Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update references to Hugging Face DLC for TGI #816

Merged
merged 3 commits into from
Oct 17, 2024

Conversation

alvarobartt
Copy link
Collaborator

Description

This PR updates a bunch of outdated references to Text Generation Inference (TGI) container, as previously those were pointing to the GitHub Container Registry where Hugging Face publishes those containers i.e. ghcr.io/huggingface/text-generation-inference; but since the Hugging Face DLCs on Google Cloud were recently released publicly (see Google-Cloud-Containers), now those references have been updated to point to the DLCs hosted on Google Cloud's Artifact Registry.

cc @brandonroyal

Included a harmless TODO note so that we remember to come back to this
once the benchmark is verified with the latest Hugging Face DLC for TGI
cc @annapendleton
@annapendleton
Copy link
Collaborator

/gcbrun

@annapendleton
Copy link
Collaborator

@alvarobartt I also added you as a collaborator so you can test + merge this yourself :)

@alvarobartt
Copy link
Collaborator Author

Hi here @annapendleton, so the CI is failing, but not sure that's something I can fix or related to any of the changes within this PR, could you help me debug the issue further if related to the PR? 🤗

@annapendleton
Copy link
Collaborator

Hi here @annapendleton, so the CI is failing, but not sure that's something I can fix or related to any of the changes within this PR, could you help me debug the issue further if related to the PR? 🤗

From what I can see in the logs, it looks like the test cluster isn't present. We don't use these tests ourselves for the benchmarking, I think we should add someone else who has a better sense of if it's safe to bypass this test or not.

@annapendleton
Copy link
Collaborator

@imreddy13 could you help us determine if this is a true test failure?

@alvarobartt
Copy link
Collaborator Author

Hi here @annapendleton @brandonroyal is there any update with regards to this PR? Thanks in advance 🤗

@annapendleton
Copy link
Collaborator

Hey @alvarobartt, I just got back from vacation. Let me retrigger these tests to see if they're passing.

@annapendleton
Copy link
Collaborator

/gcbrun

@annapendleton
Copy link
Collaborator

@alvarobartt tests are passing, let me know if you still want to merge :)

@alvarobartt
Copy link
Collaborator Author

Hi here @annapendleton thanks for handling!

So now the latest released DLC for TGI is 2.3 (not 2.2 as in this PR) do you think makes sense to just update that now and re-run the tests?

Also what's the strategy you usually follow to keep the documentation examples point to the latest containers in these cases? Or should we just open a PR here updating those on each release and then waiting for the tests to pass?

Thanks in advance 🤗

@annapendleton annapendleton merged commit 0400243 into GoogleCloudPlatform:main Oct 17, 2024
7 checks passed
@annapendleton
Copy link
Collaborator

@alvarobartt you'll have to open a new PR to keep the containers updated with each release you'll want to upgrade to for the DLC used in these example workloads. It'll be the same process as done here.

If you want to upgrade, we'll have to rerun the tests anyways, so let's do that in a new PR :) I merged this PR for the current container image.

@raushan2016
Copy link
Member

This looks very similar to the change below which has issue where the new image doesn't work for LLAMA multiple GPU.
GoogleCloudPlatform/kubernetes-engine-samples#1581

@annapendleton

@annapendleton
Copy link
Collaborator

@alvarobartt will you be able to help fix this in both repositories?

@alvarobartt
Copy link
Collaborator Author

Hi here @annapendleton indeed see this PR with the fix for the current repository (on a previously opened PR updating the TGI DLC)

#852

@annapendleton
Copy link
Collaborator

Ah I see, I had originally approved that fix, but the tests failed. Thanks for bringing it back to my attention, I re-triggered the tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants