-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Explicitly state that InternVL 2.5 is supported #10978
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
I'm not completely sure where to get the stop tokens though. The link in |
You can get the stop tokens here: https://huggingface.co/OpenGVLab/InternVL2_5-4B/blob/9e3bfef341bf84ca3efed094ea6c598e6b34f527/conversation.py#L335-L391 Seems that they deleted the stop tokens from README. |
I see, let me update the link then, thanks for pointing that out! |
So basically, I should include the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Signed-off-by: DarkLight1337 <[email protected]>
…0978) Signed-off-by: DarkLight1337 <[email protected]>
…0978) Signed-off-by: DarkLight1337 <[email protected]>
Can you show the full stack trace? |
�[36m(RayWorkerWrapper pid=781)�[0m INFO 01-19 22:13:17 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing 0%| | 0/61 [00:00<?, ?it/s] main env main args: |
Additionally, I use the same code and env to infer InternVL2_5-8B successfully. Maybe the vision part and language part(Qwen2) are not well matched, it deliver wrong tensor from vision part to language part. |
@LaoWangGB Can you try the InternVL2.5-26B model as well? If it also occurred on 26B, this might be the case. |
InternVL2.5-26B works well. So the problem occurs due to the difference in language models. Did you test 78B successfully? |
The model architecture of InternVL2.5 is the same as InternVL2 except for different LM backbone. We have already implemented dynamic LM loading for this model, so no further changes are needed to support this model in vLLM.
I have tested the 4B model locally (
vllm serve OpenGVLab/InternVL2_5-4B
) and it seems to be working fine.