-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt #7126
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
…ted metadata in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 Left a few comments - PTAL!
@@ -180,6 +181,7 @@ def __init__( | |||
log_stats: bool, | |||
usage_context: UsageContext = UsageContext.ENGINE_CONTEXT, | |||
stat_loggers: Optional[Dict[str, StatLoggerBase]] = None, | |||
input_registry: InputRegistry = INPUT_REGISTRY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to make this an variable of __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to assigning the global INPUT_REGISTRY
directly to the instance attribute, this makes it easier to see the dependencies of LLMEngine
.
if multimodal_config is None: | ||
raise ValueError("Provide vision related configurations " | ||
raise ValueError("Provide multi-modal related configurations " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now looking at the previous piece of code, is it ever possible that multimodal_config is None
? If not, then this should probably be assert multimodal_config is not None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it can't be None
now. It's a holdover from the previous implementation of config... we can remove this in a later PR since quite a few files have to be changed.
input_registry: InputRegistry = INPUT_REGISTRY, | ||
mm_registry: MultiModalRegistry = MULTIMODAL_REGISTRY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for having these as input variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
.get_max_multimodal_tokens(model_config) | ||
input_registry = self.input_registry | ||
mm_registry = self.mm_registry | ||
mm_registry.init_mm_limits_per_prompt(model_config, mm_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving mm_registry.init_mm_limits_per_prompt
into the model runner's __init__
phase? As some model runners don't have a profiling run phase, as well as enc_dec_model_runner
and xpu_model_runner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good point - I assume this is regarding generating embeddings from a LMM? WDYT? @DarkLight1337
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be fine to move it to __init__
. Can you also implement this in #7530?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we need to factor out the profiling + input mapping logic into its own class. (so that _limits_by_model
is kept track somewhere close to the model runner instead of inside MultiModalRegistry
itself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm doing it in #7530 @AllenDou @DarkLight1337
Hi~ Does vllm support multiple image input now? |
@xyfZzz Not yet - This PR itself allows profiling with multiple image input but there are still a few things we need to do to enable multi-image input for inference. Stay tuned! |
Thanks! Since another three weeks have passed, I would like to ask if vllm now supports multiple image inputs? |
Yes, it's supported now. Please check out the docs. |
@DarkLight1337 Hi~ I installed the latest main branch of vllm and deployed MiniCPM-V-2.6, but this error occurred when calling the openai style interface.
Could you please help me find out why this error occurred? |
I found the cause of the error. I should set |
…ompt (vllm-project#7126) Signed-off-by: Alvant <[email protected]>
The calculation of
get_max_multimodal_tokens
is designed for a single instance of multi-modal data (e.g. image), so it is inconsistent with dummy data when the dummy data contains multiple instances of multi-modal data.To support the above case, this PR introduces the
--limit-mm-per-prompt
argument which limits how many instances of multi-modal data are allowed per prompt. During profiling, the total number of multimodal tokens for a given modality can be obtained by multiplying the result ofget_max_multimodal_tokens
by the corresponding limit.Checklist
MultiModalConfig
and CLI args with the new argumentInputRegistry.dummy_data_for_profiling
)MultiModalRegistry.map_input
)