Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VLM][Core] Support profiling with multiple multi-modal inputs per prompt #7126

Merged
merged 55 commits into from
Aug 14, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Aug 4, 2024

The calculation of get_max_multimodal_tokens is designed for a single instance of multi-modal data (e.g. image), so it is inconsistent with dummy data when the dummy data contains multiple instances of multi-modal data.

To support the above case, this PR introduces the --limit-mm-per-prompt argument which limits how many instances of multi-modal data are allowed per prompt. During profiling, the total number of multimodal tokens for a given modality can be obtained by multiplying the result of get_max_multimodal_tokens by the corresponding limit.

Checklist

  • Update MultiModalConfig and CLI args with the new argument
  • Update the calculation for the total number of multimodal tokens
  • Enforce the limit during profiling (InputRegistry.dummy_data_for_profiling)
  • Enforce the limit during inference (MultiModalRegistry.map_input)
  • Add corresponding tests (except for calculation and profiling)

@DarkLight1337 DarkLight1337 requested a review from ywang96 August 4, 2024 15:33
@DarkLight1337 DarkLight1337 changed the title [VLM] Support profiling for multiple multi-modal inputs per prompt [VLM][Core] Support profiling for multiple multi-modal inputs per prompt Aug 4, 2024
Copy link

github-actions bot commented Aug 4, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337 DarkLight1337 changed the title [VLM][Core] Support profiling for multiple multi-modal inputs per prompt [VLM][Core] Support profiling with multiple multi-modal inputs per prompt Aug 4, 2024
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 Left a few comments - PTAL!

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved
@@ -180,6 +181,7 @@ def __init__(
log_stats: bool,
usage_context: UsageContext = UsageContext.ENGINE_CONTEXT,
stat_loggers: Optional[Dict[str, StatLoggerBase]] = None,
input_registry: InputRegistry = INPUT_REGISTRY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to make this an variable of __init__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to assigning the global INPUT_REGISTRY directly to the instance attribute, this makes it easier to see the dependencies of LLMEngine.

vllm/inputs/registry.py Outdated Show resolved Hide resolved
vllm/engine/arg_utils.py Show resolved Hide resolved
Comment on lines 135 to +136
if multimodal_config is None:
raise ValueError("Provide vision related configurations "
raise ValueError("Provide multi-modal related configurations "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now looking at the previous piece of code, is it ever possible that multimodal_config is None? If not, then this should probably be assert multimodal_config is not None?

Copy link
Member Author

@DarkLight1337 DarkLight1337 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it can't be None now. It's a holdover from the previous implementation of config... we can remove this in a later PR since quite a few files have to be changed.

vllm/multimodal/registry.py Outdated Show resolved Hide resolved
Comment on lines +87 to +88
input_registry: InputRegistry = INPUT_REGISTRY,
mm_registry: MultiModalRegistry = MULTIMODAL_REGISTRY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for having these as input variables

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ywang96 ywang96 enabled auto-merge (squash) August 14, 2024 16:36
@ywang96 ywang96 merged commit 3f674a4 into vllm-project:main Aug 14, 2024
52 checks passed
@DarkLight1337 DarkLight1337 deleted the multi-mm-profiling branch August 14, 2024 23:39
.get_max_multimodal_tokens(model_config)
input_registry = self.input_registry
mm_registry = self.mm_registry
mm_registry.init_mm_limits_per_prompt(model_config, mm_config)
Copy link
Contributor

@AllenDou AllenDou Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving mm_registry.init_mm_limits_per_prompt into the model runner's __init__ phase? As some model runners don't have a profiling run phase, as well as enc_dec_model_runner and xpu_model_runner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good point - I assume this is regarding generating embeddings from a LMM? WDYT? @DarkLight1337

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be fine to move it to __init__. Can you also implement this in #7530?

Copy link
Member Author

@DarkLight1337 DarkLight1337 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we need to factor out the profiling + input mapping logic into its own class. (so that _limits_by_model is kept track somewhere close to the model runner instead of inside MultiModalRegistry itself)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm doing it in #7530 @AllenDou @DarkLight1337

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024
@xyfZzz
Copy link

xyfZzz commented Aug 19, 2024

Hi~ Does vllm support multiple image input now?

@ywang96
Copy link
Member

ywang96 commented Aug 19, 2024

Hi~ Does vllm support multiple image input now?

@xyfZzz Not yet - This PR itself allows profiling with multiple image input but there are still a few things we need to do to enable multi-image input for inference. Stay tuned!

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024
@xyfZzz
Copy link

xyfZzz commented Sep 8, 2024

Hi~ Does vllm support multiple image input now?

@xyfZzz Not yet - This PR itself allows profiling with multiple image input but there are still a few things we need to do to enable multi-image input for inference. Stay tuned!

Thanks! Since another three weeks have passed, I would like to ask if vllm now supports multiple image inputs?

@DarkLight1337
Copy link
Member Author

Yes, it's supported now. Please check out the docs.

@xyfZzz
Copy link

xyfZzz commented Sep 9, 2024

Yes, it's supported now. Please check out the docs.

@DarkLight1337 Hi~ I installed the latest main branch of vllm and deployed MiniCPM-V-2.6, but this error occurred when calling the openai style interface.

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Could you please help me find out why this error occurred?

@xyfZzz
Copy link

xyfZzz commented Sep 9, 2024

Yes, it's supported now. Please check out the docs.

@DarkLight1337 Hi~ I installed the latest main branch of vllm and deployed MiniCPM-V-2.6, but this error occurred when calling the openai style interface.

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Could you please help me find out why this error occurred?

I found the cause of the error. I should set --limit-mm-per-prompt image=2 when deploying.

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants