Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod #6289

Merged
merged 5 commits into from
Jul 15, 2024

Conversation

WoosukKwon
Copy link
Collaborator

Currently, UnquantizedFusedMoEMethod directly imports the Triton fused MoE kernel and related CUDA kernels, preventing other hardware backends from supporting MoE models. This PR adds the CustomOp interface to it so that the kernels are imported only for NVIDIA and AMD GPUs.

@robertgshaw2-redhat
Copy link
Collaborator

Does this need to be added to the fp8 method as well? Or are we handling quantization separately?

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/fp8.py#L220

@WoosukKwon
Copy link
Collaborator Author

@robertgshaw2-neuralmagic We haven't used the CustomOp interface for the quantization-related ops, since they usually only support NVIDIA or AMD GPUs. Do you want to apply the interface to the quant ops?

@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-neuralmagic We haven't used the CustomOp interface for the quantization-related ops, since they usually only support NVIDIA or AMD GPUs. Do you want to apply the interface to the quant ops?

I think its okay to leave it for now and make the modifications once we have a need for it

@WoosukKwon
Copy link
Collaborator Author

This PR seems to break Mixtral. Let me check the reason.

@robertgshaw2-redhat
Copy link
Collaborator

What TP is it running at? @WoosukKwon

@WoosukKwon
Copy link
Collaborator Author

@comaniac Could you please take a look? The PR removes a few lines of code in model loader that you marked as FIXME.

@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 15, 2024
@comaniac
Copy link
Collaborator

@comaniac Could you please take a look? The PR removes a few lines of code in model loader that you marked as FIXME.

That FIXME should be removed safely. Please let me know if the test still fails and I'll take a look.

@WoosukKwon
Copy link
Collaborator Author

@comaniac Thanks for the confirmation! It works well.

@WoosukKwon WoosukKwon enabled auto-merge (squash) July 15, 2024 18:35
@WoosukKwon WoosukKwon merged commit ec9933f into main Jul 15, 2024
89 checks passed
@WoosukKwon WoosukKwon deleted the moe-backend branch July 15, 2024 19:23
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants