[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod #6289

WoosukKwon · 2024-07-10T06:53:48Z

Currently, UnquantizedFusedMoEMethod directly imports the Triton fused MoE kernel and related CUDA kernels, preventing other hardware backends from supporting MoE models. This PR adds the CustomOp interface to it so that the kernels are imported only for NVIDIA and AMD GPUs.

robertgshaw2-redhat · 2024-07-10T12:37:28Z

Does this need to be added to the fp8 method as well? Or are we handling quantization separately?

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/fp8.py#L220

WoosukKwon · 2024-07-10T17:36:36Z

@robertgshaw2-neuralmagic We haven't used the CustomOp interface for the quantization-related ops, since they usually only support NVIDIA or AMD GPUs. Do you want to apply the interface to the quant ops?

robertgshaw2-redhat · 2024-07-10T17:40:47Z

@robertgshaw2-neuralmagic We haven't used the CustomOp interface for the quantization-related ops, since they usually only support NVIDIA or AMD GPUs. Do you want to apply the interface to the quant ops?

I think its okay to leave it for now and make the modifications once we have a need for it

WoosukKwon · 2024-07-10T19:25:54Z

This PR seems to break Mixtral. Let me check the reason.

robertgshaw2-redhat · 2024-07-10T19:33:11Z

What TP is it running at? @WoosukKwon

WoosukKwon · 2024-07-15T16:49:47Z

@comaniac Could you please take a look? The PR removes a few lines of code in model loader that you marked as FIXME.

comaniac · 2024-07-15T17:13:22Z

@comaniac Could you please take a look? The PR removes a few lines of code in model loader that you marked as FIXME.

That FIXME should be removed safely. Please let me know if the test still fails and I'll take a look.

WoosukKwon · 2024-07-15T18:35:31Z

@comaniac Thanks for the confirmation! It works well.

…ect#6289)

…ect#6289) Signed-off-by: Alvant <[email protected]>

[Misc] Add CustomOp Interface to MoE

2a86b9a

WoosukKwon requested a review from robertgshaw2-redhat July 10, 2024 06:54

robertgshaw2-redhat approved these changes Jul 10, 2024

View reviewed changes

Merge branch 'main' into moe-backend

ccf9210

WoosukKwon added 3 commits July 14, 2024 22:51

Merge branch 'main' into moe-backend

9e19861

Merge branch 'main' into moe-backend

976de20

Fix model loading

ec15140

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 15, 2024

WoosukKwon enabled auto-merge (squash) July 15, 2024 18:35

WoosukKwon merged commit ec9933f into main Jul 15, 2024
89 checks passed

WoosukKwon deleted the moe-backend branch July 15, 2024 19:23

robertgshaw2-redhat mentioned this pull request Jul 16, 2024

[Draft] proposal for ipex quant support #6440

Draft

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (vllm-proj…

d09c37b

…ect#6289)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (vllm-proj…

92ae626

…ect#6289)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (vllm-proj…

d8d221c

…ect#6289)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (vllm-proj…

a95000a

…ect#6289) Signed-off-by: Alvant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod #6289

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod #6289

WoosukKwon commented Jul 10, 2024

robertgshaw2-redhat commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

robertgshaw2-redhat commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

robertgshaw2-redhat commented Jul 10, 2024

WoosukKwon commented Jul 15, 2024

comaniac commented Jul 15, 2024

WoosukKwon commented Jul 15, 2024

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod #6289

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod #6289

Conversation

WoosukKwon commented Jul 10, 2024

robertgshaw2-redhat commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

robertgshaw2-redhat commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

robertgshaw2-redhat commented Jul 10, 2024

WoosukKwon commented Jul 15, 2024

comaniac commented Jul 15, 2024

WoosukKwon commented Jul 15, 2024