Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts() #70

tangleintel · 2025-01-09T06:22:02Z

Add renormalize parameter for FusedMOE cause whether to normalize routing_weights depends on the norm_topk_prob attrib in model's config.json file. Some models such as Qwen2-MoE is set to false.
The experts_max param is inclusive according to the habana's doc

…torch.ops.hpu.mixture_of_experts()

michalkuligowski · 2025-01-14T09:38:30Z

vllm_hpu_extension/ops.py

@@ -357,30 +357,32 @@ def forward(self, state, expert_id, w):
        return torch.matmul(state, w[expert_id].transpose(0, 1))


-def calculate_routing_tensors(score, topk, hidden_states_dtype):
+def calculate_routing_tensors(score, topk, hidden_states_dtype, renormalize: bool = True):


Can you add a small test for this to vllm-fork ci?

I found it's hard to demonstrate this with a UT cause it's depends on the real model with trained weights and see what's the different of the generated contents with and without this PR. We can refer to optimum habana's results and CUDA's results. But I think its's not realistic to put them into the UT. Do you have any suggestions ?

We define tests in https://github.com/HabanaAI/vllm-fork/blob/habana_main/.jenkins/test_config.yaml please refer to configuration file to see possible invocation of a small test for fusedMoE with renormalize True and False

michalkuligowski · 2025-01-16T08:24:22Z

vllm_hpu_extension/ops.py

@@ -357,30 +357,32 @@ def forward(self, state, expert_id, w):
        return torch.matmul(state, w[expert_id].transpose(0, 1))


-def calculate_routing_tensors(score, topk, hidden_states_dtype):
+def calculate_routing_tensors(score, topk, hidden_states_dtype, renormalize: bool = True):


We define tests in https://github.com/HabanaAI/vllm-fork/blob/habana_main/.jenkins/test_config.yaml please refer to configuration file to see possible invocation of a small test for fusedMoE with renormalize True and False

add renormalize parameter for FusedMOE's & modify experts_max arg of …

80ad23f

…torch.ops.hpu.mixture_of_experts()

tangleintel mentioned this pull request Jan 9, 2025

add renormalize param for FusedMOE HabanaAI/vllm-fork#671

Open

michalkuligowski requested a review from kwisniewski98 January 13, 2025 07:08

kwisniewski98 approved these changes Jan 13, 2025

View reviewed changes

michalkuligowski reviewed Jan 14, 2025

View reviewed changes

kwisniewski98 mentioned this pull request Jan 14, 2025

Resolved ALIBI bias regression due to porting flat PA HabanaAI/vllm-fork#503

Open

tangleintel requested a review from michalkuligowski January 15, 2025 01:44

michalkuligowski requested changes Jan 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts() #70

Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts() #70

tangleintel commented Jan 9, 2025

michalkuligowski Jan 14, 2025

tangleintel Jan 14, 2025

michalkuligowski Jan 16, 2025

michalkuligowski Jan 16, 2025

Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts() #70

Are you sure you want to change the base?

Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts() #70

Conversation

tangleintel commented Jan 9, 2025

michalkuligowski Jan 14, 2025

Choose a reason for hiding this comment

tangleintel Jan 14, 2025

Choose a reason for hiding this comment

michalkuligowski Jan 16, 2025

Choose a reason for hiding this comment

michalkuligowski Jan 16, 2025

Choose a reason for hiding this comment