[Feature][Spec Decode] Simplify the use of Eagle Spec Decode #12304

ShangmingCai · 2025-01-22T07:12:40Z

This PR tries to implement #11943.

Signed-off-by: Shangming Cai <[email protected]>

github-actions · 2025-01-22T07:12:52Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Shangming Cai <[email protected]>

ShangmingCai · 2025-01-22T08:09:18Z

@LiuXiaoxuanPKU Hello, I saw the RFC you wrote, so I took some time to implement this PR.

According to the conversion script, we have to transform config.json to wrap it inside

{
    "model_type": "eagle",
    "model": {$origin_json_content}
}

so that it can be considered an EAGLEConfig. So, I made some changes inside vllm/config.py. First, I tried to implement it in the create_worker of SpecDecodeWorker, but it turned out that some config options were wrong, so we had to do this earlier and implement it in vllm/config.py.

Also, since we have to copy and load the lm_head.weight of the scorer model into the eagle ckpt so that it can be loaded in vllm/model_executor/models/eagle.py. I thought about it and felt it would be more appropriate to implement this in the init_device of SpecDecodeWorker.

Considering that the scorer model may enable tp, so I use tensor_model_parallel_gather to collect and concat weights first. I have run some experiments with different tp and verified the correctness. But I only test it with EAGLE-llama2-chat-7B and Llama-2-7b-chat-hf.

Do you think we have to add more tests? It appears that the conversion script is not enough for EAGLE-Qwen2-7B-Instruct, I am seeing this error:

ValueError("Found bias in the loaded weights but the model config doesn't have bias")

ShangmingCai · 2025-01-22T08:17:59Z

BTW, this PR is compatible with the old method with the conversion script. The converted config.json and weight binary can still be loaded successfully. After testing, the Draft acceptance rate remains consistent for both methods with EAGLE-llama2-chat-7B.

Simplify the use of eagle spec decode.

c145a29

Signed-off-by: Shangming Cai <[email protected]>

fix

34999dc

Signed-off-by: Shangming Cai <[email protected]>

ShangmingCai changed the title ~~[Feature][Spec Decode]: Simplify the use of eagle spec decode~~ [Feature][Spec Decode] Simplify the use of Eagle Spec Decode Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Spec Decode] Simplify the use of Eagle Spec Decode #12304

[Feature][Spec Decode] Simplify the use of Eagle Spec Decode #12304

ShangmingCai commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

ShangmingCai commented Jan 22, 2025 •

edited

Loading

ShangmingCai commented Jan 22, 2025

[Feature][Spec Decode] Simplify the use of Eagle Spec Decode #12304

Are you sure you want to change the base?

[Feature][Spec Decode] Simplify the use of Eagle Spec Decode #12304

Conversation

ShangmingCai commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

ShangmingCai commented Jan 22, 2025 • edited Loading

ShangmingCai commented Jan 22, 2025

ShangmingCai commented Jan 22, 2025 •

edited

Loading