Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Spec Decode] Simplify the use of Eagle Spec Decode #12304

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ShangmingCai
Copy link
Contributor

This PR tries to implement #11943.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Signed-off-by: Shangming Cai <[email protected]>
@ShangmingCai
Copy link
Contributor Author

ShangmingCai commented Jan 22, 2025

@LiuXiaoxuanPKU Hello, I saw the RFC you wrote, so I took some time to implement this PR.

According to the conversion script, we have to transform config.json to wrap it inside

{
    "model_type": "eagle",
    "model": {$origin_json_content}
}

so that it can be considered an EAGLEConfig. So, I made some changes inside vllm/config.py. First, I tried to implement it in the create_worker of SpecDecodeWorker, but it turned out that some config options were wrong, so we had to do this earlier and implement it in vllm/config.py.

Also, since we have to copy and load the lm_head.weight of the scorer model into the eagle ckpt so that it can be loaded in vllm/model_executor/models/eagle.py. I thought about it and felt it would be more appropriate to implement this in the init_device of SpecDecodeWorker.

Considering that the scorer model may enable tp, so I use tensor_model_parallel_gather to collect and concat weights first. I have run some experiments with different tp and verified the correctness. But I only test it with EAGLE-llama2-chat-7B and Llama-2-7b-chat-hf.

Do you think we have to add more tests? It appears that the conversion script is not enough for EAGLE-Qwen2-7B-Instruct, I am seeing this error:

ValueError("Found bias in the loaded weights but the model config doesn't have bias")

@ShangmingCai
Copy link
Contributor Author

BTW, this PR is compatible with the old method with the conversion script. The converted config.json and weight binary can still be loaded successfully. After testing, the Draft acceptance rate remains consistent for both methods with EAGLE-llama2-chat-7B.

@ShangmingCai ShangmingCai changed the title [Feature][Spec Decode]: Simplify the use of eagle spec decode [Feature][Spec Decode] Simplify the use of Eagle Spec Decode Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant