Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] separate builder init and builder prepare for each batch #12253

Merged
merged 15 commits into from
Jan 22, 2025

Conversation

youkaichao
Copy link
Member

@youkaichao youkaichao commented Jan 21, 2025

Right now we create the builder instance for every batch, and hence create the attention metadata builder instance for every batch, but we don't have some global information when we create the input for every batch.

When upgrading to flashinfer 0.2, see #11194 , the attention metadata builder needs to access some global information such as sliding window, for the plan function before the flashinfer wrapper can be used.

This PR fixes the problem, by separating the builder init and builder prepare for each batch:

  • we can remember global config when we create the builder.
  • the builder will call prepare (or we can rename it to reset) to prepare for the current batch, and now it can access the global config stored during construction.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@youkaichao youkaichao marked this pull request as draft January 21, 2025 08:42
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
@youkaichao youkaichao marked this pull request as ready for review January 21, 2025 09:13
Signed-off-by: youkaichao <[email protected]>
@comaniac
Copy link
Collaborator

cc @elfiegg @pavanimajety for FlashInfer 2.0 upgrade

@pavanimajety
Copy link
Contributor

Thanks Kaichao! Nice change, LGTM.

self.runner = input_builder.runner

self.sliding_window = input_builder.sliding_window
self.block_size = input_builder.block_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need sm_scale too, but can go with the Flashinfer PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this PR just makes it easier to add more values in the builder. we can add them in the flashinfer pr.

@youkaichao youkaichao enabled auto-merge (squash) January 22, 2025 03:17
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 22, 2025
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
@youkaichao youkaichao disabled auto-merge January 22, 2025 06:13
@youkaichao youkaichao merged commit 66818e5 into vllm-project:main Jan 22, 2025
10 of 17 checks passed
@youkaichao youkaichao deleted the builder branch January 22, 2025 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants