-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Fix Flashinfer Correctness #7284
[Kernel] Fix Flashinfer Correctness #7284
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
/ready |
wait how did this fix it? did the profile run corrupt something? |
@simon-mo looks like the assumption that "prefill doesn't read KV cache" was incorrect. |
Originally, we set |
test failures are not related and are fixed in the main branch. |
@@ -127,6 +127,7 @@ def __post_init__(self): | |||
raise ValueError( | |||
f"Only {supported_head_sizes} are supported for head_dim,", | |||
f"received {self.head_dim}.") | |||
self.is_profile_run = is_block_tables_empty(self.block_tables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LiuXiaoxuanPKU I think this line is buggy, self.block_tables
here is always a tensor so is_block_tables_empty
should always return False. From what I've checked this will happen to work for flashinfer v0.1.2, but not v0.1.3, which will raise RuntimeError: CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1) failed. 1 vs 257
during profile run. (This is due to the if self.is_profile_run
block never being run.)
I tried a quick fix of patching is_block_tables_empty
by checking if block_tables has num_el == 0, which will pass the profile run fine. But this introduces logic issues for prefill, which obviously has a 0 length block table as well. Not sure what the best fix for this is, just wanted to raise this concern here.
Signed-off-by: Alvant <[email protected]>
FIX #7176