[Kernel] Flashinfer correctness fix for v0.1.3 #7319

LiuXiaoxuanPKU · 2024-08-08T23:16:52Z

Reported by @felixzhu555 , is_profile_run is buggy in flashinfer backend, which will fail flashinfer v0.1.3. This PR fixes this, and update CI flashinfer version to v1.0.3.

github-actions · 2024-08-08T23:17:03Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

comaniac · 2024-08-08T23:20:59Z

.buildkite/test-pipeline.yaml

@@ -61,7 +61,7 @@ steps:
  - tests/basic_correctness
  commands:
  # This flashinfer installation will fail on AMD ROCm, so it is set as optional.
-  - pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.2/flashinfer-0.1.2+cu121torch2.4-cp310-cp310-linux_x86_64.whl || true
+  - pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.3/flashinfer-0.1.3+cu124torch2.4-cp310-cp310-linux_x86_64.whl || true


Just update the Dockerfile and remove all of these.

LiuXiaoxuanPKU · 2024-08-08T23:52:11Z

/ready

comaniac

LGTM. Just a nit

comaniac · 2024-08-09T00:15:55Z

vllm/attention/backends/flashinfer.py

-                self.paged_kv_indptr = torch.zeros(batch_size + 1,
-                                                   device=self.device)
-            else:
+            # Only use flashinfer in the non-profile run


This comment is a bit confusing as we are in the FlashInfer backend. Can we say more about why we skip the following logic in profile run and what's the outcome?

comaniac · 2024-08-09T16:08:44Z

CI failure seems like a real bug

[2024-08-09T04:52:05Z]   File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 791, in begin_forward
--
  | [2024-08-09T04:52:05Z]     self._wrapper.begin_forward(
  | [2024-08-09T04:52:05Z] RuntimeError: CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1) failed. 1 vs 257

exceedzhang · 2024-08-10T03:59:54Z

@LiuXiaoxuanPKU It's bugs

Signed-off-by: Alvant <[email protected]>

LiuXiaoxuanPKU added 3 commits August 7, 2024 14:48

ifx

00841ee

add is_profile_run

6db70e5

update ci flashinfer version

5f27eaa

LiuXiaoxuanPKU added 2 commits August 8, 2024 16:20

update ci version

8ccc52f

revert

7e77c1c

comaniac reviewed Aug 8, 2024

View reviewed changes

LiuXiaoxuanPKU added 3 commits August 8, 2024 16:30

update ci

519aac1

Merge branch 'main' into flashinfer-correct

da96484

minor

7dca259

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 8, 2024

comaniac approved these changes Aug 9, 2024

View reviewed changes

fix comments

3482424

LiuXiaoxuanPKU enabled auto-merge (squash) August 9, 2024 02:43

LiuXiaoxuanPKU added 2 commits August 11, 2024 22:20

minor

75e7515

Merge branch 'main' into flashinfer-correct

1580111

LiuXiaoxuanPKU merged commit ec2affa into vllm-project:main Aug 12, 2024
68 checks passed

sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024

[Kernel] Flashinfer correctness fix for v0.1.3 (vllm-project#7319)

74494eb

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[Kernel] Flashinfer correctness fix for v0.1.3 (vllm-project#7319)

9a16370

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[Kernel] Flashinfer correctness fix for v0.1.3 (vllm-project#7319)

cf541f7

LiuXiaoxuanPKU deleted the flashinfer-correct branch September 17, 2024 04:29

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Kernel] Flashinfer correctness fix for v0.1.3 (vllm-project#7319)

163a730

Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Kernel] Flashinfer correctness fix for v0.1.3 (vllm-project#7319)

d5a9805

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Flashinfer correctness fix for v0.1.3 #7319

[Kernel] Flashinfer correctness fix for v0.1.3 #7319

LiuXiaoxuanPKU commented Aug 8, 2024 •

edited

Loading

github-actions bot commented Aug 8, 2024

comaniac Aug 8, 2024

LiuXiaoxuanPKU commented Aug 8, 2024

comaniac left a comment •

edited

Loading

comaniac Aug 9, 2024

comaniac commented Aug 9, 2024

exceedzhang commented Aug 10, 2024

[Kernel] Flashinfer correctness fix for v0.1.3 #7319

[Kernel] Flashinfer correctness fix for v0.1.3 #7319

Conversation

LiuXiaoxuanPKU commented Aug 8, 2024 • edited Loading

github-actions bot commented Aug 8, 2024

comaniac Aug 8, 2024

Choose a reason for hiding this comment

LiuXiaoxuanPKU commented Aug 8, 2024

comaniac left a comment • edited Loading

Choose a reason for hiding this comment

comaniac Aug 9, 2024

Choose a reason for hiding this comment

comaniac commented Aug 9, 2024

exceedzhang commented Aug 10, 2024

LiuXiaoxuanPKU commented Aug 8, 2024 •

edited

Loading

comaniac left a comment •

edited

Loading