Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Remove flashinfer warning, add flashinfer tests to CI #6351

Merged
merged 3 commits into from
Jul 12, 2024

Conversation

LiuXiaoxuanPKU
Copy link
Collaborator

Add flashinfer basic correctness tests to CI, remove the llama-7b warning since it is fixed.
CI will fail for now, we need flashinfer's new release to pass the CI, will update CI flashinfer version accordingly.

@kostum123
Copy link

Can we enable flash attention inference support for Gemma2 models with SoftCap? Flash attention supports this as of the 2.6 release.

@comaniac comaniac enabled auto-merge (squash) July 11, 2024 22:24
@LiuXiaoxuanPKU
Copy link
Collaborator Author

Can we enable flash attention inference support for Gemma2 models with SoftCap? Flash attention supports this as of the 2.6 release.

Thanks for the info! Yeah will do.

@comaniac comaniac merged commit d6ab528 into vllm-project:main Jul 12, 2024
71 checks passed
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024
@LiuXiaoxuanPKU LiuXiaoxuanPKU deleted the flashinfer-fix branch September 17, 2024 04:29
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants