Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi step scheduling support for encoder-decoder models #12265

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
599 commits
Select commit Hold shift + click to select a range
ebd42c4
Reformat README_GAUDI.md (#389)
kzawora-intel Oct 14, 2024
2d2bf7a
[CI] Prepare separate Jenkins tests for torch compile mode (#388)
anko-intel Oct 14, 2024
9df1d4a
Remove workaround added to resolve multi-card stall issue (#387)
SanjuCSudhakaran Oct 14, 2024
9777c9f
Update SynapseAI version in README & Dockerfile (#390)
kzawora-intel Oct 14, 2024
5ceda69
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Oct 14, 2024
3e6a2d4
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 14, 2024
9ac52ab
fix attention backend selector:
kzawora-intel Oct 14, 2024
57bc31d
Oct 7 rebase (#367)
kzawora-intel Oct 14, 2024
55dd07e
enable mixtral quantization using INC (#372)
dudilester Oct 15, 2024
401f5ae
[CI] Temporarily increase test tolerances (#392)
kzawora-intel Oct 15, 2024
e598f3f
Add quickstart section to READMEs (#391)
kzawora-intel Oct 15, 2024
f77435d
Softmax: add weighted-sum normalization (#378)
madamczykhabana Oct 16, 2024
a59fc7b
Remove HPU changes from cache_engine.py (#400)
kzawora-intel Oct 16, 2024
05bcdf5
[bucketing overhaul 1/n] Add padding-aware scheduling and option to l…
kzawora-intel Oct 17, 2024
9276ccc
Add WA for RuntimeError: "fill_cpu" not implemented for 'Float8_e4m3f…
kzawora-intel Oct 17, 2024
07c98a5
Workaround for OOM during loading llama-405 (#396)
afierka-intel Oct 18, 2024
acde882
Add HPU specific arguments to benchmark_throughput (#406)
kdamaszk Oct 22, 2024
8c43ff1
Add forward_hpu to RotaryEmbedding, remove custom module (#404)
kzawora-intel Oct 22, 2024
aecd667
Remove if blocks smaller than bs in generate_decode_buckets (#412)
kamil-kaczor Oct 22, 2024
0cf5261
Remove CPU sync before Sampler (#414)
kdamaszk Oct 22, 2024
3af4b6c
Remove redundant set_active_loras call during warmup (#413)
SanjuCSudhakaran Oct 22, 2024
892c090
Change profile Run batch based on max_seq_len (#415)
hlahkar Oct 23, 2024
7f58ad1
Add support for various softmax normalization options (#420)
madamczykhabana Oct 23, 2024
f603353
Update README_GAUDI about fp8 calibration procedure (#423)
afierka-intel Oct 25, 2024
a5136ec
Set vllm-hpu-extension to 341a77f (#428)
madamczykhabana Oct 25, 2024
a926d14
Create scorecard.yml
rozhukov Oct 25, 2024
5b7f685
Contiguous PA (#424)
mfylcek Oct 25, 2024
e3ae2eb
Revert "Contiguous PA" (#432)
madamczykhabana Oct 25, 2024
93609a2
Enable Dynamic MoE for Mixtral on 1.19.0 (#425)
tpawlows Oct 25, 2024
3a55e77
Support long contexts with LoRA (#418)
SanjuCSudhakaran Oct 28, 2024
4fd5c4c
Add HPU specific changes to benchmark_latency.py (#436)
kdamaszk Oct 28, 2024
3e06110
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 28, 2024
96e0d6f
Rebase fix
kzawora-intel Oct 28, 2024
ebebbbb
fix ci fails
kzawora-intel Oct 28, 2024
4c0caa5
fix ci again
kzawora-intel Oct 28, 2024
72a2856
formatting
kzawora-intel Oct 28, 2024
2a38e6f
sarkar/Add htrandom generator for hpu (#246)
ssarkar2 Oct 28, 2024
3e135ae
Fix one_hot bug in torch compile mode (#427)
yuwenzho Oct 29, 2024
3203bd9
HPU: offload logits processing to CPU (#358)
madamczykhabana Oct 29, 2024
2fa54e2
Lora layers (#435)
rsshaik1 Oct 29, 2024
1dcdb37
initial works on enabling automatic prefix caching (#162)
huijjj Oct 29, 2024
78e947a
Multi step scheduling (#441)
tzielinski-habana Oct 29, 2024
a821717
Add fp8 test to jenkins CI (#429)
afierka-intel Oct 30, 2024
79dc102
Enable FusedSDPA prefill by default (#447)
kzawora-intel Oct 30, 2024
2f7f963
Contiguous PA (#433)
mfylcek Oct 30, 2024
94858b5
Fix default value for FSDPA (#448)
madamczykhabana Oct 30, 2024
d3257b2
Fix performance of top_p and top_k calculations (#449)
kdamaszk Oct 30, 2024
d42c2a2
Reduce block fragmentation (#426)
yangw1234 Oct 31, 2024
6643aa6
Create scorecard.yml (#431)
rozhukov Oct 31, 2024
0cc72b9
Enable HPUGraphs for lora long-contexts tests
SanjuCSudhakaran Nov 4, 2024
24ba4d4
[CI] Add Llama2 to torch compile tests (#446)
anko-intel Nov 4, 2024
1bb808a
Enable HPUGraphs for lora long-contexts tests (#454)
vivekgoe Nov 4, 2024
ac12d53
Fix SchedulerConfig params (#459)
ldurejko Nov 5, 2024
653e56c
Tensor parallelism for multi-step scheduling (#457)
tzielinski-habana Nov 5, 2024
1033c3e
Set tokenizers version to <0.20.2 (#460)
madamczykhabana Nov 5, 2024
5e56d88
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Nov 5, 2024
18f00d7
Merge remote-tracking branch 'upstream/main' into private/kzawora/oct…
kzawora-intel Nov 5, 2024
d397ba5
fix hpu execution
kzawora-intel Nov 5, 2024
4c0647f
format.sh
kzawora-intel Nov 5, 2024
c41788f
fix type checks
kzawora-intel Nov 5, 2024
c3c0e90
[BugFix][Habana_main][Multistep]Fix multistep deepcopy overhead (#452)
xuechendi Nov 6, 2024
dc5cdfb
Set vllm-hpu-extension to 0063520 (#455)
madamczykhabana Nov 6, 2024
7578f3b
Oct 28 rebase (#439)
kzawora-intel Nov 6, 2024
07a6441
Revert "Oct 28 rebase" (#466)
kzawora-intel Nov 6, 2024
5812cb6
Oct 28 rebase - attempt 2 (#467)
kzawora-intel Nov 6, 2024
40882f3
Merge commit 'a5fda50a10641e47c0c290907f30ef2add6d4e7a' into HEAD
kzawora-intel Nov 6, 2024
8e62377
format.sh
kzawora-intel Nov 6, 2024
5eb7f3d
Nov 6 rebase (sans vllm-project#6143) (#468)
kzawora-intel Nov 6, 2024
0a17a2e
Fix missed conflict (#469)
kzawora-intel Nov 6, 2024
b91403a
Merge commit 'a02a50e' into HEAD
kzawora-intel Nov 6, 2024
843ae37
Merge commit '6a585a2' into HEAD
kzawora-intel Nov 6, 2024
60b981e
Align fork with HPU upstream code (#465)
michalkuligowski Nov 6, 2024
3c39626
The output tensor from sampling is the input_tokens to the (#471)
tzielinski-habana Nov 6, 2024
11f5da6
Add multi step scheduling scenario to jenkins CI (#445)
afierka-intel Nov 7, 2024
6eed0ef
Handle offsets shape in long contexts
SanjuCSudhakaran Nov 7, 2024
e6087ea
[New Feature][Habana-Main] speculative_decoding HPU support (#375)
xuechendi Nov 7, 2024
ac16ba1
[Doc] Fix broken urls in gaudi-installation (#473)
MohitIntel Nov 8, 2024
e818cf3
[Installation] Avoid ModuleNotFoundError:setuptools-scm error (#475)
MohitIntel Nov 8, 2024
41dddab
Add option to disable duplicates in topk (#464)
kdamaszk Nov 8, 2024
1565944
Handle offsets shape in long contexts (#477)
vivekgoe Nov 11, 2024
65a920e
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Nov 12, 2024
890b1f0
[New Feature][Habana main] spec decode PR2 - Medusa, MLP, Eagle (#461)
xuechendi Nov 12, 2024
3fb59c7
Add FP8 TP=2 scenario to Jenkins CI (#478)
afierka-intel Nov 13, 2024
c27899a
Commonalize code between contiguous and flat pa (#493)
madamczykhabana Nov 14, 2024
0548200
Config hidden layer number to run in 1 lazy graph (#451)
libinta Nov 14, 2024
eca9a83
Fix number of blocks when profiling contiguous pa (#496)
madamczykhabana Nov 14, 2024
ea8a23a
Warmup for multi-step scheduling (#501)
tzielinski-habana Nov 15, 2024
a029232
Enable patching matmuls in block2batch and batch2block (#500)
nirda7 Nov 15, 2024
875faa6
Add FP8 inference procedure (#504)
afierka-intel Nov 15, 2024
0467cc1
Warm up random sampler
mfylcek Nov 14, 2024
82e0521
Warmup random sampler only during decoding
mfylcek Nov 15, 2024
0014d34
Remove comment
mfylcek Nov 15, 2024
e0e37e0
Remove comments
mfylcek Nov 15, 2024
96467d8
Terminate ray workers on ray_hpu_executor shutdown (#505)
kzawora-intel Nov 15, 2024
0175fe0
Formatting
mfylcek Nov 15, 2024
b38b160
Move the warmup to graph capture function
mfylcek Nov 15, 2024
76aa48a
Bug fix
mfylcek Nov 15, 2024
e24a5af
Formatting
mfylcek Nov 15, 2024
0011e75
Add valid_seq_lengths to fusedsdpa - port from 1.18.0 (#509)
iboiko-habana Nov 18, 2024
c601886
Set vllm-hpu-extension to 2542c18 (#517)
iboiko-habana Nov 18, 2024
dac5d80
[BUGFIX] fix worker selector non-return issue (#508)
xuechendi Nov 18, 2024
a4e689a
Use contiguous pa by default (#519)
madamczykhabana Nov 18, 2024
fb308c9
Set vllm-hpu-extension to 3a60b49 (#520)
madamczykhabana Nov 18, 2024
9ebcb9b
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Nov 18, 2024
295cabe
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Nov 18, 2024
7c5038c
Add async copying to input preparation (#497)
jkaniecki Nov 18, 2024
8155ba7
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Nov 18, 2024
3400180
format.sh
kzawora-intel Nov 18, 2024
6ae5229
Nov 18 rebase (#485)
kzawora-intel Nov 18, 2024
c79982d
[BUGFIX]fix FP8 failing issue on habana_main [PatchedVLLMKVCache fwd …
xuechendi Nov 18, 2024
2f43ebf
Set vllm-hpu-extension to a69bb99 (#521)
madamczykhabana Nov 19, 2024
8c3f56a
Update ray_hpu_executor.py (#522)
michalkuligowski Nov 20, 2024
6338608
Random sampler warmup (#506)
mfylcek Nov 20, 2024
efe0268
Skip empty steps in multi step sheduling (#526)
jkaniecki Nov 20, 2024
f481707
[bucketing overhaul 2/n] Delegate bucket management to HPUBucketingCo…
kdamaszk Nov 21, 2024
425d0be
[SW-201504] Adding Test Trigger (#533)
RonBenMosheHabana Nov 21, 2024
0d153cf
[SW-201504] Add Jenkins Tests Trigger (#537)
RonBenMosheHabana Nov 22, 2024
dbde4b8
[bucketing overhaul 3/n] Move HPUBucketingContext to vllm-hpu-extensi…
kdamaszk Nov 22, 2024
39c6b6c
Limit decode block size (#532)
mfylcek Nov 25, 2024
5eb8b1f
fix marlin flag set on hpu (#540)
nirda7 Nov 25, 2024
0f513bd
Fix profile run for multi LoRA (#549)
kdamaszk Nov 26, 2024
7133502
fix cutlass_fp8_supported flag set on hpu
nirda7 Nov 26, 2024
38c2d10
Fix cutlass_fp8_supported flag set on HPU (#550)
nirda7 Nov 26, 2024
b62f1b2
[HPU] Add mark_step configurable for the decoder layer. (#525)
jiminha Nov 26, 2024
633df59
Update cpu-test.yml (#544)
michalkuligowski Nov 26, 2024
4d8185f
Update *.sh (#545)
michalkuligowski Nov 26, 2024
3f0b0e4
Update run-lm-eval-gsm-vllm-baseline.sh (#552)
michalkuligowski Nov 26, 2024
b099337
Add HPU information to collect_env script (#430)
michalkuligowski Nov 26, 2024
b7d75b8
Intern2 habana (#489)
skirdey-inflection Nov 26, 2024
677741e
Added hpu as device argument
rsshaik1 Nov 26, 2024
0c62b0b
Added "hpu" as configurable device argument in test_lora_manager_hpu …
vivekgoe Nov 27, 2024
756485f
[BUG FIX] [SPEC DECODE] 0.6.4 rebase cause incorrectness in spec deco…
xuechendi Nov 28, 2024
d83b62f
CI fix (#563)
tzielinski-habana Nov 28, 2024
637bb57
Set vllm-hpu-extension to 50e10ea (#565)
mswiniarsk Nov 28, 2024
cff5c7f
Refactor FP8 Inc config and flow (#564)
nirda7 Nov 29, 2024
f295f07
Set vllm-hpu-extension to bc01901
iboiko-habana Nov 29, 2024
2aeea0b
Set vllm-hpu-extension to bc01901 (#567)
iboiko-habana Nov 29, 2024
cef2df0
to make repetition penalty faster (#442)
ccrhx4 Nov 29, 2024
49c9efa
Enable alibi fusedsdpa (#561)
itaraban Nov 29, 2024
56da9fc
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Dec 2, 2024
e438503
fix syntax error
kzawora-intel Dec 2, 2024
4b502a6
Set vllm-hpu-extension to fb36408 (#572)
mswiniarsk Dec 2, 2024
3cb5420
Set vllm-hpu-extension to cd520df (#574)
mswiniarsk Dec 3, 2024
1440f45
Revert "to make repetition penalty faster" (#570)
michalkuligowski Dec 3, 2024
b9d6f69
Regional compilation support (#576)
Kacper-Pietkun Dec 4, 2024
4796d16
Revert "Enable alibi fusedsdpa" (#585)
madamczykhabana Dec 4, 2024
8c76728
Prepare sin/cos buffers for rope outside model forward (#566)
tzielinski-habana Dec 4, 2024
f6865f4
Enable DeepseekV2 Lite/Chat models (#516)
hlin99 Dec 4, 2024
8754e17
Set vllm-hpu-extension to 070591a (#591)
mswiniarsk Dec 4, 2024
ad29332
[CI/BUILD] Spec decode ci (#524)
xuechendi Dec 5, 2024
a805205
Add host traces to high-level profilings (#577)
szutenberg Dec 6, 2024
e349f70
Enable patching Fused SDPA (#569)
nirda7 Dec 6, 2024
6a4f673
revert INC fixed version installation in requirements-hpu.txt for 1.1…
xuechendi Dec 6, 2024
e0e47ed
Add multiprocessing HPU executor (#559)
kzawora-intel Dec 6, 2024
858e0a0
fix WorkerWrapperBase and spec_decode rebase (#582)
xuechendi Dec 6, 2024
21323ed
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Dec 6, 2024
d8f395e
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Dec 6, 2024
48ab12b
fix mypy errors
kzawora-intel Dec 6, 2024
9204975
fix (hopefully) all linter errors
kzawora-intel Dec 6, 2024
ad8d5b7
Dec 06 rebase (#571)
kzawora-intel Dec 9, 2024
db68690
fix hpu destructors flow and remove finish_measurements (#379)
nirda7 Dec 9, 2024
0cce63a
Set vllm-hpu-extension to 4312768
SanjuCSudhakaran Dec 10, 2024
3473bc1
Set vllm-hpu-extension to 4312768 (#604)
vivekgoe Dec 10, 2024
239739c
Support mllama (llama 3.2) model for HPU (#491)
yisonzhu Dec 10, 2024
2126fd2
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Dec 10, 2024
89266bc
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Dec 10, 2024
5a166da
Update ray_hpu_executor.py
michalkuligowski Dec 10, 2024
0ad9b59
Enable padding aware scheduling by default on HPU (#606)
kzawora-intel Dec 10, 2024
17e6be7
Update CODEOWNERS
kzawora-intel Dec 10, 2024
15774c4
Update CODEOWNERS (#608)
kzawora-intel Dec 10, 2024
def7ac2
Fix TP>1 in encoder-decoder models (#607)
jkaniecki Dec 10, 2024
b8fff21
Add PunicaWrapperHPU to handle LoRA computations
SanjuCSudhakaran Dec 11, 2024
381453c
Align LoRA handling in HPU with PunicaWrapper class (#614)
kzawora-intel Dec 11, 2024
a9fde5f
Dec 10 rebase (#605)
michalkuligowski Dec 11, 2024
641367b
Revert "Dec 10 rebase"
michalkuligowski Dec 11, 2024
55f99ea
Revert "Dec 10 rebase" (#618)
kzawora-intel Dec 11, 2024
ad10b73
Revert "Revert "Dec 10 rebase""
kzawora-intel Dec 11, 2024
df7dd05
Revert "Revert "Dec 10 rebase"" (#619)
kzawora-intel Dec 11, 2024
07dbd34
fix graceful shutdown
kzawora-intel Dec 10, 2024
d312c92
Fix multiprocessing executor shutdown (#621)
michalkuligowski Dec 11, 2024
7ef6b2c
Update GitHub Actions targets (#622)
kzawora-intel Dec 11, 2024
449a89d
Add padding to encoder_seq_lens (#610)
kdamaszk Dec 12, 2024
d2128b4
Remove workaround for one_hot in eager/compile (#632)
anko-intel Dec 16, 2024
11c07e3
Add shutdown_inc method to MultiprocessingHPUExecutor (#634)
nirda7 Dec 16, 2024
ba1d24b
Fix recompilations due to different batch_sizes in MSS (#637)
mfylcek Dec 16, 2024
c9a740f
Fix CI reports (#636)
afierka-intel Dec 16, 2024
da61ecf
Unit scales in FP8 CI scenarios (#633)
afierka-intel Dec 16, 2024
adac58e
multimodality fix
adobrzyniewicz-habana Dec 17, 2024
e8ce81e
formating
adobrzyniewicz-habana Dec 17, 2024
d81f829
TC llama recompile fix - no_grad to inference_mode (#640)
RafLit Dec 18, 2024
88ef381
Generic call for prepare_cos_sin in rotary embedding (#638)
tzielinski-habana Dec 18, 2024
67df809
undo changes in layer.py
adobrzyniewicz-habana Dec 18, 2024
0fdac85
Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava
adobrzyniewicz-habana Dec 18, 2024
9555fef
Update CODEOWNERS (#649)
vivekgoe Dec 19, 2024
1259d8d
remove past code
adobrzyniewicz-habana Dec 20, 2024
5c59ccd
Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava
adobrzyniewicz-habana Dec 30, 2024
2443ba9
Fix long contexts in LoRA (#624)
SanjuCSudhakaran Jan 2, 2025
2012336
Lora manager tests fix (#652)
rsshaik1 Jan 2, 2025
5b5bf26
Fix LoRA tests (#664)
SanjuCSudhakaran Jan 2, 2025
2d24be7
[BUG fix] Rebase caused spec decode fix (#613)
xuechendi Jan 7, 2025
27a22ab
fix slow sampling when repetition_penalty is set. (#584)
ccrhx4 Jan 7, 2025
9d6917f
Optimize for topk=1 case if we do not handle duplicates (#603)
ssarkar2 Jan 7, 2025
5d582b5
[bugfix] fix RuntimeError on apc (#648)
kkimmk Jan 7, 2025
585ca9a
Add llava support to benchmark_throuhput (#665)
adobrzyniewicz-habana Jan 8, 2025
8f53dee
Add mllama support to benchmark_throughput (#668)
kdamaszk Jan 8, 2025
49a11e2
Add mark_step for encoder layers (#669)
yma11 Jan 8, 2025
cccf363
Use FusedSDPA for MllamaVisionSdpaAttention (#620)
kdamaszk Jan 8, 2025
fa9dbf2
Limit number of dummy cross attention blocks (#667)
kdamaszk Jan 8, 2025
cbfb022
send placeholder_index_maps
adobrzyniewicz-habana Jan 9, 2025
73aaf71
[SW-197036] - use torch._scaled_mm with hpu (#660)
nirda7 Jan 9, 2025
e411a64
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Jan 10, 2025
ab1ca6d
make the code actually run
kzawora-intel Jan 10, 2025
f3ecf00
make linters happy
kzawora-intel Jan 10, 2025
c5975f8
Handle LoRA specific changes in MSS (#675)
SanjuCSudhakaran Jan 11, 2025
c83289e
[SW-201504] Trigger Internal Tests (#538)
RonBenMosheHabana Jan 12, 2025
c245ef0
Fix model OOM issue in llama-405 and mixtral - 2nd attempt (#644)
afierka-intel Jan 13, 2025
eb0d42f
Add inc fp8 qunatization documentation (#635)
nirda7 Jan 13, 2025
f6b6092
Adds LoRA tests to vLLM CI pipeline (#680)
rsshaik1 Jan 14, 2025
132d40e
Update CODEOWNERS (#683)
michalkuligowski Jan 14, 2025
f51e265
Merge remote-tracking branch 'upstream/main' into private/kzawora/jan…
kzawora-intel Jan 14, 2025
ca8cb82
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Jan 14, 2025
7d13823
linter updates + bugfixes
kzawora-intel Jan 14, 2025
885c60d
Set vllm-hpu-extension to 6ac93fb (#684)
mfylcek Jan 15, 2025
aeebe54
Set cache size for t.compile even if there is no warmup (#689)
anko-intel Jan 15, 2025
47391dc
Jan 10 rebase (#677)
kzawora-intel Jan 15, 2025
9af82cd
Workaround to handle multi-card stall issue (#688)
SanjuCSudhakaran Jan 16, 2025
567f7e7
Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava
adobrzyniewicz-habana Jan 16, 2025
40bb71f
Fix weights load device use (#686)
nirda7 Jan 16, 2025
aaaac6c
format
adobrzyniewicz-habana Jan 16, 2025
a3197c6
Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava
adobrzyniewicz-habana Jan 16, 2025
b3a0db2
Move scores to float32 in case of running xgrammar on cpu (#695)
madamczykhabana Jan 16, 2025
4db525d
Clean-up LoRA flow (#518)
SanjuCSudhakaran Jan 17, 2025
2d85682
Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava
adobrzyniewicz-habana Jan 17, 2025
a685225
Check if kv_cache is tuple before calling split_kv_cache (#697)
kdamaszk Jan 17, 2025
a293e2e
Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava
adobrzyniewicz-habana Jan 17, 2025
7eea2df
[CI] Cleanup run_tests.sh logs (#700)
kzawora-intel Jan 17, 2025
ce50b1a
Merge remote-tracking branch 'upstream/main' into private/kzawora/reb…
kzawora-intel Jan 17, 2025
a128878
fix TP crashes
kzawora-intel Jan 17, 2025
2e53e75
make mypy happy
kzawora-intel Jan 17, 2025
21f5fb2
¿what the heck is incquark?
kzawora-intel Jan 17, 2025
f1e911d
i forgot brackets again
kzawora-intel Jan 17, 2025
ae67e4d
Multimodality fix for llava (#641)
adobrzyniewicz-habana Jan 17, 2025
018ce62
Rebase 2025-01-17 (#701)
kzawora-intel Jan 17, 2025
b10992b
Fix LoRA tests (#696)
SanjuCSudhakaran Jan 20, 2025
1252646
Updating README_GAUDI in habana_main (#690)
MohitIntel Jan 20, 2025
293bd87
Change vllm-hpu-extension revision to ae726d4
iboiko-habana Jan 20, 2025
cc069cb
Change vllm-hpu-extension revision to ae726d4 (#707)
iboiko-habana Jan 20, 2025
fedf706
Capabilities overhaul (#692)
madamczykhabana Jan 20, 2025
37eb4fc
[SW-216156] Fix mixtral Fused MoE issues after rebase (#708)
dudilester Jan 21, 2025
ed496d6
Support for multi step scheduling in enc dec models
jkaniecki Jan 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 1 addition & 30 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,33 +1,4 @@
# See https://help.github.com/articles/about-codeowners/
# for more info about CODEOWNERS file

# This lists cover the "core" components of vLLM that require careful review
/vllm/attention/backends/abstract.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/core @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/engine/llm_engine.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/executor/executor_base.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker_base.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/model_executor/layers/sampler.py @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
CMakeLists.txt @tlrmchlsmth

# vLLM V1
/vllm/v1 @WoosukKwon @robertgshaw2-neuralmagic @njhill @ywang96 @comaniac @alexm-neuralmagic

# Test ownership
/tests/async_engine @njhill @robertgshaw2-neuralmagic @simon-mo
/tests/test_inputs.py @DarkLight1337 @ywang96
/tests/entrypoints @DarkLight1337 @robertgshaw2-neuralmagic @simon-mo
/tests/models @DarkLight1337 @ywang96
/tests/multimodal @DarkLight1337 @ywang96
/tests/prefix_caching @comaniac @KuntaiDu
/tests/spec_decode @njhill @LiuXiaoxuanPKU
/tests/kernels @tlrmchlsmth @WoosukKwon
/tests/quantization @mgoin @robertgshaw2-neuralmagic
/.buildkite/lm-eval-harness @mgoin @simon-mo
/tests/distributed/test_multi_node_assignment.py @youkaichao
/tests/distributed/test_pipeline_parallel.py @youkaichao
/tests/distributed/test_same_node.py @youkaichao
/tests/multi_step @alexm-neuralmagic @comaniac
/tests/weight_loading @mgoin @youkaichao
/tests/basic_correctness/test_chunked_prefill @rkooo567 @comaniac
* @kzawora-intel @madamczykhabana @michalkuligowski @mgawarkiewicz @vivekgoe @afierka-intel
10 changes: 10 additions & 0 deletions .github/actionlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
self-hosted-runner:
# Labels of self-hosted runner in array of strings.
labels:
- generic-runner
paths:
.github/workflows/trigger_jenkins.yml:
ignore:
- shellcheck reported issue in this script: SC2116:.+
- shellcheck reported issue in this script: SC2086:.+
- shellcheck reported issue in this script: SC2001:.+
4 changes: 2 additions & 2 deletions .github/workflows/actionlint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ name: Lint GitHub Actions workflows
on:
push:
branches:
- "main"
- "habana_main"
paths:
- '.github/workflows/*.ya?ml'
- '.github/workflows/actionlint.*'
- '.github/workflows/matchers/actionlint.json'
pull_request:
branches:
- "main"
- "habana_main"
paths:
- '.github/workflows/*.ya?ml'
- '.github/workflows/actionlint.*'
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/clang-format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ name: clang-format

on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# but only for the habana_main branch
push:
branches:
- main
- habana_main
paths:
- '**/*.h'
- '**/*.cpp'
Expand All @@ -14,7 +14,7 @@ on:
- '.github/workflows/clang-format.yml'
pull_request:
branches:
- main
- habana_main
paths:
- '**/*.h'
- '**/*.cpp'
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/codespell.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
# but only for the main branch
push:
branches:
- main
- habana_main
paths:
- "**/*.py"
- "**/*.md"
Expand All @@ -15,7 +15,7 @@ on:
- .github/workflows/codespell.yml
pull_request:
branches:
- main
- habana_main
paths:
- "**/*.py"
- "**/*.md"
Expand Down
35 changes: 35 additions & 0 deletions .github/workflows/cpu-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: cpu-test

on:
# Trigger the workflow on push or pull request,
# but only for the habana_main branch
push:
branches:
- habana_main
pull_request:
branches:
- habana_main


jobs:
cputest:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11"]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install torch --extra-index-url https://download.pytorch.org/whl/cpu
pip install -r requirements-build.txt
pip install -r requirements-hpu.txt
VLLM_TARGET_DEVICE=hpu python setup.py develop
- name: cpu-test
run: |
VLLM_SKIP_WARMUP=true VLLM_PROMPT_SEQ_BUCKET_MAX=128 VLLM_USE_FAKE_HPU=1 python examples/offline_inference_fakehpu.py
4 changes: 2 additions & 2 deletions .github/workflows/doc-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ name: Lint documentation
on:
push:
branches:
- main
- habana_main
paths:
- "docs/**"
pull_request:
branches:
- main
- habana_main
paths:
- "docs/**"

Expand Down
82 changes: 0 additions & 82 deletions .github/workflows/lint-and-deploy.yaml

This file was deleted.

6 changes: 3 additions & 3 deletions .github/workflows/mypy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@ name: mypy

on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# but only for the habana_main branch
push:
branches:
- main
- habana_main
paths:
- '**/*.py'
- '.github/workflows/mypy.yaml'
- 'tools/mypy.sh'
- 'pyproject.toml'
pull_request:
branches:
- main
- habana_main
# This workflow is only relevant when one of the following files changes.
# However, we have github configured to expect and require this workflow
# to run and pass before github with auto-merge a pull request. Until github
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/png-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ name: Lint PNG exports from excalidraw
on:
push:
branches:
- "main"
- "habana_main"
paths:
- '*.excalidraw.png'
- '.github/workflows/png-lint.yml'
pull_request:
branches:
- "main"
- "habana_main"
paths:
- '*.excalidraw.png'
- '.github/workflows/png-lint.yml'
Expand Down
21 changes: 0 additions & 21 deletions .github/workflows/reminder_comment.yml

This file was deleted.

6 changes: 3 additions & 3 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ name: ruff

on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# but only for the habana_main branch
push:
branches:
- main
- habana_main
paths:
- "**/*.py"
- pyproject.toml
Expand All @@ -14,7 +14,7 @@ on:
- .github/workflows/ruff.yml
pull_request:
branches:
- main
- habana_main
# This workflow is only relevant when one of the following files changes.
# However, we have github configured to expect and require this workflow
# to run and pass before github with auto-merge a pull request. Until github
Expand Down
73 changes: 73 additions & 0 deletions .github/workflows/scorecard.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# This workflow uses actions that are not certified by GitHub. They are provided
# by a third-party and are governed by separate terms of service, privacy
# policy, and support documentation.

name: Scorecard supply-chain security
on:
# For Branch-Protection check. Only the default branch is supported. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#branch-protection
branch_protection_rule:
# To guarantee Maintained check is occasionally updated. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
schedule:
- cron: '20 13 * * 0'
push:
branches: [ "habana_main" ]

# Declare default permissions as read only.
permissions: read-all

jobs:
analysis:
name: Scorecard analysis
runs-on: ubuntu-latest
permissions:
# Needed to upload the results to code-scanning dashboard.
security-events: write
# Needed to publish results and get a badge (see publish_results below).
id-token: write
# Uncomment the permissions below if installing in a private repository.
# contents: read
# actions: read

steps:
- name: "Checkout code"
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
persist-credentials: false

- name: "Run analysis"
uses: ossf/scorecard-action@0864cf19026789058feabb7e87baa5f140aac736 # v2.3.1
with:
results_file: results.sarif
results_format: sarif
# (Optional) "write" PAT token. Uncomment the `repo_token` line below if:
# - you want to enable the Branch-Protection check on a *public* repository, or
# - you are installing Scorecard on a *private* repository
# To create the PAT, follow the steps in https://github.com/ossf/scorecard-action?tab=readme-ov-file#authentication-with-fine-grained-pat-optional.
# repo_token: ${{ secrets.SCORECARD_TOKEN }}

# Public repositories:
# - Publish results to OpenSSF REST API for easy access by consumers
# - Allows the repository to include the Scorecard badge.
# - See https://github.com/ossf/scorecard-action#publishing-results.
# For private repositories:
# - `publish_results` will always be set to `false`, regardless
# of the value entered here.
publish_results: false

# Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
# format to the repository Actions tab.
- name: "Upload artifact"
uses: actions/upload-artifact@97a0fba1372883ab732affbe8f94b823f91727db # v3.pre.node20
with:
name: SARIF file
path: results.sarif
retention-days: 5

# Upload the results to GitHub's code scanning dashboard (optional).
# Commenting out will disable upload of results to your repo's Code Scanning dashboard
- name: "Upload to code-scanning"
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
Loading
Loading