-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI/Build] custom build backend and dynamic build dependencies #7525
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
ff4d15f
to
b5edbeb
Compare
b5edbeb
to
fdcae32
Compare
fdcae32
to
05292a4
Compare
why do we need build isolation? vllm build needs to build against pytorch, and it does not make sense to install pytorch (can be complicated) in an isolated environment and build against it. I'd like to specify the build dependency in it looks strange that pip does not support this out of the box. |
As long as the build-time and run-time torch dependencies match, there's no issue with this.
We can do this using |
05292a4
to
6a1ae68
Compare
it does not make sense for vllm. in our case, the build will build against pytorch, with binary dependency. and then it needs to run with that pytorch. I don't see any benefit w.r.t. isolated build, while it introduces additional cost of unnecessarily installing pytorch again. and as i mentioned, installing pytorch in an isolated environment can be complicated, or even impossible. users might bring their own custom build, and the pytorch might come directly from the base container. since you are building a custom build backend here, I assume you can disable the isolated build by default, and still read the build time dependency and use |
6a1ae68
to
4121ba1
Compare
_build_backend/vllm.py
Outdated
*requirements_extras, | ||
] | ||
print( | ||
f"vllm build-backend: resolved build dependencies to: {complete_requirements}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the list in a nicer way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up removing this altogether, before removal it looked like this:
$ python -m build -v --wheel --installer=uv
* Creating isolated environment: venv+uv...
* Using external uv from /usr/bin/uv
* Installing packages in isolated environment:
- setuptools
- setuptools-scm
> /usr/bin/uv pip install setuptools-scm setuptools
< Using Python 3.12.8 environment at: /tmp/build-env-t84pkm2n
< Resolved 3 packages in 3ms
< warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
< If the cache and target directories are on different filesystems, hardlinking may not be supported.
< If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
< Installed 3 packages in 16ms
< + packaging==24.2
< + setuptools==75.8.0
< + setuptools-scm==8.1.0
* Getting build dependencies for wheel...
vllm build-backend: resolved build dependencies to:
setuptools>=61
setuptools-scm>=8
cmake>=3.26
ninja
packaging
setuptools
setuptools-scm
wheel
torch==2.5.1
* Installing packages in isolated environment:
- cmake>=3.26
- ninja
- packaging
- setuptools
- setuptools-scm
- setuptools-scm>=8
- setuptools>=61
- torch==2.5.1
- wheel
> /usr/bin/uv pip install torch==2.5.1 ninja cmake>=3.26 setuptools-scm wheel setuptools setuptools>=61 setuptools-scm>=8 packaging
[...]
which ended up duplicating a lot of information
requirements-neuron.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these files still required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are runtime dependencies, not build time.
fb84bf8
to
85ff124
Compare
This pull request has merge conflicts that must be resolved before it can be |
26905c7
to
e754ed3
Compare
e754ed3
to
ccc246d
Compare
379867a
to
bd42837
Compare
@@ -22,7 +22,7 @@ ENV LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4:/usr/local/li | |||
|
|||
RUN echo 'ulimit -c 0' >> ~/.bashrc | |||
|
|||
RUN pip install intel_extension_for_pytorch==2.5.0 | |||
RUN pip install intel_extension_for_pytorch==2.5.0 # FIXME: should this be a dependency in requirements-cpu.txt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anybody know if there's a specific reason on why this was added here instead of being added to requirements-cpu.txt
?
One potential issue could be that requirements-cpu.txt
also is used by ppc and arm Dockerfiles, although this could fixed by adding
intel_extension_for_pytorch==2.5.0; platform_machine == "x86_64"
This does not solve the issue when using AMD processors, but this is currently when building Dockerfile.cpu
bd42837
to
7b29fd6
Compare
b26d4b4
to
f9ea699
Compare
|
||
|
||
def _is_hip() -> bool: | ||
return (VLLM_TARGET_DEVICE == "cuda" | ||
or VLLM_TARGET_DEVICE == "rocm") and torch.version.hip is not None | ||
return VLLM_TARGET_DEVICE == "rocm" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to stress out this change: I think it's better to explicitly set the ROCm target device rather than inferring it based on the value of torch.version.hip
. Dockerfile.rocm
has been updated accordingly.
f9ea699
to
bafc285
Compare
async-engine-inputs-utils-worker-test failure seems unrelated |
a6a7288
to
f510738
Compare
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]> fix Dockerfile build Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
The rationale for this is that the current build setup uses different methods in each Dockerfile:
python setup.py bdist_wheel
python setup.py develop
python setup.py install
pip install -e ./vllm
as well as using different approaches for installing build requirements: using
requirements-build.txt
,pip install
-ing the expected dependencies or relying on the build dependencies defined inpyproject.toml
(see discussion here)As the first three methods above are deprecated, one might expect modern PEP517/PEP518 style builds to work (e.g.
pip install git+https://github.com/vllm-project/vllm
).This PR attempts solve these issues by:
_custom_backend/vllm.py
to dynamically resolve build dependencies at build time, based on the value ofVLLM_TARGET_DEVICE
, consolidating build depedency requirements in a single place.pip install --no-build-isolation
orpython -m build --no-isolation
are used.-Get rid of torch in
requirements-build.txt
(required torch version is computer per-device by the custom build backend)