Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI/Build] custom build backend and dynamic build dependencies #7525

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

dtrifiro
Copy link
Contributor

@dtrifiro dtrifiro commented Aug 14, 2024

The rationale for this is that the current build setup uses different methods in each Dockerfile:

  • python setup.py bdist_wheel
  • python setup.py develop
  • python setup.py install
  • pip install -e ./vllm

as well as using different approaches for installing build requirements: using requirements-build.txt, pip install-ing the expected dependencies or relying on the build dependencies defined in pyproject.toml (see discussion here)

As the first three methods above are deprecated, one might expect modern PEP517/PEP518 style builds to work (e.g. pip install git+https://github.com/vllm-project/vllm).

This PR attempts solve these issues by:

  • adding a custom build backend _custom_backend/vllm.py to dynamically resolve build dependencies at build time, based on the value of VLLM_TARGET_DEVICE, consolidating build depedency requirements in a single place.
  • builds will now happen in an isolated environment unless pip install --no-build-isolation or python -m build --no-isolation are used.
    -Get rid of torch in requirements-build.txt (required torch version is computer per-device by the custom build backend)
  • modifying Dockerfiles to use the PEP517/518 isolated build environment (when possible).

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch 2 times, most recently from ff4d15f to b5edbeb Compare August 14, 2024 17:37
@dtrifiro dtrifiro changed the title ] [CI/Build] custom build backend and dynamic build dependencies [CI/Build] custom build backend and dynamic build dependencies Aug 14, 2024
@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch from b5edbeb to fdcae32 Compare August 20, 2024 09:10
@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch from fdcae32 to 05292a4 Compare September 17, 2024 12:16
@dtrifiro dtrifiro marked this pull request as ready for review September 17, 2024 13:09
@youkaichao
Copy link
Member

why do we need build isolation? vllm build needs to build against pytorch, and it does not make sense to install pytorch (can be complicated) in an isolated environment and build against it.

I'd like to specify the build dependency in pyproject.toml, get rid of the requirements.txt, while still doing non-isolated build.

it looks strange that pip does not support this out of the box.

@dtrifiro
Copy link
Contributor Author

@youkaichao

why do we need build isolation? vllm build needs to build against pytorch, and it does not make sense to install pytorch (can be complicated) in an isolated environment and build against it.

As long as the build-time and run-time torch dependencies match, there's no issue with this.

I'd like to specify the build dependency in pyproject.toml, get rid of the requirements.txt, while still doing non-isolated build.
it looks strange that pip does not support this out of the box.

We can do this using pip install --no-build-isolation or python -m build --no-isolation, although you will have to install every build dependency beforehand.

@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch from 05292a4 to 6a1ae68 Compare September 23, 2024 15:37
@youkaichao
Copy link
Member

As long as the build-time and run-time torch dependencies match, there's no issue with this.

it does not make sense for vllm. in our case, the build will build against pytorch, with binary dependency. and then it needs to run with that pytorch. I don't see any benefit w.r.t. isolated build, while it introduces additional cost of unnecessarily installing pytorch again.

and as i mentioned, installing pytorch in an isolated environment can be complicated, or even impossible. users might bring their own custom build, and the pytorch might come directly from the base container.

since you are building a custom build backend here, I assume you can disable the isolated build by default, and still read the build time dependency and use pip to install them if they are not installed (e.g. for cmake etc)

*requirements_extras,
]
print(
f"vllm build-backend: resolved build dependencies to: {complete_requirements}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print the list in a nicer way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up removing this altogether, before removal it looked like this:

$ python -m build  -v --wheel --installer=uv                                                                                                                                                                       
* Creating isolated environment: venv+uv...
* Using external uv from /usr/bin/uv
* Installing packages in isolated environment:
  - setuptools
  - setuptools-scm
> /usr/bin/uv pip install setuptools-scm setuptools
< Using Python 3.12.8 environment at: /tmp/build-env-t84pkm2n
< Resolved 3 packages in 3ms
< warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
<          If the cache and target directories are on different filesystems, hardlinking may not be supported.
<          If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
< Installed 3 packages in 16ms
<  + packaging==24.2
<  + setuptools==75.8.0
<  + setuptools-scm==8.1.0
* Getting build dependencies for wheel...
vllm build-backend: resolved build dependencies to:
setuptools>=61
setuptools-scm>=8
cmake>=3.26
ninja
packaging
setuptools
setuptools-scm
wheel
torch==2.5.1
* Installing packages in isolated environment:
  - cmake>=3.26
  - ninja
  - packaging
  - setuptools
  - setuptools-scm
  - setuptools-scm>=8
  - setuptools>=61
  - torch==2.5.1
  - wheel
> /usr/bin/uv pip install torch==2.5.1 ninja cmake>=3.26 setuptools-scm wheel setuptools setuptools>=61 setuptools-scm>=8 packaging
[...]

which ended up duplicating a lot of information

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these files still required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are runtime dependencies, not build time.

@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch from fb84bf8 to 85ff124 Compare January 20, 2025 16:27
@mergify mergify bot added the documentation Improvements or additions to documentation label Jan 20, 2025
Copy link

mergify bot commented Jan 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dtrifiro.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 21, 2025
@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch 4 times, most recently from 26905c7 to e754ed3 Compare January 21, 2025 17:49
@mergify mergify bot removed the needs-rebase label Jan 21, 2025
@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch from e754ed3 to ccc246d Compare January 22, 2025 10:12
@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch 2 times, most recently from 379867a to bd42837 Compare January 22, 2025 10:53
@@ -22,7 +22,7 @@ ENV LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4:/usr/local/li

RUN echo 'ulimit -c 0' >> ~/.bashrc

RUN pip install intel_extension_for_pytorch==2.5.0
RUN pip install intel_extension_for_pytorch==2.5.0 # FIXME: should this be a dependency in requirements-cpu.txt?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anybody know if there's a specific reason on why this was added here instead of being added to requirements-cpu.txt?

One potential issue could be that requirements-cpu.txt also is used by ppc and arm Dockerfiles, although this could fixed by adding

intel_extension_for_pytorch==2.5.0; platform_machine == "x86_64"

This does not solve the issue when using AMD processors, but this is currently when building Dockerfile.cpu

@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch from bd42837 to 7b29fd6 Compare January 22, 2025 11:19
@dtrifiro dtrifiro marked this pull request as ready for review January 22, 2025 11:20
@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch 4 times, most recently from b26d4b4 to f9ea699 Compare January 22, 2025 12:27


def _is_hip() -> bool:
return (VLLM_TARGET_DEVICE == "cuda"
or VLLM_TARGET_DEVICE == "rocm") and torch.version.hip is not None
return VLLM_TARGET_DEVICE == "rocm"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to stress out this change: I think it's better to explicitly set the ROCm target device rather than inferring it based on the value of torch.version.hip. Dockerfile.rocm has been updated accordingly.

@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch from f9ea699 to bafc285 Compare January 22, 2025 12:34
@dtrifiro
Copy link
Contributor Author

async-engine-inputs-utils-worker-test failure seems unrelated

@dtrifiro dtrifiro force-pushed the pep517-pep518-improvements branch 3 times, most recently from a6a7288 to f510738 Compare January 22, 2025 15:27
Signed-off-by: Daniele Trifirò <[email protected]>

fix Dockerfile build

Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Signed-off-by: Daniele Trifirò <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants