Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GCC 13 in CUDA 12+ builds #129

Open
bdice opened this issue Dec 23, 2024 · 4 comments
Open

Use GCC 13 in CUDA 12+ builds #129

bdice opened this issue Dec 23, 2024 · 4 comments

Comments

@bdice
Copy link
Contributor

bdice commented Dec 23, 2024

CUDA 12.5 added support for GCC 13. Recently, conda-forge began using GCC 13 for CUDA 12 builds (conda-forge/conda-forge-pinning-feedstock#6736, conda-forge/conda-forge-pinning-feedstock#6849).

This issue proposes using GCC 13 for CUDA 12 builds of RAPIDS, to align with conda-forge.

One proposal for implementation is here: rapidsai/rmm#1773

I propose that we target this update for 25.02, to stay aligned with conda-forge.

@jameslamb
Copy link
Member

@bdice @robertmaynard and I talked in an offline conversation and decided to only pursue this for conda builds, leaving wheel builds on GCC 11 (which is set here in ci-imgs).

Summarizing...

On the one hand... it'd probably be safe to update to GCC 13. We're producing manylinux_2_28 wheels. The official PyPA manylinux images for doing that that just switched to GCC 14 (!) (pypa/manylinux#1730)... that suggests to me that going to a newer compiler for wheel builds in RAPIDS might be ok. Improved diagnostics in GCC 13 would be helpful for catching issues, and wheel builds do sometimes go down different codepaths than conda builds (for example, because more dependencies are built from source instead of linked to).

On the other hand... switching the host compiler based on CUDA version (as we'd have to stay on GCC 11 for CUDA 11) would be more painful and error-prone for wheel builds than it is for conda. And I can't think of specific reasons that continuing to use GCC 11 for wheel builds while using GCC 13 for conda builds would be problematic.

@jameslamb
Copy link
Member

jameslamb commented Jan 2, 2025

One other issue will need to be figured out... unified devcontainers.

The conda devcontainers expect every project to provide compiler pins, like this:

specific:
  - output_types: conda
    matrices:
      - matrix:
          arch: x86_64
        packages:
          - gcc_linux-64=11.*
          - sysroot_linux-64==2.17
      - matrix:
          arch: aarch64
        packages:
          - gcc_linux-aarch64=11.*
          - sysroot_linux-aarch64==2.17
  - output_types: conda
    matrices:
      - matrix:
          arch: x86_64
          cuda: "11.8"
        packages:
          - nvcc_linux-64=11.8
      - matrix:
          arch: aarch64
          cuda: "11.8"
        packages:
          - nvcc_linux-aarch64=11.8
      - matrix:
          cuda: "12.*"
        packages:
          - cuda-nvcc

(rmm/dependencies.yaml)

Which all get merged into a single conda environment in rapids-make-conda-env: https://github.com/rapidsai/devcontainers/blob/e1168d73bcbe5d5c96010471ac2f9accef943592/features/src/rapids-build-utils/opt/rapids-build-utils/bin/post-start-command.sh#L9

As soon as any one of the RAPIDS repos switches to GCC 13, I think the unified conda devcontainers will be broken until all of them are updated (because gcc_linux-64=11.* and gcc_linux-64=13.* are incompatible pins).

Since those matrices in dependencies.yaml only affect local development (not packages built in CI), the best way I can think of to do this could be to make them ranges in each "use GCC 13" PR, like this:

specific:
  - output_types: conda
    matrices:
      - matrix:
          arch: x86_64
          cuda: "11.*"
        packages:
          - gcc_linux-64=11.*
          - &sysroot_x86_64 sysroot_linux-64==2.17
      - matrix:
          arch: x86_64
          cuda: "12.*"
        packages:
          - gcc_linux-64>=11,<14
          - *sysroot_x86_64
      - matrix:
          arch: aarch64
          cuda: "11.*"
        packages:
          - gcc_linux-aarch64=11.*
          - &sysroot_aarch64 sysroot_linux-aarch64==2.17
      - matrix:
          arch: aarch64
          cuda: "12.*"
        packages:
          - gcc_linux-aarch64>=11,<14
          - *sysroot_aarch64

And then, once every project is updated, do one more round of PRs to tighten them:

specific:
  - output_types: conda
    matrices:
      - matrix:
          arch: x86_64
          cuda: "11.*"
        packages:
          - gcc_linux-64=11.*
          - &sysroot_x86_64 sysroot_linux-64==2.17
      - matrix:
          arch: x86_64
          cuda: "12.*"
        packages:
          - gcc_linux-64=13.*
          - *sysroot_x86_64
      - matrix:
          arch: aarch64
          cuda: "11.*"
        packages:
          - gcc_linux-aarch64=11.*
          - &sysroot_aarch64 sysroot_linux-aarch64==2.17
      - matrix:
          arch: aarch64
          cuda: "12.*"
        packages:
          - gcc_linux-aarch64=13.*
          - *sysroot_aarch64

@jakirkham
Copy link
Member

This argues in favor of having centralized pinnings that we can apply across RAPIDS projects. That could be useful even outside this context

@bdice
Copy link
Contributor Author

bdice commented Jan 13, 2025

I opened a series of PRs to implement this. I used automation but also did some manual review and updates for each PR, mostly to fix up dependencies.yaml and meta.yaml files (which are specified a little differently in every repository).

These PRs also contain changes from #131 to use glibc 2.28 / sysroot 2.28 in all builds.

Here is the merge order I propose (reverse topological order of the RAPIDS dependency tree).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants