Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible #5187

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

plotfi
Copy link
Contributor

@plotfi plotfi commented Nov 19, 2024

This is for handling an optimizable case of tl.atomic_add(ptr, 0) and tl.atomic_xchg where the ptr type is scalar and the xchg op result is never used.

This path lowers to PTX ld and st with supported fields for:

  • scoping: { .cta, .cluster, .gpu, .sys }, and
  • acquire/release semantics

Generating:

ld{ .relaxed, .acquire }{ .cta, .cluster, .gpu, .sys }
st{ .relaxed, .release }{ .cta, .cluster, .gpu, .sys }

The purpose is to generate better code for synchronizing groups of threads during a cooperative thread launch.

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.

      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

@Jokeren
Copy link
Contributor

Jokeren commented Nov 19, 2024

@peterbell10 is familiar with this

@plotfi plotfi force-pushed the plotfi-atomic-ldst branch 4 times, most recently from 0f0a416 to da6a9f1 Compare December 5, 2024 02:19
@plotfi
Copy link
Contributor Author

plotfi commented Dec 5, 2024

Updated to incorporate feedback on:

  • making tl.atomic_load/tl.atomic_store to lower into a separate AtomicLoadOp/AtomicStoreOp operations.
  • making support for these new atomic ops to only handle for the scalar case

TODO: For now the lowering to LLVM is PTX only, and it is also a copy of LoadOp/StoreOps matchAndRewrite lowering.

@plotfi plotfi force-pushed the plotfi-atomic-ldst branch 4 times, most recently from 906c05b to ba82b05 Compare December 9, 2024 18:43
@plotfi plotfi force-pushed the plotfi-atomic-ldst branch 4 times, most recently from 70f69a7 to f435fad Compare December 10, 2024 20:03
@plotfi plotfi marked this pull request as ready for review December 10, 2024 20:07
@plotfi plotfi requested a review from ptillet as a code owner December 10, 2024 20:07
@plotfi plotfi force-pushed the plotfi-atomic-ldst branch from f435fad to 19120b5 Compare December 10, 2024 22:32
@plotfi plotfi force-pushed the plotfi-atomic-ldst branch from 19120b5 to e6d5388 Compare December 18, 2024 08:41
@plotfi plotfi changed the title Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load) [FRONTEND][BACKEND] Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load) Dec 18, 2024
@plotfi
Copy link
Contributor Author

plotfi commented Dec 18, 2024

Updated to remove boilerplate and to do end to end codegen of the atomic load and store ops all the way down to LLVM.

@plotfi plotfi force-pushed the plotfi-atomic-ldst branch from e6d5388 to 7585650 Compare December 18, 2024 08:46
htyu pushed a commit that referenced this pull request Dec 19, 2024
…_COOPERATIVE) (#5381)

This change sets the launch grid attribute before calling
cuLaunchKernelEx.

This change is intended to pair with load/store atomics from
#5187

and is intended to add grid synchronization similar to what cooperative
groups do.

@ptillet Any recommendations on the UI for using this in code would be
most welcome :-)

- [X] I am not making a trivial change, such as fixing a typo in a
comment.

- [X] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).

- [X] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [x] I have added tests.
    - `/python/test` for end-to-end tests
  - [?] This PR does not need a test because: 
  
I am not entirely sure how to test the use of one driver API attr versus
another for this case yet.

I did add a test that exercises the launch_cooperative_grid=True launch
flag but I am not confirming that the plumbing triggers the use of the
API attr in test, although I did confirm it does offline using an
assert.
  

- Select one of the following.
  - [X] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
@plotfi plotfi force-pushed the plotfi-atomic-ldst branch 2 times, most recently from 99f292b to 9ac3e61 Compare December 20, 2024 19:57
@plotfi plotfi changed the title [FRONTEND][BACKEND] Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load) [BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible Dec 20, 2024
@plotfi plotfi force-pushed the plotfi-atomic-ldst branch 2 times, most recently from 0866ebf to 25fde29 Compare December 20, 2024 23:50
@plotfi plotfi force-pushed the plotfi-atomic-ldst branch 2 times, most recently from b6b115b to f73fa4c Compare January 10, 2025 04:21
This is for handling an optimized case of tl.atomic_add(ptr, 0) for scalars

This path lowers to PTX `ld.acquire.scope` (`.cta`, `.gpu`, `.sys`)

The purpose is to generate better code for synchronizing groups of
threads during a cooperative thread launch.
@plotfi plotfi force-pushed the plotfi-atomic-ldst branch from f73fa4c to 9de5ed4 Compare January 10, 2025 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants