[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible #5187

plotfi · 2024-11-19T00:20:04Z

This is for handling an optimizable case of tl.atomic_add(ptr, 0) and tl.atomic_xchg where the ptr type is scalar and the xchg op result is never used.

This path lowers to PTX ld and st with supported fields for:

scoping: { .cta, .cluster, .gpu, .sys }, and
acquire/release semantics

Generating:

ld{ .relaxed, .acquire }{ .cta, .cluster, .gpu, .sys }
st{ .relaxed, .release }{ .cta, .cluster, .gpu, .sys }

The purpose is to generate better code for synchronizing groups of threads during a cooperative thread launch.

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

python/triton/language/core.py

Jokeren · 2024-11-19T00:46:47Z

@peterbell10 is familiar with this

plotfi · 2024-12-05T02:22:13Z

Updated to incorporate feedback on:

making tl.atomic_load/tl.atomic_store to lower into a separate AtomicLoadOp/AtomicStoreOp operations.
making support for these new atomic ops to only handle for the scalar case

TODO: For now the lowering to LLVM is PTX only, and it is also a copy of LoadOp/StoreOps matchAndRewrite lowering.

plotfi · 2024-12-18T08:45:13Z

Updated to remove boilerplate and to do end to end codegen of the atomic load and store ops all the way down to LLVM.

@ptillet

…_COOPERATIVE) (#5381) This change sets the launch grid attribute before calling cuLaunchKernelEx. This change is intended to pair with load/store atomics from #5187 and is intended to add grid synchronization similar to what cooperative groups do. @ptillet Any recommendations on the UI for using this in code would be most welcome :-) - [X] I am not making a trivial change, such as fixing a typo in a comment. - [X] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [X] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [x] I have added tests. - `/python/test` for end-to-end tests - [?] This PR does not need a test because: I am not entirely sure how to test the use of one driver API attr versus another for this case yet. I did add a test that exercises the launch_cooperative_grid=True launch flag but I am not confirming that the plumbing triggers the use of the API attr in test, although I did confirm it does offline using an assert. - Select one of the following. - [X] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)

This is for handling an optimized case of tl.atomic_add(ptr, 0) for scalars This path lowers to PTX `ld.acquire.scope` (`.cta`, `.gpu`, `.sys`) The purpose is to generate better code for synchronizing groups of threads during a cooperative thread launch.

plotfi commented Nov 19, 2024

View reviewed changes

python/triton/language/core.py Outdated Show resolved Hide resolved

plotfi commented Nov 19, 2024

View reviewed changes

python/triton/language/core.py Outdated Show resolved Hide resolved

plotfi force-pushed the plotfi-atomic-ldst branch 4 times, most recently from 0f0a416 to da6a9f1 Compare December 5, 2024 02:19

plotfi force-pushed the plotfi-atomic-ldst branch 4 times, most recently from 906c05b to ba82b05 Compare December 9, 2024 18:43

plotfi mentioned this pull request Dec 9, 2024

[NVIDIA][Launcher] NV Cooperative Grid Launching (CU_LAUNCH_ATTRIBUTE_COOPERATIVE) #5381

Merged

6 tasks

plotfi force-pushed the plotfi-atomic-ldst branch 4 times, most recently from 70f69a7 to f435fad Compare December 10, 2024 20:03

plotfi marked this pull request as ready for review December 10, 2024 20:07

plotfi requested a review from ptillet as a code owner December 10, 2024 20:07

plotfi force-pushed the plotfi-atomic-ldst branch from f435fad to 19120b5 Compare December 10, 2024 22:32

plotfi force-pushed the plotfi-atomic-ldst branch from 19120b5 to e6d5388 Compare December 18, 2024 08:41

plotfi changed the title ~~Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load)~~ [FRONTEND][BACKEND] Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load) Dec 18, 2024

plotfi force-pushed the plotfi-atomic-ldst branch from e6d5388 to 7585650 Compare December 18, 2024 08:46

plotfi force-pushed the plotfi-atomic-ldst branch 2 times, most recently from 99f292b to 9ac3e61 Compare December 20, 2024 19:57

plotfi changed the title ~~[FRONTEND][BACKEND] Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load)~~ [BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible Dec 20, 2024

plotfi force-pushed the plotfi-atomic-ldst branch 2 times, most recently from 0866ebf to 25fde29 Compare December 20, 2024 23:50

plotfi force-pushed the plotfi-atomic-ldst branch 2 times, most recently from b6b115b to f73fa4c Compare January 10, 2025 04:21

plotfi force-pushed the plotfi-atomic-ldst branch from f73fa4c to 9de5ed4 Compare January 10, 2025 04:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible #5187

[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible #5187

plotfi commented Nov 19, 2024 •

edited

Loading

Jokeren commented Nov 19, 2024

plotfi commented Dec 5, 2024

plotfi commented Dec 18, 2024

[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible #5187

Are you sure you want to change the base?

[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible #5187

Conversation

plotfi commented Nov 19, 2024 • edited Loading

Jokeren commented Nov 19, 2024

plotfi commented Dec 5, 2024

plotfi commented Dec 18, 2024

plotfi commented Nov 19, 2024 •

edited

Loading