-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible #5187
Open
plotfi
wants to merge
1
commit into
triton-lang:main
Choose a base branch
from
plotfi:plotfi-atomic-ldst
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+246
−0
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
plotfi
commented
Nov 19, 2024
plotfi
commented
Nov 19, 2024
@peterbell10 is familiar with this |
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
4 times, most recently
from
December 5, 2024 02:19
0f0a416
to
da6a9f1
Compare
Updated to incorporate feedback on:
TODO: For now the lowering to LLVM is PTX only, and it is also a copy of LoadOp/StoreOps matchAndRewrite lowering. |
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
4 times, most recently
from
December 9, 2024 18:43
906c05b
to
ba82b05
Compare
6 tasks
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
4 times, most recently
from
December 10, 2024 20:03
70f69a7
to
f435fad
Compare
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
from
December 10, 2024 22:32
f435fad
to
19120b5
Compare
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
from
December 18, 2024 08:41
19120b5
to
e6d5388
Compare
plotfi
changed the title
Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load)
[FRONTEND][BACKEND] Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load)
Dec 18, 2024
Updated to remove boilerplate and to do end to end codegen of the atomic load and store ops all the way down to LLVM. |
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
from
December 18, 2024 08:46
e6d5388
to
7585650
Compare
htyu
pushed a commit
that referenced
this pull request
Dec 19, 2024
…_COOPERATIVE) (#5381) This change sets the launch grid attribute before calling cuLaunchKernelEx. This change is intended to pair with load/store atomics from #5187 and is intended to add grid synchronization similar to what cooperative groups do. @ptillet Any recommendations on the UI for using this in code would be most welcome :-) - [X] I am not making a trivial change, such as fixing a typo in a comment. - [X] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [X] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [x] I have added tests. - `/python/test` for end-to-end tests - [?] This PR does not need a test because: I am not entirely sure how to test the use of one driver API attr versus another for this case yet. I did add a test that exercises the launch_cooperative_grid=True launch flag but I am not confirming that the plumbing triggers the use of the API attr in test, although I did confirm it does offline using an assert. - Select one of the following. - [X] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
2 times, most recently
from
December 20, 2024 19:57
99f292b
to
9ac3e61
Compare
plotfi
changed the title
[FRONTEND][BACKEND] Atomic Load and Store operations for Triton (tl.atomic_store/tl.atomic_load)
[BACKEND] Promote tl.atomic_add and tl.atomic_xchg to PTX ld/st when possible
Dec 20, 2024
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
2 times, most recently
from
December 20, 2024 23:50
0866ebf
to
25fde29
Compare
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
2 times, most recently
from
January 10, 2025 04:21
b6b115b
to
f73fa4c
Compare
This is for handling an optimized case of tl.atomic_add(ptr, 0) for scalars This path lowers to PTX `ld.acquire.scope` (`.cta`, `.gpu`, `.sys`) The purpose is to generate better code for synchronizing groups of threads during a cooperative thread launch.
plotfi
force-pushed
the
plotfi-atomic-ldst
branch
from
January 10, 2025 04:28
f73fa4c
to
9de5ed4
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is for handling an optimizable case of tl.atomic_add(ptr, 0) and tl.atomic_xchg where the ptr type is scalar and the xchg op result is never used.
This path lowers to PTX ld and st with supported fields for:
Generating:
The purpose is to generate better code for synchronizing groups of threads during a cooperative thread launch.
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
I have added tests.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsThe
lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)