[AMD] Support fp16 upcast in scaled dot #5543

antiagainst · 2025-01-06T23:33:34Z

AMD gfx9 architectures do not have native bf16 VALU instructions so doing bf16 scaling can be expensive.

This commit prototypes upcasting to fp16 for computation. It would mean relaxing to support fp16 in dot_scaled frontend and upcast_mxfp op definitions.

Right now the fp16 path is turned on if one input is fp16 for prototyping. A more proper way might be introducing a math_dtype to explicitly control.

AMD gfx9 architectures do not have native bf16 VALU instructions so doing bf16 scaling can be expensive. This commit prototypes upcasting to fp16 for computation. It would mean relaxing to support fp16 in dot_scaled frontend and upcast_mxfp op definitions. Right now the fp16 path is turned on if one input is fp16 for prototyping. A more proper way might be introducing a `math_dtype` to explicitly control.

lezcano

Just one small question.

FWIW, once #5475 becomes a thing all this will be trivial to implement. That PR is still very much WIP tho :)

lezcano · 2025-01-10T09:57:01Z

lib/Dialect/TritonGPU/IR/Ops.cpp

+RankedTensorType
+UpcastMXFPOp::deduceOutputType(TypedValue<RankedTensorType> inputTensor,
+                               ScaleDotElemType inputElemType,
+                               Type outputElemType) {


Why this change?

lezcano · 2025-01-10T09:59:48Z

python/triton/language/core.py

+    Software emulation enables targeting hardware architectures without native microscaling
+    operation support. Right now for such case, microscaled lhs/rhs are upcasted to
+    :code:`bf16` element type beforehand for dot computation, with one exception:
+    for AMD CDNA3 specifically, if one of the inputs is of normal :code:`fp16` element type,


nit.

Suggested change

for AMD CDNA3 specifically, if one of the inputs is of normal :code:`fp16` element type,

for AMD CDNA3 specifically, if one of the inputs is of :code:`fp16` element type,

antiagainst force-pushed the amd-mxfp-fp16 branch from 10044b7 to 2e6f6ee Compare January 6, 2025 23:56

Add reference kernel impl for comparison

d085268

antiagainst force-pushed the amd-mxfp-fp16 branch 2 times, most recently from 34259a4 to d085268 Compare January 9, 2025 00:10

antiagainst added 2 commits January 10, 2025 05:16

Merge remote-tracking branch 'origin/main' into amd-mxfp-fp16

0b8542a

Clarify the behavior and emit warnings

48ed49c

antiagainst marked this pull request as ready for review January 10, 2025 06:58

antiagainst requested review from zhanglx13 and ptillet as code owners January 10, 2025 06:58

antiagainst requested review from lezcano and ThomasRaoux January 10, 2025 06:59

lezcano reviewed Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Support fp16 upcast in scaled dot #5543

[AMD] Support fp16 upcast in scaled dot #5543

antiagainst commented Jan 6, 2025

lezcano left a comment

lezcano Jan 10, 2025

lezcano Jan 10, 2025

	for AMD CDNA3 specifically, if one of the inputs is of normal :code:`fp16` element type,
	for AMD CDNA3 specifically, if one of the inputs is of :code:`fp16` element type,

[AMD] Support fp16 upcast in scaled dot #5543

Are you sure you want to change the base?

[AMD] Support fp16 upcast in scaled dot #5543

Conversation

antiagainst commented Jan 6, 2025

lezcano left a comment

Choose a reason for hiding this comment

lezcano Jan 10, 2025

Choose a reason for hiding this comment

lezcano Jan 10, 2025

Choose a reason for hiding this comment