BMM-style neighborhood attention

Simplified visualization of GEMM-based neighborhood attention.

BMM-style implementations break the attention operation into three primary stages:

A = QK^T
P = Softmax(A)
X = PV

BMM-style is typically the most straightforward way of implementing attention, and as a result of that most of NATTEN's implementations are BMM-style. Given that we can just use PyTorch's native softmax op, it is not independently implemented in NATTEN.

For more information on how the two operations (QK and PV) are implemented, please refer to backend docs.

For more details, we highly recommend reading our paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bmm.md

bmm.md

BMM-style neighborhood attention

Files

bmm.md

Latest commit

History

bmm.md

File metadata and controls

BMM-style neighborhood attention