Skip to content

Latest commit

 

History

History
21 lines (14 loc) · 867 Bytes

bmm.md

File metadata and controls

21 lines (14 loc) · 867 Bytes

BMM-style neighborhood attention

Simplified visualization of GEMM-based neighborhood attention.

BMM-style implementations break the attention operation into three primary stages:

  • A = QK^T
  • P = Softmax(A)
  • X = PV

BMM-style is typically the most straightforward way of implementing attention, and as a result of that most of NATTEN's implementations are BMM-style. Given that we can just use PyTorch's native softmax op, it is not independently implemented in NATTEN.

For more information on how the two operations (QK and PV) are implemented, please refer to backend docs.

For more details, we highly recommend reading our paper.