You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replay buffers are currently under-utilized in the kernels. For the case of unpacker kernels, replay buffers can be used to update the l1 tile addresses without mmio accesses:
The following unpacker kernels do not use replay buffers:
llk_unpack_A.h
llk_unpack_AB.h
llk_unpack_reduce.h
llk_unpack_tilize.h
Performance measurements for the above kernels should be done, and operations that are unpack bound can try implementing the addresses updates for performance increase. Eltwise binary/unary operations for example are around ~15% math util in buda performance measurements.
Replay buffers are currently under-utilized in the kernels. For the case of unpacker kernels, replay buffers can be used to update the l1 tile addresses without mmio accesses:
or can also use CFGSHIFTMASK method here: #4
The following unpacker kernels do not use replay buffers:
Performance measurements for the above kernels should be done, and operations that are unpack bound can try implementing the addresses updates for performance increase. Eltwise binary/unary operations for example are around ~15% math util in buda performance measurements.
@ttmtrajkovic @rdjogoTT fyi
The text was updated successfully, but these errors were encountered: