Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review kernels for replay buffer usage #7

Open
4 tasks
rtawfik01 opened this issue May 28, 2024 · 0 comments
Open
4 tasks

Review kernels for replay buffer usage #7

rtawfik01 opened this issue May 28, 2024 · 0 comments
Labels
Feature New feature request Performance Feature that helps with performance, not a blocker for functionality

Comments

@rtawfik01
Copy link
Collaborator

rtawfik01 commented May 28, 2024

Replay buffers are currently under-utilized in the kernels. For the case of unpacker kernels, replay buffers can be used to update the l1 tile addresses without mmio accesses:

  TTI_RDCFG(p_gpr_unpack::TMP0, THCON_SEC0_REG3_Base_address_ADDR32);
  TTI_ADDDMAREG(0, p_gpr_unpack::TMP0, p_gpr_unpack::TMP0, p_gpr_unpack::TILE_SIZE_A);
  TTI_STALLWAIT(p_stall::STALL_CFG, p_stall::THCON);
  TTI_WRCFG(p_gpr_unpack::TMP0,0,THCON_SEC0_REG3_Base_address_ADDR32);

or can also use CFGSHIFTMASK method here: #4

The following unpacker kernels do not use replay buffers:

  • llk_unpack_A.h
  • llk_unpack_AB.h
  • llk_unpack_reduce.h
  • llk_unpack_tilize.h

Performance measurements for the above kernels should be done, and operations that are unpack bound can try implementing the addresses updates for performance increase. Eltwise binary/unary operations for example are around ~15% math util in buda performance measurements.

@ttmtrajkovic @rdjogoTT fyi

@rtawfik01 rtawfik01 added Performance Feature that helps with performance, not a blocker for functionality Feature New feature request labels May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature request Performance Feature that helps with performance, not a blocker for functionality
Projects
None yet
Development

No branches or pull requests

1 participant