Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CFGSHIFTMASK instruction #4

Open
rtawfik01 opened this issue May 28, 2024 · 0 comments
Open

Use CFGSHIFTMASK instruction #4

rtawfik01 opened this issue May 28, 2024 · 0 comments
Labels
Performance Feature that helps with performance, not a blocker for functionality

Comments

@rtawfik01
Copy link
Collaborator

Blackhole has new CFGSHIFTMASK that can update addresses for the unpacker instructions inside the mop/replay buffers.

Instead of updating addresses in this method:

  TTI_RDCFG(p_gpr_unpack::TMP0, THCON_SEC0_REG3_Base_address_ADDR32);
  TTI_ADDDMAREG(0, p_gpr_unpack::TMP0, p_gpr_unpack::TMP0, p_gpr_unpack::TILE_SIZE_A);
  TTI_STALLWAIT(p_stall::STALL_CFG, p_stall::THCON);
  TTI_WRCFG(p_gpr_unpack::TMP0,0,THCON_SEC0_REG3_Base_address_ADDR32);

Using the CFGSHIFTMASK instruction, it could be done like this:

TTI_CFGSHIFTMASK(1, 0b011, 32 - 1, 0, 0b11, THCON_SEC0_REG3_Base_address_ADDR32); // THCON_SEC0_REG3_Base_address_ADDR32 =  THCON_SEC0_REG3_Base_address_ADDR32 +  SCRATCH_SEC0_val_ADDR32 

as long as the scratch buffer is correctly populated:

TTI_WRCFG(p_gpr_unpack::TILE_SIZE_A, 0, SCRATCH_SEC0_val_ADDR32);
TTI_NOP;

If an operation is unpacker bound, then using the CFGSHIFTMASK should increase performance.

@ttmtrajkovic @rdjogoTT fyi.

@rtawfik01 rtawfik01 added the Performance Feature that helps with performance, not a blocker for functionality label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Feature that helps with performance, not a blocker for functionality
Projects
None yet
Development

No branches or pull requests

1 participant