-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MediaTek DSP memory layout improvements #83292
Open
andyross
wants to merge
3
commits into
zephyrproject-rtos:main
Choose a base branch
from
andyross:mtk-mem-layout
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+117
−22
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a tiny feature at the top level allowing a new section (".stacks") for all stack memory (vs. .noinit and/or .user_stacks) such that this device can place them in SRAM for performance. Longer term we should come up with a unified plan for "stack placement" such that it can coexist with USERSPACE and KERNEL_COHERENCE, which also want control over stack memory at the linker level. But this is very simple. Signed-off-by: Andy Ross <[email protected]>
Further cleanup for this architecture. With .text/.rodata left in DRAM, use the higher performance memory for what is likely to be the most used app data. Signed-off-by: Andy Ross <[email protected]>
Both as a way to validate the "stacks/data in SRAM" feature and for personal curiosity, I added a quick test case that measures memory read latency over various linker memory sections, ex: START - mem_lat Measuring estimated load latency (dcache disabled): .data: 40.011 cyc .rodata: 224.117 cyc .bss: 40.000 cyc .text: 222.756 cyc __nocache: 221.512 cyc Measuring estimated load latency (dcache enabled): .data: 1.002 cyc .rodata: 1.014 cyc .bss: 1.002 cyc .text: 1.014 cyc __nocache: 222.564 cyc PASS - mem_lat in 0.022 seconds Note that there isn't actually any board-specific code here. This should run on any Xtensa device with a cache. Consider moving somewhere better. Signed-off-by: Andy Ross <[email protected]>
eddy1021
approved these changes
Dec 23, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from google team.
fabiobaltieri
approved these changes
Dec 23, 2024
dcpleung
approved these changes
Jan 6, 2025
nashif
approved these changes
Jan 6, 2025
@andyross please rebase / resolve merge conflicts |
@andyross a reminder that this needs a rebase |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Move stacks, .data and .bss from DRAM to SRAM. The SRAM on these devices is limited; we can't put the entire firmware in it. But it shows 5x faster latency from the DSP, so try to prioritize default linkage to put mutable data there.
Actually in point of fact even the SRAM is pretty slow (~40 cycle read latency). Thankfully these devices have monster caches, so in practice things usually work out.