-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] [ZEPHYR] sof-logger dead: Invalid filename length 1650680879 or ldc file does not match firmware #5352
Comments
P1 because it affects PR and daily testing. @kv2019i could this be related to thesofproject/linux#3275 @jsarha any clue now that you're an expert after fixing #5120? |
1650680879 == 0x6263642F == "bcd/" So it looks like we are reading a header? as length. |
I don't think so, the values seem fairly random. Here's a sample of a few different failures:
|
Something strange happened in the middle of a test on WHL:
|
This does not appear to happen on my TGL board. I have now run test-speaker.sh 730 times (had it running in a loop over night) on Zephyr build of the latest https://github.com/thesofproject/sof main. |
TGL does not seem to reproduce the most frequently and test-speaker neither. Do you have an APL board? I would also recommend reloading the firmware regularly, just an intuition based on staring at a lot of failures recently. https://github.com/thesofproject/sof-test/tree/main/tools/kmod works great. EDIT: seen again in daily 10429 Start Time: 2022-02-22 22:27:21 UTC 10429?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-all-50 |
@jsarha,this issue is still valid in CI, we see this issue almost every day in "CI daily test(Today's test ID:10429)". |
Lower priority to P2, as we're going to abandon DMA-based SOF trace and use Zephyr native LOG implementation. Zephyr native LOG implementation will probably fix this problem. |
Unlike bug severity, priority is never "objectively" defined with specified metrics. Priority can be defined in many different, project-specific ways based on various inputs and preferences. However I don't think I've seen the implementation details of the fix make any priority difference before. |
Slightly different failure in daily 10797?model=WHL_UPEXT_HDA_ZEPHYR&testcase=check-capture-50rounds Still happening at the very start of the test.
|
New failure in daily 11126?modelFirmwareType=SOF-Zephyr&model=ADLP_RVP_SDW_ZEPHYR&testcase=multiple-pipeline-capture-50, happening at the very start of the test.
|
More DMA trace corruption with Zephyr, this time not at the very start: https://sof-ci.01.org/sofpr/PR5631/build12594/devicetest/ |
This failure is just too frequent on Zephyr, downgrade it to a SKIP. See long story in thesofproject/sof#5352 Signed-off-by: Marc Herbert <[email protected]>
Downgrading this failure to a SKIP because it's too frequent; too much pollution |
This failure is just too frequent on Zephyr, downgrade it to a SKIP. See long story in thesofproject/sof#5352 Signed-off-by: Marc Herbert <[email protected]>
This failure is just too frequent on Zephyr, downgrade it to a SKIP. See long story in thesofproject/sof#5352 Signed-off-by: Marc Herbert <[email protected]>
This made a huge difference but there are still some failures, typically like this:
There is only so much DMA corruption you can ignore. |
Won't fix cavs tool issue. Will switch to new logging tool. |
Rarely, the sof-logger dies like this at the very start of a test:
https://sof-ci.01.org/sofpr/PR5340/build11976/devicetest/?model=TGLU_UP_HDA_ZEPHYR&testcase=test-speaker
This happens in about 1 out 50 test runs.
The dictionary is valid because other tests immediately before and after are fine in the exact same configuration.
While the "sof-logger was already dead" error message is in the end the same as #5120 (which was hiding this bug), the symptoms are extremely different.
This caused a number of failures in recent daily runs 10079, 10105 and 10146. It seems Zephyr specific. It's not clear why it's even more rare in PR testing.
10079?model=TGLU_UP_HDA_ZEPHYR&testcase=volume-basic-test-50
10079?model=WHL_UPEXT_HDA_ZEPHYR&testcase=multiple-pipeline-capture-50
10105?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-playback-50
10146?model=APL_UP2_NOCODEC_ZEPHYR&testcase=check-capture-50rounds
10146?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pause-resume-50
10146?model=TGLH_RVP_NOCODEC_ZEPHYR&testcase=check-xrun-injection-playback-10
Daily run 10079:
DMA alignment fix c11562b for #5120 was between 10079 and 10105
Daily run 10105 was:
Daily run 10146 was:
The text was updated successfully, but these errors were encountered: