-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] IPC timed out when multiple-pipeline on APL_UP2_NOCODEC_ZEPHYR #4414
Comments
@XiaoyunWu6666 I assume this is APL only ? If so, looks like we are OOM (out of memory) |
Could #4428 (just merged) make any difference or is it a completely different code path? |
But APL_UP2_NOCODEC is good these days. |
According to inner daily 5000. |
[update inner daily logs on this case , this platform] |
I'm looking at daily test from 15.07 number 5308. APL_UP2_NOCODEC_ZEPHYR failed several tests there, e.g. multiple-pipeline-playback-50 and the failure links to this bug report. However this issue is specifically for an arecord failure, whereas that failure is playback. Further 5308 doesn't have any "IPC timed out" entries in the kernel log. That failure therefore seems to be unrelated. |
Potential bugfix: #4673 |
@XiaoyunWu6666 can you retest as #4673 is merged. Thanks ! |
#4673 has apparently been merged 2 days ago yet I've seen plenty of APL_UP2_NOCODEC_ZEPHYR timeouts yesterday in random PRs, Afraid manual re-testing is not even required. |
@lgirdwood @marc-hb @kv2019i Although before the multiple pipeline test , there is multiple-pause-resume-25 testcase which brought IO error , but it was recovered after multiple-pause-resume failed . So no impacts from previous cases |
in inner daily 6401 Related issue on TGLH_RVP_NOCODEC_ZEPHYR : #4680 |
I'm trying to fix this and I see the following flow taking place:
@lgirdwood Is this how it is intended? |
I think so, but this may not be optimal and may require fixing. |
Added some analysis on the matching TGLH bug -> #4680 (comment) |
@lgirdwood interesting, what's the reason for the asymmetry - sending the XRUN notification to the user-space for capture overruns but not for playback underruns? |
Replying to myself after a discussion with @kv2019i and @ujfalusi: seems like an asymmetry might indeed be justified there. For capture any overrun means a data loss, whereas for playback some host DMA underruns might be recoverable if the DAI DMA still has enough data to copy for an additional timer period. |
as for the actual issue, I've just double-checked and was unable to reproduce it on UP2 / APL. EDITED: the last daily test passed too. EDITED 2: this is now used to track an "Unable to install hw params" issue on TGL. I tried to reproduce it on UP Xtreme, for which I had to remove SSP1 and SSP2 from the topology, and couldn't reproduce it. |
Lets close, we can reopen if needed. |
This failure seemed to happen less systematically so I spent some time diving in recent test results. It's difficult to draw conclusions because the symptoms vary significantly. I still think it's the same failure because one way or the other it's always when testing multiple pipelines on APL_UP2_NOCODEC_ZEPHYR. The It's also hard to tell the status of #4414 because of more recent failure #5352. The #5352 failure is reported only at the end, so seeing a #5352 report proves that #4414 is not happening. However it's very time consuming to make this difference because on a dashboard because they both appear as the same "FAIL". In daily 10146 there was no #4414 failure, only #5352. In daily 10079?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-capture-50
In 10086?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-all-50
In daily 10105?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-capture-50 failed on
|
Another very recent one: https://sof-ci.01.org/sofpr/PR5393/build12115/devicetest/?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-capture Over the last few days this failure is what has been hitting test runs the hardest, see links above and below this comment. It's especially bad because it puts the FW in a bad state and makes all consecutive tests fail until the next FW boot. |
Looks like turning off SSP0 and SSP1 is a good workaround: no failure like this in daily test 10879 Start Time: 2022-03-09 22:27:42 UTC |
A bug is not fixed when it stops being tested. |
Known performance delta with more use of uncached memory that is currently being optimised. |
Cannot reproduce. Closing. |
Describe the bug
ipc timed out when multiple-pipeline on APL_UP2_NOCODEC_ZEPHYR
To Reproduce
arecord -D hw:0,10 -c 4 -r 48000 -f S16_LE /dev/null -q’ on the DUT
Reproduction Rate
100%
Environment
Start Time: 2021-06-27 21:21:11 UTC
End Time: 2021-06-28 02:15:54 UTC
Kernel Branch: topic/sof-dev
Kernel Commit: 5b851f48
SOF Branch: main
SOF Commit: bccecb1
Screenshots or console output
[console]
[dmesg]
The text was updated successfully, but these errors were encountered: