Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] IPC timed out when multiple-pipeline on APL_UP2_NOCODEC_ZEPHYR #4414

Closed
XiaoyunWu6666 opened this issue Jun 28, 2021 · 27 comments
Closed
Assignees
Labels
APL Applies to Apollolake platform bug Something isn't working as expected Intel Linux Daily tests This issue can be found in internal Linux daily tests P2 Critical bugs or normal features Zephyr Issues only observed with Zephyr integrated
Milestone

Comments

@XiaoyunWu6666
Copy link
Contributor

XiaoyunWu6666 commented Jun 28, 2021

Describe the bug
ipc timed out when multiple-pipeline on APL_UP2_NOCODEC_ZEPHYR

To Reproduce
arecord -D hw:0,10 -c 4 -r 48000 -f S16_LE /dev/null -q’ on the DUT
Reproduction Rate
100%

Environment
Start Time: 2021-06-27 21:21:11 UTC
End Time: 2021-06-28 02:15:54 UTC
Kernel Branch: topic/sof-dev
Kernel Commit: 5b851f48
SOF Branch: main
SOF Commit: bccecb1

Screenshots or console output
[console]

ubuntu@sh-apl-up2-nocodec-zephyr-01:/etc/sof$ arecord   -D hw:0,10 -c 4 -r 48000 -f S16_LE /dev/null -q -d 5
arecord: set_params:1407: Unable to install hw params:
ACCESS:  RW_INTERLEAVED
FORMAT:  S16_LE
SUBFORMAT:  STD
SAMPLE_BITS: 16
FRAME_BITS: 64
CHANNELS: 4
RATE: 48000
PERIOD_TIME: (42666 42667)
PERIOD_SIZE: 2048
PERIOD_BYTES: 16384
PERIODS: 4
BUFFER_TIME: (170666 170667)
BUFFER_SIZE: 8192
BUFFER_BYTES: 65536
TICK_TIME: 0

[dmesg]

ubuntu@sh-apl-up2-nocodec-zephyr-01:/etc/sof$ dmesg
[26837.939358] sof-audio-pci-intel-apl 0000:00:0e.0: pcm: open stream 10 dir 1
[26837.939373] sof-audio-pci-intel-apl 0000:00:0e.0: period min 192 max 16384 bytes
[26837.939376] sof-audio-pci-intel-apl 0000:00:0e.0: period count 2 max 16
[26837.939379] sof-audio-pci-intel-apl 0000:00:0e.0: buffer max 65536 bytes
[26837.939582] sof-audio-pci-intel-apl 0000:00:0e.0: pcm: hw params stream 10 dir 1
[26837.939587] sof-audio-pci-intel-apl 0000:00:0e.0: generating page table for 0000000011b65708 size 0x10000 pages 16
[26837.939598] sof-audio-pci-intel-apl 0000:00:0e.0: FW Poll Status: reg[0xa0]=0x240000 successful
[26837.939616] sof-audio-pci-intel-apl 0000:00:0e.0: FW Poll Status: reg[0xa0]=0x40000 successful
[26837.939621] sof-audio-pci-intel-apl 0000:00:0e.0: period_bytes:0x4000
[26837.939624] sof-audio-pci-intel-apl 0000:00:0e.0: periods:4
[26837.939635] sof-audio-pci-intel-apl 0000:00:0e.0: stream_tag 2
[26837.939651] sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x60010000: GLB_STREAM_MSG: PCM_PARAMS
[26838.442532] sof-audio-pci-intel-apl 0000:00:0e.0: error: ipc timed out for 0x60010000 size 108
[26838.442553] sof-audio-pci-intel-apl 0000:00:0e.0: info: preventing DSP entering D3 state to preserve context
[26838.442561] sof-audio-pci-intel-apl 0000:00:0e.0: status: fw entered - code 00000005
[26838.442608] sof-audio-pci-intel-apl 0000:00:0e.0: error: unexpected fault 0x00000000 trace 0x00004000
[26838.442619] sof-audio-pci-intel-apl 0000:00:0e.0: error: hda irq intsts 0x00000000 intlctl 0xc0000001 rirb 00
[26838.442625] sof-audio-pci-intel-apl 0000:00:0e.0: error: dsp irq ppsts 0x00000000 adspis 0x00000000
[26838.442633] sof-audio-pci-intel-apl 0000:00:0e.0: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[26838.442639] sof-audio-pci-intel-apl 0000:00:0e.0: error: waking up any trace sleepers
[26838.442647] sof-audio-pci-intel-apl 0000:00:0e.0: error: hw params ipc failed for stream 2
[26838.442654] sof-audio-pci-intel-apl 0000:00:0e.0: ASoC: error at snd_soc_pcm_component_hw_params on 0000:00:0e.0: -110
[26838.442692]  DMIC: ASoC: soc_pcm_hw_params() failed (-110)
[26838.442722] sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG
[26838.946538] sof-audio-pci-intel-apl 0000:00:0e.0: error: ipc timed out for 0x80010000 size 216
[26838.946570] sof-audio-pci-intel-apl 0000:00:0e.0: info: preventing DSP entering D3 state to preserve context
[26838.946582] sof-audio-pci-intel-apl 0000:00:0e.0: status: fw entered - code 00000005
[26838.946636] sof-audio-pci-intel-apl 0000:00:0e.0: error: unexpected fault 0x00000000 trace 0x00004000
[26838.946652] sof-audio-pci-intel-apl 0000:00:0e.0: error: hda irq intsts 0x00000000 intlctl 0xc0000001 rirb 00
[26838.946664] sof-audio-pci-intel-apl 0000:00:0e.0: error: dsp irq ppsts 0x00000000 adspis 0x00000000
[26838.946677] sof-audio-pci-intel-apl 0000:00:0e.0: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[26838.946688] sof-audio-pci-intel-apl 0000:00:0e.0: error: waking up any trace sleepers
[26838.946701] sof-audio-pci-intel-apl 0000:00:0e.0: error: failed resetting DAI config for DMIC0.IN
[26838.946722]  DMIC: ASoC: dpcm_fe_dai_hw_params failed (-110)
[26838.946775] sof-audio-pci-intel-apl 0000:00:0e.0: pcm: free stream 10 dir 1
[26838.946828] sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG
[26839.450579] sof-audio-pci-intel-apl 0000:00:0e.0: error: ipc timed out for 0x80010000 size 216
[26839.450610] sof-audio-pci-intel-apl 0000:00:0e.0: info: preventing DSP entering D3 state to preserve context
[26839.450622] sof-audio-pci-intel-apl 0000:00:0e.0: status: fw entered - code 00000005
[26839.450677] sof-audio-pci-intel-apl 0000:00:0e.0: error: unexpected fault 0x00000000 trace 0x00004000
[26839.450694] sof-audio-pci-intel-apl 0000:00:0e.0: error: hda irq intsts 0x00000000 intlctl 0xc0000001 rirb 00
[26839.450706] sof-audio-pci-intel-apl 0000:00:0e.0: error: dsp irq ppsts 0x00000000 adspis 0x00000000
[26839.450718] sof-audio-pci-intel-apl 0000:00:0e.0: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[26839.450730] sof-audio-pci-intel-apl 0000:00:0e.0: error: waking up any trace sleepers
[26839.450742] sof-audio-pci-intel-apl 0000:00:0e.0: error: failed resetting DAI config for DMIC0.IN
[26839.452340] sof-audio-pci-intel-apl 0000:00:0e.0: pcm: close stream 10 dir 1

@XiaoyunWu6666 XiaoyunWu6666 added bug Something isn't working as expected APL Applies to Apollolake platform Zephyr Issues only observed with Zephyr integrated Intel Linux Daily tests This issue can be found in internal Linux daily tests labels Jun 28, 2021
@marc-hb marc-hb changed the title [BUG]ipc timed out when PCM_PARAMS on APL_UP2_NOCODEC_ZEPHYR [BUG] IPC timed out when PCM_PARAMS on APL_UP2_NOCODEC_ZEPHYR Jun 28, 2021
@lgirdwood
Copy link
Member

@XiaoyunWu6666 I assume this is APL only ? If so, looks like we are OOM (out of memory)

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 29, 2021

Could #4428 (just merged) make any difference or is it a completely different code path?

@XiaoyunWu6666
Copy link
Contributor Author

@XiaoyunWu6666 I assume this is APL only ? If so, looks like we are OOM (out of memory)

But APL_UP2_NOCODEC is good these days.
see inner daily test 4904 4928 4961

@XiaoyunWu6666 XiaoyunWu6666 changed the title [BUG] IPC timed out when PCM_PARAMS on APL_UP2_NOCODEC_ZEPHYR [BUG] IPC timed out when multiple-pipeline on APL_UP2_NOCODEC_ZEPHYR Jul 1, 2021
@XiaoyunWu6666
Copy link
Contributor Author

According to inner daily 5000.
When multiple-pipeline ( playback , capture or both ), ipc timed out can happen at several stages including PCM_PARAMS, TRIG_START , TRIG_STOP .... not only at a specific stage .

@XiaoyunWu6666
Copy link
Contributor Author

[update inner daily logs on this case , this platform]
according to inner daily ADL 5071?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-playback-50 , still IO error but
not much error message in dmesg logs

@XiaoyunWu6666 XiaoyunWu6666 removed the APL Applies to Apollolake platform label Jul 5, 2021
@mengdonglin mengdonglin added the APL Applies to Apollolake platform label Jul 6, 2021
@lyakh
Copy link
Collaborator

lyakh commented Jul 16, 2021

I'm looking at daily test from 15.07 number 5308. APL_UP2_NOCODEC_ZEPHYR failed several tests there, e.g. multiple-pipeline-playback-50 and the failure links to this bug report. However this issue is specifically for an arecord failure, whereas that failure is playback. Further 5308 doesn't have any "IPC timed out" entries in the kernel log. That failure therefore seems to be unrelated.

@kv2019i
Copy link
Collaborator

kv2019i commented Aug 24, 2021

Potential bugfix: #4673

@lgirdwood
Copy link
Member

@XiaoyunWu6666 can you retest as #4673 is merged. Thanks !

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 26, 2021

#4673 has apparently been merged 2 days ago yet I've seen plenty of APL_UP2_NOCODEC_ZEPHYR timeouts yesterday in random PRs, Afraid manual re-testing is not even required.

@XiaoyunWu6666
Copy link
Contributor Author

XiaoyunWu6666 commented Aug 27, 2021

@lgirdwood @marc-hb @kv2019i
still ipc timed out when multiple-pipeline in inner daily 6126

Although before the multiple pipeline test , there is multiple-pause-resume-25 testcase which brought IO error , but it was recovered after multiple-pause-resume failed . So no impacts from previous cases

@XiaoyunWu6666
Copy link
Contributor Author

XiaoyunWu6666 commented Sep 8, 2021

in inner daily 6401
IPC timed out when multiple-pipeline on APL_UP2_NOCODEC_ZEPHYR still happened , when TRIG_STOP/TRIG_START

Related issue on TGLH_RVP_NOCODEC_ZEPHYR : #4680

@lyakh
Copy link
Collaborator

lyakh commented Oct 12, 2021

I'm trying to fix this and I see the following flow taking place:

  1. LL scheduler runs tasks with all interrupts disabled (as discussed before)
  2. dai_copy() detects an xrun and calls dai_report_xrun()
  3. dai_report_xrun() calls either comp_underrun() or comp_overrun() depending on direction. Now those two functions seem to be generic to be called from any component, but they're only called from this single function dai_report_xrun()
  4. both those functions call pipeline_xrun()
  5. the latter begins recursion over all components, calling pipeline_comp_xrun()
  6. the recursion is executed in the direction of the flow, so, since it begins with the DAI, for capture it indeed recurses the entire pipeline, whereas for playback it terminates immediately
  7. for capture then the recursion reaches the "host" component and sends an XRUN IPC to the host, whereas for playback this never happens

@lgirdwood Is this how it is intended?

@lgirdwood
Copy link
Member

I'm trying to fix this and I see the following flow taking place:

  1. LL scheduler runs tasks with all interrupts disabled (as discussed before)
  2. dai_copy() detects an xrun and calls dai_report_xrun()
  3. dai_report_xrun() calls either comp_underrun() or comp_overrun() depending on direction. Now those two functions seem to be generic to be called from any component, but they're only called from this single function dai_report_xrun()
  4. both those functions call pipeline_xrun()
  5. the latter begins recursion over all components, calling pipeline_comp_xrun()
  6. the recursion is executed in the direction of the flow, so, since it begins with the DAI, for capture it indeed recurses the entire pipeline, whereas for playback it terminates immediately
  7. for capture then the recursion reaches the "host" component and sends an XRUN IPC to the host, whereas for playback this never happens

@lgirdwood Is this how it is intended?

I think so, but this may not be optimal and may require fixing.

@kv2019i
Copy link
Collaborator

kv2019i commented Oct 28, 2021

Added some analysis on the matching TGLH bug -> #4680 (comment)

@lgirdwood
Copy link
Member

@lyakh @kv2019i still an issue ?

@lyakh
Copy link
Collaborator

lyakh commented Nov 15, 2021

  1. the recursion is executed in the direction of the flow, so, since it begins with the DAI, for capture it indeed recurses the entire pipeline, whereas for playback it terminates immediately
  2. for capture then the recursion reaches the "host" component and sends an XRUN IPC to the host, whereas for playback this never happens

I think so, but this may not be optimal and may require fixing.

@lgirdwood interesting, what's the reason for the asymmetry - sending the XRUN notification to the user-space for capture overruns but not for playback underruns?

@lyakh
Copy link
Collaborator

lyakh commented Nov 15, 2021

Replying to myself after a discussion with @kv2019i and @ujfalusi: seems like an asymmetry might indeed be justified there. For capture any overrun means a data loss, whereas for playback some host DMA underruns might be recoverable if the DAI DMA still has enough data to copy for an additional timer period.

@lyakh
Copy link
Collaborator

lyakh commented Nov 15, 2021

as for the actual issue, I've just double-checked and was unable to reproduce it on UP2 / APL. EDITED: the last daily test passed too. EDITED 2: this is now used to track an "Unable to install hw params" issue on TGL. I tried to reproduce it on UP Xtreme, for which I had to remove SSP1 and SSP2 from the topology, and couldn't reproduce it.

@lgirdwood
Copy link
Member

Lets close, we can reopen if needed.

@lgirdwood lgirdwood added this to the v2.1 milestone Feb 14, 2022
@marc-hb
Copy link
Collaborator

marc-hb commented Feb 15, 2022

This failure seemed to happen less systematically so I spent some time diving in recent test results. It's difficult to draw conclusions because the symptoms vary significantly. I still think it's the same failure because one way or the other it's always when testing multiple pipelines on APL_UP2_NOCODEC_ZEPHYR. The multiple-pipeline-all test reports errors with *_hw_params the most frequently but sometimes (as in recent https://sof-ci.01.org/linuxpr/PR3409/build7157/devicetest/) another multiple-pipeline test failed first - did it leave the system in a bad state?

It's also hard to tell the status of #4414 because of more recent failure #5352. The #5352 failure is reported only at the end, so seeing a #5352 report proves that #4414 is not happening. However it's very time consuming to make this difference because on a dashboard because they both appear as the same "FAIL".

In daily 10146 there was no #4414 failure, only #5352.

In daily 10079?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-capture-50
and maybe others the user space error was different, there was no failure to install hw params but this instead:

022-02-10 00:12:42 UTC [REMOTE_INFO] Testing: Port0 [hw:0,0]
2022-02-10 00:12:42 UTC [REMOTE_COMMAND] arecord   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/null -q
2022-02-10 00:12:42 UTC [REMOTE_INFO] Testing: Port1 [hw:0,1]
2022-02-10 00:12:42 UTC [REMOTE_COMMAND] arecord   -D hw:0,1 -c 2 -r 48000 -f S16_LE /dev/null -q
2022-02-10 00:12:42 UTC [REMOTE_INFO] wait 0.5s for aplay_opts()
2022-02-10 00:12:43 UTC [REMOTE_INFO] checking pipeline status
arecord: pcm_read:2153: read error: Input/output error
2022-02-10 00:12:43 UTC [REMOTE_INFO] Letting playback/capture run for 5s
arecord: pcm_read:2153: arecord: pcm_read:2153: read error: Input/output error
read error: Input/output error
arecord: pcm_read:2153: read error: Input/output error
error: in logger_read(), fread(..., /sys/kernel/debug/sof/trace) failed: Operation not permitted(-1)
2022-02-10 00:12:48 UTC [REMOTE_INFO] checking pipeline status again
2022-02-10 00:12:48 UTC [REMOTE_COMMAND] pkill -9 aplay arecord
declare -- cmd="journalctl_cmd --since=@1644451954"
2022-02-10 00:12:50 UTC [ERROR] Caught kernel log error
===========================>>
[ 5297.135422] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx timed out for 0x60050000 (msg/reply size: 12/12)
[ 5297.135587] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump start ]------------
[ 5297.135597] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: hda irq intsts 0x00000000 intlctl 0xc000001f rirb 00
[ 5297.135603] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 5297.135611] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: host status 0x00000000 dsp status 0x00000000 mask 0x00000003

In 10086?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-all-50
it was arecord: set_params:1432: Unable to install hw params: again. All other tests passed.

[ 1358.896013] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc rx: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[ 1358.896034] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc rx done: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[ 1358.976058] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: open stream 10 dir 1
[ 1358.976070] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: period min 192 max 16384 bytes
[ 1358.976074] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: period count 2 max 16
[ 1358.976077] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: buffer max 65536 bytes
[ 1358.976344] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x30010000: GLB_TPLG_MSG: COMP_NEW
[ 1359.481636] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx timed out for 0x30010000 (msg/reply size: 96/20)
[ 1359.481828] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: preventing DSP entering D3 state to preserve context
[ 1359.481837] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump start ]------------
[ 1359.481853] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: hda irq intsts 0x00000000 intlctl 0xc0000781 rirb 00
[ 1359.481866] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 1359.481880] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 1359.481892] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump end ]------------
[ 1359.481903] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ DSP dump start ]------------
[ 1359.481913] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: Firmware exception
[ 1359.481924] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 1359.481938] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: status: fw entered - code 00000005
[ 1359.481992] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: unexpected fault 0x00000000 trace 0x00004000
[ 1359.482005] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ DSP dump end ]------------
[ 1359.482041] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: error: failed to load widget DMIC0.IN
[ 1359.482055] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: error: failed setting up DAI widget DMIC0.IN
[ 1359.482068] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ASoC: error at snd_soc_dai_hw_params on SSP3 Pin: -110
[ 1359.482097] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: error: trace IO error
[ 1359.482126] kernel:  NoCodec-3: ASoC: __soc_pcm_hw_params() failed (-110)
[ 1359.482164] kernel:  DMIC: ASoC: dpcm_fe_dai_hw_params failed (-110)
[ 1359.482202] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: free stream 10 dir 1
[ 1359.483534] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: open stream 11 dir 1
[ 1359.483554] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: period min 192 max 16384 bytes

In daily 10105?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-capture-50 failed on soc_component_trigger:

[ 5501.649443] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx succeeded: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 5501.649471] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 5501.650520] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx succeeded: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 5501.654448] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc rx: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[ 5501.654470] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc rx done: 0x90020000: GLB_TRACE_MSG: DMA_POSITION
[ 5502.142982] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: trigger stream 10 dir 1 cmd 0
[ 5502.143003] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x60050000: GLB_STREAM_MSG: TRIG_STOP
[ 5502.150987] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: trigger stream 1 dir 1 cmd 0
[ 5502.150987] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: trigger stream 0 dir 1 cmd 0
[ 5502.159117] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: trigger stream 11 dir 1 cmd 0
[ 5502.646724] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx timed out for 0x60050000 (msg/reply size: 12/12)
[ 5502.646914] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: preventing DSP entering D3 state to preserve context
[ 5502.646921] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump start ]------------
[ 5502.646936] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: hda irq intsts 0x00000000 intlctl 0xc000001f rirb 00
[ 5502.646948] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 5502.646961] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 5502.646972] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump end ]------------
[ 5502.646981] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ DSP dump start ]------------
[ 5502.646990] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: Firmware exception
[ 5502.647000] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 5502.647013] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: status: fw entered - code 00000005
[ 5502.647065] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: unexpected fault 0x00000000 trace 0x00004000
[ 5502.647077] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ DSP dump end ]------------
[ 5502.647156] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: FW Poll Status: reg[0xa0]=0x20240000 successful
[ 5502.647171] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ASoC: error at soc_component_trigger on 0000:00:0e.0: -110
[ 5502.647188] kernel:  DMIC: ASoC: trigger FE cmd: 0 failed: -110
[ 5502.647194] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: error: trace IO error
[ 5502.647296] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: pcm: free stream 10 dir 1
[ 5502.647312] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx: 0x60030000: GLB_STREAM_MSG: PCM_FREE
[ 5503.150790] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ipc tx timed out for 0x60030000 (msg/reply size: 12/12)
[ 5503.150824] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: preventing DSP entering D3 state to preserve context
[ 5503.150833] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: ------------[ IPC dump start ]------------
[ 5503.150850] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: hda irq intsts 0x00000000 intlctl 0xc000001d rirb 00
[ 5503.150864] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 5503.150877] kernel: sof-audio-pci-intel-apl 0000:00:0e.0: host status 0x00000000 dsp status 0x00000000 mask 0x00000003

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 23, 2022

Another very recent one: https://sof-ci.01.org/sofpr/PR5393/build12115/devicetest/?model=APL_UP2_NOCODEC_ZEPHYR&testcase=multiple-pipeline-capture

Over the last few days this failure is what has been hitting test runs the hardest, see links above and below this comment. It's especially bad because it puts the FW in a bad state and makes all consecutive tests fail until the next FW boot.

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 10, 2022

Looks like turning off SSP0 and SSP1 is a good workaround: no failure like this in daily test 10879

Start Time: 2022-03-09 22:27:42 UTC
End Time: 2022-03-10 02:31:16 UTC
Kernel Branch: topic/sof-dev
Kernel Commit: 92d71723d47c
SOF Branch: main
SOF Commit: b9d971920e30
Zephyr Commit: adc901aa6a39

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 15, 2022

A bug is not fixed when it stops being tested.

@marc-hb marc-hb reopened this Mar 15, 2022
@lgirdwood lgirdwood added P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Mar 16, 2022
@lgirdwood
Copy link
Member

Known performance delta with more use of uncached memory that is currently being optimised.

@mengdonglin
Copy link
Collaborator

Cannot reproduce. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APL Applies to Apollolake platform bug Something isn't working as expected Intel Linux Daily tests This issue can be found in internal Linux daily tests P2 Critical bugs or normal features Zephyr Issues only observed with Zephyr integrated
Projects
None yet
Development

No branches or pull requests

7 participants