-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: Intel: hda: fix null deref on system suspend entry #5085
ASoC: SOF: Intel: hda: fix null deref on system suspend entry #5085
Conversation
When system enters suspend with an active stream, SOF core calls hw_params_upon_resume(). On Intel platforms with HDA DMA used to manage the link DMA, this leads to call chain of hda_dsp_set_hw_params_upon_resume() -> hda_dsp_dais_suspend() -> hda_dai_suspend() -> hda_ipc4_post_trigger() A bug is hit in hda_dai_suspend() as hda_link_dma_cleanup() is run first, which clears hext_stream->link_substream, and then hda_ipc4_post_trigger() is called with a NULL snd_pcm_substream pointer. Fixes: 2b009fa ("ASoC: SOF: Intel: hda: Unify DAI drv ops for IPC3 and IPC4") Link: thesofproject#5080 Signed-off-by: Kai Vehmanen <[email protected]>
humm, this looks relatively similar to f710445 But then the original code in has a comment that "This fixes the suspend while paused case for This suggests that this piece of code is indeed controversial. the code we're trying to be consistent with does this static int __maybe_unused hda_dai_trigger(struct snd_pcm_substream *substream, int cmd,
struct snd_soc_dai *dai)
{
...
if (ops->post_trigger) {
ret = ops->post_trigger(sdev, dai, substream, cmd);
if (ret < 0)
return ret;
}
switch (cmd) {
case SNDRV_PCM_TRIGGER_SUSPEND:
ret = hda_link_dma_cleanup(substream, hext_stream, dai);
if (ret < 0) {
dev_err(sdev->dev, "%s: failed to clean up link DMA\n", __func__);
return ret;
}
break;
default:
break;
}
return 0;
} so it's not clear why we used the reverse order to deal with the corner case of suspend while paused. What's also not clear is why we end-up executing this code if the stream was active. In that case, the ALSA framework would send the TRIGGER_SUSPEND command and the link_substream would be freed. So while the change looks ok, I am not clear on why we hit this problem in the first place. |
Also now found #5054 (update: and #5058 which replaces 5054) -- will dig a bit deeper why this code gets called in this scenario. The test results for this PR are positive though. Not a full pass, but test run 43174 versus 43138 (using stable-v2.10-rc1 as fw) shows a big improvement in system-pm tests. Still failures, but I think some variant of this PR is needed to fix the "system suspend when audio stream is active" case. |
@plbossart I did more test runs and indeed the normal case is that SUSPEND trigger is called for the streams at suspend. When the NULL deref is hit, it seems the DAI is stopped for some reason just before suspend, but hw_free is not called on the dai.
But while this PR avoids the kernel crash, I don't think the driver is working correctly anymore here and the rootcause is probably somewhere else. I'll continue to debug this aspect tomorrow. Merging this PR might make debugging easier in CI (given there is some timing aspect here)... without this PR, you need console access to the DUT. UPDATE: a lot more details on the condition where this bug is hit: #5080 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with merging this, the code as it is is not consistent despite the comment that it is.
But we MUST root-cause why this oops was hit. To some extent, this helped identify a larger problem before this code is run.
@bardliao can you review/approve/merge if you are ok?
When system enters suspend with an active stream, SOF core calls hw_params_upon_resume(). On Intel platforms with HDA DMA used to manage the link DMA, this leads to call chain of
hda_dsp_set_hw_params_upon_resume()
-> hda_dsp_dais_suspend()
-> hda_dai_suspend()
-> hda_ipc4_post_trigger()
A bug is hit in hda_dai_suspend() as hda_link_dma_cleanup() is run first, which clears hext_stream->link_substream, and then hda_ipc4_post_trigger() is called with a NULL snd_pcm_substream pointer.
Fixes: 2b009fa ("ASoC: SOF: Intel: hda: Unify DAI drv ops for IPC3 and IPC4")
Link: #5080