Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Ability to start DMA-trace while DSP is running #3275

Open
kv2019i opened this issue Nov 15, 2021 · 5 comments
Open

[FEATURE] Ability to start DMA-trace while DSP is running #3275

kv2019i opened this issue Nov 15, 2021 · 5 comments
Labels
enhancement New feature or request suspend resume Issues related to suspend resume (e.g. rtcwake)

Comments

@kv2019i
Copy link
Collaborator

kv2019i commented Nov 15, 2021

As documented in thesofproject/sof-docs#381 and discussed at length in thesofproject/sof-test#802 (comment) , current DMA trace is not designed to support the scenario where DMA tracing is started by user-space (sof-logger) while the DSP is in fact running. The current kernel driver allows to do this, but logs are provided in best-effort manner.

For development purposes, getting some traces is better than not getting any traces. When used for validation, lack of determinism is a problem. A specific problem for validation is how to capture early traces reliable. The sof-logger cannot be started when SOF driver is not loaded. But when SOF driver is loaded, then the DSP might be already be powered on. This leaves a race that cannot be easily handled in user-space.

Improve the kernel tracing interface to guarantee:

  • sof-logger can be restarted independently of DSP state
  • upon start, the initial traces are always read in the order DSP sent them
    - driver needs to hide runtime-suspend transitions of DSP
    - if the ringbuffer has wrapped while sof-logger was not running, the returned traces must still be in order

FYI @marc-hb @ujfalusi @plbossart @lgirdwood

@kv2019i kv2019i added the enhancement New feature or request label Nov 15, 2021
kv2019i added a commit to kv2019i/sof-docs that referenced this issue Nov 15, 2021
If sof-logger is started (or restarted) while DSP is running, the
initial traces may be incomplete. Document the limitation and give a
brief explanation of the current ringbuffer design and how it affects
the start-up behaviour.

BugLink: thesofproject/sof-test#297
BugLink: thesofproject/linux#3275
Signed-off-by: Kai Vehmanen <[email protected]>
lgirdwood pushed a commit to thesofproject/sof-docs that referenced this issue Nov 15, 2021
If sof-logger is started (or restarted) while DSP is running, the
initial traces may be incomplete. Document the limitation and give a
brief explanation of the current ringbuffer design and how it affects
the start-up behaviour.

BugLink: thesofproject/sof-test#297
BugLink: thesofproject/linux#3275
Signed-off-by: Kai Vehmanen <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Nov 16, 2021

The sof-logger cannot be started when SOF driver is not loaded. But when SOF driver is loaded, then the DSP might be already be powered on. This leaves a race that cannot be easily handled in user-space.

In practice this race is a problem only when the FW logs are so verbose that they fill the entire ring buffer. It's never been a problem for me.

driver needs to hide runtime-suspend transitions of DSP

Which current problem does this solve?

  • if the ringbuffer has wrapped while sof-logger was not running, the returned traces must still be in order

Yes but not urgent, sof-logger already has mitigation for that: it detects and skips garbage and is smart enough to notice timestamps going backwards and report them.

@marc-hb marc-hb assigned RanderWang and unassigned RanderWang Nov 16, 2021
@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 16, 2021

@marc-hb wrote:

The sof-logger cannot be started when SOF driver is not loaded. But when SOF driver is loaded, then the DSP might be already be powered on. This leaves a race that cannot be easily handled in user-space.

In practice this race is a problem only when the FW logs are so verbose that they fill the entire ring buffer. It's never been a problem for me.

driver needs to hide runtime-suspend transitions of DSP

Which current problem does this solve?

Let's discuss that. It seems based on discussion on #3136 that currently sof-test cases are assuming some data is always in the trace buffer. This was tracked also in thesofproject/sof-test#297

If we'd move to a "double ringbuffer" in the SOF driver, "sof-logger -t" would always return at least some traces and this would be guaranteed if the firmware boot was successful. IOW, if no traces are read, this is a guaranteed sign of failure and can be
raised.

This mostly impacts cases where DSP is runtime suspended and resumed during (or between) test cases, and the test cases are restarting sof-logger. With current semantics, if the DSP has suspended/resumed with sof-logger not running continuously, some traces are missed, and it is possible trace log is empty when sof-logger is started.

I don't fully understand the problems involved, but I can see with a double-buffer kernel trace, test case development would be easier.

One non-kernel solution is to improve sof-logger with the ability to keep it running even when driver is not loaded. It could keep polling the availability of kernel trace node and start emitting trace whenever the trace node becomes available. This would allow to get the complete FW trace for a full test plan, including driver load/unload cases. Currently sof-logger must be restarted as the driver must be loaded, but this can be changed.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 16, 2021

It seems based on discussion on #3136 that currently sof-test cases are assuming some data is always in the trace buffer.

Yes, and for various good and bad reasons this kept passing until #3136.

If we'd move to a "double ringbuffer" in the SOF driver, "sof-logger -t" would always return at least some traces and this would be guaranteed if the firmware boot was successful.

From a validation perspective the most important thing is as you wrote: determinism. You obviously can't assert something non-deterministic. So in any simple situation the logs should be either always empty or always present.

Determinism is made more complicated by power management that powers off the DSP (D3) after a timeout. Tests have little control on this timeout. Also, it's possible (and even recommended by the logger docs) to disable D3 and it would simpler from a test perspective if logging behavior is not totally different when disabling D3 (of course content differences are unavoidable).

So given a choice, I think I prefer logs to be never empty as long as the DSP has booted at least once in the past. Logs obviously provide more information than no logs and this matches other tools like dmesg and journalctl -e or -f.

If we'd move to a "double ringbuffer" in the SOF driver, "sof-logger -t" would always return at least some traces and this would be guaranteed if the firmware boot was successful. IOW, if no traces are read, this is a guaranteed sign of failure and can be raised.

Sounds good, thanks.

One non-kernel solution is to improve sof-logger with the ability to keep it running even when driver is not loaded. It could keep polling the availability of kernel trace node and start emitting trace whenever the trace node becomes available.

Good idea, I'll try to find some time and do that. I'm already familiar with that code. Is there some inotify-like API you would recommend to avoid polling /sys? EDIT: submitted in thesofproject/sof#5027

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 16, 2021

Rescuing from a private email thread a non-intuitive (to me) host_offset behavior explained by @ujfalusi

I stopped disabling D3 (disabling D3 is what sof-docs recommend when using the logger, don't forget that...) and I can reproduce this every single time:

sof-logger -t

# in a different terminal:
speakertest -l 1 && sleep 1 && sudo pkill sof-logger

<wait at least 2s for D3>

sof-logger -t # full logs from DSP boot without waking up the DSP,
# COMPLETE duplicate of what the first sof-logger already showed
Ctrl-C
sof-logger -t # empty

As you can see no filter updated involved. I guess this could be how we always got at least the logs from the previous test so far. Even before/without the filter updates.

Now replace "sleep 1" by "sleep 3" and all sof-logger (except the very first one) return empty.

Which is pretty much expected based on the kernel code:

-- sleep 1 --

  • If you kill the sof-logger _while the DSP is still on then the
    host_offset is not reset.
  • You wait for 2 more sec to hit D3.
  • when you run sof-logger next time you will get the log from 0 to the
    previous host_offset
  • kill sof-logger and the host_offset is rest to 0 as the DSP is off
  • run sof-logger again and you will have empty log (you receive from 0
    to offset 0)

-- sleep 3 --

  • if you kill sof-logger the DSP is already powered down, the
    host_offset is reset to 0
  • wait 2 sec more
  • sof-logger will return with empty log (log from offset 0 to offset 0)
  • kill it
  • sof-logger still shows empty logs.

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 16, 2022

@marc-hb marc-hb added the suspend resume Issues related to suspend resume (e.g. rtcwake) label Jul 19, 2022
marc-hb added a commit to marc-hb/sof that referenced this issue Jul 23, 2022
Don't fail immediately when the driver is not loaded. Use inotify
instead to wait for /sys/kernel/debug/sof/[e]trace to appear.

This makes it possible to start before the driver is loaded which
reduces considerably the chances of missing early logs.

Fixes a small part of thesofproject/linux#3275

Signed-off-by: Marc Herbert <[email protected]>
lgirdwood pushed a commit to thesofproject/sof that referenced this issue Aug 2, 2022
Don't fail immediately when the driver is not loaded. Use inotify
instead to wait for /sys/kernel/debug/sof/[e]trace to appear.

This makes it possible to start before the driver is loaded which
reduces considerably the chances of missing early logs.

Fixes a small part of thesofproject/linux#3275

Signed-off-by: Marc Herbert <[email protected]>
pjdobrowolski pushed a commit to pjdobrowolski/sof_zephyr that referenced this issue Aug 10, 2022
Don't fail immediately when the driver is not loaded. Use inotify
instead to wait for /sys/kernel/debug/sof/[e]trace to appear.

This makes it possible to start before the driver is loaded which
reduces considerably the chances of missing early logs.

Fixes a small part of thesofproject/linux#3275

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Sep 20, 2023
Don't fail immediately when the driver is not loaded. Use inotify
instead to wait for /sys/kernel/debug/sof/[e]trace to appear.

This makes it possible to start before the driver is loaded which
reduces considerably the chances of missing early logs.

Fixes a small part of thesofproject/linux#3275

Signed-off-by: Marc Herbert <[email protected]>
(cherry picked from commit dcf0577)
lgirdwood pushed a commit to thesofproject/sof that referenced this issue Sep 25, 2023
Don't fail immediately when the driver is not loaded. Use inotify
instead to wait for /sys/kernel/debug/sof/[e]trace to appear.

This makes it possible to start before the driver is loaded which
reduces considerably the chances of missing early logs.

Fixes a small part of thesofproject/linux#3275

Signed-off-by: Marc Herbert <[email protected]>
(cherry picked from commit dcf0577)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request suspend resume Issues related to suspend resume (e.g. rtcwake)
Projects
None yet
Development

No branches or pull requests

3 participants