Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hijack.sh: don't return error when test case is already skip #802

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fredoh9
Copy link
Collaborator

@fredoh9 fredoh9 commented Nov 5, 2021

Empty logger in skipped test is expected. Don't return error
because sof-logger is empty in skipped test.

Signed-off-by: Fred Oh [email protected]

Empty logger in skipped test is expected. Don't return error
because sof-logger is empty in skipped test.

Signed-off-by: Fred Oh <[email protected]>
@fredoh9 fredoh9 requested a review from a team as a code owner November 5, 2021 20:15
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused!

@@ -72,7 +72,8 @@ function func_exit_handler()

local nlines; nlines=$(wc -l < "$logfile") # line count only
# The first line is the sof-logger header
if [ "$nlines" -le 1 ]; then
# Don't override exit_status if already SKIPped test case
if ([ "$nlines" -le 1 ] && [ $exit_status -ne 2 ]); then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't get the flow in this file. If an error was already identified (exist_status=1), then what is the point of trying to check additional things?

In this case, if the exit_status is already not zero, why add something related to an empty logger trace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, flow is not clean here. exit_status is overwritten under many conditions in this hijack function. Original value of exit_status is easily forgotten.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 5, 2021

I believe this was initiated by https://sof-ci.01.org/linuxpr/PR3136/build6666/devicetest/?model=APL_UP2_HDA&testcase=check-capture-3times but there was no SKIP in that case a SKIP was overridden by a failure to find logs.

marc-hb
marc-hb previously requested changes Nov 5, 2021
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SKIP and FAIL should still have logs. If a SKIP or a FAIL has no logs then there is a logging issue and we want to know about it, see #297.

This code hasn't changed in months and has been working perfectly fine, what recent event prompts this?

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 5, 2021

This code hasn't changed in months and has been working perfectly fine, what recent event prompts this?

I think I understand now, apologies to @fredoh9 for answering too fast. The recent event is that FW logging started failing for some reason (there have been many DMA code changes recently). The current code works as intended: a logging failure takes precedence over a SKIP.

@fredoh9
Copy link
Collaborator Author

fredoh9 commented Nov 5, 2021

I agree on that it has been working fine as it is.
But I'm wondering where sof-logger came from when a test case is SKIP.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 5, 2021

But I'm wondering where sof-logger came from when a test case is SKIP.

The sof-logger (unless disabled with -s) is started at the very start of the test. Every SKIP we've done so far happened after that. I think it's fine.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 6, 2021

BTW we still miss the last line of the DMA logs more often than not: thesofproject/sof#4333 (comment)

That does not explain why these logs were completely empty though because they should have at least the content from the previous test. I think we need a new bug.

The sof-logger (unless disabled with -s) is started at the very start of the test. Every SKIP we've done so far happened after that.

That does not even matter because at least the logs from the previous test should be found. FW logs are NOT cleared when you read them multiple times, this is like running dmesg multiple times. There's no dmesg -c.

For instance have a look at this test which was also skipped (without issue). If you look at its FW logs, you can see they all come from the end of the previous test, it's a perfect match to the microsecond (so not a coincidence)

https://sof-ci.01.org/linuxpr/PR3136/build6666/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-with-capture-5

@kv2019i
Copy link
Contributor

kv2019i commented Nov 9, 2021

@marc-hb wrote:

That does not even matter because at least the logs from the previous test should be found. FW logs are NOT cleared when you read them multiple times, this is like running dmesg multiple times. There's no dmesg -c.

No, that's not correct. We only have "dmesg -c" behaviour. So if you consume all logs from DSP, and DSP is not powered on, there will be no traces.

For instance have a look at this test which was also skipped (without issue). If you look at its FW logs, you can see they all come from the end of the previous test, it's a perfect match.

https://sof-ci.01.org/linuxpr/PR3136/build6666/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-with-capture-5

In that case, DSP is booted to update the trace filter, so you have DSP log for the boot. This is a bit dubious to boot DSP just for this as the filter update is lost when the DSP is powered down.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 9, 2021

No, that's not correct. We only have "dmesg -c" behaviour.

It looks like you never tried to run the sof-logger, hit Ctrl-C and then run it again. Just try it now.

You also didn't look at this test results example I just gave:

For instance have a look at this test which was also skipped (without issue). If you look at its FW logs, you can see they all come from the end of the previous test, it's a perfect match.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 9, 2021

No, that's not correct. We only have "dmesg -c" behaviour.

You may have that impression because the DMA logs seem (sometimes?!?) cleared when going to D3. Disable D3 and you will see there is NO -c.

My really big concern right now is not what happens but the fact that no one seems to know what's supposed to happen (I don't know either).

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 9, 2021

Parallel discussion in issue #804 (an issue is probably a better place than a PR for this)

@kv2019i
Copy link
Contributor

kv2019i commented Nov 10, 2021

@marc-hb @ujfalusi Restarting sof-logger while DSP is on has never worked (or it works as a convenience, but it's never been a reliable way to get all the traces from DSP and most certainly there has been no guarantee that you will get some traces).

$ cat /sys/devices/pci0000\:00/0000\:00\:1f.3/power/runtime_status
suspended
$ sudo sof-logger -t -l /lib/firmware/intel/sof/sof-tgl-h.ldc
TIMESTAMP         (us)              DELTA  C# COMPONENT          LOCATION                      CONTENT ktime=4695264.572s  @  2021-11-10 14:37:36 -0500 EST
[743598486590.800903] (           0.000000) c0 sa                          src/lib/agent.c:79   WARN validate(), ll drift detected, delta = 40536
...
[test ends]
[    47099941.045083] (          57.135414) c0 zll-schedule       ......../zephyr_domain.c:89   INFO ll timer avg 512, max 1384, overruns 0
warn: failed to fread() 4 params from the log for ../../src/schedule/zephyr_ll.c:117
warn: log's End Of File. Device suspend?

       ---- Re-opening trace input file; device suspend? -----
[ctrl-c to sof-logger]
[wait until device is suspended]
$ cat /sys/devices/pci0000\:00/0000\:00\:1f.3/power/runtime_status
suspended
$ sudo sof-logger -t -l /lib/firmware/intel/sof/sof-tgl-h.ldc
 TIMESTAMP         (us)              DELTA  C# COMPONENT          LOCATION                      CONTENT ktime=4695516.298s  @  2021-11-10 14:41:48 -0500 EST

[**NO DATA REPEATED**]

That's the expected behaviour and how it has always been since before time SOF was added to upstream kernel a few years back.

I today submitted a sof-docs addition that adds a specific note about starting sof-logger while DSP is running.
thesofproject/sof-docs#381

This is by no means perfect, but this is what the driver currently does. If you or anyone wants "dmesg" type of semantics (reliably), new development in driver and FW is needed.

The current trace module stores the traces into ringbuffer. Let's describe it as ABCD (with A the first entry and D the last entry). When sof-logger starts, it will start always reading from A. FW will keep writing to the ringbuffer independently, so the latest traces might have been written to A, B, C or D. It is possible FW just filled D, so if you start sof-logger with bad enough luck, you will not get any traces although the trace buffer is actually full. The FW will also restart from A whenever it's suspended and resumed and to ensure no data is lost, sof-logger needs to consume data all the time.

So it is a very simple design. One could implement a double-buffering, and/or new mechanism for SW/FW to negotiate where to write, and/or enable DMA tracing only when sof-logger is running, but none of these have been implemented. And it's been like this for 2 years (or more). If you keep sof-logger running during a test/use-case, you'll reliably get all the traces from the DSP (independently of whether runtime-PM is used or not).

@marc-hb marc-hb dismissed their stale review November 13, 2021 01:04

logging is utterly broken and hopeless, do whatever you want

@marc-hb marc-hb requested review from marc-hb and removed request for marc-hb November 13, 2021 01:04
@marc-hb
Copy link
Collaborator

marc-hb commented Nov 13, 2021

The current trace module stores the traces into ringbuffer. ... So it is a very simple design. ...

That's the short description you should submit to sof-docs instead of the way too short "things may go wrong" update in thesofproject/sof-docs#381 right now. Don't try to dumb down a very simple design.

The logger is a developer tool, its users are expected to understand what a ringbuffer is (and even if they don't then they can easily skip it)

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 24, 2021

This discussion unfortunately ended up scattered across multiple places, see links above.

The latest news is thesofproject/linux#3136 was merged without the part that required this test change. So I suggest that for now this test PR waits until we see what directions logging goes to.

@ujfalusi
Copy link
Contributor

The latest news is thesofproject/linux#3136 was merged without the part that required this test change. So I suggest that for now this test PR waits until we see what directions logging goes to.

If the test case is marked as SKIP on a given platform then it does not matter which direction the tracing is going to go, it will not change the fact that the case should not have been run in the first place, but if we run it at least we should not mark fail.
imho

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 24, 2021

it will not change the fact that the case should not have been run in the first place,

As already explained above, there are several flavours of SKIP. If (for instance) the firmware crashes while trying to SKIP a particular test from within the test itself then yes, we absolutely want to report that as a FAIL. Because the firmware should never crash ever. Same for the logger: the logs may be empty or maybe not empty but the logger itself should never crash.

Test SetUp, TearDown or anything in the test environment should never fail - even when the test decides to "skip itself".

@marc-hb marc-hb added the area:logs Log and results collection, storage, etc. label Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Log and results collection, storage, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants