hijack.sh: don't return error when test case is already skip #802

fredoh9 · 2021-11-05T20:15:23Z

Empty logger in skipped test is expected. Don't return error
because sof-logger is empty in skipped test.

Signed-off-by: Fred Oh [email protected]

Empty logger in skipped test is expected. Don't return error because sof-logger is empty in skipped test. Signed-off-by: Fred Oh <[email protected]>

plbossart

confused!

plbossart · 2021-11-05T20:20:06Z

case-lib/hijack.sh

@@ -72,7 +72,8 @@ function func_exit_handler()

            local nlines; nlines=$(wc -l < "$logfile") # line count only
            # The first line is the sof-logger header
-            if [ "$nlines" -le 1 ]; then
+            # Don't override exit_status if already SKIPped test case
+            if ([ "$nlines" -le 1 ] && [ $exit_status -ne 2 ]); then


I really don't get the flow in this file. If an error was already identified (exist_status=1), then what is the point of trying to check additional things?

In this case, if the exit_status is already not zero, why add something related to an empty logger trace?

agree, flow is not clean here. exit_status is overwritten under many conditions in this hijack function. Original value of exit_status is easily forgotten.

marc-hb · 2021-11-05T21:50:10Z

I believe this was initiated by https://sof-ci.01.org/linuxpr/PR3136/build6666/devicetest/?model=APL_UP2_HDA&testcase=check-capture-3times ~~but there was no SKIP in that case~~ a SKIP was overridden by a failure to find logs.

marc-hb

SKIP and FAIL should still have logs. If a SKIP or a FAIL has no logs then there is a logging issue and we want to know about it, see #297.

This code hasn't changed in months and has been working perfectly fine, what recent event prompts this?

marc-hb · 2021-11-05T22:38:38Z

This code hasn't changed in months and has been working perfectly fine, what recent event prompts this?

I think I understand now, apologies to @fredoh9 for answering too fast. The recent event is that FW logging started failing for some reason (there have been many DMA code changes recently). The current code works as intended: a logging failure takes precedence over a SKIP.

fredoh9 · 2021-11-05T22:44:06Z

I agree on that it has been working fine as it is.
But I'm wondering where sof-logger came from when a test case is SKIP.

marc-hb · 2021-11-05T22:48:56Z

But I'm wondering where sof-logger came from when a test case is SKIP.

The sof-logger (unless disabled with -s) is started at the very start of the test. Every SKIP we've done so far happened after that. I think it's fine.

marc-hb · 2021-11-06T00:13:06Z

BTW we still miss the last line of the DMA logs more often than not: thesofproject/sof#4333 (comment)

That does not explain why these logs were completely empty though because they should have at least the content from the previous test. I think we need a new bug.

The sof-logger (unless disabled with -s) is started at the very start of the test. Every SKIP we've done so far happened after that.

That does not even matter because at least the logs from the previous test should be found. FW logs are NOT cleared when you read them multiple times, this is like running dmesg multiple times. There's no dmesg -c.

For instance have a look at this test which was also skipped (without issue). If you look at its FW logs, you can see they all come from the end of the previous test, it's a perfect match to the microsecond (so not a coincidence)

https://sof-ci.01.org/linuxpr/PR3136/build6666/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-with-capture-5

kv2019i · 2021-11-09T12:26:15Z

@marc-hb wrote:

That does not even matter because at least the logs from the previous test should be found. FW logs are NOT cleared when you read them multiple times, this is like running dmesg multiple times. There's no dmesg -c.

No, that's not correct. We only have "dmesg -c" behaviour. So if you consume all logs from DSP, and DSP is not powered on, there will be no traces.

For instance have a look at this test which was also skipped (without issue). If you look at its FW logs, you can see they all come from the end of the previous test, it's a perfect match.

https://sof-ci.01.org/linuxpr/PR3136/build6666/devicetest/?model=APL_UP2_HDA&testcase=check-suspend-resume-with-capture-5

In that case, DSP is booted to update the trace filter, so you have DSP log for the boot. This is a bit dubious to boot DSP just for this as the filter update is lost when the DSP is powered down.

marc-hb · 2021-11-09T17:30:21Z

No, that's not correct. We only have "dmesg -c" behaviour.

It looks like you never tried to run the sof-logger, hit Ctrl-C and then run it again. Just try it now.

You also didn't look at this test results example I just gave:

For instance have a look at this test which was also skipped (without issue). If you look at its FW logs, you can see they all come from the end of the previous test, it's a perfect match.

marc-hb · 2021-11-09T17:41:03Z

No, that's not correct. We only have "dmesg -c" behaviour.

You may have that impression because the DMA logs seem (sometimes?!?) cleared when going to D3. Disable D3 and you will see there is NO -c.

My really big concern right now is not what happens but the fact that no one seems to know what's supposed to happen (I don't know either).

marc-hb · 2021-11-09T17:53:08Z

Parallel discussion in issue #804 (an issue is probably a better place than a PR for this)

kv2019i · 2021-11-10T20:04:05Z

@marc-hb @ujfalusi Restarting sof-logger while DSP is on has never worked (or it works as a convenience, but it's never been a reliable way to get all the traces from DSP and most certainly there has been no guarantee that you will get some traces).

$ cat /sys/devices/pci0000\:00/0000\:00\:1f.3/power/runtime_status
suspended
$ sudo sof-logger -t -l /lib/firmware/intel/sof/sof-tgl-h.ldc
TIMESTAMP         (us)              DELTA  C# COMPONENT          LOCATION                      CONTENT ktime=4695264.572s  @  2021-11-10 14:37:36 -0500 EST
[743598486590.800903] (           0.000000) c0 sa                          src/lib/agent.c:79   WARN validate(), ll drift detected, delta = 40536
...
[test ends]
[    47099941.045083] (          57.135414) c0 zll-schedule       ......../zephyr_domain.c:89   INFO ll timer avg 512, max 1384, overruns 0
warn: failed to fread() 4 params from the log for ../../src/schedule/zephyr_ll.c:117
warn: log's End Of File. Device suspend?

       ---- Re-opening trace input file; device suspend? -----
[ctrl-c to sof-logger]
[wait until device is suspended]
$ cat /sys/devices/pci0000\:00/0000\:00\:1f.3/power/runtime_status
suspended
$ sudo sof-logger -t -l /lib/firmware/intel/sof/sof-tgl-h.ldc
 TIMESTAMP         (us)              DELTA  C# COMPONENT          LOCATION                      CONTENT ktime=4695516.298s  @  2021-11-10 14:41:48 -0500 EST

[**NO DATA REPEATED**]

That's the expected behaviour and how it has always been since before time SOF was added to upstream kernel a few years back.

I today submitted a sof-docs addition that adds a specific note about starting sof-logger while DSP is running.
thesofproject/sof-docs#381

This is by no means perfect, but this is what the driver currently does. If you or anyone wants "dmesg" type of semantics (reliably), new development in driver and FW is needed.

The current trace module stores the traces into ringbuffer. Let's describe it as ABCD (with A the first entry and D the last entry). When sof-logger starts, it will start always reading from A. FW will keep writing to the ringbuffer independently, so the latest traces might have been written to A, B, C or D. It is possible FW just filled D, so if you start sof-logger with bad enough luck, you will not get any traces although the trace buffer is actually full. The FW will also restart from A whenever it's suspended and resumed and to ensure no data is lost, sof-logger needs to consume data all the time.

So it is a very simple design. One could implement a double-buffering, and/or new mechanism for SW/FW to negotiate where to write, and/or enable DMA tracing only when sof-logger is running, but none of these have been implemented. And it's been like this for 2 years (or more). If you keep sof-logger running during a test/use-case, you'll reliably get all the traces from the DSP (independently of whether runtime-PM is used or not).

logging is utterly broken and hopeless, do whatever you want

marc-hb · 2021-11-13T01:52:54Z

The current trace module stores the traces into ringbuffer. ... So it is a very simple design. ...

That's the short description you should submit to sof-docs instead of the way too short "things may go wrong" update in thesofproject/sof-docs#381 right now. Don't try to dumb down a very simple design.

The logger is a developer tool, its users are expected to understand what a ringbuffer is (and even if they don't then they can easily skip it)

marc-hb · 2021-11-24T18:57:13Z

This discussion unfortunately ended up scattered across multiple places, see links above.

The latest news is thesofproject/linux#3136 was merged without the part that required this test change. So I suggest that for now this test PR waits until we see what directions logging goes to.

ujfalusi · 2021-11-24T19:56:35Z

The latest news is thesofproject/linux#3136 was merged without the part that required this test change. So I suggest that for now this test PR waits until we see what directions logging goes to.

If the test case is marked as SKIP on a given platform then it does not matter which direction the tracing is going to go, it will not change the fact that the case should not have been run in the first place, but if we run it at least we should not mark fail.
imho

marc-hb · 2021-11-24T20:01:41Z

it will not change the fact that the case should not have been run in the first place,

As already explained above, there are several flavours of SKIP. If (for instance) the firmware crashes while trying to SKIP a particular test from within the test itself then yes, we absolutely want to report that as a FAIL. Because the firmware should never crash ever. Same for the logger: the logs may be empty or maybe not empty but the logger itself should never crash.

Test SetUp, TearDown or anything in the test environment should never fail - even when the test decides to "skip itself".

hijack.sh: don't return error when test case is already skip

6877b04

Empty logger in skipped test is expected. Don't return error because sof-logger is empty in skipped test. Signed-off-by: Fred Oh <[email protected]>

fredoh9 requested a review from a team as a code owner November 5, 2021 20:15

plbossart reviewed Nov 5, 2021

View reviewed changes

marc-hb previously requested changes Nov 5, 2021

View reviewed changes

marc-hb mentioned this pull request Nov 6, 2021

SOF client support (auxiliary bus) - take3 thesofproject/linux#3136

Merged

kv2019i mentioned this pull request Nov 10, 2021

[BUG] Irrelevant tests should be completely skipped #804

Closed

marc-hb requested review from marc-hb and removed request for marc-hb November 13, 2021 01:04

kv2019i mentioned this pull request Nov 15, 2021

[FEATURE] Ability to start DMA-trace while DSP is running thesofproject/linux#3275

Open

marc-hb added the area:logs Log and results collection, storage, etc. label Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hijack.sh: don't return error when test case is already skip #802

hijack.sh: don't return error when test case is already skip #802

fredoh9 commented Nov 5, 2021

plbossart left a comment

plbossart Nov 5, 2021

fredoh9 Nov 5, 2021

marc-hb commented Nov 5, 2021 •

edited

Loading

marc-hb left a comment •

edited

Loading

marc-hb commented Nov 5, 2021

fredoh9 commented Nov 5, 2021

marc-hb commented Nov 5, 2021

marc-hb commented Nov 6, 2021 •

edited

Loading

kv2019i commented Nov 9, 2021

marc-hb commented Nov 9, 2021

marc-hb commented Nov 9, 2021 •

edited

Loading

marc-hb commented Nov 9, 2021

kv2019i commented Nov 10, 2021

marc-hb commented Nov 13, 2021 •

edited

Loading

marc-hb commented Nov 24, 2021

ujfalusi commented Nov 24, 2021

marc-hb commented Nov 24, 2021 •

edited

Loading

hijack.sh: don't return error when test case is already skip #802

Are you sure you want to change the base?

hijack.sh: don't return error when test case is already skip #802

Conversation

fredoh9 commented Nov 5, 2021

plbossart left a comment

Choose a reason for hiding this comment

plbossart Nov 5, 2021

Choose a reason for hiding this comment

fredoh9 Nov 5, 2021

Choose a reason for hiding this comment

marc-hb commented Nov 5, 2021 • edited Loading

marc-hb left a comment • edited Loading

Choose a reason for hiding this comment

marc-hb commented Nov 5, 2021

fredoh9 commented Nov 5, 2021

marc-hb commented Nov 5, 2021

marc-hb commented Nov 6, 2021 • edited Loading

kv2019i commented Nov 9, 2021

marc-hb commented Nov 9, 2021

marc-hb commented Nov 9, 2021 • edited Loading

marc-hb commented Nov 9, 2021

kv2019i commented Nov 10, 2021

marc-hb commented Nov 13, 2021 • edited Loading

marc-hb commented Nov 24, 2021

ujfalusi commented Nov 24, 2021

marc-hb commented Nov 24, 2021 • edited Loading

marc-hb commented Nov 5, 2021 •

edited

Loading

marc-hb left a comment •

edited

Loading

marc-hb commented Nov 6, 2021 •

edited

Loading

marc-hb commented Nov 9, 2021 •

edited

Loading

marc-hb commented Nov 13, 2021 •

edited

Loading

marc-hb commented Nov 24, 2021 •

edited

Loading