Dashboard no longer reports accurate scores for Cascade Layers #281

jensimmons · 2023-02-09T17:22:32Z

Cascade Layers used to report: Chrome 100 / Firefox 96.3 / Safari 100 (preview scores)
Now it reports: Chrome 96.7 / Firefox 93.1 / Safari 96.7 (preview scores)

If you click through to see the tests & the results, you can see that the earlier numbers are the correct numbers. So currently, the dashboard is wrong.

This may be happening for other focus areas, too.

robnyman · 2023-02-09T17:28:54Z

@DanielRyanSmith If you know why this could be?

gsnedders · 2023-02-09T17:53:05Z

The dashboard matches https://github.com/web-platform-tests/results-analysis/blob/cc161045215088212e0eb4c678024cd221e7d07e/data/interop-2022/interop-2022-experimental-v2.csv, so it's not a dashboard problem per-se.

The Cascade Layers runs today don't show this; I presume this is the exact set of runs being scored. And that still doesn't show these scores.

Could we somehow have fetched web-platform-tests/results-analysis-cache in some intermediary state?

gsnedders · 2023-02-09T18:00:22Z

So the following columns have changes in the last two rows:

In Interop 2023:

chrome-interop-2022-cascade
firefox-interop-2022-cascade
firefox-interop-2022-dialog
safari-interop-2022-cascade
safari-interop-2023-contain
safari-interop-2023-flexbox
safari-interop-2023-offscreencanvas

In Interop 2022:

chrome-interop-2022-cascade
firefox-interop-2022-cascade
firefox-interop-2022-dialog
safari-interop-2021-flexbox
safari-interop-2022-cascade

foolip · 2023-02-09T18:08:05Z

Could we somehow have fetched web-platform-tests/results-analysis-cache in some intermediary state?

I have no theory of what's gone wrong, but I don't think so. Each run gets a tag only once the tree is built/committed, so there should be no intermediary state to fetch.

It's possible a broken conversion has happened, but that then isn't going to fix itself.

@DanielRyanSmith what do you think?

gsnedders · 2023-02-09T18:15:24Z

I have no theory of what's gone wrong, but I don't think so. Each run gets a tag only once the tree is built/committed, so there should be no intermediary state to fetch.

The alternative is it gets written to the cache in an intermediary state, depending on when wpt.fyi starts serving it?

gsnedders · 2023-02-09T18:21:12Z

git diff run/5128159591137280/results..run/5192142658469888/results -- css/css-cascade shows quite a lot of changes (that should be from yesterday's to the current run today, for Chrome), which is basically: https://wpt.fyi/results/css/css-cascade?view=subtest&diff&filter=ADC&run_id=5128159591137280&run_id=5192142658469888

gsnedders · 2023-02-09T18:47:44Z

Ohhh, @DanielRyanSmith: this is because all-prop-revert-layer-noop.html is labelled, and thus considered part of the total number of tests, but we don't have any results for it because it's been split into variants.

DanielRyanSmith · 2023-02-09T19:56:26Z

Ohhh, @DanielRyanSmith: this is because all-prop-revert-layer-noop.html is labelled, and thus considered part of the total number of tests, but we don't have any results for it because it's been split into variants.

I think you're right. This is a comparison of 3 recent runs today as opposed to 3 yesterday, filtering for the interop-2022-cascade label.

Is this something that requires changing the wpt-metadata to remove the old labeled tests? This is a new scenario that I don't think we've prepared for, and just changing the test name/path could change the historical data overall, as they're listed in older runs with a different path.

gsnedders · 2023-02-09T22:27:20Z

Is this something that requires changing the wpt-metadata to remove the old labeled tests? This is a new scenario that I don't think we've prepared for, and just changing the test name/path could change the historical data overall, as they're listed in older runs with a different path.

Yeah, this has been the problem anything like this happens, which is always unfortunate. This is was probably most apparent in 2021 where the tests were being revised pretty much throughout the year.

I've filed web-platform-tests/wpt-metadata#3769 to change the metadata, and filed web-platform-tests/results-analysis#149 to try and make this easier to debug.

Note that a change like this splitting a test into variants also would impact the score if it weren't 100%, because it changes the total number of tests we're scoring against. See #256 for some discussion about similar variant scoring.

foolip · 2023-02-10T08:44:06Z

I've filed #284 about the generic problem.

@nt1m unlabeled the test in web-platform-tests/wpt-metadata#3750 which should restore the Cascade Layers scores to 100 where they were before.

Leaving this issue open to decide what we do about the split test, which is now 8 individual tests. I'll comment on #256 about what I think makes sense in general.

nt1m · 2023-02-10T11:05:57Z

@nt1m unlabeled the test in web-platform-tests/wpt-metadata#3750 which should restore the Cascade Layers scores to 100 where they were before.

I also merged Sam's PR: web-platform-tests/wpt-metadata#3769 which relabeled the tests, the change that unlabeled the test was just temporary in order to merge the WPT manifest PR.

foolip · 2023-02-16T07:53:49Z

OK, so the scores are now back to 100 for both 2022 and 2023 dashboards. But historical scores before the new tests existing will be lower. We should work this out in #283 + #286.

foolip · 2023-02-16T07:54:24Z

And #284 actually, that's the issue directly about this problem.

gsnedders mentioned this issue Feb 9, 2023

Interop CSV files should include run_ids or sha1 web-platform-tests/results-analysis#148

Open

This was referenced Feb 9, 2023

Surprising results when scoring interop when a labelled test no longer exists web-platform-tests/results-analysis#149

Open

Update labelled tests web-platform-tests/wpt-metadata#3769

Merged

foolip mentioned this issue Feb 10, 2023

Handling test renames or splits (test suite evolution over time) #284

Closed

foolip mentioned this issue Feb 10, 2023

Scoring: variants and multi-globals #256

Open

nairnandu mentioned this issue Feb 10, 2023

Agenda for Feb 9, 2023 #278

Closed

foolip closed this as completed Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dashboard no longer reports accurate scores for Cascade Layers #281

Dashboard no longer reports accurate scores for Cascade Layers #281

jensimmons commented Feb 9, 2023 •

edited

Loading

robnyman commented Feb 9, 2023

gsnedders commented Feb 9, 2023

gsnedders commented Feb 9, 2023

foolip commented Feb 9, 2023 •

edited

Loading

gsnedders commented Feb 9, 2023

gsnedders commented Feb 9, 2023 •

edited

Loading

gsnedders commented Feb 9, 2023

DanielRyanSmith commented Feb 9, 2023

gsnedders commented Feb 9, 2023 •

edited

Loading

foolip commented Feb 10, 2023

nt1m commented Feb 10, 2023

foolip commented Feb 16, 2023

foolip commented Feb 16, 2023

Dashboard no longer reports accurate scores for Cascade Layers #281

Dashboard no longer reports accurate scores for Cascade Layers #281

Comments

jensimmons commented Feb 9, 2023 • edited Loading

robnyman commented Feb 9, 2023

gsnedders commented Feb 9, 2023

gsnedders commented Feb 9, 2023

foolip commented Feb 9, 2023 • edited Loading

gsnedders commented Feb 9, 2023

gsnedders commented Feb 9, 2023 • edited Loading

gsnedders commented Feb 9, 2023

DanielRyanSmith commented Feb 9, 2023

gsnedders commented Feb 9, 2023 • edited Loading

foolip commented Feb 10, 2023

nt1m commented Feb 10, 2023

foolip commented Feb 16, 2023

foolip commented Feb 16, 2023

jensimmons commented Feb 9, 2023 •

edited

Loading

foolip commented Feb 9, 2023 •

edited

Loading

gsnedders commented Feb 9, 2023 •

edited

Loading

gsnedders commented Feb 9, 2023 •

edited

Loading