Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard no longer reports accurate scores for Cascade Layers #281

Closed
jensimmons opened this issue Feb 9, 2023 · 13 comments
Closed

Dashboard no longer reports accurate scores for Cascade Layers #281

jensimmons opened this issue Feb 9, 2023 · 13 comments

Comments

@jensimmons
Copy link
Contributor

jensimmons commented Feb 9, 2023

Cascade Layers used to report: Chrome 100 / Firefox 96.3 / Safari 100 (preview scores)
Now it reports: Chrome 96.7 / Firefox 93.1 / Safari 96.7 (preview scores)

If you click through to see the tests & the results, you can see that the earlier numbers are the correct numbers. So currently, the dashboard is wrong.

This may be happening for other focus areas, too.

@robnyman
Copy link
Contributor

robnyman commented Feb 9, 2023

@DanielRyanSmith If you know why this could be?

@gsnedders
Copy link
Member

The dashboard matches https://github.com/web-platform-tests/results-analysis/blob/cc161045215088212e0eb4c678024cd221e7d07e/data/interop-2022/interop-2022-experimental-v2.csv, so it's not a dashboard problem per-se.

The Cascade Layers runs today don't show this; I presume this is the exact set of runs being scored. And that still doesn't show these scores.

Could we somehow have fetched web-platform-tests/results-analysis-cache in some intermediary state?

@gsnedders
Copy link
Member

So the following columns have changes in the last two rows:

In Interop 2023:

chrome-interop-2022-cascade
firefox-interop-2022-cascade
firefox-interop-2022-dialog
safari-interop-2022-cascade
safari-interop-2023-contain
safari-interop-2023-flexbox
safari-interop-2023-offscreencanvas

In Interop 2022:

chrome-interop-2022-cascade
firefox-interop-2022-cascade
firefox-interop-2022-dialog
safari-interop-2021-flexbox
safari-interop-2022-cascade

@foolip
Copy link
Member

foolip commented Feb 9, 2023

Could we somehow have fetched web-platform-tests/results-analysis-cache in some intermediary state?

I have no theory of what's gone wrong, but I don't think so. Each run gets a tag only once the tree is built/committed, so there should be no intermediary state to fetch.

It's possible a broken conversion has happened, but that then isn't going to fix itself.

@DanielRyanSmith what do you think?

@gsnedders
Copy link
Member

I have no theory of what's gone wrong, but I don't think so. Each run gets a tag only once the tree is built/committed, so there should be no intermediary state to fetch.

The alternative is it gets written to the cache in an intermediary state, depending on when wpt.fyi starts serving it?

@gsnedders
Copy link
Member

gsnedders commented Feb 9, 2023

git diff run/5128159591137280/results..run/5192142658469888/results -- css/css-cascade shows quite a lot of changes (that should be from yesterday's to the current run today, for Chrome), which is basically: https://wpt.fyi/results/css/css-cascade?view=subtest&diff&filter=ADC&run_id=5128159591137280&run_id=5192142658469888

@gsnedders
Copy link
Member

Ohhh, @DanielRyanSmith: this is because all-prop-revert-layer-noop.html is labelled, and thus considered part of the total number of tests, but we don't have any results for it because it's been split into variants.

@DanielRyanSmith
Copy link
Contributor

Ohhh, @DanielRyanSmith: this is because all-prop-revert-layer-noop.html is labelled, and thus considered part of the total number of tests, but we don't have any results for it because it's been split into variants.

I think you're right. This is a comparison of 3 recent runs today as opposed to 3 yesterday, filtering for the interop-2022-cascade label.

Is this something that requires changing the wpt-metadata to remove the old labeled tests? This is a new scenario that I don't think we've prepared for, and just changing the test name/path could change the historical data overall, as they're listed in older runs with a different path.

@gsnedders
Copy link
Member

gsnedders commented Feb 9, 2023

Is this something that requires changing the wpt-metadata to remove the old labeled tests? This is a new scenario that I don't think we've prepared for, and just changing the test name/path could change the historical data overall, as they're listed in older runs with a different path.

Yeah, this has been the problem anything like this happens, which is always unfortunate. This is was probably most apparent in 2021 where the tests were being revised pretty much throughout the year.

I've filed web-platform-tests/wpt-metadata#3769 to change the metadata, and filed web-platform-tests/results-analysis#149 to try and make this easier to debug.

Note that a change like this splitting a test into variants also would impact the score if it weren't 100%, because it changes the total number of tests we're scoring against. See #256 for some discussion about similar variant scoring.

@foolip
Copy link
Member

foolip commented Feb 10, 2023

I've filed #284 about the generic problem.

@nt1m unlabeled the test in web-platform-tests/wpt-metadata#3750 which should restore the Cascade Layers scores to 100 where they were before.

Leaving this issue open to decide what we do about the split test, which is now 8 individual tests. I'll comment on #256 about what I think makes sense in general.

@nt1m
Copy link
Member

nt1m commented Feb 10, 2023

@nt1m unlabeled the test in web-platform-tests/wpt-metadata#3750 which should restore the Cascade Layers scores to 100 where they were before.

I also merged Sam's PR: web-platform-tests/wpt-metadata#3769 which relabeled the tests, the change that unlabeled the test was just temporary in order to merge the WPT manifest PR.

@foolip
Copy link
Member

foolip commented Feb 16, 2023

OK, so the scores are now back to 100 for both 2022 and 2023 dashboards. But historical scores before the new tests existing will be lower. We should work this out in #283 + #286.

@foolip foolip closed this as completed Feb 16, 2023
@foolip
Copy link
Member

foolip commented Feb 16, 2023

And #284 actually, that's the issue directly about this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants