Log stats about test runs somewhere #42

evansd · 2022-04-06T17:37:03Z

This is still a bit vague and handwavy but noting it down following discussion with Seb around identifying and debugging projects which run into, or are likely to run into, performance issues.

The idea is that this action can log stats about project test runs to some central location where we can do something with them.

The sort of things that could be logged are:

How many actions in the pipeline?
How long do they take to run?
What's the total size of the output files?
How many columns in the study definition? (Needs this issue closing first)

Possibly we can push these straight into Honeycomb and avoid having to build any backend infrastructure for this. If we can't do this directly, maybe we can have an endpoint on job-server which acts as a proxy so that we still don't need to worry about storing or analysing the data.

sebbacon · 2022-04-07T11:24:17Z

Quick clarification: just to be clear, when Dave says "this action" above, he's referring to Github Actions. The thought was that although we'd ideally be running this in production, too, it would be very instructive to push stats generated in CI to honeycomb in the first instance. The same process could add information about if the tests finished with success or failure, and help us detect patterns in users that are not using CI as intended (for example)

Thinking aloud, the first thing would be a wrapper script that parses log files for interesting info (see opensafely-core/cohort-extractor#777), and turns those into structured data, which includes the job id, info from the manifest, etc.

This could then be run by hand in production, to get data about the most expensive (in terms of time, at least) variables, for example.

The same script would be run as part of our CI run, and its output posted somewhere else.

Honeycomb seems like a good starting point, but we need to find out how to use its API and obtain the correct security tokens (@madwort will be able to help here)

rebkwok · 2022-06-29T10:28:07Z

Honeycomb is currently not receiving any logs from test runs; this is because the structure of the logs that cohort-extract outputs has changed, and opensafely-cli's script that extracts them needs updated.

Note that the script in the stats-logs-notebooks with the same name extracts logs with the latest structure - it's designed to work with the collated logs on the server rather than the logs generated by a job, but otherwise does the same log parsing.

madwort · 2022-06-29T15:44:44Z

I think something was implemented, which has subsequently failed. IMHO we can close this ticket in favour of #93 and/or fresh tickets about future work/goals.

lucyb mentioned this issue Apr 7, 2022

Load test the options for retrieving counts over time opensafely-core/interactive.opensafely.org#1

Closed

rebkwok self-assigned this Apr 25, 2022

benbc added this to Data Team Jun 22, 2022

benbc moved this to Under Review in Data Team Jun 22, 2022

iaindillingham moved this from Under Review to Blocked in Data Team Jun 27, 2022

inglesp moved this from Blocked to Next in Data Team Jun 28, 2022

madwort mentioned this issue Jun 29, 2022

Fix broken honeytail upload #93

Open

madwort closed this as completed Jun 30, 2022

Repository owner moved this from Next to Done in Data Team Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log stats about test runs somewhere #42

Log stats about test runs somewhere #42

evansd commented Apr 6, 2022

sebbacon commented Apr 7, 2022

rebkwok commented Jun 29, 2022

madwort commented Jun 29, 2022 •

edited

Loading

Log stats about test runs somewhere #42

Log stats about test runs somewhere #42

Comments

evansd commented Apr 6, 2022

sebbacon commented Apr 7, 2022

rebkwok commented Jun 29, 2022

madwort commented Jun 29, 2022 • edited Loading

madwort commented Jun 29, 2022 •

edited

Loading