Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite to add more metrics and be more modular #7

Merged
merged 1 commit into from
Apr 30, 2024
Merged

Rewrite to add more metrics and be more modular #7

merged 1 commit into from
Apr 30, 2024

Conversation

surilindur
Copy link
Contributor

Here is a set of changes to rewrite the runner, with the following major differences:

  • The functionality from the input output file, as well as the result aggregation, have been split into more modular components:
    • QueryLoaderFile handles the loading of queries from a file.
    • ResultAggegator handles the result aggregation, and ResultAggregatorComunica handles additional Comunica-specific metadata (currently only the HTTP request counts).
    • ResultSerializerCsv handles the serialization of results into a CSV file. The new structure allows adding JSON serializers and other serializers as needed, in case the CSV format is insufficient (for example, for dumping an entire error object, or other items that cannot be reduced to a simple value).
    • SparqlBenchmarkRunner now only handles the running of queries, recording of metrics, and calls the result aggregator (passed to it as a config option) to aggregate the results.
  • The endpoint availability check now uses a simple HEAD request, which seems to work fine. This is more lightweight for the endpoint to handle, and does not cause any unnecessary processing.
  • The runner now records the following additional metrics to help provide a better view of the variance between replications:
    • Minimum, maximum and average query duration.
    • Minimum, maximum and average result count.
    • Minimum, maximum and average HTTP request count (for Comunica).
    • Minimum, maximum and average result arrival timestamps.
    • Bindings hash for the results, to ensure the same query produces the exact same results across all replications. When the result hash differs between executions, the runner marks it as failed and sets the error to a corresponding error (unless another error has already been reported).
  • The metrics recorded by the runner cannot be toggled off anymore to keep code complexity low and to ensure all the relevant metrics are available.
  • The runner records the thrown error as the error object now, and serializes the error message in the CSV file by default, to help identify exactly what the error was when a specific query fails.
  • The runner now always waits for the endpoint to be available between executions, in case the endpoint crashes and needs some time to recover between subsequent query queries. This ensures the following query will not suffer because of a previous one that crashed the endpoint.

@surilindur surilindur mentioned this pull request Apr 3, 2024
@surilindur
Copy link
Contributor Author

This is based on the changes in #6 and will probably need to be updated after that one is merged.

await this.sleep(1_000);
this.log(`\rEndpoint not available yet, waited for ${++counter} seconds...`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason this was removed?
We had it there to indicate to the user that the script is still running, and hasn't frozen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason was that I added timestamps to the default CLI logger, and the \r was deleting them. Without the \r, there would be a lot of lines printed if the endpoint took a long while to become available. I will add back this line, though, because it does make sense to let the user know that the tool is not frozen. Thank you for the feedback!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logging has been added back now.

Copy link
Member

@rubensworks rubensworks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very neat!

Could you also update the README to reflect the changes?
We may also need a smal section on how to use it from JS, as things may seem a bit less straightforward now.

Note to self: major update

@coveralls
Copy link

coveralls commented Apr 25, 2024

Pull Request Test Coverage Report for Build 8893996937

Details

  • 225 of 225 (100.0%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 8847154501: 0.0%
Covered Lines: 241
Relevant Lines: 241

💛 - Coveralls

@surilindur
Copy link
Contributor Author

Could you also update the README to reflect the changes?

There is now a short note about using the tool as a library in the readme. I did not add a full example of the output, because it would be a lot of test and the current examples use WatDiv and not SolidBench, and I have been testing with SolidBench.

@rubensworks
Copy link
Member

I did not add a full example of the output, because it would be a lot of test and the current examples use WatDiv and not SolidBench, and I have been testing with SolidBench.

But is the current CSV example output valid still? Because based on what I understand, we now output more metrics?
Or is this still the default (which is definitely ok by me, as it means my old script will still work).

@surilindur
Copy link
Contributor Author

It is valid in the sense that everything in the example is still produced, but it is true that it does not contain the additional columns that are also output. Additionally, the column ordering can be different from how it used to be, since name and id will still be first, but everything else after them is alphabetically sorted.

I will look into updating the example, and also adding the option to configure a delay between subsequent requests sent to the endpoint if someone has issues with too many requests being sent in rapid succession.

@rubensworks
Copy link
Member

I will look into updating the example, and also adding the option to configure a delay between subsequent requests sent to the endpoint if someone has issues with too many requests being sent in rapid succession.

Sounds good!
The README doesn't have to contain actually produced data though, as long as the column names and orders correspond it's fine by me.

@surilindur
Copy link
Contributor Author

This should be ready for another review now!

@surilindur surilindur requested a review from rubensworks April 30, 2024 11:33
@rubensworks rubensworks merged commit 8fd9b18 into comunica:master Apr 30, 2024
6 checks passed
@rubensworks
Copy link
Member

Thanks! Released as 3.0.0.

@surilindur surilindur deleted the feat/modular-rewrite branch April 30, 2024 12:04
@rubensworks
Copy link
Member

@surilindur I just had another look at the current README, and it looks like there's no httpRequests column anymore in the example.
I guess it's still measured, but not reported by default?
Could you look into documenting into the README how to enable that?

@surilindur
Copy link
Contributor Author

I guess it's still measured, but not reported by default?

It is actually both measured and reported by default. The metadata is always extracted from the bindings stream (regardless of the result aggregator), and when the Comunica-specific result aggregator is used (which differs from the base aggregator only in that it does the httpRequest aggregation), it will also be reported (httpRequests, httpRequestsMax, httpRequestsMin). The default aggregator is the Comunica one.

I left out the columns in the example because they are only reported for the Comunica link traversal SPARQL endpoint that provides the HTTP request count as metadata, but the tool could also be used for other endpoints that do not provide it (or that provide other metadata which would require their own aggregators extending the base one). So the example only contains what can be expected from any endpoint.

If it is confusing, I can edit the example output in the readme. I did not actually think about it beyond the reasoning above. 🙂

@surilindur
Copy link
Contributor Author

Maybe I should change the default aggregator to be the base one, and add a note in the readme on using different aggregators and creating custom ones? Would that be reasonable?

@rubensworks
Copy link
Member

I left out the columns in the example because they are only reported for the Comunica link traversal SPARQL endpoint that provides the HTTP request count as metadata

I thought Comunica base always reports request count as well? (e.g. for TPF queries) But could be wrong.

Maybe I should change the default aggregator to be the base one, and add a note in the readme on using different aggregators and creating custom ones? Would that be reasonable?

Since we are the primary ones who will be using this tool, I think the comunica one as default is fine (including the requests columns in the README). But some docs on how to use a different one definitely make sense!

@surilindur
Copy link
Contributor Author

I thought Comunica base always reports request count as well? (e.g. for TPF queries) But could be wrong.

Oh right, I forgot TPF! But for example Virtuoso or Jena probably do not provide the HTTP request count metadata, or Comunica when used with an HDT file. That was my reasoning. 😄

But some docs on how to use a different one definitely make sense!

I will look into adding some notes in the usage section, then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants