-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite to add more metrics and be more modular #7
Conversation
This is based on the changes in #6 and will probably need to be updated after that one is merged. |
await this.sleep(1_000); | ||
this.log(`\rEndpoint not available yet, waited for ${++counter} seconds...`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason this was removed?
We had it there to indicate to the user that the script is still running, and hasn't frozen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason was that I added timestamps to the default CLI logger, and the \r
was deleting them. Without the \r
, there would be a lot of lines printed if the endpoint took a long while to become available. I will add back this line, though, because it does make sense to let the user know that the tool is not frozen. Thank you for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logging has been added back now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very neat!
Could you also update the README to reflect the changes?
We may also need a smal section on how to use it from JS, as things may seem a bit less straightforward now.
Note to self: major update
Pull Request Test Coverage Report for Build 8893996937Details
💛 - Coveralls |
There is now a short note about using the tool as a library in the readme. I did not add a full example of the output, because it would be a lot of test and the current examples use WatDiv and not SolidBench, and I have been testing with SolidBench. |
But is the current CSV example output valid still? Because based on what I understand, we now output more metrics? |
It is valid in the sense that everything in the example is still produced, but it is true that it does not contain the additional columns that are also output. Additionally, the column ordering can be different from how it used to be, since name and id will still be first, but everything else after them is alphabetically sorted. I will look into updating the example, and also adding the option to configure a delay between subsequent requests sent to the endpoint if someone has issues with too many requests being sent in rapid succession. |
Sounds good! |
This should be ready for another review now! |
Thanks! Released as 3.0.0. |
@surilindur I just had another look at the current README, and it looks like there's no |
It is actually both measured and reported by default. The metadata is always extracted from the bindings stream (regardless of the result aggregator), and when the Comunica-specific result aggregator is used (which differs from the base aggregator only in that it does the httpRequest aggregation), it will also be reported ( I left out the columns in the example because they are only reported for the Comunica link traversal SPARQL endpoint that provides the HTTP request count as metadata, but the tool could also be used for other endpoints that do not provide it (or that provide other metadata which would require their own aggregators extending the base one). So the example only contains what can be expected from any endpoint. If it is confusing, I can edit the example output in the readme. I did not actually think about it beyond the reasoning above. 🙂 |
Maybe I should change the default aggregator to be the base one, and add a note in the readme on using different aggregators and creating custom ones? Would that be reasonable? |
I thought Comunica base always reports request count as well? (e.g. for TPF queries) But could be wrong.
Since we are the primary ones who will be using this tool, I think the comunica one as default is fine (including the requests columns in the README). But some docs on how to use a different one definitely make sense! |
Oh right, I forgot TPF! But for example Virtuoso or Jena probably do not provide the HTTP request count metadata, or Comunica when used with an HDT file. That was my reasoning. 😄
I will look into adding some notes in the usage section, then! |
Here is a set of changes to rewrite the runner, with the following major differences:
QueryLoaderFile
handles the loading of queries from a file.ResultAggegator
handles the result aggregation, andResultAggregatorComunica
handles additional Comunica-specific metadata (currently only the HTTP request counts).ResultSerializerCsv
handles the serialization of results into a CSV file. The new structure allows adding JSON serializers and other serializers as needed, in case the CSV format is insufficient (for example, for dumping an entire error object, or other items that cannot be reduced to a simple value).SparqlBenchmarkRunner
now only handles the running of queries, recording of metrics, and calls the result aggregator (passed to it as a config option) to aggregate the results.HEAD
request, which seems to work fine. This is more lightweight for the endpoint to handle, and does not cause any unnecessary processing.