Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add data-file output option #305

Closed
bbros-dev opened this issue Jul 5, 2021 · 6 comments
Closed

RFC: Add data-file output option #305

bbros-dev opened this issue Jul 5, 2021 · 6 comments

Comments

@bbros-dev
Copy link
Contributor

bbros-dev commented Jul 5, 2021

Prompted by issue #303 and PR #304

Add a data-file output option (under Metrics)

Metrics:
.....
-D, --data-file NAME        Create an CSV-formatted data file of response times (nanoseconds)
    --data-fields f1,...fn  Comma delimited list of fields.  Values: elapsed, co_elapsed, raw, co_adjusted
    --append                Append data to the data-file NAME if it exists.  Create if not exists.

This produces a data file of response times suitable for analysis such as that described in PR #304.

Example (data comes from README mitigation section):

datetime, sample, user, code, elapsed, co_elapsed, raw, co_adjusted
2021-04-05T14:30:01Z,1,2,200,185835,11965,11965,11965
2021-04-05T14:30:01Z,2,2,200,185835,11965,,8804
2021-04-05T14:30:01Z,3,2,200,185835,11965,,5643
2021-04-05T14:30:01Z,4,2,200,185835,11965,,2482
....
....
....

where all data files contain the fields:

  • datetime: ISO-8601 date-time stamp of the start of the Goose run. Used to distinguish between Goose runs. Not used to calculate response times. Not used to debug.
  • sample: integer, row counter
  • user: The Goose user ID (integer)
  • code: The HTTP response code

Data files include additional fields passed as a comma delimited list in the option --data-fields

  • raw: integer, raw response time (nanoseconds)
  • co_adjusted: integer, CO-adjusted response time (nanoseconds)
  • elapsed
  • co_elapsed
@jeremyandrews
Copy link
Member

Please see the --requests-file which is intended to provide request-related logs. It can be written in multiple formats, including csv, json, and raw (ie, Rust's debug display of a struct). When I only want a subset of the information, I generally create a json log and then use jq to grab the specific columns I'm interested in.

A datetime field would need to be added, PR welcome.

We don't currently count the rows, I'm not sure what it adds / why it's necessary? As there's lots of threads making requests at the same time we'd have to "fake" it with the counter.

Finally, the individual response times are not adjusted, so the last two rows aren't an option (and this is the same in HdrHistogram). If you review the mitigation section of the README you can see an example of what shows up in the Requests log. Specifically, if the coordinated_omission_elapsed column is non-zero, this metric was generated through coordinate omission mitigation.

@jeremyandrews
Copy link
Member

BTW: It's also worth noting that I'm reworking the logger logic in this issue which should get merged soon:
#302

@bbros-dev
Copy link
Contributor Author

Please see the --requests-file

Yes the requests file is a reasonable workaround.
The data file is targeted at analysis uses cases, for questions such as 'Is this latency sample distribution different from a prior run?'

We don't currently count the rows, I'm not sure what it adds

It adds two things:

  • It aids when truncating k-samples at a common row count. I've added an --append option which will help produced single (stacked) data file for analysis.
  • It's a ready made index that combined with the date-time should be unique in most cases.

If you review the mitigation section of the README...

Apologies for seeming obtuse, but I found it very confusing. I'll add comments in the relevant PR.

@jeremyandrews
Copy link
Member

I'd consider a PR that makes it configurable which fields from a given entity are logged, but for now I think it's enough to just be sure that all necessary fields are included. I don't want to get to bogged down in this just yet when tools already exist that can easily map the --requests-file to what you need.

Due to the way that GooseUser's each run in their own thread, I don't think you're going to find sufficient consistency simply numbering all requests in the order made: this will inherently get shuffled a little each time you run (not due to Goose which works very hard to ensure each test is sequenced the same, but due to whatever Goose is interacting with and the network in between).

Instead, perhaps it would work better to filter by a given GooseUser (they each have in integer ID, starting at 0 and incrementing by 1) and then to compare the elapsed times for requests made by that thread. You could potentially include a per-GooseUser counter to aid in comparisons.

Apologies for seeming obtuse, but I found it very confusing. I'll add comments in the relevant PR.

Not at all: it's new documentation and a work in progress. PRs to improve readability are welcome.

@bbros-dev
Copy link
Contributor Author

bbros-dev commented Jul 7, 2021

I don't want to get to bogged down in this just yet when tools already exist that can easily map the --requests-file to what you need.

Agree the requests file is a existing workaround. This is aimed at making it convenient to generate data ready for analysis.

I've amended the proposal to reflect this: One thought was to alter the RFC to be a generic solution:

--analysis-file NAME  Create an CSV-formatted data file for analysis
--analysis-fields F1,F2,..FN Comma delimited field names to output to analysis-file.  Field names: <TBD> 

I don't think you're going to find sufficient consistency simply numbering all requests in the order made

Not a problem. Generally people seem interested in the distributional properties, e.g. the tail, 99th %-tile etc. For distributional purposes the order requests are made is not relevant, hence the absence of a request time stamp - the datetime field is a 'run' identifier.

While time-series properties are, IMO, of interest, e.g. are slow/fast responses clustered (tests for heteroskedasticity are well established), they are not at this stage the subject of this RFC.
That is for the reason you cite - its not yet clear to us that Goose/Gaggle can generate that data.

Though, as we mentioned elsewhere, it is not clear to us these systems are always ergodic - perhaps that is why we generally see no sophisticated analysis of this type?

Instead, perhaps ...

Hopefully all that analysis is possible given the updated RFC.

PRs to improve readability are welcome.

OK we'll open a RFC to discover direction.

@bbros-dev
Copy link
Contributor Author

bbros-dev commented Jul 10, 2021

We won't be able to pursue this beyond the documentation contribution which has landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants