Introduce vlaudit-stats #9

adeason · 2020-12-10T20:30:27Z

Introduce a new tool, called vlaudit-stats, which can generate vldb
access statistics from a vlserver audit log (currently, only the
'pipe' audit interface format). Typically, a daemon process is run via
'vlaudit-stats daemon', and the stats are collected via 'vlaudit-stats
stats-get'.

Introduce a new tool, called vlaudit-stats, which can generate vldb access statistics from a vlserver audit log (currently, only the 'pipe' audit interface format). Typically, a daemon process is run via 'vlaudit-stats daemon', and the stats are collected via 'vlaudit-stats stats-get'.

adeason

this is a bit of a "first pass", though I think functionally it should be complete. this is missing proper end-user documentation (but I'm not sure how much that's needed?) and should probably have some more comments and docstrings. the tests don't cover everything, but they should cover a decent amount of functionality.

this also of course requires that the vlserver actually generates the needed audit messages. adding those to openafs is in openafs gerrit 14467, so this can't be used with any existing openafs release yet.

the performance of this has been a concern in the back of my mind... when using --bench with a big artificial audit log, I get about the following (on server-class hardware, a proliant of some kind):

rh7 in a vm: 230k audit messages per second
debian 10 bare metal: 320k
debian 10 bare metal, pypy: 630k

and on my old laptop I get more like 110k. that seems like it should be good enough given the current performance capabilities of the actual vlserver.

adeason · 2020-12-10T20:33:00Z

admin/vlaudit-stats/vlaudit-stats

+        warn("msgid went backwards: %d -> %d (audit tstamp %s -> %s)" % (
+             from_id, to_id, from_ts, to_ts))
+
+    def msgid_gap(self, n_miss, from_ts, to_ts):


I wasn't sure if message "gaps" should even be recorded in the stats, or we should just log them. the should be rare, and it might be easier to just detect them by looking in logs

adeason

note that 14467 has changed the format of the relevant audit line. I'm trying to -1 this PR to flag that this needs a change, but github seems to not want to let me (maybe I can't -1 my own PR? ugh)

adeason · 2020-12-14T23:03:46Z

admin/vlaudit-stats/vlaudit-stats

+    # We don't really distinguish between GetEntryByID and GetEntryByName
+    # requests in here. If we wanted to in the future, just check for
+    # reqvol.isdigit() to see if it's a numeric request.
+    getent_pat = re.compile(r'^\[\d+\] ... ... \d\d \d\d:\d\d:\d\d \d{4} EVENT AFS_VL_GetEnt CODE ([0-9]+) NAME .* HOST ([^ ]+) (STR|LONG) (.*) LONG ([0-9]+) STR (.*) $')


openafs gerrit 14467 has changed the relevant audit names here. I have this fixed locally, but I won't push the change here yet, in case 14467 changes any more, or if this PR yields any additional comments in the meantime :)

adeason commented Dec 10, 2020

View reviewed changes

adeason commented Dec 14, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce vlaudit-stats #9

Introduce vlaudit-stats #9

adeason commented Dec 10, 2020

adeason left a comment

adeason Dec 10, 2020

adeason left a comment

adeason Dec 14, 2020

Introduce vlaudit-stats #9

Are you sure you want to change the base?

Introduce vlaudit-stats #9

Conversation

adeason commented Dec 10, 2020

adeason left a comment

Choose a reason for hiding this comment

adeason Dec 10, 2020

Choose a reason for hiding this comment

adeason left a comment

Choose a reason for hiding this comment

adeason Dec 14, 2020

Choose a reason for hiding this comment