Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ban Dirty Save performance is very bad #5

Open
Lazersmoke opened this issue Feb 26, 2018 · 15 comments
Open

Ban Dirty Save performance is very bad #5

Lazersmoke opened this issue Feb 26, 2018 · 15 comments

Comments

@Lazersmoke
Copy link

When using /bansave or shutting down the server, BanStick hangs on Ban Dirty Save for a very long time.

It appears to not batch properly, as in it saves each dirty ban one at a time in its own SQL query, since SQL show processlist always reports one or two queries running for 0ms. This is probably because dirtyBans is being populated by one of the scrapers at the same time as it is being consumed by the saveDirty method, creating a race condition between adding the next ban from the scraper and saving the next ban. If the saving wins,, it finishes the loop and exits. If the scraper wins, it continues the loop, which is the observed behavior.

@ProgrammerDan
Copy link

ProgrammerDan commented Feb 26, 2018

referring to here: https://github.com/DevotedMC/BanStick/blob/master/src/main/java/com/programmerdan/minecraft/banstick/data/BSBan.java#L119
?

as called by this:

https://github.com/DevotedMC/BanStick/blob/master/src/main/java/com/programmerdan/minecraft/banstick/handler/BanStickDatabaseHandler.java#L333

each dirty ban should not be saved individually. It creates a batches of 100 if there are any left to save, otherwise ends quickly...

I can't match "scraper" notes here to actual code, as dirty save has no direct relationship at all to scrapers -- they simply share a queue, which is non-blocking? They aren't even running on the same thread, so unless it's being powered by potatoes this should be good.

Toss me some logs?

@Lazersmoke
Copy link
Author

Lazersmoke commented Feb 26, 2018

This adds more bans to the queue:

public void setBanEndTime(Timestamp banEnd) {
this.banEnd = banEnd;
this.dirty = true;
dirtyBans.offer(new WeakReference<BSBan>(this));
}

It is called here: 1 2

I think (but not sure) that this causes this loop to run many times. It seems like the scrapers will put a handful of bans in the queue, then the saver saves those in a small batch, then the scraper adds more in while that batch is executing, then the saver tries to save these new ones.

Toss me some logs?

What logs do you need, and how can I get them? The issue is that it hangs on Ban dirty save, which produces no logs.

Also worth noting this happens sporadically, not every time.

@ProgrammerDan
Copy link

Are you observing

Ban batch: 1 saves

repeatedly in the logs? Otherwise I wonder if this is a suboptimal database -- each save is indeed its own line, but is sent to the database in a batch -- e.g. the SOP for batching of UPDATES... no other way to really do it at scale...

At the time you see Ban Dirty Save the scrapers should have long since shut down, so unless your queue is already absurdly large at that point, there is no good reason for a slowdown.

CivSpy used to give me trouble but never banstick :-/

@Lazersmoke
Copy link
Author

bansticklog

@Lazersmoke
Copy link
Author

Nothing about the batch size in logs... Is there a way to turn that on?

@ProgrammerDan
Copy link

Ok. Prepare to get blasted but, turn on debug: true in the config. Then, next time this happens, send me the log of that same stretch. Probably just a ton of 100 element batches if the scrapers had generated a great volume of items.

Make sure that your database isn't IO bound at some point -- if binary logging is turned on, I strongly recommend turning it off unless you're master/slaving...

@Lazersmoke
Copy link
Author

Is that in BanStick or Spigot config?

@ProgrammerDan
Copy link

Banstick. Root option, based on ACivMod iirc

@Lazersmoke
Copy link
Author

Looks like you were right, it runs in batches of 100, and each one takes a couple seconds. Does it really need to save this many bans, though? It makes the server freeze for ~30 seconds when shutting down, which makes players timeout instead of disconnecting normally.

banstickverboselog

@ProgrammerDan
Copy link

Does it need to? No, turn off some scrapers. These are exclusively data updates to existing bans, so something is telling banstick that there is new information to be saved. We could spend some time tracing stuff out and try to figure out if its updating things it doesn't need to, but at this point my question is -- why does it take 3 seconds for your dbms to run 100 updates? I wonder if there is something could be done to make that more efficient -- perhaps we are missing an index somewhere

@Lazersmoke
Copy link
Author

This might be the root cause:

mainproblem

Every record has its own ban message :/

@ProgrammerDan
Copy link

ProgrammerDan commented Feb 26, 2018

If it seems like the TOR tracking is out of hand, might be worthwhile to just disable it. I can count on one hand how many bans from TOR I issued...

@ProgrammerDan
Copy link

The only thought I had is, I suspect that only the end date is changing, but it's sending the message in the update package... that could cause it to go a bit slower, even though, 30ms per update is shockingly slow, even with a tiny TEXT message.

@Lazersmoke
Copy link
Author

This is on Teal's test server, so it a) is probably a potato and b) doesn't really need BanStick at all, except for consistency with the real server. This isn't really an issue on live except for the freeze when shutting down, so I'll just disable most of the TOR/scraper stuff on the test server. :)

@ProgrammerDan
Copy link

Ah! Ok... yes, on low-end servers I could see this approach being an issue, hmm. It's worthwhile to note, so I'll leave this open. Never did much testing on potato-servs, but most people start on one.

As for live, I'm tempted to say, let's put these "final" shutdown tasks on a separate thread... the risk, however, is that other database processes might also then run concurrent and cause new, worse slowdowns / resourcing problems... hrm.

Any chance to via an admintools plugin to just, I dunno, kick everyone before this stuff starts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants