-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme slowness with /replay list
with 13,000 replays
#277
Comments
Have you identified if the slowness is on reading file or sorting? If it is while sorting at the best sort can do n*log(n) but if the results are only max 100 we could do that operation in 100n that for very big big n can be lower :( |
Basic optimization rules apply. For any operation that is non-performant, you can
If scanning 13,000 files is too costly (one might imagine so), you can perform a time/memory exchange and store the results, either in memory or create a database or sorts. It is the nature of this problem that the information of each file doesn't change, so really only new files need to be scanned. This is a form of "do it less often" optimization. Let's suppose you had a cached replay list, The challenge is one of performance. How quickly after a new replay is created do you want it to be seen in response to FWIW, you could use |
How long does it take bzfs to scan through those 13,000 replay files? |
On server startup, it could lookup the files and cache the results. |
Doing some very basic/stupid profiling on where the slow down is, I've added some calls to
Removing the call to Sorting seems is pretty instant, so it appears to me that the biggest slow down is opening the files. |
13,000 files will stress the OS I/O cache. 26,000 will be even worse... Starting an asynchronous cache of the required information at server start is a pretty easy fix |
Some suggestion that I found is to use openat instead of open and pread instead of fseek and fread |
How big of an issue would it be if you cannot do The approach I'm taking is twofold:
Also a small side question, doing a |
Something to keep in mind is that most recordings are actively streamed to the disk. In this mode, the replay header is initially written with a recording duration of 0, and when the recording is stopped this duration is updated. So if we're able to read the recording while it's being actively written (I've not tested that), you would not want to cache information about recordings that have a duration of 0. |
I've been looking into this further the last few days. When we're listing replays, we loop through all of them, open the file, and run two fread calls. The first fread is to read the magic string to identify if it is a BZFlag replay file. Then we seek ahead a bit and read the replay duration, which we show in the list output. So far it looks like just the act of running a fread for the magic string on thousands of files takes a good chunk of time. I made two example directories of replays, one with 3000 files of 16MB each (some tournament replay I had) and 15000 replay files I created with a basic program that just writes out the replay header and a world database dumped from an empty world file (502 bytes per replay). I compiled this minimal program: #include <stdio.h>
#include <stdlib.h>
#include <string>
#include <iostream>
typedef uint32_t u32;
int main() {
std::string RecordDir = "recordings/";
DIR* dir = opendir(RecordDir.c_str());
if (dir == NULL)
return 1;
struct dirent *de;
unsigned long filesRead = 0;
while ((de = readdir(dir)) != NULL)
{
std::string name = RecordDir;
name += de->d_name;
FILE* file = fopen(name.c_str(), "rb");
u32 magic;
char buffer[sizeof(magic)];
if (fread(buffer, sizeof(magic), 1, file) != 1)
{
fclose(file);
continue;
}
filesRead++;
fclose(file);
}
std::cout << "Total files read: " << filesRead << std::endl;
return 0;
} I ran the test on a Debian system with an Intel Core 2 Duo E8400 3GHz, 6GB RAM, and 80GB SATA hard drive. With 3000 files the program took 40 seconds and the 15000 files took 94 seconds. Before each run I executed as root Things to think about:
|
I have not tested it, and cannot, but maybe this is faster: #include <iostream>
#include <dirent.h>
#include <fcntl.h>
#include <unistd.h>
typedef uint32_t u32;
int main() {
std::string RecordDir = "recordings/";
DIR* dir = opendir(RecordDir.c_str());
if (dir == NULL)
return 1;
int dfd = open(RecordDir.c_str(), O_RDONLY);
struct dirent *de;
unsigned long filesRead = 0;
while ((de = readdir(dir)) != NULL)
{
int file = openat(dfd, de->d_name, O_RDONLY);
u32 magic;
char buffer[sizeof(magic)];
if (read(file, buffer, sizeof(magic)) != sizeof(magic))
{
close(file);
continue;
}
filesRead++;
close(file);
}
std::cout << "Total files read: " << filesRead << std::endl;
return 0;
} |
I just tested that variant and it takes basically the same amount of time. |
In case this is helpful to anyone, this is the output from two consecutive runs of my original test program, with the first time being with caches cleared. blast007@debian:~$ /usr/bin/time ./filetest
Total files read: 3001
0.02user 0.13system 0:41.01elapsed 0%CPU (0avgtext+0avgdata 3488maxresident)k
102104inputs+0outputs (25major+137minor)pagefaults 0swaps
blast007@debian:~$ /usr/bin/time ./filetest
Total files read: 3001
0.01user 0.03system 0:00.04elapsed 93%CPU (0avgtext+0avgdata 3468maxresident)k
0inputs+0outputs (0major+148minor)pagefaults 0swaps I'll also attach my 15000 generated replay files. They're enough for listing replays, but will error out if you try to actually load one since it's missing data after the world database. They even have randomized replay durations, so sorting by duration is an option for testing performance. |
Do we have a target for how fast we should be able to read the headers of 15000 files? How fast is fast enough? |
Using modern C++, #include <filesystem>
#include <fstream>
#include <iostream>
#include <chrono>
void test(std::filesystem::path const& path)
{
for (auto const& dir_ent : std::filesystem::directory_iterator(path))
{
std::ifstream f(dir_ent.path(), std::ios_base::in|std::ios::binary);
if (!f.good()) { std::cout << "problem opening " << dir_ent.path() << std::endl; return; }
uint32_t magic;
auto expected = sizeof(magic);
f.read(reinterpret_cast<char*>(&magic), expected);
auto nRead = f.tellg();
if (nRead != expected)
{
std::cout << "problem reading " << dir_ent.path()
<< ", read " << nRead << " of " << expected << " bytes."
<< std::endl;
}
}
}
int main()
{
auto path = std::filesystem::path("recordings");
auto start = std::chrono::steady_clock::now();
test(path);
auto stop = std::chrono::steady_clock::now();
std::cout << "Elapsed: "
<< std::chrono::duration_cast<std::chrono::duration<double>>(stop - start).count()
<< " seconds"
<< std::endl;
return 0;
} iMac 3.3GHz Intel Core i5, 32GB 2667 MHz DDR4, 2TB Flash storage Ran your files in 2 seconds
Does that help? |
Basically the same on my test system (with the 3001 x 16MB replays): blast007@debian:~$ /usr/bin/time ./filetest3
Elapsed: 40.2098 seconds
0.04user 0.14system 0:40.37elapsed 0%CPU (0avgtext+0avgdata 3644maxresident)k
102456inputs+0outputs (26major+140minor)pagefaults 0swaps For the 15000 files, here's what it's like on that Core 2 Duo system with the hard drive: blast007@debian:~$ /usr/bin/time ./filetest3
Elapsed: 93.2741 seconds
0.13user 0.66system 1:33.44elapsed 0%CPU (0avgtext+0avgdata 3504maxresident)k
133448inputs+0outputs (26major+138minor)pagefaults 0swaps And on a $6/month Digital Ocean droplet (1 shared CPU core, 25GB SSD, 1GB RAM): blast007@xs:~$ /usr/bin/time ./filetest3
Elapsed: 4.91171 seconds
0.18user 0.69system 0:04.92elapsed 17%CPU (0avgtext+0avgdata 3564maxresident)k
132480inputs+0outputs (9major+145minor)pagefaults 0swaps |
looks like your files are on a network drive? That's going to be painfully slow. |
No, there are no network drives involved in any of this. |
So I'm kinda stumped on how to proceed with fixing this situation. On my Leagues United replay server, I have over 13,000 replays and running
/replay list
is extremely inefficient and leads to server freezes.bzflag/src/bzfs/RecordReplay.cxx
Lines 865 to 942 in 3be72c3
What I've gathered from reading through this code is: opening up 13,000 files and extracting the match date isn't the most efficient solution. Even though
sendFileList
will only output 100 results to the user as private messages, the server is still processing all 13,000 replays.My thought for an immediate fix (2.4) is to change this functionality to use the filesystem "last modified" date for the sorting by time even though this can sometimes be inaccurate.(This is not the cause of the slowdown so doing this would prove useless)For 2.6, I would think having a replay server doing some sort of caching on boot to optimize these listings would be ideal.
The text was updated successfully, but these errors were encountered: