-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache-less fast file search by directly reading the file system #61
Comments
i second this |
I just wanted to suggest the MFT store, however, I thought to use it to build the index/cache which would speed up subdirectory search when/if implemented. |
But you would have to overhaul the cache, as the "path" operation is really slow, because you have to walk the entire path to the root directory, Which can be solved by normal tree walking utilizing the index attribute, or to directly build a tree from leaf to root.
This method is a trade-off between linear search speed (CPU cache hits, because there is no pointer chasing like a Vec of Strings), hash lookup speed and also memory occupancy |
There is a way on Windows, and definitely also on Linux, to directly read the file system structure, which don't involve the operating systems file caching, search indices, etc.
And when searching for a file over the entire disk, it's way faster to just read for example the NTFS Master File Table; As a proof on concept, I created a project which currently implements this kind of file search for Windows NTFS file system: https://github.com/valaphee/curtain/blob/main/storage/examples/search.rs, and it is only limited by the speed of how fast the Master File Table can be read, and only takes about 1 second on my PC when searching on an 512GB NVMe volume with around 200k files.
This might also be combined with file system watches and caching. But without writing the cache to disk, as the load time of the cache is slower than just reading the file entries. And it also increases the disk wear level by an unnessary amount, especially as compressed data may completely change by just adding/removing one file.
The internals are little bit more complicated as it basically simulates the NTFS file system read operation and it also requires administrator privileges. One other downside is that it might be slower when limiting the search like for example searching for example by directory which has <= Total NTFS files / 2, which can be optimized by parsing the file structure, etc. with the cost of higher memory usage.
But its the technically the fastest way for searching files.
The text was updated successfully, but these errors were encountered: