-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): library refresh go brrr #14456
base: main
Are you sure you want to change the base?
Conversation
0eb1440
to
80aa615
Compare
80aa615
to
8ecde3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start! I think there are still a lot of untapped potential improvements here.
The update to |
Thanks for your comments @mertalev ! I'll first attempt to do the import path and exclusion pattern checks in SQL and then move to your suggestions |
d394654
to
8b2a48c
Compare
6d69307
to
c26f6aa
Compare
c26f6aa
to
a3be620
Compare
775b817
to
69b273d
Compare
Never thought of that, I've implemented your suggestion. I'm also considering changing the initial import code to ignore file mtime, this allows us to not do any file system calls except for the crawl. Metadata extraction will have to do the heavy lifting instead |
Would that mean you queue them for metadata extraction even if they're unchanged? You can test it but I think it'd be more overhead than the stat calls. Edit: also if you do this with the source set to |
I was referring to new imports, files that are new to immich. I hoped to improve the ingest performance by removing the stat call. After testing, there are two issues:
If we can mitigate the two issues above, I can rewrite the library import feature and do that in batches as well! |
I don't see why fileModifiedAt needs a non-null constraint in the DB. Might just be an oversight that didn't matter because it didn't affect our usage. I think you can change the asset entity and generate a migration to remove that constraint. For sidecar files, maybe you could add |
I might just put new Date() in at the moment to keep the PR somewhat constrained. Regarding sidecars, I have thought about that, problem right now is that we're batching the crawled files in batches of 10k. It might be hard to do get that working alright. Maybe I'll just queue a sidecar discovery for every imported asset for now |
d876ee9
to
4d1dd39
Compare
…/inline-offline-check
4d1dd39
to
49eaafa
Compare
c49b406
to
1155de6
Compare
…/inline-offline-check
1155de6
to
9d8a9d9
Compare
e266307
to
0669169
Compare
…/inline-offline-check
async getLibraryAssetCount(options: AssetSearchOptions = {}): Promise<number | undefined> { | ||
const { count } = await this.db | ||
.selectFrom('assets') | ||
.select(sql`COUNT(*)`.as('count')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can eb.fn.count
…/inline-offline-check
6f7b903
to
a4421dd
Compare
a4421dd
to
0830e9d
Compare
This PR significantly improves library scanning performance. Wherever suitable, we are doing jobs in batches, and many looped database interactions are replaced with SQL queries.
User testimonials
"@etnoy what on earth have you done. I tried your PR and it finished the scan for 1M assets in 37 seconds down from 728s on main. It takes 188s just to finish queuing on main" -- @mertalev
Changes made
Plus several minor cleanups and performance enhancements.
The performance improvements are at least an order of magnitude in library scanning.
Benchmark 1
A library scan with 22k items where nothing has changed since the last scan used to take 1m 22s, now it's below 10 seconds, an improvement of 87 percent!
Benchmark 2
A clean library import with 19k items takes 1m40s in main and 7 seconds in this PR.
NOTE: this benchmark is only the library service scan and does not include the metadata extraction. Also, some fs calls have been migrated from the library service to the metadata service, although this should only have a minor impact on overall scan performance
Benchmark 3
Importing a library with >5M assets.
No need to compare to main, you know it's fast!
Benchmark 4
Importing a library of 527041 files took 1m58s (without metadata extraction) in this PR.
No need to compare to main, you know it's fast!
Bonus:
This scan imports all new files:
This is an "idle scan", where a refresh finds no changes:
Future work:
Final note:
This PR allowed me to hit a milestone of 10M assets in a single Immich instance, likely a world-first. This does require max-old-space-size=8096, but that's to be expected anyway