Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion to default to use_binary_file=True for KS4 sorting #3611

Closed
guptadivyansh opened this issue Jan 13, 2025 · 6 comments · Fixed by #3614
Closed

Suggestion to default to use_binary_file=True for KS4 sorting #3611

guptadivyansh opened this issue Jan 13, 2025 · 6 comments · Fixed by #3614
Labels
sorters Related to sorters module

Comments

@guptadivyansh
Copy link
Contributor

guptadivyansh commented Jan 13, 2025

Continuing the discussion from the merged PR here..

The suggestion is to make change the default value for the parameter.

The flag use_binary_file makes a big difference to KS4 sorting performance. Writing the binary does take a while but overall I managed to sort a ~2 hour recording in ~4 hours with use_binary_file = True compared to ~22 hours with the default use_binary_file = False for the same recording. So, thank you for adding this param!

Workstation: Intel i7 8700K, Nvidia 3060 12GB, 64GB RAM, SSD, Windows

It would be good to hear if others are also seeing such a big performance difference.

Although as @sameulgarcia mentioned, it is weird that KS4 is so slow when passed the SI recording internally. If someone can get to the bottom of that, then perhaps we could have the best of both worlds; fast sorting without explicitly writing to disk.

In the meantime, use_binary_file = True might be a better default

Originally posted by @guptadivyansh in #3339 (comment)

@zm711
Copy link
Collaborator

zm711 commented Jan 13, 2025

This was also found to be true for Mountainsort5 which started caching the recording (ie writing to disk) before sorting which leads to great speed-ups to the sorting itself. So we see this in multiple sorters. One hypothesis that may contribute in the case of KS4 is that although they use spikeinterface objects natively they do not make use of any of our multiprocessing. Maybe their binary support has speed-ups that they have not applied to our native objects. They directly say this in the comments of their code (the comment about not using spikeinterface multiprocessing for their wrapper of a recording).

@zm711 zm711 added the sorters Related to sorters module label Jan 13, 2025
@zm711
Copy link
Collaborator

zm711 commented Jan 13, 2025

@samuelgarcia do you want to argue for keeping the current default? I think @alejoe91 may have also slightly leaned toward the current default too?

@alejoe91
Copy link
Member

Given that this came up several times, I think we should have use_binary_file true by default. From Sam's comment #3339 (comment) he also agrees :)

@zm711
Copy link
Collaborator

zm711 commented Jan 13, 2025

That's a really chill PR @guptadivyansh any interest in trying it yourself or do you want us to just deal with it?

@guptadivyansh
Copy link
Contributor Author

Sure, I'll give it a go! (excited for first contribution)

But first, do others also see such a big difference in performance between the two?

@chrishalcrow
Copy link
Collaborator

I've found that the difference in performance depends on the system I'm using. On my Mac, it didn't make a big difference, but on the HPC I use, it did! No idea why...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sorters Related to sorters module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants