Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink the size of OpenBLAS DLLs on Windows #175

Open
carlkl opened this issue Aug 8, 2024 · 9 comments
Open

Shrink the size of OpenBLAS DLLs on Windows #175

carlkl opened this issue Aug 8, 2024 · 9 comments

Comments

@carlkl
Copy link

carlkl commented Aug 8, 2024

Shrinking the size of OpenBLAS binary size can be done with several ways:

  1. Use the same OpenBLAS DLL for numpy as well for scipy
  2. use of DYNAMIC_LIST to reduce the number of targets
  3. strip DLL

(1) has the greatest impact on the overall size of a python installation as well as on the memory consumption of a python process. There is no good reason to keep two dedicated OpenBLAS binaries in a process.
There are two ways to accomplish this: a) scipy could use the OpenBLAS DLL from numpy or b) numpy as well as scipy both depend on a dedicated OpenBLAS wheel with a OpenBLAS.dll included. (b) has the advantage to allow for easy monkey-patching the OpenBLAS DLL, i.e. with less more threads enabled if needed.

(2) in a similar vein as #166

(3) included in #85 with the help of -Wl,-gc-sections -Wl,-s in the linking stage.

@rgommers
Copy link
Collaborator

rgommers commented Aug 8, 2024

Re (1), the relevant issue is scipy/scipy#15129. This isn't happening soon, it's blocked for at least two reasons (ILP64 vs. LP64, and Python packaging standards forbidding us from having an extra dependency in some wheels only).

(2) is being done.

(3) is always a good idea - if stripping isn't optimal yet, that's great to fix.

@mattip
Copy link
Collaborator

mattip commented Aug 8, 2024

The scipy-openblas-0.3.27.44.3 wheels, without #85, are here. The win_amd64 one is 10.7MB.

The scipy-openblas-0.3.27.44.4 wheels, with #85, are here. The win_amd64 one is 10.0 MB. Adding #85 and backing out a windows threading issue saved ~0.7MB.

It seems #177 will shrink the wheel to 6.7MB. 🎉

@rgommers
Copy link
Collaborator

rgommers commented Aug 8, 2024

That is excellent, thanks Matti. Also quite useful to have two tagged versions with the only change being the size change due to the dropped architectures - that's going to help in case we get some issue that may possibly be related.

@mattip
Copy link
Collaborator

mattip commented Aug 8, 2024

Yes, although it might be difficult to untangle windows performance bug reports. 0.3.27.44.4 reverts windows threading improvements which obviously impacts performance, and then 0.3.27.44.5 removes some kernels. We can use linux as a control platform, since it will only have the kernel removals.

@carlkl
Copy link
Author

carlkl commented Aug 9, 2024

Adding the flag -fno-ident will take out some noise out of the binary as well.

@mattip
Copy link
Collaborator

mattip commented Aug 9, 2024

According to the documentation

-fno-ident
    Ignore the #ident directive.

Is there much use of that directive in gcc and/or OpenBLAS?

@carlkl
Copy link
Author

carlkl commented Aug 9, 2024

It puts a string constant into the binary - for each individual function! You can identify the gcc version used for the build process.

If you are not sure about the usage from elsewhere one could compile one function with -fident, which is enough for the OpenBLAS binary.

@mattip mattip mentioned this issue Aug 9, 2024
1 task
@mattip
Copy link
Collaborator

mattip commented Aug 11, 2024

Looking at the artifacts in the CI runs from #178 before- and after-adding -fno-ident it does not seem to change the size of the shared object.

@carlkl
Copy link
Author

carlkl commented Aug 17, 2024

Hm, I forgot, that -Wl,-gc-sections removes all ident strings, so -fno-ident has no effect anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants