-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shrink aarch64 wheels #170
Comments
The |
I remapped any common targets back together in OpenMathLib/OpenBLAS#4389, unsure how to tell which targets are less used and be removed 🤔 Also ref: https://github.com/OpenMathLib/OpenBLAS/blob/develop/Makefile.system#L686-L700 |
Thanks. Is |
BLAS-benchmarks runs on a c7g.large instance (https://aws.amazon.com/ec2/instance-types/c7g/) via https://github.com/OpenMathLib/BLAS-Benchmarks/blob/main/.cirun.yml Also, does @czgdp1807 benchmarking machinery handle aarch architectures? |
In manylinux2014 with GCC 10.2 you should get the SVE targets. For certain toolchains, such as the |
Cool, thanks
That is graviton3, so should be as good as it gets.
I think so, you need to specify a different set of kernels. You can see which ones in the Maekfile.system from this comment |
Ok, one benchmark: this is Linux on arm64 not MacOS on a c7g.large machine on AWS:
The rest of benchmarks are running, will see how different they look. |
It'd be good to test these on an |
full bench suite on c7g: https://gist.github.com/ev-br/c1a35b386c90d8eaac484520d8256927 |
I've tried tweaking some constants in OpenMathLib/OpenBLAS#4833, if we do this, we could potentially have Do you mind benchmarking these changes @ev-br ? |
TL;DR: not easily, sadly. There are two ways OpenBLAS benchmarks run currently:
Both were set up as a part of an STF project co-PI-ed by @martin-frbg and @rgommers . The AWS costs for blas-benchmarks weekly runs are also picked up by Quansight (I believe). I'm happy to help extending the set of benchmarks these two services run --- do you have suggestions what would be useful to add? Large-scale restructurings I'm also happy to work on, but these will have to be cleared through Quansight first. Neither of these has per-kernel granularity though. I was only able to run these one-off experiments because a) Matti and Gagan had the benchmarking scripts, b) I have the AWS setup ready from the blas-benchmark work, and c) Quansight basically shrugged off the costs of a couple of hours of CPU and engineering time. I'm definitely happy to evolve either sets of benchmarks or set up some other strategy---when it's cleared with Quansight. So possible concrete steps: Easy ones:
Needs some design:
|
I think this is fairly low-prio? I'd move from TravisCI to Cirrus CI and be done with it to address the CI problem. The gain in binary size is much more limited than for x86-64, plus download numbers are way lower. So I don't think this is worth spending a lot of time on at the moment. |
Hi @ev-br, I meant the benchmarks in #170 (comment) only 😸 It those one-shot benchmarks show the @mattip is it easy to use the infra in this repo to build from a my branch of OpenBLAS? It'd be easier than trying to recreate the build parameters you've used 😸 @rgommers understood, hopefully this minimal amount step is enough 😸 |
Yeah, a technical hurdle here is that numpy benchmarks need a python wheel, and I'm not sure how to generate one from a local OpenBLAS build. |
🐸only do flywheels, but perhaps it would be sufficient to replace the libscipy-openblas in numpy.libs with your identically named own build after installing the stock numpy wheel ? |
I wonder if the problem with aarch64 builds on travisCI is that we are running out of memory and the build process is killed (on manylinux/glibc). Travis has a 3GB limit. Similar to issue #144 and the PR #166, we should benchmark aarch64 on a high-end aarch64 machine.
@ev-br is this something you could do? Is the AWS m7g instance (with a graviton3 processor) advanced enough to use the
THUNDERX3T110
kernels or is that targeting some other processor?The text was updated successfully, but these errors were encountered: