-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoongArch64: Update symv #5061
LoongArch64: Update symv #5061
Conversation
The SSYMV behaviour looks a bit curious here, both before and after your change. (Probably my switchover point for multithreading in interface/symv.c - at 200x200 - is not optimal too) |
Hi, @XiWeiGu With just released OpenBLAS 0.3.29 version, gmake compilation fails on cblas_ssymv test (see below). System: Debian sid loong64, Kernel 6.12.9-loong64 (and also on AOSC OS (12.0.3) loongarch64 with kernel 6.11.10)
|
I cannot reproduce this problem on the only Loongarch64 hardware I have access to, a 3C5000L-LL in the GCC Compile Farm running Debian Trixie (and gcc 14.2 built from source), binutils 2.43 from Debian, LA464 target was autodetected. (Not sure if that helps, but at least this is using the new kernels from this PR) |
Thank you for your reply. I will test to see if it is related to the threshold for enabling multithreading |
It seems to be a precision issue. My testing environment is as follows, and all tests can pass.
I will later verify it on an AOSC OS (12.0.3) system |
I got the same results on AOSC OS (12.0.3) system with gcc/gfortran 14, but let us see whether you can reproduce. In my tests it seems that both cblas_ssymv/cblas_dsymv have precision issue. (entering ctest sub-directory, and run "./xscblat2 < sin2" or "./xdcblat2 < din2" to verify). On the other side, cmake compilation seems OK without any problem. |
cmake uses O3 by design in Release mode, and no optimization option if no mode was specified (IIRC). Maybe this is enough to cause the difference in the loongson backend, there were a few cases in the past where I had to disable optimization by a pragma in one of the ctest sources to get rid of spurious test failures. |
Only ctest. test/sblat3 is OK, though there are floating-point exceptions as following:
|
I conducted some tests, and it should not be related to the matrix size threshold you set for enabling multithreading. I tried adjusting the matrix size for multithreading (300x300, 400x400), and at these sizes, there is a sharp performance drop compared to single-threading. However, the performance gradually recovers as the matrix size increases. This issue only occurs on LoongArch64, and I did not observe similar behavior on x86-64. |
I encountered the same issue in the following environment:
Error message:
I currently have no idea why it is causing the failure; further analysis is necessary. |
I asked the maintainers of AOSC OS for help on this issue since they have more Loongson computers. They confirmed this compilation error and say this happens only on 6000 series (la664) architecture. They are now investigating it as well: |
I wonder if the error goes away if we apply #4667 unconditionally |
No, removing those three lines does not solve the problem. |
Uh, what exactly did you remove if you write of "those three lines" ? Just to be sure, you should have ended up with |
Sorry I removed the line |
@azuresky01 Can you please try #5070? |
I just finished new test. With two straight override lines the compilation error still appears. |
Yes. I confirm that #5070 does fix the problem. The compilation error disappears. |
Improve the performance of the {s/d}symv interface with LASX optimization when INCX=1 and INCY=1.