From ce66ffe7bb5e554ac9c87c4b49fad13f122ce769 Mon Sep 17 00:00:00 2001 From: Martin Kroeker Date: Sun, 12 Jan 2025 00:57:10 +0100 Subject: [PATCH 1/2] Update the Changelog for version 0.3.29 --- Changelog.txt | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/Changelog.txt b/Changelog.txt index 7f89a2eab7..b131dca5c4 100644 --- a/Changelog.txt +++ b/Changelog.txt @@ -1,4 +1,98 @@ OpenBLAS ChangeLog +==================================================================== +Version 0.3.29 +12-Jan-2025 + +general: + - fixed a potential NULL pointer dereference in multithreaded builds + - added function aliases for GEMMT using its new name GEMMTR adopted by Reference-BLAS + - fixed a build failure when building without LAPACK_DEPRECATED functions + - the minimum required CMake version for CMake-based builds was raised to 3.16.0 in order + to remove many compatibility and deprecation warnings + - added more detailed CMake rules for OpenMP builds (mainly to support recent LLVM) + - fixed the behavior of the recently added CBLAS_?GEMMT functions with row-major data + - improved thread scaling of multithreaded SBGEMV + - improved thread scaling of multithreaded TRTRI + - fixed compilation of the CBLAS testsuite with gcc14 (and no Fortran compiler) + - added support for option handling changes in flang-new from LLVM18 onwards + - added support for recent calling conventions changes in Cray and NVIDIA compilers + - added support for compilation with the NAG Fortran compiler + - fixed placement of the -fopenmp flag and libsuffix in the generated pkgconfig file + - improved the CMakeConfig file generated by the Makefile build + - fixed const-correctness of cblas_?geadd in cblas.h + - fixed a potential inaccuracy in multithreaded BLAS3 calls + - fixed empty implementations of get/set_affinity that print a warning in OpenMP builds + - fixed function signatures for TRTRS in the converted C version of LAPACK + - fixed omission of several single-precision LAPACK symbols in the shared library + - improved build instructions for the provided "pybench" benchmarks + - improved documentation, including added build instructions for WoA and HarmonyOS + - added a separate "make install_tests" target for use with cross-compilations + - integrated improvements and corrections from Reference-LAPACK: + - removed a comparison in LAPACKE ?tpmqrt that is always false (LAPACK PR 1062) + - fixed the leading dimension for B in tests for GGEV (LAPACK PR 1064) + - replaced the ?LARFT functions with a recursive implementation (LAPACK PR 1080) + +arm: + - fixed build with recent versions of the NDK (missing .type declaration of symbols) + +arm64: + - fixed a long-standing bug in the (generic) c/zgemm_beta kernel that could lead to + reads and writes outside the array bounds in some circumstances + - rewrote cpu autodetection to scan all cores and return the highest performing type + - improved the DGEMM performance for SVE targets and small matrix sizes + - improved dimension criteria for forwarding from GEMM to GEMV kernels + - added SVE kernels for ROT and SWAP + - improved SVE kernels for SGEMV and DGEMV on A64FX and NEOVERSEV1 + - added support for using the "small matrix" kernels with CMake as well + - fixed compilation on Windows on Arm + - improved compile-time detection of SVE capability + - added cpu autodetection and initial support for Apple M4 + - added support for compilation on systems running IOS + - added support for compilation on NetBSD ("evbarm" architecture) + - fixed NRM2 implementations for generic SVE targets and the Neoverse N2 + - fixed compilation for SVE-capable targets with the NVIDIA compiler + +x86_64: + - fixed a wrong storage size in the SBGEMV kernel for Cooper Lake + - added cpu autodetection for Intel Granite Rapids + - added cpu autodetection for AMD Ryzen 5 series + - added optimized SOMATCOPY_CT for AVX-capable targets + - fixed the fallback implementation of GEMM3M in GENERIC builds + - tentatively re-enabled builds with the EXPRECISION option + - worked around a miscompilation of tests with mingw32-gfortran14 + - added support for compilation with the Intel oneAPI 2025.0 compiler on Windows + +power: + - fixed multithreaded SBGEMM + - fixed a CMake build problem on POWER10 + - improved the performance of SGEMV + - added vectorized implementations of SBGEMV and support for forwarding 1xN SBGEMM to them + - fixed illegal instructions and potential memory overflow in SGEMM on PPCG4 + - fixed handling of NaN and Inf arguments in SSCAL and DSCAL on PPC440,G4 and 970 + - added improved CGEMM and ZGEMM kernels for POWER10 + - added Makefile logic to remove all optimization flags in DEBUG builds + +mips64: + - fixed compilation with gcc14 + - fixed GEMM parameter selection for the MIPS64_GENERIC target + - fixed a potential build failure when compiling with OpenMP + +loongarch64: + - fixed compilation for Loongson3 with recent versions of gmake + - fixed a potential loss of precision in Loongson3A GEMM + - fixed a potential build failure when compiling with OpenMP + - added optimized SOMATCOPY for LASX-capable targets + - introduced a new cpu naming scheme while retaining compatibility + - added support for cross-compiling Loongarch64 targets with CMake + - added support for compilation with LLVM + +riscv64: + - removed thread yielding overhead caused by sched_yield + - replaced some non-standard intrinsics with their official names + - fixed and sped up the implementations of CGEMM/ZGEMM TCOPY for vector lenghts 128 and 256 + - improved the performance of SNRM2/DNRM2 for RVV1.0 targets + - added optimized ?OMATCOPY_CN kernels for RVV1.0 targets + ==================================================================== Version 0.3.28 8-Aug-2024 From 20f6114e98bce519a3c4f8af4a22a868a1c2d946 Mon Sep 17 00:00:00 2001 From: Martin Kroeker Date: Sun, 12 Jan 2025 13:12:41 +0100 Subject: [PATCH 2/2] add descriptions of build/runtime vars to 0.3.29 improvements --- Changelog.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Changelog.txt b/Changelog.txt index b131dca5c4..b52734c82c 100644 --- a/Changelog.txt +++ b/Changelog.txt @@ -26,6 +26,7 @@ general: - fixed omission of several single-precision LAPACK symbols in the shared library - improved build instructions for the provided "pybench" benchmarks - improved documentation, including added build instructions for WoA and HarmonyOS + as well as descriptions of environment variables that affect build and runtime behavior - added a separate "make install_tests" target for use with cross-compilations - integrated improvements and corrections from Reference-LAPACK: - removed a comparison in LAPACKE ?tpmqrt that is always false (LAPACK PR 1062)