Skip to content

Support for AArch64

JayDDee edited this page Nov 28, 2023 · 75 revisions

Support for AArch64 with AES and SHA2 is now fully supported with most algos optimized for NEON, AES & SHA with exceptions noted below.

This is provided as source code only and may be built on native Linux by following the existing procedure subject to any modifications described below.

Bitcoin talk discussion thread: https://bitcointalk.org/index.php?topic=5226770.0

Requirements:

  • An ARM CPU supporting AArch64.
  • Linux OS.
  • MacOS on ARM does not work natively but does work at full speed from a Linux VM using UTM & Qemu.

Status

cpuminer-opt-23.14 is released

Highlights of this release: Groestl AES is working and enabled for x16*, minotaurx, hmq1725, allium and others.

Development environment:

  • Orange Pi 5 Plus 16 GB, Rockchip 8 core CPU with AES & SHA2
  • Ubuntu Mate 22.04
  • GCC-11.4

Secondary environment:

  • Mac Mini M2
  • MacOS 14.1 Sonoma
  • UTM/Qemu VM emulator
  • Ubuntu Mate 22.04 VM guest

Compile with:

$ ./arm-build-sh

The only change from build.sh is the addition of "-flax-vector-conversions" to CFLAGS. The compiler will remind you if you forget. Specific achitectures and features can be compiled using examples in armbuild-all.sh.

The miner has been tested on Raspberry Pi 4B, Orange Pi 5 Plus, and Mac Mini from a Linux VM. It compiles for all minor versions of armv8.x with or without AES, or SHA2, or both.

Known problems:

  • Verthash algo is not working.
  • MacOS is not working natively, workaround with linux VM.
  • CPU and feature detection and reporting is incomplete.
  • Fugue: multiple issues not AES related, using unoptimized.
  • Some algorithms too difficult to test with a CPU are not optimized for NEON.

Short term plan:

  • Figure out what's going on with verthash.
  • Fugue AES.
  • Migrate NEON to more algos.

Medium term:

  • Detection of ARM CPU model and architecture minor version.
  • Find NEON optimization opportunities that exploit it's architecture and instruction set.
  • Apply lessons learned to x86_64.
  • SHA512, x86_64 & AArch64.

Long term:

  • ARM SVE
  • x86_64 AVX10
  • RISC-V

Some notable observations about the problems observed:

Verthash is a mystery, it only produces rejects on ARM even with no targtetted code, only compiled C. The same C source works on x86_64 but not on AArch64. Tried with -O3 & -O2. In all other cases falling back to C was always successful. Verthash data file creation and verification work. Verthash has one unique feature in the data-file. No other algo has that and no other algo fails with unoptimized code.

Multiplications are implemented differently, particularly widening multiplcatiom where the product is twice the bit width of the souces. X86_64 operates on lanes 0 & 2 while ARM operates on lanes 0 & 1 of the source data. In effect x86_64 assumes the data is pre-widened and discards lanes 1 & 3 leaving 2 zero extended 64 bit source integers. With ARM the source arguments are packed into a smaller vector and the product is widened to 64 bits upon multiplication:

uint64x2_t = uint32x2_t * uint32x2_t

Most uses are the x86_64 format requiring a workaround for ARM. The curent workaround seems to be functioning correctly where needed.

NEON has no blend instruction but can emulate one compatible with x86_64 blendv using boolean algebra, but not very efficiently.

Clone this wiki locally