Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V Cryptography Extensions? #239

Open
nazar-pc opened this issue Dec 13, 2024 · 12 comments
Open

RISC-V Cryptography Extensions? #239

nazar-pc opened this issue Dec 13, 2024 · 12 comments

Comments

@nazar-pc
Copy link

There is https://github.com/riscv/riscv-crypto, which defines a lot of useful primitives that would accelerate typical blockchain operations with generic RISC-V code instead of custom host functions.

I'm wondering if this is something that is considered to be supported?

I wasn't sure what is the best place to ask or discuss such questions would be, feel free to point me elsewhere if topics like this are not welcome here.

@koute
Copy link
Collaborator

koute commented Dec 13, 2024

Currently there are no plans, and we want to finalize the instruction set soon-ish.

Of course you can propose an extension if you'd like, but:

  1. it'd have to be general-purpose (e.g. an extension to the instruction set which calculates SHA-2 is pointless, since you can do it with a host function, and you can special-case that host function in the VM to be fast even without a dedicated instruction),
  2. you'd have to show an actual gain in real code that'd result from the extension,
  3. you'd most likely have to implement it in PolkaVM to show (2)

For example, we've recently added the Zbb extension, and this extension accelerates EVM contracts recompiled to PolkaVM by quite a bit and significantly cuts down on their size (I don't remember from the top of my head, but it was significant; I don't have a link to the benchmarks on hand).

@nazar-pc
Copy link
Author

Currently there are no plans, and we want to finalize the instruction set soon-ish.

Makes perfect sense to me.

e.g. an extension to the instruction set which calculates SHA-2 is pointless, since you can do it with a host function, and you can special-case that host function in the VM to be fast even without a dedicated instruction

I'll respectfully disagree with this. Having support for standardized instructions for things like SHA-2 and AES means a lot of crates out there will be able to take advantage of them without being aware of the custom host functions of a particular VM instance.

All that would be needed to take advantage of hardware acceleration is just to compile code with -C target-feature=+zknh and call RISC-V intrinsics or use corresponding inline assembly if standard library doesn't provide such intrinsics yet. In fact sha2 already supports this for RISC-V (unstably).

I my eyes it'd be nice to not require host functions for cryptography at all, just use off-the-shelf crates as is, it reduces the barrier to entry. I understand that it is an explicit non-goal and it makes sense for current stage of the VM to have a simpler instruction set implementation to get off the ground.


What would be the best place to ask random unstructured questions? I might be experimenting with polkavm in the near future and documentation is not that extensive at the moment, especially for those without experience writing VMs from scratch or even using them directly.

@koute
Copy link
Collaborator

koute commented Dec 13, 2024

Be that as it may, since the PolkaVM instruction set must be synchronized with the Graypaper, and there are currently ~34 other teams implementing this instruction set, any extra additions have a very high bar of entry.

What would be the best place to ask random unstructured questions?

I've enabled discussions on the repo; feel free to post there:

https://github.com/paritytech/polkavm/discussions

@nazar-pc
Copy link
Author

any extra additions have a very high bar of entry

This issue is more opportunistic, I certainly don't expect it to be implemented in short term or before 1.0 or even be mandatory for implementation. That said I do think it would be valuable to support it eventually, especially if it is possible to map these to native x86-64/aarch64 instructions directly.

One other reason is that Subspace consensus has proof-of-time component based on AES-CBC iteration (heavily optimized in x86-64 hardware for many years with bounded theoretical improvement possible with an ASIC). On aarch64 it is over 50x slower in "software" than with hardware intrinsics.

Being able to verify (re-execute) AES under VM absolutely means support for acceleration in some form and Rust already happens to have intrinsics for this, we just need to plug them into aes crate (that we're using for verification) and we'll be in a good shape.

AES is also a primitive to some hash functions and other cryptographic constructions in the wild.

Anyway, feel free to close this issue if it doesn't make sense to keep it open, but I hope the rationale makes some sense.

@burdges
Copy link

burdges commented Dec 13, 2024

Is it possible to map these to native x86-64/aarch64 instructions directly?

AES diserves discussion, because its faster on the hardware we actually use, but I've mostly pushed people towards chacha anyways.

Also..

These elliptic curve host functions linked from polkadot-fellows/RFCs#113 (comment) have multi-threading baked in, which reduces their latency, especially for larger MSMs and pairings.

If you compare the single-threaded curve25519-dalek vs any other ed25519 on https://zka.lc/ then you'll notice curve25519-dalek is much faster for small MSMs, but much slower for really large ones. In essence, curve25519-dalek has much better optimizations than anything else, but the multi-threading always wins in a big MSM.

We should discuss if hostcalls should use multi-threading under the hood, because this obviously creates weight-like issues, and maybe it wastes total CPU time if polkadot were fully loaded, but anyways this requires actual discussion. JAM depends upon these hostcalls for sassafras performance btw.

@athei
Copy link
Member

athei commented Dec 13, 2024

any extra additions have a very high bar of entry

This issue is more opportunistic, I certainly don't expect it to be implemented in short term or before 1.0 or even be mandatory for implementation. That said I do think it would be valuable to support it eventually, especially if it is possible to map these to native x86-64/aarch64 instructions directly.

I don't think the two of you disagree here. It is a high bar. AES might meet the bar, though (your opinion). It still needs somebody to proof that.

Is it possible to map these to native x86-64/aarch64 instructions directly?

The RISC-V AES instructions (K extension) are more fine grained than the x86 AES-NI instructions as far as I remember. That could make it difficult to translate them. RISC-V instructions can apply individual transformations while AES-NI deals in full rounds. Correct me if I'm wrong. But it needs some research if a translation is possible.

@nazar-pc
Copy link
Author

I just checked and I think you're right, RISC-V scalar crypto is lower-level than AES-NI. Vector AES extension (Zvkned) is high-level though and can technically be exploited for scalar values, may map better onto AES-NI/VAES and while more wasteful for single values should still be significantly faster than software implementation under VM. Aarch64 seems to be closer to AES-NI, but a bit lower-level than that (doesn't have *last variants).

@athei
Copy link
Member

athei commented Dec 13, 2024

I think anything we add needs to have a mapping to x86 and ARM instructions. Otherwise it is kind of useless. We might determine that we need AES, the official extensions don't map and a host function is not sufficient.

A custom extension that maps well would probably the way to go. But it has the same drawback that a host function has: The software (like the aes crate) needs to be aware of it to use it. The only reason to add it over a host function would be performance if the host function call overhead is too high.

@nazar-pc
Copy link
Author

I agree, but with a bit of effort it should also be possible to convert a typical pattern of a set of scalar AES instructions into functionally equivalent set of x86-64/aarch64 instructions even if it is not exactly 1:1 mapping instruction to instruction.

I'm also hoping for desktop-grade RISC-V hardware in the next few years 🙄

@burdges
Copy link

burdges commented Dec 14, 2024

I've suggested this approach for symetric cryptography like hashing before:

We've some lightweight pure host function (LPFH) which avoids a context switch, but still uses native code, and enforces jump correctness somehow, including callbacks being banned. We then fork the rustcrypto crate for symetric crypto function, pull out the inner loop into a LPFH, and provide the forked crate as sub-blake3 or sub-aes or whatever. AES could employ the native nstruction within its LPFH.

This is similar to our elliptic curve scheme, incuring a similar maintenance burden, due to arkworks or rustcrypto evolving. Yet, this should be simpler than maintaining some special instruction because K extension doesn't map to AES-NI.

@koute
Copy link
Collaborator

koute commented Jan 1, 2025

We've some lightweight pure host function (LPFH) which avoids a context switch, but still uses native code

Yes, that's what I meant by saying "you can do it with a host function, and you can special-case that host function in the VM to be fast even without a dedicated instruction".

Anyway, what Alex said is correct - we want to only introduce instructions which map cleanly to AMD64 and Aarch64. A major design point of the PolkaVM ISA is that no instruction should require any complicated handling in the VM itself when recompiling it into native code (it should be easy-ish to write a recompiler), and that can make some stuff in RISC-V not appropriate (which some of the crypto instructions may or may not be; I haven't investigated them in much detail).

@burdges
Copy link

burdges commented Jan 4, 2025

I suppose host function overhead breaks down into primarily context switches and memory copies, with other overhead being very small, yes? And memory copies are being discussed in paritytech/polkadot-sdk#6761 and polkadot-fellows/RFCs#4 I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants