-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bit-packed GF2 #583
base: release/0.4.x
Are you sure you want to change the base?
Bit-packed GF2 #583
Conversation
Thanks for working on / thinking through this. Here are some initial thoughts. I think a good API would be to have x = galois.GF2([1, 0, 1, 1])
print(repr(x))
# GF([1, 0, 1, 1], order=2)
x_bp = np.packbits(x)
print(repr(x_bp))
# GF([176], order=2, bitpacked=True) Whereas, creating a x_bp = galois.GF2BP([176])
print(repr(x_bp))
# GF([176], order=2, bitpacked=True)
x = np.unpackbits(x_bp)
print(repr(x))
# GF([1, 0, 1, 1, 0, 0, 0, 0], order=2) It's an open question whether to use only If we use def __call__(self, ufunc, method, inputs, kwargs, meta):
output = super().__call__(ufunc, method, inputs, kwargs, meta)
output._unpacked_shape = inputs[0]._unpacked_shape
return output Looking at the ufuncs for GF2, most of them use bitwise arithmetic (XOR, AND, etc). class UFuncMixin_2_1(UFuncMixin):
"""
A mixin class that provides explicit calculation arithmetic for GF(2).
"""
def __init_subclass__(cls) -> None:
super().__init_subclass__()
cls._add = add_ufunc(cls, override=np.bitwise_xor)
cls._negative = negative_ufunc(cls, override=np.positive)
cls._subtract = subtract_ufunc(cls, override=np.bitwise_xor)
cls._multiply = multiply_ufunc(cls, override=np.bitwise_and)
cls._reciprocal = reciprocal(cls)
cls._divide = divide(cls)
cls._power = power(cls)
cls._log = log(cls)
cls._sqrt = sqrt(cls) We can probably make all of the ufuncs for GF2 (reciprocal, divide, etc) also work for bit-packed GF2 arrays. If that's the case, maybe a more elegant solution is to not have a separate class for I'm uncomfortable bumping the minimum NumPy version to 2.0. It's too minimally adopted and too much infrastructure relies on v1. A compromise could be having a v1 and v2 implementation of matmul for |
Thanks, @mhostetter. I'll work on incorporating this feedback in. One thing I wanted to clarify since it seems to be a misunderstanding. |
That is my understanding of your current implementation, but also something I'm suggesting we change. To me, having an internal data conversion seems dubious. What is the motivation for doing the conversions internally? In my proposal, if a user wanted to start with an unpacked array, they would My issue with the internal (mandatory) data conversion is that it is not idempotent. For instance, doing x = np.packbits([0, 0, 0, 1])
print(x)
# [16]
x = np.packbits(x)
print(x)
# [128]
x = np.packbits(x)
print(x)
# [128] |
Also, I'll try to think about an easy way to plug into the test infrastructure. |
To your question about whether it's necessary to track the unpacked shape. I think so because otherwise when you unpack you won't be able to supply the |
Yes, I agree with that. I was thinking we could populate a property like So the user has Later when the user wants to unpack it, calling I can help you implement the interception of |
I've implemented
I'm now tracking the axis element count because it seemed more straightforward. Is there a specific reason why you wanted to track the appended zero count? |
Coming back to this. I was trying to find a case of broadcasting where the output shape would not be the same as the input shape and couldn't. Numpy will complain about the broadcasting if the shapes aren't compatible both for the unpacked and packed case. a = GF2.Random((10, 10))
b = GF2.Random((1, 10))
x = np.packbits(a)
y = np.packbits(b)
print((a + b).shape)
print((x + y).shape) If the shapes aren't broadcastable, then it will throw a a = GF2.Random((10, 10))
b = GF2.Random((2, 10))
x = np.packbits(a)
y = np.packbits(b)
print((b + a).shape)
print((y + x).shape) Was your concern with how I'm choosing the first input, which would cause an issue if the operands were switched (as in the last example)? If so, then I think a max over the inputs |
Yes, that was a concern.
Using ufunc methods, like import numpy as np
a = np.random.randint(0, 2, 10)
b = np.random.randint(0, 2, 10)
x = np.packbits(a)
y = np.packbits(b)
print((a * b).shape)
print((x * y).shape)
print(np.multiply.outer(a, b).shape)
print(np.multiply.outer(x, y).shape)
# (10,)
# (2,)
# (10, 10)
# (2, 2) Also, here's crazy example of broadcasting over multiple axes. import numpy as np
a = np.random.randint(0, 2, (1, 2, 3))
b = np.random.randint(0, 2, (2, 2, 1))
x = np.packbits(a)
y = np.packbits(b)
print((a * b).shape)
print((x * y).shape)
print(np.multiply.outer(a, b).shape)
print(np.multiply.outer(x, y).shape)
# (2, 2, 3)
# (1,)
# (1, 2, 3, 2, 2, 1)
# (1, 1) I think the question is, can we track the axis elements (or appended zeros) through all of the broadcasting and manipulation? |
Yes, that is a good question. I'll explore that soon. Just wanted to give you an update on where I'm at. I started looking at making a bit-packed version of Currently, I've got index update rules to handle indexing without defaulting to unpacking the whole matrix. I think I can simplify these rules if I convert them into a canonical form first. There's a test file I had Chat GPT generate to provide some examples of all the ways one can index. My thought for going down this path is I don't see much utility to the class if there aren't at least some of the more common numpy functions supported in bit-packed form. For us, matrix inverses are one of those. |
Finished a bitpacked implementation of # %%
a = GF2.Random((1024, 1024), seed=4)
x = np.packbits(a)
%timeit np.linalg.inv(a) # ~3s
%timeit np.linalg.inv(x) # ~734ms
assert np.array_equal(a, np.unpackbits(x)) So, there's a 4.5x speedup w/ 1k matrices. Also, did some tests in general to see if unpacking portions of matrices was worthwhile over unpacking the full matrix (to validate having a custom # %%
M, N = 1800, 1000
a = np.random.randint(0, 2, size=(M, N), dtype=np.uint8)
A = np.packbits(a, axis=-1) # shape: (M, ceil(N/8))
# ~434 us
%timeit np.unpackbits(A, axis=-1, count=N)
# %%
# ~3 us
%timeit np.unpackbits(A[np.random.randint(0, M)], axis=-1, count=N) So, >100x speedup -- I think it's worthwhile to keep this approach around. I think if one wants to get at the rawdata, then they just perform a I'll plan to look into broadcasting and clean up / normalize the indexing code when I'm back on Thursday. |
# 9. Using np.newaxis (reshaped array assignment) | ||
arr = GF([1, 0, 1, 1]) | ||
arr = np.packbits(arr) | ||
reshaped = arr[:, np.newaxis] # should this be using arr's data (as would be the case without packbits) or a new array? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure what to do with this case as indexing into a bitpacked array will return an unpacked array copy.
I've cleaned up the indexing routines and have started to look at broadcasting. I found a bug in my |
I've added and expanded upon your broadcasting examples in |
Initially, I started with an approach that defined bit-packing at
ArrayMeta
, so it could be used inArray
andFieldArray
. Then, I would handle implicit bit packing and unpacking as necessary depending on the operation. This soon turned into a game of whack-a-mole where I was needing to override a lot of numpy functions on top of ndarray methods.This version is much simpler, smaller scope, and has limited functionality: basically arithmetic operations and matrix-vector/matrix-matrix multiplication. I figure let's start with this and then expand scope as needed.
Rather than introduce fields higher up in the class/meta hierarchy I've opted for overriding specific functionality in a custom
GF2BP
class. There are simple conversions for going betweenGF2
andGF2BP
via.astype
, which implies that you will receive a new array.The numpy v2 and numba v0.60 bump was necessary, so that I could make use of
np.bitwise_count
. This allows a roughly 4x speed-up in matrix-matrix multiplication.I started to work on tests and tried to introduce
GF2BP
under theFIELDS
fixture construction, but there won't be an easy way toxfail
the tests I know would fail. Ideally, I'd like to piggyback on the tests intest_arithmetic.py
andtest_linalg.py
, but only in comparison toGF2
after unpacking. So, let me know if you have any good ideas for this without having to duplicate tests.