-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add big endian support #771
base: master
Are you sure you want to change the base?
Conversation
__builtin_bswapX usage seems justified, and if we ever need to port to a big-endian compiler that doesn't support them it's easy enough to do the ugly bitwise integer operations or add support for that hypothetical compiler's equivalent builtin.
|
Why do you sometimes introduce a temporary variable to swap, but other times you swap the value in-place? |
I think this is an N64_BUILD quirk specific to floats, but I didn't want to introduce the N64_BUILD in here. The problem is that doing a bswap on a float causes a floating point exception, the exception could be silenced but would need a libdragon change and would also silence any other float exceptions. |
This is a valid reason, maybe a comment to that effect would be a good idea to prevent overzealous optimization in future? |
float f = 0; | ||
sd_peek_buf(reader, (char *)&f, 4); | ||
#ifdef BIG_ENDIAN_BUILD | ||
f = __builtin_bswap16(f); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__builtin_bswap16 does not take an argument of type float, it takes a uint16_t.
The __builtin_bswap16(f)
call is equivalent to __builtin_bswap16((uint16_t)f)
, which is why the N64 build was encountering floating point errors.
Also: bswap16 is the wrong number of bits, should be bswap32.
(same as other review comment: please implement sd_peek_float in terms of sd_peek_dword)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the exceptions were definitely from doing float f = __buildin_bswap32((float)x);
The only reason this function is wrong is because nothing in the engine calls it, and I just didn't hit the issue.
#ifdef BIG_ENDIAN_BUILD | ||
int len = maxlen; | ||
for(int i = 0; i < len / 4; i += 1) { | ||
((int *)buffer)[i] = __builtin_bswap32(((int *)buffer)[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char *buffer
is insufficiently aligned to be a int*
same comment applies in sd_read_str.
please memcpy to a uint32_t local variable (and uint16_t local var for the final-2-bytes bytes-swap)
float memread_float(memreader *reader) { | ||
float r; | ||
#ifdef BIG_ENDIAN_BUILD | ||
uint32_t fl; | ||
memcpy(&fl, reader->buf + reader->pos, sizeof(fl)); | ||
fl = __builtin_bswap32(fl); | ||
reader->pos += sizeof(fl); | ||
#else | ||
memcpy(&r, reader->buf + reader->pos, sizeof(r)); | ||
reader->pos += sizeof(r); | ||
#endif | ||
return r; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think memread_float (and similar: memwrite_float, sd_read_float, etc..)
should be implemented in terms of their respective integer read-write functions.
for memread_float, this would look like:
uint32_t u = memread_dword(reader);
float f;
memcpy(&f, &u, 4);
return f;
this way, they don't need to perform any byte swapping and can be focused on just the uint-to-float bit cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh,, I see your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memread_dword performs a byteswap.
memread_float is currently only used to read some tournament mode pilot variables we ignore, unsure what you mean by tested and working
uint16_t r; | ||
#ifdef BIG_ENDIAN_BUILD | ||
r = __builtin_bswap16(*((uint16_t *)(reader->buf + reader->pos))); | ||
#else | ||
memcpy(&r, reader->buf + reader->pos, sizeof(r)); | ||
#endif | ||
reader->pos += sizeof(r); | ||
return r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf+pos is not insufficiently aligned to be a (uint16_t *)
please use the memcpy from the existing code, and keep your #ifdef block as small as possible (your sd_peek_dword is a good example of how it should look).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about this, and it hasn't been an issue with the current game assets. I don't want to slow it down more than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a platform with "free" unaligned reads (x86, N64?), memcpy'ing the four bytes should compile to equivalent assembly as the hazardous uint16 reads (which are innocuous on x86(/n64?) without ubsan).
Dereferencing an unaligned pointer is undefined behavior, and traps on more-or-less common architectures (the ARM family, which even supported big endian back in armv5).
Have you observed a pessimization when using memcpy in these functions, or a big difference in assembly? I would expect GCC to be very good at optimizing a four byte memcpy.
The byteswapping in sd_read_str is pure magic to me, as I'm not familiar with big endian systems. |
The big endian support assumes the GCC compiler and the presence of __builtin_bswapX.