Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add big endian support #771

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add big endian support #771

wants to merge 1 commit into from

Conversation

nopjne
Copy link

@nopjne nopjne commented Nov 18, 2024

The big endian support assumes the GCC compiler and the presence of __builtin_bswapX.

@Nopey
Copy link
Contributor

Nopey commented Nov 18, 2024

__builtin_bswapX usage seems justified, and if we ever need to port to a big-endian compiler that doesn't support them it's easy enough to do the ugly bitwise integer operations or add support for that hypothetical compiler's equivalent builtin.

reviewers: please ignore the MSVC build failure, all MSVC CI pipelines are expected to fail until PR 769 or an alternative is merged. EDIT: 769 is resolved

@Vagabond
Copy link
Member

Why do you sometimes introduce a temporary variable to swap, but other times you swap the value in-place?

@nopjne
Copy link
Author

nopjne commented Nov 18, 2024

Why do you sometimes introduce a temporary variable to swap, but other times you swap the value in-place?

I think this is an N64_BUILD quirk specific to floats, but I didn't want to introduce the N64_BUILD in here. The problem is that doing a bswap on a float causes a floating point exception, the exception could be silenced but would need a libdragon change and would also silence any other float exceptions.

@Vagabond
Copy link
Member

Why do you sometimes introduce a temporary variable to swap, but other times you swap the value in-place?

I think this is an N64_BUILD quirk specific to floats, but I didn't want to introduce the N64_BUILD in here. The problem is that doing a bswap on a float causes a floating point exception, the exception could be silenced but would need a libdragon change and would also silence any other float exceptions.

This is a valid reason, maybe a comment to that effect would be a good idea to prevent overzealous optimization in future?

Comment on lines 214 to +218
float f = 0;
sd_peek_buf(reader, (char *)&f, 4);
#ifdef BIG_ENDIAN_BUILD
f = __builtin_bswap16(f);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__builtin_bswap16 does not take an argument of type float, it takes a uint16_t.
The __builtin_bswap16(f) call is equivalent to __builtin_bswap16((uint16_t)f), which is why the N64 build was encountering floating point errors.

Also: bswap16 is the wrong number of bits, should be bswap32.

(same as other review comment: please implement sd_peek_float in terms of sd_peek_dword)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the exceptions were definitely from doing float f = __buildin_bswap32((float)x);
The only reason this function is wrong is because nothing in the engine calls it, and I just didn't hit the issue.

#ifdef BIG_ENDIAN_BUILD
int len = maxlen;
for(int i = 0; i < len / 4; i += 1) {
((int *)buffer)[i] = __builtin_bswap32(((int *)buffer)[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char *buffer is insufficiently aligned to be a int*
same comment applies in sd_read_str.

please memcpy to a uint32_t local variable (and uint16_t local var for the final-2-bytes bytes-swap)

Comment on lines 114 to 126
float memread_float(memreader *reader) {
float r;
#ifdef BIG_ENDIAN_BUILD
uint32_t fl;
memcpy(&fl, reader->buf + reader->pos, sizeof(fl));
fl = __builtin_bswap32(fl);
reader->pos += sizeof(fl);
#else
memcpy(&r, reader->buf + reader->pos, sizeof(r));
reader->pos += sizeof(r);
#endif
return r;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think memread_float (and similar: memwrite_float, sd_read_float, etc..)
should be implemented in terms of their respective integer read-write functions.

for memread_float, this would look like:

   uint32_t u = memread_dword(reader);
   float f;
   memcpy(&f, &u, 4);
   return f;

this way, they don't need to perform any byte swapping and can be focused on just the uint-to-float bit cast.

Copy link
Author

@nopjne nopjne Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh,, I see your point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memread_dword performs a byteswap.

memread_float is currently only used to read some tournament mode pilot variables we ignore, unsure what you mean by tested and working

Comment on lines 64 to 71
uint16_t r;
#ifdef BIG_ENDIAN_BUILD
r = __builtin_bswap16(*((uint16_t *)(reader->buf + reader->pos)));
#else
memcpy(&r, reader->buf + reader->pos, sizeof(r));
#endif
reader->pos += sizeof(r);
return r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buf+pos is not insufficiently aligned to be a (uint16_t *)

please use the memcpy from the existing code, and keep your #ifdef block as small as possible (your sd_peek_dword is a good example of how it should look).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this, and it hasn't been an issue with the current game assets. I don't want to slow it down more than needed.

Copy link
Contributor

@Nopey Nopey Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a platform with "free" unaligned reads (x86, N64?), memcpy'ing the four bytes should compile to equivalent assembly as the hazardous uint16 reads (which are innocuous on x86(/n64?) without ubsan).

Dereferencing an unaligned pointer is undefined behavior, and traps on more-or-less common architectures (the ARM family, which even supported big endian back in armv5).

Have you observed a pessimization when using memcpy in these functions, or a big difference in assembly? I would expect GCC to be very good at optimizing a four byte memcpy.

@Nopey
Copy link
Contributor

Nopey commented Nov 19, 2024

The byteswapping in sd_read_str is pure magic to me, as I'm not familiar with big endian systems.
I assume you've tested with and without the final 2-bytes bswap16 and found it necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants