Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[p2app] Fix concurrency, undefined behavior, and UDP underflow + minor performance improvements #10

Closed

Conversation

dchristle
Copy link
Contributor

What changes are proposed in this PR?

  • Uses atomics & mutexes to enforce synchronization between threads (replacing existing binary semaphores).
  • Use memcpy within helper functions to prevent undefined behavior caused by dereferencing unaligned pointers.
  • Fix UDP data structure buffer underflow in OutgoingDDCIQ.
  • Compile with -O2 and -flto by default.
  • Try to reduce overhead from sys calls by increasing usleep durations, using pread/pwrite instead of separate lseek & read/write, simplified or SIMD-based memory copy operations, and other cleanups.

Why are these changes needed?

This PR incorporates several bug fixes, cleanups, and minor performance enhancements to p2app. The main aim is to fix various bugs - concurrency, undefined behavior, and an underflow in UDP - to improve p2app's stability/behavior. The undefined behavior is caused by dereferencing uint16_t and uint32_t-sized pointers at unaligned memory addresses. There are also global variables (usually prefixed with G or defined in threaddata.h) that aren't atomic or protected by a mutex, which means the compiler is free to reorder them & that changes (especially on ARM processors, which have a weak memory model) in a variable in one thread are not guaranteed to be reflected in other threads. I found the undefined behavior & underflow bugs can be detected by running -fsanitize=undefined and -fsanitize=address on the current main branch code.

Fixing these fixes the transmit power issue seen in #8 when using -O2 or -O3 in gcc.

These changes also seem to fix another issue I've observed that crops up after operating Saturn via Thetis for several hours, where the relay clicks heard when turning off transmit seem to get more spaced out in time, i.e. I can hear distinct clicks rather than having all clicks occur quickly enough that they sound like a single click. This is harder to measure since it happens only after a few hours, but perhaps some concurrency/ordering issue that was present is now fixed.

Finally, the CPU utilization appears of-order ~10% lower due to longer usleep & other minor perf changes. The CPU utilization is dominated by lots of sys calls & usleep mechanics, so these are a good area to focus on in future changes. If we can cut these down, it may solve the high CPU usage seen at high sampling rates in #5.

How were these changes tested?

Several hours worth of contacts on FT8 using Thetis v2.10.3.6 dev_3 seem to work similar to the old version. The spectrum seen momentarily when switching TX off looks visually OK. The signal spectrum as detected by remote WebSDR instances looks similar, too. Testing the mic & other functionalities like the G2 display panel should probably be done before merging this, as my tests are only on digital modes with VAC.

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem OK. There are a few format changes that don't match the coding style for example if(condition) {
(my coding style has the brackets in vertical pairs, so I can see the structure)

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change to outddciq.c on line 525 is an error (that will try to print an integer value as a string)

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not encountered poll() but this seems reasonable

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems sensible

there are some formatting changes I will need to reset eg if(condition) { should have the { on the next line

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem OK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why the FIFO read usleeps changed to 5ms. This seems remarkably long. I'd want to see some analysis of why these times are OK. Yes it reduces overhead - but transferring data is the only task the pi has to do.

There is a lot of debug output added around line 400 on outgoing ddciq.c.

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK.

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this commit seems to have reverted something close to my original code for reading the DDC packed data structure. And time delays seem to have been reset back to 500us.

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't pretend to understand the SIMD instructions!

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems oK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK

Copy link
Owner

@laurencebarker laurencebarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems ok

Reformat parts of the code & fix minor spellings.

Fix XOR 2^32, which is equal to 34, to be 1 << 32.

Misc. formatting.
Remove debugaids.o from link.

Fix format string for pointer type.

Make constant 1's unsigned, since left shift of 1 by 31 places cannot be represented in type int.
…th memcpy.

Add ShutdownFrontPanelHandler declaration to header.
Use correct endian conversion function for uint16 (short).

Simplify ReadFIFOMonitorChannel with bitshift, use memcpy to fix alignment issue, change Current to uint16_t to match actual data size.

Add general register read/write mutex to more register read/mutate functions.

Remove unneeded Address and Channel from SetupFIFOMonitorChannel.

Enable -O2 optimization level by default.

Cleanup old header and rateword check code.
* Remove some manual pointer arithmetic, e.g. on ThreadData, to ensure code is safer.

* Use pwrite instead of a separate lseek and write, to reduce syscall overhead.
…ay of sockaddr_in instead of sizeof(sockaddr_in). Remove debug statements.
…ead.

Ensure Current pointer for ReadFIFOMonitorChannel is properly casted.

Initialize PrevSDRActive and RegVal before use.

Convert some usages of unsigned int to size-specific uint32_t.

Add Raspberry Pi CM4 arch-specific flags to Makefile, and enable link-time optimization.

Use a 0.5ms usleep before the first FIFO check to improve likelihood of available data.

Move CodecRegisterWrite to saturndrivers so all register mutexes are alongside eachother.

Cleanup codecwrite files.

Safe fetching of uint16 values.

Prevent out-of-bounds read in DDC loop with an if check.

Fix size to ssize_t.

Use safe reads & local variables for IncomingDUC.

Cleanup unused imports.

Use a single memcpy to transfer src to dest in OutDDCIQ.
…as static. Add a check to prevent an array out-of-bounds access bug for an interleaved DDC with integer near the end of the range.
…ck for VCWKEYDOWN is redundant with the second one, so I've removed it.
…onization. Move the direct register operations out of version.c and into saturnregisters.c, for better encapsulation. Use facade functions to encapsulate the complex logic.

Simplify FIFO depth initializers, eliminating a branch in the FIFO read function.
…les_SIMD. Handle the edge case if VIQSAMPLESPERFRAME is ever odd, or 1.
@dchristle dchristle force-pushed the dchristle/p2_glob_mem_barriers branch from 3127c3c to 4bb89f2 Compare November 7, 2024 21:51
@dchristle dchristle closed this Dec 10, 2024
@dchristle dchristle deleted the dchristle/p2_glob_mem_barriers branch December 10, 2024 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants