Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix POSIX transport profiles #321

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

jseldent
Copy link
Contributor

This pull requests adds proper handling of EINTR to the UDP and TCP transport profiles on POSIX platforms.

Some platforms, like the FreeRTOS simulator for Linux, generate a lot signals. This causes blocking I/O functions (like connect(), poll(), send() and recv()) to fail with errno == EINTR. Currently, Micro-XRCE-DDS-Client assumes this means the connection has been lost. Instead, it should restart the I/O operation and try again. This pull request implements the auto-restart behavior.

Additionally, #318 contains a bug in getaddrinfo() (the socket type for TCP should be SOCK_STREAM instead of SOCK_DGRAM). Also, it didn't register a signal handler for SIGPIPE, like the polling TCP transport does. This pull request fixes both of those issues.

I've tested these changes with the FreeRTOS simulator for POSIX and they fix the communication issues we had there.

@richiprosima
Copy link
Contributor

Build status:

  • Linux Build Status
  • Windows Build Status

@pablogs9 pablogs9 requested review from pablogs9 and Acuadros95 June 15, 2022 11:06
@jseldent
Copy link
Contributor Author

The proposed implementation restarts blocking operations like poll() and recv() (with SO_RCVTIMEO) with the same timeout after an EINTR interrupt. This could lead to uxr_read_udp_data_platform()/uxr_read_tcp_data_platform() waiting for longer than the specified timeout.

This could be prevented by setting the timeout to 0 after EINTR. But that could cause those functions to return early. Which would you prefer?

@pablogs9
Copy link
Member

We need uxr_read_*_data_platform to wait at maximum the specified time in the timeout if no data arrives.

@jseldent
Copy link
Contributor Author

Ok. I've pushed a commit that sets the timeout to 0 after the first occurrence of an EINTR interrupt.

@richiprosima
Copy link
Contributor

Build status:

  • Linux Build Status
  • Windows Build Status

@pablogs9
Copy link
Member

Could you explain the purpose of this last commit?

@Acuadros95
Copy link
Contributor

You can also recalculate the timeout as we do here:

int remaining_time = timeout_ms;
uxr_flash_output_streams(session);
do
{
listen_message_reliably(session, remaining_time);
remaining_time = timeout_ms - (int)(uxr_millis() - start_timestamp);
}
while (remaining_time > 0);

@jseldent
Copy link
Contributor Author

Let's say the timeout is 1 second. And after half a second a signal is received and poll() (or recv()) returns with errno == EINTR. If we then restart those functions with the same timeout, and no data is available, the uxr_read_*_data_platform() function will take 1.5 seconds. Which exceeds the timeout. By setting the timeout to 0 after the interrupt, the function returns after 0.5 seconds. Which is too soon, but at least not too late.

If you prefer, I can add calls to uxr_millis() to compute the remaining time in case of an interrupt. But since that already happens at the higher layers, I'm not sure how useful it would be to do it also in the low-level I/O functions.

@pablogs9
Copy link
Member

Yes please, make it block the required time.

@jseldent jseldent force-pushed the fix-posix-transport-profiles branch from 41b292f to 18b285a Compare June 15, 2022 15:32
@jseldent
Copy link
Contributor Author

I've pushed a fix to compute the remaining time.

@richiprosima
Copy link
Contributor

Build status:

  • Linux Build Status
  • Windows Build Status

@pablogs9
Copy link
Member

image

@jseldent jseldent force-pushed the fix-posix-transport-profiles branch from 18b285a to 705369b Compare June 16, 2022 16:43
@jseldent
Copy link
Contributor Author

I've included <uxr/client/util/time.h>. That should fix it.

@richiprosima
Copy link
Contributor

Build status:

  • Linux Build Status
  • Windows Build Status

@pablogs9
Copy link
Member

@Acuadros95 can we test this in integration tests?

@Acuadros95
Copy link
Contributor

Acuadros95 commented Jun 20, 2022

Integration tests running on eProsima/Micro-XRCE-DDS#144

Tests are OK!

@jseldent
Copy link
Contributor Author

I just pushed another commit to fix the non-polling versions of the UDP and TCP transport profiles. To behave identically to the polling versions, receive timeouts should not be treated as errors.

@richiprosima
Copy link
Contributor

Build status:

  • Linux Build Status
  • Windows Build Status

This commit fixes the socket type used with getaddrinfo(). It was SOCK_DGRAM,
but it should be SOCK_STREAM for TCP. Additionally, it adds a signal handler
for SIGPIPE on Linux, similar to the polling implementation.

Signed-off-by: J. S. Seldenthuis <[email protected]>
Blocking I/O functions, like connect(), poll(), send() and recv(), will return
-1 with errno == EINTR if the process/thread is interrupted by a signal. This
is not an I/O error, but generally means the operations should be restarted.

This commit implements the auto-restart behavior, which allows the transport
profiles to be used in environments that generate a lot of signals, like the
FreeRTOS simulator for Linux.

Signed-off-by: J. S. Seldenthuis <[email protected]>
This commit recomputes the timeout in uxr_read_*_data_platform() after each
occurrence of an EINTR interrupt. This ensures that the function will never
block longer than the specified timeout.

Signed-off-by: J. S. Seldenthuis <[email protected]>
@jseldent jseldent force-pushed the fix-posix-transport-profiles branch from c66ca5b to eb8fa97 Compare August 23, 2022 09:39
@richiprosima
Copy link
Contributor

Build status:

  • Linux Build Status
  • Windows Build Status

@jseldent
Copy link
Contributor Author

Build status:

* Linux [![Build Status](https://camo.githubusercontent.com/c15842e2450c554bdb272b8fe354acf9084186bd1edb598bcc3c2dde942d7ae0/687474703a2f2f6a656e6b696e732e6570726f73696d612e636f6d3a383038302f6275696c645374617475732f69636f6e3f6a6f623d4d6963726f2d585243452d4444532d436c69656e742532304d616e75616c2532304c696e7578266275696c643d353636)](http://jenkins.eprosima.com:8080/job/Micro-XRCE-DDS-Client%20Manual%20Linux/566/)

* Windows [![Build Status](https://camo.githubusercontent.com/24b8abbe75f147e4209df59fa6aede6d860c816e12e65bd42cd6bee158f698e0/687474703a2f2f6a656e6b696e732e6570726f73696d612e636f6d3a383038302f6275696c645374617475732f69636f6e3f6a6f623d4d6963726f2d585243452d4444532d436c69656e742532304d616e75616c25323057696e646f7773266275696c643d353834)](http://jenkins.eprosima.com:8080/job/Micro-XRCE-DDS-Client%20Manual%20Windows/584/)

How can I see the error here? I can't find why the build is failing.

@Acuadros95
Copy link
Contributor

Click the tag and go to the Console Output tag.

In this case uncrustify is failing on the following files:

FAIL: src/c/profile/transport/ip/tcp/tcp_transport_posix_nopoll.c (File size changed from 4050 to 4054)
FAIL: src/c/profile/transport/ip/udp/udp_transport_posix_nopoll.c (File size changed from 3620 to 3624)

You can use this uncrustify.cfg to fix the style: https://raw.githubusercontent.com/eProsima/cpp-style/master/uncrustify.cfg

The non-polling versions of the POSIX transport profiles shoud behave the same
as the polling versions. That means that a receive timeout, because no data was
available, is not an error.

Signed-off-by: J. S. Seldenthuis <[email protected]>
@jseldent jseldent force-pushed the fix-posix-transport-profiles branch from eb8fa97 to 73340ab Compare August 23, 2022 11:30
@richiprosima
Copy link
Contributor

Build status:

  • Linux Build Status
  • Windows Build Status

@jseldent
Copy link
Contributor Author

@Acuadros95, thanks. I've fixed the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants