Implement the tokio version of sans-io #2

joshka · 2024-12-31T06:23:46Z

The purpose of this PR is to avoid reimplementing the state machine that is inherently provided by async rust.

Follows from the discussion at

The purpose of this is to avoid reimplementing the state machine that is inherently async rust. This follows from the discussion at <https://news.ycombinator.com/item?id=40872020#40879547> and then: <https://www.reddit.com/r/rust/comments/1hpqgbt/sansio_the_secret_to_effective_rust_for_network/>

The modification to support time after the receiving a binding is 4 LoC

thomaseizinger

I wasn't sure where you'd like to take this PR so I left a few comments that I'd like to have addressed for merging.

thomaseizinger · 2025-01-02T15:10:38Z

src/bin/stun_tokio_sans.rs

+        .find(|addr| addr.is_ipv4())
+        .context("Failed to resolve hostname")?;
+
+    let (sink, stream) = UdpFramed::new(socket, StunCodec).split();


I'd like to see a sketch on how multiplexing of multiple StunBindings to different servers over the same socket works with this design.

I know that the other example doesn't directly address this. However, I know from experience that it does support multiplexing well so if we want to have an equal comparison, there needs to be an idea on how to support this.

One thing I was careful to do in this was to try to keep the implementation fairly close to the existing code (with a tiny bit of leeway). I'd be happy to play with extending it a bit, but the resulting code designed for a more real world approach will look quite different, which may introduce some impedence mismatch into things.

To clarify my understanding, you might choose to make two STUN binding requests simultaneously to two servers, and receive responses back. When you do get responses your code will see both addresses and be able to record them. These requests / responses are multiplexed on the same socket.

One thing I was careful to do in this was to try to keep the implementation fairly close to the existing code (with a tiny bit of leeway). I'd be happy to play with extending it a bit, but the resulting code designed for a more real world approach will look quite different, which may introduce some impedence mismatch into things.

Yes, this makes sense! I think those (API) differences are interesting to dig into. The existing event-loop code can handle this without major design changes: You simply iterate over a list of StunBindings and pass the packet to it until one of them accepts it.

To clarify my understanding, you might choose to make two STUN binding requests simultaneously to two servers, and receive responses back. When you do get responses your code will see both addresses and be able to record them. These requests / responses are multiplexed on the same socket.

Yes. That is how you figure out, what kind of NAT the application is dealing with. If you receive the same observed address from both servers, the NAT is destination-independent and thus "hole-punch friendly". If you receive different responses (i.e. different port numbers) it is a symmetric NAT and hole-punching will fail.

Another reason is that you want to test both IPv4 and IPv6 connectivity. This might be to the "same" server but from the perspective of STUN, that doesn't matter.

thomaseizinger · 2025-01-02T15:14:41Z

src/bin/stun_tokio_sans_time.rs

+                    println!("Our public IP is: {address}");
+                }
+            }
+            tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;


Similar to the other example, I'd like to see a sketch here how we would support other time-related requirements. For example, timeouts and re-transmits of messages where we didn't receive a response.

The re-transmits need to be bounded, i.e. we don't want to retry forever.

yup - this was just an attempt to go like for like with the same logic from the article. The protocol logic is a better source of what should be done here.

The difference is, adding more "timers" to the other implementation doesn't change its API. If we want to find an implementation that is equivalent, it also needs to support the same / similar ways of being extended.

thomaseizinger · 2025-01-02T15:18:10Z

src/bin/stun_tokio_sans_time.rs

+            if let Some(result) = self.stream.next().await {
+                let (message, _) = result?;
+                if let Some(address) = parse_binding_response(message) {
+                    println!("Our public IP is: {address}");


How does the caller learn the address? Printing should happen in main to showcase that one could do whatever with the response.

It doesn't in this particular approach - this was a like for like implementation. I'll come up with something to demonstrate something a bit more real world useful.

It doesn't in this particular approach - this was a like for like implementation.

The sans-IO code prints inside the event-loop though, not within StunBinding:

sans-io-blog-example/src/bin/stun_sans_io_time.rs

Lines 44 to 46 in a36d645

if let Some(address) = binding.public_address() {

println!("Our public IP is: {address}");

}

thomaseizinger · 2025-01-02T18:12:19Z

src/bin/stun_tokio_sans.rs

Can we extend the README to say that these are retro-actively community-contributed?

joshka · 2025-01-02T21:41:36Z

I wasn't sure where you'd like to take this PR so I left a few comments that I'd like to have addressed for merging.

It's mostly about closing the loop. I don't expect this to really need to be merged / resolved in any way. The discussion is the most valuable part of this.

thomaseizinger · 2025-01-03T11:04:59Z

src/bin/stun_tokio_sans_time.rs

+        }
+    }
+
+    async fn public_address(&mut self) -> anyhow::Result<()> {


Another thing I want to mention here, in case you are interested in continuing to iterate on this:

In our production app, we don't just perform a STUN binding. Instead, we implement an entire TURN client. The TURN client has background timers that send particular messages on an interval (and wait for the responses). In addition - whilst those are running - other parts of the code need occasional access to the state of the client to pro-actively send messages. In particular, they need to be able to make new channel bindings (i.e. just a request-response) handshake. Those newly bound channels then trigger a new timer that rebinds the channel on a 5min interval.

The exact domain doesn't matter. What is important is that what we have here - an async fn that captures &mut self won't work for that unless we start using Arc and Mutex in a some way. To simulate the same thing, we could add an additional method that proactively performs another STUN binding, separately to the 5s timer.

Yep. I think based purely on the RFC, my understanding is that multiple callers should have the ability to do STUN/ICE/TURN over the one UDP socket (or if really going all in on the spec, then across any of the supported transports).

Incidentally, as a supporting part of this part of things, I was under the impression that tokio / futures based streams already had a stream fork / partition approach, but it doesn't seem like they do. The filter method discards the non-filtered items. I'm looking at implementations for that in tokio-rs/tokio#7065.

My general idea on the design of how this might work is that:

on the STUN side of things, a Transaction can be created that known only about messages that are for that transaction. Implementing this seems simplest as something like:

let (stun_messages, non_stun_messages) = all_messages.partition(MultiplexedMessage::is_stun); // then later in the code for implementing a single Transaction: let (transaction_messages, non_transaction_messages) = stun_messages.partition(|message| message.transaction_id() == self.transaction_id)

At a higher level, the stun code sees a stream / sink of STUN messages, the TURN related code sees a stream / sink of TURN message, etc. There's no shared access to some &mut sender / Arc as the multiplex is handled by splitting the stream / sinks. I'm not sure how feasible this is as yet.

As an aside, my interest in this is purely about the technical challenge of understanding the technology better. I want to make sure that I communicate that your sans-io approach is pragmatic given the software you're building and it's very likely the best solution for your problem. All of this discussion is about whether this approach generalizes, or whether using async is a more ideal generalization of the sans-io problem. I think the shallow answer is that async could eventually be the better approach, but it's probably not quite there yet.

Also, I wanted to thank you for continuing the discussion on this. This is obviously your code for your $JOB which has settled on the sans-io approach alread, so this alternative discussion doesn't really change that. So I appreciate your time in giving feedback regardless, and I won't be offended if you have to duck out of the conversation at any point.

Yep. I think based purely on the RFC, my understanding is that multiple callers should have the ability to do STUN/ICE/TURN over the one UDP socket (or if really going all in on the spec, then across any of the supported transports).

For completeness: It would be good if we can also multiplex other things in addition (like QUIC or in Firezone's case WireGuard). However, I don't think that should be an issue if we figure out a pattern on how to generally de-multiplex messages from one socket.

As an aside, my interest in this is purely about the technical challenge of understanding the technology better. I want to make sure that I communicate that your sans-io approach is pragmatic given the software you're building and it's very likely the best solution for your problem. All of this discussion is about whether this approach generalizes, or whether using async is a more ideal generalization of the sans-io problem. I think the shallow answer is that async could eventually be the better approach, but it's probably not quite there yet.

Also, I wanted to thank you for continuing the discussion on this. This is obviously your code for your $JOB which has settled on the sans-io approach alread, so this alternative discussion doesn't really change that. So I appreciate your time in giving feedback regardless, and I won't be offended if you have to duck out of the conversation at any point.

Absolutely! I think it is a very interesting topic and definitely worth exploring. Thank you for spending time on it! I think identifying async patterns (and shortcomings) here is very valuable and might also serve as a good resource for async Rust WGs in what is still needed to make things better. I'll do my best to stay responsive :)

An interesting thought I had this morning is: The direction you are pushing in with your design is still sans-IO (modulo the timers perhaps) because it doesn't depend on IO sources directly but uses the Stream + Sink abstractions. So eventually, we should probably also spend some time thinking about more descriptive names? :)

An interesting thought I had this morning is: The direction you are pushing in with your design is still sans-IO (modulo the timers perhaps) because it doesn't depend on IO sources directly but uses the Stream + Sink abstractions. So eventually, we should probably also spend some time thinking about more descriptive names? :)

Yep. that's kind of similar to where I started in my hacker news comment:

This article / idea really refactors two things out of some IO code

the event loop

the state machine of data states that occur

But async rust is already a state machine, so the stun binding could be expressed as a 3 line async function that is fairly close to sans-io (if you don't consider relying on abstractions like Stream and Sink to be IO).

async fn stun( server: SocketAddr, mut socket: impl Sink<(BindingRequest, SocketAddr), Error = color_eyre::Report> + Stream<Item = Result<(BindingResponse, SocketAddr), color_eyre::Report>> + Unpin + Send + 'static, ) -> Result<SocketAddr> { socket.send((BindingRequest, server)).await?; let (message, _server) = socket.next().await.ok_or_eyre("No response")??; Ok(message.address) }

Obviously there's a lot more to it than just that... :), but the underlying idea that async code and sans-io are on opposing sides of things is definitely not the case if you accept that sans-io doesn't require push style interfaces that are externally driven, and are ok with them being async state machines which are driven by the async framework instead of your own code.

joshka added 2 commits December 30, 2024 22:22

Add tokio-sans-time

ae14749

The modification to support time after the receiving a binding is 4 LoC

thomaseizinger reviewed Jan 2, 2025

View reviewed changes

thomaseizinger reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the tokio version of sans-io #2

Implement the tokio version of sans-io #2

joshka commented Dec 31, 2024

thomaseizinger left a comment

thomaseizinger Jan 2, 2025

thomaseizinger Jan 2, 2025

joshka Jan 2, 2025

thomaseizinger Jan 3, 2025

thomaseizinger Jan 3, 2025

thomaseizinger Jan 2, 2025

thomaseizinger Jan 2, 2025

joshka Jan 2, 2025

thomaseizinger Jan 3, 2025

thomaseizinger Jan 2, 2025

joshka Jan 2, 2025

thomaseizinger Jan 3, 2025

thomaseizinger Jan 2, 2025

joshka Jan 2, 2025

joshka commented Jan 2, 2025

thomaseizinger Jan 3, 2025

joshka Jan 4, 2025 •

edited

Loading

thomaseizinger Jan 4, 2025 •

edited

Loading

joshka Jan 4, 2025

	if let Some(address) = binding.public_address() {
	println!("Our public IP is: {address}");
	}

Implement the tokio version of sans-io #2

Are you sure you want to change the base?

Implement the tokio version of sans-io #2

Conversation

joshka commented Dec 31, 2024

thomaseizinger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshka commented Jan 2, 2025

Choose a reason for hiding this comment

joshka Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

thomaseizinger Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshka Jan 4, 2025 •

edited

Loading

thomaseizinger Jan 4, 2025 •

edited

Loading