Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks due to CloseRead #429

Closed
creker opened this issue Dec 19, 2023 · 6 comments
Closed

Deadlocks due to CloseRead #429

creker opened this issue Dec 19, 2023 · 6 comments
Labels
Milestone

Comments

@creker
Copy link

creker commented Dec 19, 2023

nhooyr.io/websocket v1.8.10

The library uses defer defer wg.Wait() in multiple functions and that causes calls to deadlock instead of properly shutting down when connection is terminated.

One possible situation is this:

  1. The server calls CloseRead
  2. The Client sends some message
  3. Goroutine inCloseRead unblocks from Reader(ctx) and calls Close
  4. Close blocks on defer c.wg.Wait() but it can't succeed due to CloseRead calling wg.Add(1) and expecting it to be decremented with defer wg.Done() on exit from CloseRead. But CloseRead cannot exit due to Close blocking. We have a deadlock and a goroutine leak. Context is never cancelled either which means all the other code that interacts with websocket also can't properly exit.

Switching to manually calling Read in a separate goroutine and cancelling context from it solves this problem. Basically, you have to reimplement CloseRead but without blocking on waitGroup

@nhooyr
Copy link
Contributor

nhooyr commented Dec 19, 2023

Copy. Will rework the close code ASAP. See also #427

@nhooyr
Copy link
Contributor

nhooyr commented Dec 19, 2023

Actually can you provide a trace?

CloseRead done's the wait group before it closes. https://github.com/nhooyr/websocket/blob/e3a2d32f704fb06c439e56d2a85334de04b50d32/read.go#L69

But I will concede the use of a wait group in this way is certainly potentially panicky as the wg.Add(1) call could occur concurrently with a close and thus panic.

@creker
Copy link
Author

creker commented Dec 19, 2023

@nhooyr the problem is that c.wg.Done() is deferred and called on exit from CloseRead. But CloseRead cannot exit because it calls Close which blocks on c.wg.Wait(). Same with context cancellation.

@nhooyr
Copy link
Contributor

nhooyr commented Dec 19, 2023

Oh I'm blind great point! Confirmed. 🤦🏼

nhooyr added a commit to alixander/websocket that referenced this issue Apr 5, 2024
Far simpler now. Sorry this took a while.

Closes coder#427
Closes coder#429
Closes coder#434
Closes coder#436
Closes coder#437
@nhooyr
Copy link
Contributor

nhooyr commented Apr 5, 2024

Thanks for reporting. I've fixed this in #427. Please review and let me know if you see anything else.

nhooyr added a commit to alixander/websocket that referenced this issue Apr 5, 2024
Far simpler now. Sorry this took a while.

Closes coder#427
Closes coder#429
Closes coder#434
Closes coder#436
Closes coder#437
@nhooyr nhooyr added this to the v1.8.11 milestone Apr 5, 2024
@nhooyr
Copy link
Contributor

nhooyr commented Apr 7, 2024

Sorry for the delay. Fixed by #427.

@nhooyr nhooyr closed this as completed Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants