Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: docker exec hangs indefinitely when reading from stdin #7896

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bcxpro
Copy link

@bcxpro bcxpro commented Dec 8, 2024

This PR fixes the improper handling of half-closed connection behavior when attaching stdin to a container. The proper approach requires:

  • The docker client receives a half-close of the connection from client to daemon (stdin)
  • The connection from daemon to client (stdout, stderr) remains open until no more data is to be received

The first part of the issue is resolved by enabling MessageMode on Windows named pipes. This ensures that when stdin ends, an EOF is received on the reader, allowing proper half-closing of the connection.

The previous implementation used stdlib's httputil.DockerProxy to proxy from Windows named pipe to WSL hvsock. This proxy does not use CloseWrite on hijacked connections, preventing proper handling of this use case. By using Close() instead of CloseWrite(), both connection directions are closed simultaneously, causing the client to miss pending stdout/stderr content after stdin EOF.

This commit introduces a simpler custom util.ReverseProxy to replace httputil.DockerProxy, implementing proper handling of half-closed connections with explicit support for:

  • HTTP protocol upgrades
  • Half-close TCP connection management
  • Precise stream handling for Docker API interactions

Closes #2094

@bcxpro bcxpro force-pushed the 2094-closewrite-proxy branch 3 times, most recently from 8330069 to f4d9c09 Compare December 8, 2024 19:49
@jandubois jandubois closed this Dec 10, 2024
@jandubois jandubois reopened this Dec 10, 2024
@gunamata gunamata requested a review from Nino-K December 10, 2024 17:54
@Nino-K
Copy link
Member

Nino-K commented Dec 10, 2024

@bcxpro, thanks for your contribution. It looks like there are some issues with the "Test" step in the CI. Once they are addressed, I will gladly look at your PR.

Also, once you can run yarn install && yarn lint before pushing your PR, ensure you can address the lint issues. Thanks

@bcxpro bcxpro force-pushed the 2094-closewrite-proxy branch 2 times, most recently from 72b8d1c to 139becb Compare December 10, 2024 20:24
@bcxpro
Copy link
Author

bcxpro commented Dec 10, 2024

@bcxpro, thanks for your contribution. It looks like there are some issues with the "Test" step in the CI. Once they are addressed, I will gladly look at your PR.

Also, once you can run yarn install && yarn lint before pushing your PR, ensure you can address the lint issues. Thanks

@Nino-K I had missed the changes to a file in the commit. I just updated the PR with it. It should build now.

Regarding the linter, I fixed all the issues except one. Honestly, I cannot figure out exactly what response body is complaining about in that line.

@bcxpro
Copy link
Author

bcxpro commented Dec 10, 2024

I just have seen that this one may be indirectly fixing #3239

@matejkramny
Copy link

Hi, looks like we worked on the same problem simultaneously and solved it differently.

I'm curious why change the windows side? Attaching to containers from Windows didn't have this problem

this is my understanding from looking at how rancher runs:
mermaid-diagram-2024-12-11-161227

The issue seemed to be only at the linux proxy (red box in the diagram)

@bcxpro
Copy link
Author

bcxpro commented Dec 11, 2024

Hi, looks like we worked on the same problem simultaneously and solved it differently.

I'm curious why change the windows side? Attaching to containers from Windows didn't have this problem

Hi @matejkramny,

Your diagram is correct, at least as far as I understand it.

Initially, a couple of months ago, I tried to run kind on Rancher Desktop and it failed. Looking into it, I discovered that the problem was the kind tool passing configuration on Windows via stdin, and the docker commands would get stuck.
I found issue #2094, which exactly matched the problem I had on Windows. The issue can be reproduced with:

echo ok | docker run -i --rm alpine cat

When you run this, "ok" is echoed back to the terminal, but the command gets stuck instead of exiting and returning the correct exit code.

I started troubleshooting that part. The first problem was that the Windows named pipe was not signaling the EOF of the data received via stdin. As a result, the Docker CLI was stuck, always listening for more data. The data was echoed back, but that was it. Initially, I thought I had found the solution with this line in server_windows.go:

pipeConfig := &winio.PipeConfig{MessageMode: true}

Everything seemed to work—the "ok" was echoed back, and the program was no longer stuck.

However, kind still did not work. So I continued troubleshooting and found that the http.ReverseProxy used in wsl-helper serve to proxy from the Windows named pipe to the Hyper-V vsock did not properly half-close the connections. The Docker API for attach requires that the stdin end signals the end of stdin by half-closing the connection. But http.ReverseProxy just closes the connection in both directions.
This causes issues with commands like:

cat largefile.txt | docker run -i --rm alpine cat

The problem then is that the command's output is truncated. The command ends but it does not echo back everything.

The reason is that for the short period of time that the connection to the hvsock remains open traffic flows in all directions, and as soon as it is closed, the stdout/stderr data cannot flow to the client anymore. That is the reason why at first just with that like in server_windows.go the command echo ok | docker run -i --rm alpine cat seemed to work.

Then the problem was the proxy. I initially built a very simple L4-level proxy that just piped both connections properly. But the current solution needed to leverage the capabilities of hooks in http.ReverseProxy to modify requests and responses (mungers). A transparent proxy wouldn't suffice, so an L7 proxy was needed that could make proper use of CloseWrite to half-close connections.

In this PR, I wrote a minimalistic HTTP reverse proxy that can handle both half-closes and the "mungers".

With that, the solution to my problem was finished, and I submitted this PR a couple of days ago.

Yesterday, I saw #3239 related to WSL. Testing it with this PR applied, it looks like the issue is solved. This is most likely because the wsl-helper serve used to pipe from the /var/run/docker.sock in WSL to the /mnt/wsl/...sock uses the same code. The replacement of the proxy that I did to solve the problem in Windows also solves the WSL issue (specifically, the half-close problem).

@bcxpro bcxpro force-pushed the 2094-closewrite-proxy branch from 139becb to 7e98ebf Compare December 11, 2024 17:16
@matejkramny
Copy link

@bcxpro that's great, thank you for explaining!

Then the problem was the proxy. I initially built a very simple L4-level proxy that just piped both connections properly. But the current solution needed to leverage the capabilities of hooks in http.ReverseProxy to modify requests and responses (mungers). A transparent proxy wouldn't suffice, so an L7 proxy was needed that could make proper use of CloseWrite to half-close connections.

I think our solutions diverge here, at L4 I read ahead and match the request to /attach and conditionally continue piping at L4 or to the munger-capable L7 server.

For interest, go maintainers don't seem to have plans to fix the ReverseProxy

@jandubois
Copy link
Member

For interest, go maintainers don't seem to have plans to fix the ReverseProxy

Not sure this is true; that issue was just closed as a duplicate of golang/go#35892, which is still open, but doesn't seem to have any recent action on it either.

@matejkramny
Copy link

matejkramny commented Dec 11, 2024 via email

This commit fixes the improper handling of half-closed connection behavior when attaching stdin to a container.
The proper approach requires:
- The docker client receives a half-close of the connection from client to daemon (stdin)
- The connection from daemon to client (stdout, stderr) remains open until no more data is to be received

The first part of the issue is resolved by enabling MessageMode on Windows named pipes.
This ensures that when stdin ends, an EOF is received on the reader, allowing proper half-closing of the connection.

The previous implementation used stdlib's httputil.DockerProxy to proxy from Windows named pipe to WSL hvsock.
This proxy does not use CloseWrite on hijacked connections, preventing proper handling of this use case.
By using Close() instead of CloseWrite(), both connection directions are closed simultaneously,
causing the client to miss pending stdout/stderr content after stdin EOF.

This commit introduces a simpler custom `util.ReverseProxy` to replace httputil.DockerProxy, implementing proper handling
of half-closed connections with explicit support for:
- HTTP protocol upgrades
- Half-close TCP connection management
- Precise stream handling for Docker API interactions

Closes rancher-sandbox#2094

Signed-off-by: Carlos Barcenilla <[email protected]>
@bcxpro bcxpro force-pushed the 2094-closewrite-proxy branch from 7e98ebf to 6643586 Compare December 28, 2024 15:54
@MrWako
Copy link

MrWako commented Jan 3, 2025

Hi @jandubois, @matejkramny, @bcxpro - I've tried to pick up golang/go#35892 again. I couldn't see any particular objections to addressing the issue so hoping that with a bit of perseverance we can get a fix in and avoid having to maintain a custom proxy within the rancher desktop code.

@jandubois
Copy link
Member

I've tried to pick up golang/go#35892 again. I couldn't see any particular objections to addressing the issue so hoping that with a bit of perseverance we can get a fix in

@MrWako Thanks for picking this up!

and avoid having to maintain a custom proxy within the rancher desktop code.

I think it is fine to use a custom proxy in Rancher Desktop until this is fixed upstream because we don't know how long it would take, or if upstream will accept it at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docker exec hangs indefinitely when reading from stdin
5 participants