Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shpool process hanging on ssh credentials timeout #155

Open
apeyser opened this issue Sep 25, 2024 · 3 comments
Open

shpool process hanging on ssh credentials timeout #155

apeyser opened this issue Sep 25, 2024 · 3 comments

Comments

@apeyser
Copy link

apeyser commented Sep 25, 2024

Both with mosh & ssh + shpool, I've been finding shpool process hanging. If the shpool process is directly killed, then detach + list work fine, but if I try an shpool list with the process live, the daemon hangs and a systemctl --user restart shpool is need -- aka, everything getting killed.

It appears connected with ssh credentials timing out; refreshing the credentials doesn't seem to solve the problem.

What I find in the logs is:

2024-09-25T08:27:57.783362Z ERROR ThreadId(274) handling new connection: writing version header
Caused by:
0: serializing data
1: invalid value write: error while writing multi-byte MessagePack value
2: error while writing multi-byte MessagePack value
3: Broken pipe (os error 32)

shpool version 0.7.0

@ethanpailes
Copy link
Contributor

That error message isn't neccicarily a sign of something going wrong, since the shpool attach process will probe the control socket to see if someone is listening in order to decide if it needs to autodaemonize. It just hangs up immediately while the daemon tries to initiate the handshake, causing this error to get generated in the daemon logs, but it doesn't actually indicate something is going wrong.

Can you post some step-by-step instructions for how to reproduce the issue? I've had ssh credential timeouts without seeing issues with shpool, so I'm not quite sure how to try to reproduce this.

@apeyser
Copy link
Author

apeyser commented Sep 26, 2024

pkill ssh-agent is enough to trigger it for me (once) -- but not necessary, since the usual condition doesn't involve restarting the ssh-agent, but simply allowing the creds to go stale and/or (unclear) letting the ssh control master time out. But it seems to be extremely flaky -- reproducing the failure is hard.

@ethanpailes
Copy link
Contributor

shpool doesn't really know anything about ssh, so the problem probably isn't directly related to ssh. There is probably some way to reproduce the issue purely with shpool commands, though it might be hard to find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants