Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is it okay to simply close a connection in the TCP_CLOSE state upon restore? #2286

Open
aegiryy opened this issue Oct 13, 2023 · 7 comments

Comments

@aegiryy
Copy link

aegiryy commented Oct 13, 2023

Description

We hit the following error when restoring a TCP connection:

(00.430332)    718: inet:       Restore: family AF_INET6   type SOCK_STREAM    proto IPPROTO_TCP      port 34622 state TCP_CLOSE        src_addr ::ffff:10.4.104.194
(00.430342)    718: tcp: Restoring TCP connection
(00.430347)    718: tcp: Restoring TCP connection id 17c8 ino 84d1d
(00.430375)    718: Error (soccr/soccr.c:501): Can't bind inet socket back (0.0.0.0): Cannot assign requested address
(00.430389)    718: Error (criu/files.c:1213): Unable to open fd=2053 id=0x17c8

What we observed is that the connection is already in the TCP_CLOSE state. So, instead of restoring it, can we simply close the connections upon restore? Would it leave any side effect that is noticeable to the restored application?

diff --git a/criu/sk-inet.c b/criu/sk-inet.c
index db11cfb64..1fe52321e 100644
--- a/criu/sk-inet.c
+++ b/criu/sk-inet.c
@@ -859,6 +859,13 @@ static int open_inet_sk(struct file_desc *d, int *new_fd)
                }
 
                mutex_lock(&ii->port->reuseaddr_lock);
+               if (ie->state == TCP_CLOSE) {
+                       if (shutdown(sk, SHUT_RDWR) && errno != ENOTCONN) {
+                               pr_perror("Unable to shutdown the socket id %x ino %x", ii->ie->id, ii->ie->ino);
+                       } else {
+                               pr_info("TCP_CLOSE connection is forcefully closed");
+                       }
+               }
                if (restore_one_tcp(sk, ii)) {
                        mutex_unlock(&ii->port->reuseaddr_lock);
                        goto err;

Steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

CRIU logs and information:

CRIU full dump/restore logs:

Output of `criu --version`:

$ criu --version
Version: 3.17
GitID: v3.17-5-gea24496ac

Output of `criu check --all`:

Warn  (criu/kerndat.c:1453): CRIU was built without libnftables support
Warn  (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

$ uname -r
5.15.0-1047-aws
@rst0git
Copy link
Member

rst0git commented Oct 15, 2023

@aegiryy Have you tried using the --tcp-close option?

criu/criu/sk-tcp.c

Lines 458 to 459 in 711775f

if (opts.tcp_close) {
if (shutdown(fd, SHUT_RDWR) && errno != ENOTCONN) {

@aegiryy
Copy link
Author

aegiryy commented Oct 16, 2023

@rst0git I am aware of this option, but my understanding is that it will close all TCP connections regardless of their states. In my scenario, I want to keep all the active TCP connections as they are all through the loopback device (127.0.0.1). For the connections in TCP_CLOSE state, we are okay to have them closed upon restore, hence the question.

@aegiryy
Copy link
Author

aegiryy commented Oct 16, 2023

Based on the comment below, it seems to be a bug when we even try to call bind(...) when restoring a TCP connection in the TCP_CLOSE state:

criu/criu/sk-inet.c

Lines 195 to 197 in 711775f

case TCP_CLOSE:
/* Trivial case, we just need to create a socket on restore */
break;

criu/soccr/soccr.c

Lines 498 to 501 in 711775f

if (bind(sk->fd, &sk->src_addr->sa, addr_size)) {
logerr("Can't bind inet socket back");
return -1;
}

Can you confirm if we expect a closed connection to ever go through bind on restore?

Copy link

A friendly reminder that this issue had no activity for 30 days.

@avagin
Copy link
Member

avagin commented Nov 30, 2023

Can you confirm if we expect a closed connection to ever go through bind on restore?
yes, it is expected. getsockname returns a bound address and it works for closes sockets too, so we need to restore it.

(00.430332)    718: inet:       Restore: family AF_INET6   type SOCK_STREAM    proto IPPROTO_TCP      port 34622 state TCP_CLOSE        src_addr ::ffff:10.4.104.194
(00.430342)    718: tcp: Restoring TCP connection
(00.430347)    718: tcp: Restoring TCP connection id 17c8 ino 84d1d
(00.430375)    718: Error (soccr/soccr.c:501): Can't bind inet socket back (0.0.0.0): Cannot assign requested address
(00.430389)    718: Error (criu/files.c:1213): Unable to open fd=2053 id=0x17c8

Here, we see the AF_INET6 socket, but the address 0.0.0.0 is ipv4. It may be a dual-stack socket.
And it is an ANY address that looks weird in this case too.

@avagin
Copy link
Member

avagin commented Nov 30, 2023

@aegiryy do you have any clue how this socket has been created?

Copy link

github-actions bot commented Jan 1, 2024

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants