Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conntrack lookup removal in ipt_GLBREDIRECT breaks with network namespaces #111

Open
jstangroome opened this issue Sep 3, 2020 · 1 comment

Comments

@jstangroome
Copy link

jstangroome commented Sep 3, 2020

The change to ipt_GLBREDIRECT implemented in PR #67 and discussed in issue #50 breaks deployments where the listening socket is in a different network namespace to where the -j GLBREDIRECT iptables rule is installed.

The observed behaviour is that GUE-encapsulated TCP SYN packets are accepted but all subsequent GUE packets for the same TCP session are then forwarded to the next-hop specified in the GUE private data, instead of being accepted locally.

Taking current master (commit 5387908) and reverting just the PR #67 merge commit 5e1edd0, i.e. git revert -m1 5e1edd0 corrects the behaviour. The behaviour is also mitigated by configuring the GLB with only a single backend since there is no next-hop to forward to but this is not very useful in practice.

The assumption is that the inet_lookup_established call is only considering ESTABLISHED sockets in the host network namespace and the now deleted conntrack lookup code does not exist to discover the conntrack entries related to having directed the connection to another network namespace.

One example where this occurs is on a Kubernetes node with the ip fou tunnel and GLBREDIRECT iptables rule configured on the host network namespace, while an nginx-ingress controller Pod listens on TCP sockets 80 and 443 inside the Pod's network namespace and traffic is routed from the host to the Pod via DNAT iptables rules added by the Kubernetes CNI. I expect the same behaviour can be reproduced without Kubernetes, such as with a Docker container's network namespace, or even just with ip netns add, ip netns exec and appropriate NAT rules.

The problem was experienced on Ubuntu 18.04.5 with kernel 5.4.0-42-generic.

I have not confirmed but I suspect that configuring the fou tunnel and the GLBREDIRECT iptables rule inside the Pod network namespace would also resolve the fault but this is less maintainable in a Kubernetes ingress controller context.

Possible options to fix ipt_GLBREDIRECT:

  • Just revert PR Remove conntrack lookups #67
  • Revert PR Remove conntrack lookups #67 and make it either a conditional compilation option, or enabled at module load with a module parameter, or as an additional iptables argument for -j GLBREDIRECT.
  • Introduce a module/iptable parameter to specify the network namespace to use for inet_lookup_established calls (not sure if feasible, or even friendly to use).
  • Other??
@theojulienne
Copy link
Contributor

Thanks for reporting this! It's certainly an interesting issue.

I think this generally is a new use case, where iptables NAT is considered a "locally established connection", it shouldn't really matter where the remote side is. You could imagine, for example, if that DNAT directed traffic off the local host (often the case with Kubernetes nodeports, for example), then the connection wouldn't appear established locally regardless of which namespace we looked under.

This sort of leads me to think that the right answer is to add a mode/option to the iptables module to support looking at conntrack for the purposes of allowing NAT-only "sessions" to match, or just bringing back the function but explicitly stating that the module supports it for the purposes of keeping NAT sessions functional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants