Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.18] OCPBUGS-48383: Add IdleCloseOnResponse field to IngressControllerSpec #2146

Draft
wants to merge 4 commits into
base: release-4.18
Choose a base branch
from

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Jan 14, 2025

This is an manual cherry-pick of #2102.

Set IngressControllerConnectionTerminationPolicy default to Deferred for OCP <= 4.18

In OCP 4.19 (see #2102), the default for IngressControllerConnectionTerminationPolicy is Immediate. For OCP versions <= 4.18, set the default to Deferred to match the behaviour of older releases, where the HAProxy idle-close-on-response option is added unconditionally—where idle-close-on-response means Deferred.

Quoting from #2102

Introduce a new knob, IdleConnectionTerminationPolicy, in the IngressController configuration to control how idle connections are handled during router reloads.

Context
In OCPBUGS-32044, the idle-close-on-response option was unconditionally added to the HAProxy confuguration to address issues with incoming HTTP requests failing during router reloads. This issue primarily affected Apache HTTPClient versions prior to 5.0, which do not gracefully handle connection resets. Adding the option ensured that idle connections were left open to handle one final request before being closed.

Historically, HAProxy 2.2 maintained idle connections during router reloads by default, allowing requests on those connections to complete even when routing configuration changes were applied. Starting with HAProxy 2.4, the default behaviour changed to close idle connections immediately during soft reloads.

To accommodate existing clients dependent on the HAProxy 2.2 behaviour, the unconditional addition of idle-close-on-response restored the previous OpenShift status quo, where customers upgrading their OpenShift clusters experienced a behaviour change due to the jump from HAProxy 2.2 to 2.6, which altered the default handling of idle connections during router reloads.

However, unconditionally enabling idle-close-on-response has now led to issues (OCPBUGS-43745) with Route backend switching. When a Route switches its service backend, requests on persistent connections could continue being routed to the previously active backend due to HAProxy handling these connections in the old process. This behaviour occurs until the connection is closed, either by a new request, the expiration of the client keep-alive, or the expiration of the HAProxy timeout http-keep-alive 300s. While this behaviour is desirable in some cases (e.g., for clients sensitive to connection resets), it can lead to temporary inconsistencies and unexpected routing behaviour during backend switching.

This PR addresses these regressions by making the behaviour configurable through a new knob.

…(cherry-pick from 4.19)

This is a cherry-pick of commit openshift@39f913f

Add an `IdleConnectionTerminationPolicy` field to control whether
HAProxy keeps idle frontend connections open during a soft stop (router
reload). Allow users to prevent errors in clients or load balancers that
do not properly handle connection resets.
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 14, 2025
@openshift-ci-robot
Copy link

@frobware: This pull request references Jira Issue OCPBUGS-48383, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-43745 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is POST instead
  • expected dependent Jira Issue OCPBUGS-43745 to target a version in 4.19.0, but it targets "4.19" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is a cherry-pick of commit 39f913f

Add an IdleConnectionTerminationPolicy field to control whether
HAProxy keeps idle frontend connections open during a soft stop (router
reload). Allow users to prevent errors in clients or load balancers that
do not properly handle connection resets.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jan 14, 2025

Hello @frobware! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 14, 2025
@openshift-ci openshift-ci bot requested review from bparees and knobunc January 14, 2025 14:23
Copy link
Contributor

openshift-ci bot commented Jan 14, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: frobware
Once this PR has been reviewed and has the lgtm label, please assign knobunc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Jan 14, 2025

@frobware: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@frobware
Copy link
Contributor Author

/hold

All 4.19 (master) changes not yet merged; see openshift/cluster-ingress-operator#1182 (or openshift/cluster-ingress-operator#1166).

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2025
@frobware frobware marked this pull request as draft January 14, 2025 15:01
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2025
frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Jan 14, 2025
Picking up: openshift/api#2146

Vendor steps:

$ go mod edit -replace github.com/openshift/api=github.com/frobware/api@2d7c828eaefe393b7908cc762b8b772ffd04f3db
$ go mod tidy
$ go mod vendor
$ make update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants