Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.11] Backport of Add jitter and resync to polling #3195

Closed
wants to merge 78 commits into from

Conversation

manno
Copy link
Member

@manno manno commented Jan 9, 2025

Backport of #3151, refers to #3138

p-se and others added 30 commits November 6, 2024 17:21
… to 0.10.5 (#3053)

* chore: Update Fleet asset URL

* chore: Update Fleet CRD asset URL

Made with ❤️️ by updatecli

---------

Co-authored-by: fleet-bot <[email protected]>
* Improve namespace target customization tests

These tests now verify that the created namespace does bear expected
labels and annotations.
This commit also paves the way for additional tests with customizations
over unconfigured namespace labels and annotations, which currently
cause a panic.

* Initialise options maps when empty

This prevents panics when namespace labels or annotations are
configured as target customizations over nonexistent defaults.

* Use main branch of `rancher/fleet-test-data

The required changes made in that repo have been merged.
This fixes a linter error.
* Fix charts repo name population

This simplifies reuse of variables across steps and jobs by making use
of output variables, eliminating the need for additional environment
variables.

* Fix base and target branch uses

When reusing a variable computed in another step, it is now explicitly
sourced through outputs.
Made with ❤️️ by updatecli
Made with ❤️️ by updatecli
When releasing test Fleet charts, the test release workflow looks for
the latest existing Fleet release, to use it as a base before making a
few edits.

The previous logic used to find the latest available chart was buggy, in
that it would list releases in alphabetical order, which could differ
from semver. For instance, chart version `103.1.10+up0.9.11` would be
listed between versions `103.1.0+up0.9.0` and `103.1.2+up0.9.2`.

Instead, this commit simplifies resolution by first looking at the
`package.yaml` file, extracting the chart version from there and looking
for the corresponding Fleet version in the charts repository.
Resolution would then fail if no corresponding version is found in the
repository, but that is far less likely to happen than with the previous
logic and would typically be a symptom of a broken state of the charts
repository.
* Inject registered cluster name into multi-cluster tests

Depending on the context in which multi-cluster end-to-end tests are
run, there may not be any registered downstream cluster called `second`.
In such cases, such as when testing Fleet in Rancher, the CI workflow
computes the name of the registered cluster and exports it as an
environment variable, for the tests to use instead of `second`.

* Refer explicitly to Fleet API when labeling clusters

This prevents ambiguity by specifying the `fleet.cattle.io` domain when
labeling clusters in multi-cluster end-to-end tests, with `kubectl`
looking for `clusters.cluster.x-k8s.io` by default.

* Use non-managed downstream cluster in multi-cluster label tests

This makes the multi-cluster end-to-end test suite compatible with
setups in which only one, non-managed downstream cluster exists, such as
the test-Fleet-in-Rancher CI workflow.

* Use`dev-v2.10` as default base for test Fleet charts

This updates the default base test charts branch to match the current
state of Rancher releases.
* Improve sharding end-to-end error reporting

This adds expectations on existing pods and clearer error messages to
ease troubleshooting in case of failing of flaky tests.

* Run node selector check earlier

Sharding end-to-end tests exhibited flakiness, caused by the git job pod
not being present in the cluster by the time checks were run to validate
its node selector against that of the relevant shard.

To prevent this, tests on the node selector are now run first, which
incidentally also prevents `Eventually` from running more times awaiting
a config map to be deployed.
Co-authored-by: renovate-rancher[bot] <119870437+renovate-rancher[bot]@users.noreply.github.com>
* Apply defaults from gitrepo restrictions
* Add unit tests covering defaults auth and assign

This covers a few happy and error cases with unit tests, uncovering a
few typos in error messages in the process.

---------

Co-authored-by: Corentin Néau <[email protected]>
Made with ❤️️ by updatecli
Made with ❤️️ by updatecli
Made with ❤️️ by updatecli
p-se and others added 27 commits December 16, 2024 12:59
Using the environment variable FLEET_E2E_DS_CLUSTER_COUNT, an arbitrary
amount of downstream clusters can be spawned, e.g.:

```
FLEET_E2E_DS_CLUSTER_COUNT=4 ./dev/setup-multi-cluster
```

This environment variable affects

- dev/setup-k3ds
- dev/import-images-k3d
- dev/setup-multi-cluster
This fixes error messages shown in case of namespace or release name
mismatch.
Add a new custom resource `HelmApp` (resource name open to debate) that describes a helm chart to be deployed.

The resource contains all the fields from the classic `fleet.yaml` file plus a few new from the `GitRepo`
resource.

`HelmApp` YAML example:

```yaml
apiVersion: fleet.cattle.io/v1alpha1
kind: HelmApp
metadata:
  name: sample1
  namespace: fleet-local
spec:
  helm:
    releaseName: testhelm
    repo: https://charts.bitnami.com/bitnami
    chart: postgresql
    version: 16.2.1
  insecureSkipTLSVerify: true
```

The implementation tries to share as much as possible from a `Bundle` spec inside the new resource, because it helps
to "transform" the `HelmApp` into a deployment (no conversion is needed for most of the spec).

The new controller was also implemented splitting the functionality into 2 controllers (similar to what we did for the `GitRepo` controller). This allows us to reuse most of the status handling code, as display fields in the status of the new resource are as similar as possible to have consistent user experience and to integrate with the UI in the same way the `GitRepo` does.

When a new `HelmApp` resource is applied it is transformed into a single `Bundle`, adding some extra fields to let the `Bundle` reconciler know that this is not a regular `Bundle` coming from a `GitRepo`.

Similar as we did for OCI storage, the `Bundle` created from a `HelmApp` does not contain resources. The helm chart to be deployed is downloaded by the agent.

Code for downloading the helm chart is reused from gitops, so the same formats are supported.
Insecure TLS skipping was added the the ChartURL and LoadDirectory functions in order to support this for gitops and helmops.

If we need a secret to access the helm repository we can use the `helmSecretName` field. This secret will be cloned to secrets under the `BundleDeployment` namespace (same as we did for the OCI storage secret handling).

The PR includes unit, integration (most of code is tested this way) and just one single e2e test so far just to test the whole feature together in a real cluster.

Note: This is an experimental feature. In order to activate the `HelmApp` reconciling and `Bundle` deployment you need to the the environment variable: `EXPERIMENTAL_HELM_OPS=true`

Refers to: #2962

* Add Insecure TLS option when downloading from OCI registry
* Upgrades zot to version 2.1.1 So we can enable the UI and browse artifacts
* Adds metrics e2e tests for HelmApps
* Removes BundleSpecBase as it was not compatible when building rancher
* Add unit test case when getting an error retrieving the secret
* changes after 2nd review, fix flaky test

Signed-off-by: Xavi Garcia <[email protected]>
* remove init() for setting up test data per file, as test data is shared across
  files now.
* helpers don't Expect so missing resources are retried
This enables agent worker counts to be configured when installing the
`fleet` chart, which is easier than tweaking individual releases of the
`fleet-agent` chart.
This still needs work to enable worker count updates through `helm
upgrade --reuse-values` though, as this updates the `fleet-agent`
`StatefulSet` _twice_, the second time with default values (50 workers
per reconciler).
* Exports Metrics URLs to test with external IPs

It also deletes the HELM_PATH env variable as it is no longer used.

Fixes errors in metrics tests when the test is trying to check for metrics
when the service exporting them it's still not fully up.

Adds an extra check in OCI e2 tests to verify that the `CI_OCI_USERNAME` and
`CI_OCI_PASSWORD` are set.

Deletes the `GI_GIT_REPO_URL` env variable and uses the already existing `external_ip`.

---------

Signed-off-by: Xavi Garcia <[email protected]>
Co-authored-by: Mario Manno <[email protected]>
* k3d-act-clean cleanup

- Clean up downstream clusters if FLEET_E2E_DS_CLUSTER_COUNT is set
- Remove cleaning up any docker containers by name, including from
  nektos/act, therefore renaming k3d-act-clean to k3d-clean.

* Keep compatibility with prevous version of setup-k3ds
3f61ba5 removed the usage of the HELM_PATH environment variable.
* Separate k3d dev scripts for upstream/downstream

* support PORT_OFFSET for upstream in both, to avoid conflicts with
  host ports
* for simplicity downstreams always have a number in their name

* fixup! Separate k3d dev scripts for upstream/downstream

* fixup! fixup! Separate k3d dev scripts for upstream/downstream
* Import v1alpha1 package as fleet

* Show bundle errors in Bundle and GitRepo

Refers to #2943

* Add E2E tests

Refers to #2943
Same tests run in e2e-nightly
If gitrepos are lost from the requeueAfter polling, resync should add them
again.
@manno manno requested a review from a team as a code owner January 9, 2025 14:50
@manno manno closed this Jan 9, 2025
@manno manno deleted the add-jitter-to-polling branch January 9, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants