production-ready, provider-independent & easily manageable k8s cloud setup
This project is intended to help beginner-to-intermediate Kubernetes hobbyists and freelancers with the mammoth task of setting up, maintaining and updating a production k8s cloud setup.
It builds on the idea that today you can find a tool for automating almost any given DevOps task. Therefore, the challenge lies less in learning to do any one of these tasks manually, and more in finding the correct automation tools for the task at hand, separating the good from the bad, and making them work in unison. Also, to be a functioning human in this automation loop, you should have a basic understanding of the underlying ideas and technologies at play.
While every DevOps engineer likely needs to put their own, manual learning work into the latter point, the former one can definitely be outsourced into a pre-made toolbox / system to save all of us a ton of time. This is what this project aims to do.
As for the learning part: These docs point to a few, third-party resources to get newcomers started on each of the required basics, but this is by no means the focus of this work.
To make proper use of this repository, you will need basic understanding of multiple IT domains and tools. Please have a look at the Prior Knowledge Reference for an overview of relevant topics with further links to get you started learning.
This repo contains the following components:
/roles
: Ansible roles for setting up a production-ready Kubernetes cluster./charts
: Helm charts, which can be deployed into the cluster./charts/system
: System-level charts (storage, DNS, backup, etc.), which are deployed as part of the cluster setup./charts/apps
: Helm charts for user applications (Nextcloud, Wordpress, etc.), which can be deployed on demand./charts/config
: Supporting charts, which are depended upon by other charts in this repo.
/setup.yaml
: Ansible playbook to set up a cluster on a given inventory of nodes/clusters
: Configuration for deployed clusters. The repo only contains one sub-directory as a template, which you need to copy to create your own cluster config.
The following system / infrastructure components can be deployed via setup.yaml
:
Some of these can be disabled via the cluster config.
- Kubernetes:
- k3s distribution
- CNI configured for dual-stack, wireguard-encrypted networking
- Alternatively: encrypted networking via Tailscale VPN to support nodes without static public IP
- HA via embedded etcd
- Kubernetes-Dashboard
- Other production config for k3s: Secrets encryption, metrics, OIDC auth, reserved resources, ...
- Storage:
- Longhorn as storage provider
- Multiple storage classes for different volume types
- Local volumes or cross-node replication
- Optional encryption via LUKS
- Backup to and restore from S3 storage
- Web UI for storage management
- Ingress:
- Ingress-NGINX as ingress controller
- Expose services via configurable pool of ingress nodes
- Provides default ingress class
- CertManager with preconfigured Letsencrypt ACME issuer to auto-provision (& renew) certificates
- External-DNS to auto-configure and continuously sync DNS records
- Authentication:
- Keycloak as identity provider
- Cluster-internal Single-Sign-On via OIDC or SAML
- OAuth2-Proxy for proxy header auth
- Web UI for user management etc.
- Telemetry System:
- Prometheus for metrics collection
- Loki for logs collection
- Grafana for dashboards and alerts
- Node-Exporter
- AlertManager
- Auto-provisioned dashboards and email alerts for common cases / faults
- Backups:
- Velero
- Nightly full-cluster backups of all API resources and PV contents
- Easy manual backing up and restoring
- GitOps System:
- FluxCD
- Continuous, rolling updates of deployed apps based on semver ranges
- Weave Gitops as Web UI
- Cluster Upgrades:
- System Upgrade Controller
- Automatic, non-disruptive upgrades from k3s stable channel
- Upgrades both master and worker nodes
- Virtual Clusters:
- VCluster
- Create virtual sub-clusters, which are constrained to a specific namespace and subset of nodes
- Each vcluster has a full k8s API and either reuses the infrastructure components of its host cluster (e.g. Longhorn) or deploys its own set internally
- Useful for test environments or providing multi-tenancy with limited resources
- One or more linux machines managed by systemd, to which you have root access
- A domain managed by one of the providers supported by external-dns with API access
- Both Digitalocean and Cloudflare offer free DNS plans and have been tested with this setup.
- Credentials to an SMTP server to send automatic emails from
- Strato offers very affordable mail packages
- Credentials to an existing, empty S3 bucket
- OVH offers low-priced S3-compatible storage
- Optional: Free Tailscale account and credentials
The Ansible playbook code in here is meant to be run from a Linux workstation. On Windows you may use WSL.
These system dependencies are to be installed on your local machine.
Clone the repo, cd
to the cloned folder, and run following bash code to install all Python deps into a virtual environment:
poetry install
poetry shell
Then run this to install Ansible-specific dependencies:
ansible-galaxy install -r requirements.yaml
Once you have gone through below setup steps to create a host cluster, you may repeat the same steps with a subset of the original nodes and different configuration to create a vcluster on top. Note that you have to set cluster.virtual=true
in ./clusters/$CLUSTER_NAME/group_vars/cluster/configmap.yaml
for a vcluster to be created.
β οΈ You have to include at least one of the host cluster control nodes in the subset and mark it withcontrol=true
. This is required as certain setup steps have to be performed via ssh on a control node.
Note that some config options are not available for vclusters. This is mentioned in the respective configmap template comments.
Storage is managed fully by the host cluster, including backups of PVs
Copy the cluster inventory template clusters/_example
:
cp ./clusters/_example ./clusters/$CLUSTER_NAME
Replace $CLUSTER_NAME
with an arbitrary alphanumeric name for your cluster.
Edit the hosts file ./clusters/$CLUSTER_NAME/hosts.yaml
to add all host machines, which are to be setup as the nodes of your cluster.
Make sure your workstation can connect and authenticate as a privileged user via ssh to all of the remote hosts.
Backbone hosts are node hosts which have a direct and strong connection to the internet backbone, e.g. data centers, and thus have low latency and high bandwith among them. This is important especially for distributed storage. Latency between all backbone nodes should be 100ms or less and bandwidth should be at least 1Gbps (up and down).
By default, nodes are assigned to the region backbone
and the zone default
. You may set backbone=false
on a node in hosts.yaml
to assign it to the edge
region instead (disabling its participation in HA control plane and default distributed storage). You may additionally define a custom zone
for each node.
To only include a subset of hosts into the backbone, set
backbone=true
on the subset, and all other nodes will be assignedbackbone=false
automatically.
By default, the first backbone host is taken as the sole control plane host. If you want to change this bevavior, set the value control=true
on a subset of hosts.
It is recommended to have either one or at least 3 control plane nodes. Make sure you store the hosts file somewhere safe. Please also set
control nodes
for vclusters. This has no effect on where the control plane pods run, but it tells Ansible on which machines it can execute administrative tasks via ssh.
By default, the first backbone host is taken as the sole ingress host. If you want to change this bevavior, set the value ingress=true
on a subset of hosts.
By default, all hosts are taken as storage hosts, yet non-backbone hosts are excluded from distributed storage for performance reasons, hence they can only host single-replica, local volumes. To exclude hosts from being used for storage at all, set storage=false
on them, or to only include a subset of hosts for storage, set storage=true
on the subset.
Copy the default cluster config file ./roles/cluster-config/configmap.yaml
to ./clusters/$CLUSTER_NAME/group_vars/cluster/configmap.yaml
and the cluster secrets template file ./roles/cluster-config/secrets.yaml
to ./clusters/$CLUSTER_NAME/group_vars/cluster/secrets.yaml
. Fill in the required values and change or delete the optional ones to your liking.
Please either manually make sure that all your nodes are listed as trusted in
known_hosts
or append-e auto_trust_remotes=true
to below command, otherwise you will have to typeyes
and hit Enter for each of your hosts at the beginning of the playbook run.
To setup all you provided hosts as Kubernetes nodes and join them into a single cluster, run:
ansible-playbook setup.yaml -i clusters/$CLUSTER_NAME
If you recently rebuilt the OS on any of the hosts and thereby lost its public key, make sure to also update (or at least delete) its
known_hosts
entry, otherwise Ansible will throw an error. You can also append-e clear_known_hosts=true
to above command to delete theknown_hosts
entries for all hosts in the inventory before executing the setup.
As already mentioned above, this Kubernetes setup includes multiple web dashboards, which allow you to do various maintenance tasks and are available under different subdomains of the domain you supplied in the cluster config:
id.yourdomain.org
- Keycloak Web UI
- Manage your Single-Sign-On users, groups and OIDC/SAML clients
- Manage your own admin credentials
kubectl.yourdomain.org
- Kubernetes Dashboard
- List k8s API resources with their attributes, events and some metrics
- Manually create, mutate and delete any resource
- Deploy new GitOps resources
- View most recent logs of pods or shell into their containers
gitops.yourdomain.org
- Weave GitOps Dashboard
- List all deployed GitOps resources with their attributes and state
- Sync, pause and resume resources
telemetry.yourdomain.org
- Grafana Dashboard UI
- View and search logs of the past few days
- Query and visualize in-depth metrics
- Check the state of (preconfigured) alert rules
longhorn.yourdomain.org
- Longhorn UI
- Manage and monitor persistent storage nodes, volumes and backups
- Only available in a host cluster
All these dashboards are secured via OIDC authentication
For most cluster operations the Ansible playbook isn't required. You can instead use kubectl
or specific CLI tools relying on kubectl
(or the k8s API directly). These CLI tools are installed automatically on all control hosts:
Best just
ssh
into one of the control hosts and perform operations from the terminal there.
- *Via Longhorn Web UI_:
Request eviction of the associated Longhorn storage node. - Wait for all volumes to be evicted.
- *Via terminal on any control node_:
kubectl drain
the k8s node to evict all running pods from it and disable scheduling.You will probably want to run
kubectl drain
with these flags:--ignore-daemonsets --delete-emptydir-data
- Wait for all pods to be evicted.
- *Via terminal on the node to be removed_:
Execute uninstall script:- For agents:
/usr/local/bin/k3s-agent-uninstall.sh
- For servers:
/usr/local/bin/k3s-uninstall.sh
- For agents:
- *Via kubectl on control node or k8s-dashboard_:
Delete theNode
resource object.
Simply add new node hosts to the end of your cluster's hosts.yaml
and re-run the setup playbook.
β οΈ Adding new control nodes is currently untested and could leave your cluster in a failed state!
While you are free to deploy any containerized app into your cluster, a few select ones have been optimized to work well with the specific storage / networking / authentication infrastructure of this project. Concretely, these are custom helm charts (partly based on the official helm charts of these apps), which are contained in the folder /charts/apps
. To deploy any of these custom helm charts, follow these steps:
Note that this is one way to do it. If you have experience with k8s and GitOps, feel free to use your own tools.
-
Open up the Kubernetes Dashboard UI under
kubectl.yourdomain.org
. -
Open up the form for creating a new resource via the
+
button at the top right. -
Paste this template for a FluxCD helm release into the form:
apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease metadata: name: "" # Custom name of your release namespace: "" # Name of an existing namespace (best create a new one) spec: chart: spec: chart: "" # Name of the chart in /charts/apps sourceRef: kind: HelmRepository name: base-app-repo namespace: flux-system version: "" # Semver version constraint (use the latest version) interval: 1h values: {} # Custom values. See the chart's values.yaml file.
-
Fill in the missing values and hit
Upload
. -
Monitor the release's progress via
gitops.yourdomain.org
.
You can also find a list of all deployed HelmReleases in the Kubernetes Dashboard.
Likely a previous helm operation was interrupted, leaving it in an intermediate state. See this StackOverflow response for possible solutions.
Have a look at the field metadata.finalizers
of the PVC in question. If it contains snapshot.storage.kubernetes.io/pvc-as-source-protection
, then there exists an unfinished volume snapshot of this PVC. This could mean that a snapshot is being created right now, in which case the finalizer should be removed within the next few minutes (depending on volume size) and then the PVC deleted, but it could also mean that there exists a failed snapshot, in which case k8s unfortunately leaves the finalizer indefinitely.
If you do not care about the integrity of the PVC's snapshots (because you don't want to keep backups) then you can remove the finalizer entry manually and thereby trigger immediate deletion. Otherwise best wait for about an hour and only then remove the finalizer manually.