-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-AZ Volume Node Affinity Conflict #1329
Comments
@DreamingRaven, I think it sounds to me like you're doing the correct thing with I'm a bit surprised that this doesn't work, as both PVCs should be used for the first time by the pod created by the mover job. It looks like you're using a copyMethod of One thing I'm not sure of is how the clone works in your env, does it get provisioned immediately to the same availability zone as the source volume? According to the google docs link you pointed to, there is this:
Which makes me wonder if the clone PVC is the issue here - it may get pre-provisioned even with WaitForFirstConsumer - this is just me guessing however. Is this possible to test by creating a clone PVC yourself? If this is the case you could alternatively try to use copyMethod of |
With Clone@tesshuflower I have created 3 clone PVCs. Without a pod to mount them they do indeed wait for first consumer using the default storage class:
I then bound each volume individually to different pods to test which availability zone they end up in. I expected them to end up in the same zone as the source volume: which is indeed the case for all volumes:
So then the question I find myself asking is why does the cache volume not also end up being assigned to the same zone, if both volumes are being provisioned for the same pod. I then tested by deleting the replicationsource and reinstating it to see the order in which they are provisioned. It looks to me that the cache volume is provisioned almost instantly, I suspect since it is being provisioned first there is no consideration taking place for the in-progress backup volume, which takes significantly longer to clone. I will shortly try with snapshots, which I hope will inform the volume placement earlier in the chain! |
With SnapshotI changed the .spec.restic.CopMethod type of the replication source to Snapshot (as @tesshuflower recommended), which provisioned the snapshot first before other resources. (I also added the annotation However, at this stage I was concerned that the next backup, since the cache volume now already exists would fail, so I reduced the cron to activate every 10 minutes to confirm. I found that the next tick did also complete successfully. Although I am yet to confirm the backup with a restore. Which is the next operation I want to check. Interestingly however, the cache volume is still in europe-west2-a, I checked the provisioned volumes after the volume snapshot, and they too end up in europe-west2-a the same as the cache volume. So it appears that the data is actually moving zones since it originated from europe-west2-c in the ghost volume through the snapshot creating X-backup-src pv like so:
So this appears to work. I will restore from this data to confirm, since the volume is also moving zones I want to confirm the data inside the volume has too, since this zone migration is surprising behaviour. |
Ok I can confirm the backups work from the zone-migrated volumes. Although cloned volumes do not work for the aforementioned issue. Seeing as how this issue was geared towards solving the AZ issue, rather than specifying cloned volumes I would say this is resolved. As an aside, I note that restored-to-volumes, do not get wiped on restore. This is my current restore as per the docs:
Is there any option in volSync to do this, or are there any established setups / patterns for doing so? @tesshuflower thanks for your help, it is much appreciated. |
@DreamingRaven thanks for the detailed information, this was an interesting one. Glad to hear that snapshots do seem to work for your use-case. Right now you'll get a new empty PVC if you provision a new one yourself rather than re-using, or use something like the volume populator to get a new PVC. There's a long discussion here about using the volume populator, in case the use-case mentioned is in any way similar to yours: #627 (comment) If your use-case is really about trying to synchronize data to a PVC on a remote cluster (i.e. a sync operation that you will run repeatedly at the destination), you could potentially look at using the rclone or rsync-tls movers. |
OK, I will have a look. I am creating a staging environment that I want to allow some drift, then after a period of time it should be wiped and set to the same state as production. Thanks for your time @tesshuflower, it sounds like the volume populator is exactly what I need with a separate cron deletion! Then ArgoCD will recreate the resource and re-pull the backup, returning the staging environment to a production-like state. |
Describe the bug
There is a misalignment of volumes being provisioned in multi-AZ clusters. This causes volsync job-pods to be unscheduleable.
On my non multi-AZ cluster, volsync pods are scheduled without incident, since both volumes will not have any AZ restriction for mounting. However, on my multi-AZ GKE cluster the two volumes for X-backup-cache, and X-backup-src cause X-backup job to stall, since the pod cannot be scheduled with:
9 node(s) had volume node affinity conflict.
, as the volumes are in different zones, so no node will satisfy the pods requirements.Steps to reproduce
Create a multi-AZ cluster in GKE.
Create any ReplicationSource resource, e.g:
This will likely then create owned resources like so:
The pod will likely be unable to mount both volumes, and as such is unscheduled permenantly.
Expected behavior
I would expect both provisioned PVs to be allocated to the same availability zone as the PV being backed up.
Actual results
Since GKE assigns zones randomly unless specified https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes#pd-zones you will end up with something like this:
When inspected, the two pvs created by volsync will look something like this:
Additional context
I can foresee a few ways to solve this issue:
So it is currently unclear to me how one would force the zones to match, unless you completely remove the multiple volumes.
The text was updated successfully, but these errors were encountered: