You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had a failure scenario where the pod was terminated but with a grace period waiting for the migration to complete. The networking on the pod had been shutdown but open connections were permitted to continue. The copier was continuing since the connections were all being re-used, but the checkpointing was failing.
This is a bit of a weird scenario, but I think the correct answer is that if the checkpoints can't be written then the task should probably also fail.
It won't affect us soon, as we will remove the code where the pod blocks waiting for a migration, and just let it terminate. But maybe it is worth still fixing?
The text was updated successfully, but these errors were encountered:
We had a failure scenario where the pod was terminated but with a grace period waiting for the migration to complete. The networking on the pod had been shutdown but open connections were permitted to continue. The copier was continuing since the connections were all being re-used, but the checkpointing was failing.
This is a bit of a weird scenario, but I think the correct answer is that if the checkpoints can't be written then the task should probably also fail.
It won't affect us soon, as we will remove the code where the pod blocks waiting for a migration, and just let it terminate. But maybe it is worth still fixing?
The text was updated successfully, but these errors were encountered: