Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail DDL if checkpoint can't be written #197

Open
morgo opened this issue Sep 20, 2023 · 0 comments
Open

Fail DDL if checkpoint can't be written #197

morgo opened this issue Sep 20, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@morgo
Copy link
Collaborator

morgo commented Sep 20, 2023

We had a failure scenario where the pod was terminated but with a grace period waiting for the migration to complete. The networking on the pod had been shutdown but open connections were permitted to continue. The copier was continuing since the connections were all being re-used, but the checkpointing was failing.

This is a bit of a weird scenario, but I think the correct answer is that if the checkpoints can't be written then the task should probably also fail.

It won't affect us soon, as we will remove the code where the pod blocks waiting for a migration, and just let it terminate. But maybe it is worth still fixing?

@morgo morgo added the bug Something isn't working label Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant