Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stale smb unmount issue when smb file share is deleted #694

Merged
merged 2 commits into from
Nov 24, 2023

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Nov 22, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

fix: state smb unmount issue when smb file share is deleted and then unmount

related to fix: kubernetes/kubernetes#121851

there is behavior change from linux kernel 5.15.0-1051-azure (or earlier version), when smb file share is deleted, the upstream mount-utils depends on following logic to check whether smb mount is in stale state, while starting from the former kernel version, it returns resource temporarily unavailable error instead of ErrNotExist, thus this PR tries to add a new judgement in the mount-utils in IsCorruptedMnt func, here EWOULDBLOCK is resource temporarily unavailable error.

{11, "EWOULDBLOCK", "resource temporarily unavailable"},
kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = failed to unmount target /var/lib/kubelet/pods/bf4554ca-de3d-43e3-8d93-f35f14406e4f/volumes/kubernetes.io~csi/volume1/mount: Error checking path: stat /var/lib/kubelet/pods/bf4554ca-de3d-43e3-8d93-f35f14406e4f/volumes/kubernetes.io~csi/volume1/mount: resource temporarily unavailable
  • how to repro this issue
  1. mount smb file share using csi driver or subPath volume
  2. delete remote smb file share
  3. delete pod, and unmount would be stuck forever (pod in terminating state forever)
  • workaround
    force delete the pod

  • impact

This issue does not only break CSI drivers, it also breaks on pods with subPath smb volume, error msg is like following, this requires a patch version fix of kubelet since subPath unmount does not go through CSI driver.

Error: error cleaning subPath mounts for volume "volume1" (UniqueName: "kubernetes.io/csi/644d3846-709a-4165-8b13-c1003409d588-volume1") pod "644d3846-709a-4165-8b13-c1003409d588" (UID: "644d3846-709a-4165-8b13-c1003409d588") : error processing /var/lib/kubelet/pods/644d3846-709a-4165-8b13-c1003409d588/volume-subpaths/volume1/helloworld-mount-subpath: error cleaning subpath mount /var/lib/kubelet/pods/644d3846-709a-4165-8b13-c1003409d588/volume-subpaths/volume1/helloworld-mount-subpath/0: Error checking path: stat /var/lib/kubelet/pods/644d3846-709a-4165-8b13-c1003409d588/volume-subpaths/volume1/helloworld-mount-subpath/0: resource temporarily unavailable

Which issue(s) this PR fixes:

Fixes #693

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fix: statle smb mount issue when smb file share is deleted and then unmount

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

fix: statle smb mount issue when smb file share is deleted and then unmount

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 22, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2023
@andyzhangx andyzhangx merged commit e5467c9 into kubernetes-csi:master Nov 24, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods stuck in terminating or Init:0/2 state after fileshere went down and then up again
2 participants