You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The interesting part is that the drone job has been actively removed:
RemoveReason = "via condor_rm (by user cobald)"
The time gap in the logs is explained by a crash due to #168 and subsequent service restart by Puppet.
So it seems that:
A drone is in DrainingState first, ResourceStatus.Running.
The COBalD/TARDIS service crashes (or is otherwise restarted).
Afterwards, the drone is found back in DrainingState, but the ResourceStatus is not defined / re-checked yet (note it is missing from the line in the log).
At that point the resource appears to be deleted (i.e. condor_rm) even though it is not fully drained yet.
I'm not sure from the code: How are drone state changes handled when the ResourceStatus is not yet defined after startup?
Do you understand what is happening?
I don't have a full understanding on what actually happens in this case, but I found this strange behaviour:
The interesting part is that the drone job has been actively removed:
The time gap in the logs is explained by a crash due to #168 and subsequent service restart by Puppet.
So it seems that:
DrainingState
first,ResourceStatus.Running
.DrainingState
, but theResourceStatus
is not defined / re-checked yet (note it is missing from the line in the log).condor_rm
) even though it is not fully drained yet.I'm not sure from the code: How are drone state changes handled when the
ResourceStatus
is not yet defined after startup?Do you understand what is happening?
(pinging also @wiene )
The text was updated successfully, but these errors were encountered: