-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of newly created threads in initial seize code #255
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #255 +/- ##
==========================================
- Coverage 66.99% 66.81% -0.19%
==========================================
Files 99 99
Lines 10321 10348 +27
Branches 1709 1718 +9
==========================================
- Hits 6915 6914 -1
- Misses 2523 2549 +26
- Partials 883 885 +2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂️
if (ptrace(PTRACE_SINGLESTEP, pid, 0, 0) == -1) { | ||
LOG(ERROR) << "Error in singlestep!"; | ||
LOG(ERROR) << "singleStepInst: ptrace single error: " << strerror(errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be replaced with PLOG(ERROR) << "singleStepInst: ptrace single error: ";
similarly with the other cases that append strerror
.
@@ -64,7 +64,7 @@ class OIDebugger { | |||
OIDebugger::processTrapRet processTrap(pid_t, bool = true, bool = true); | |||
bool contTargetThread(bool detach = true) const; | |||
bool isGlobalDataProbeEnabled(void) const; | |||
static uint64_t singlestepInst(pid_t, struct user_regs_struct&); | |||
static uint64_t singleStepInst(pid_t, struct user_regs_struct&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this return uintptr_t
? it's casted to that at one of the call sites and seems to make more sense. obviously they're the same type though.
@@ -968,7 +980,7 @@ OIDebugger::processTrapRet OIDebugger::processTrap(pid_t pid, | |||
siginfo_t info; | |||
|
|||
auto stopsig = WSTOPSIG(status); | |||
VLOG(4) << "Stop sig: " << std::dec << stopsig; | |||
VLOG(3) << "Stop sig: " << std::dec << stopsig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this was for debugging, but shouldn't this remain a 4?
Summary
Our initial thread seize code is very racy and it's super easy to end up with threads being ignored or hung for long periods waiting for stop states to be reaped. This change aims to handle the case where a thread that has been seized executes a sequence such that a new thread is created (e.g., a clone(2) syscall and the newborn thread shows up in the /proc namespace so the directory_iterator picks it up. This new thread will be traced by us already as the tracing sate is inherited from its parent thread so we can't seize it. It will however be in a signal-stop state so that should be reaped and the thread continued.
Test plan
make 'test' returns the same failures as before the change. However, it is unlikely to show any change as this code is handling an
EPERM
return fromptrace(2)
. Test in prod really. I have engineered highly controlled test cases to work through the failure scenarios and at a minimum I don't think we are any more exposed to failure with this code than without.