TlsRandomNumberGenerator causes memory corruption when the process is forked. #2408
Labels
bug
Something isn't working
do-not-stale
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
I manage an open source project https://github.com/hpcc-systems/HPCC-Platform which we adding support to open telemetry using this library.
One of the components launches large numbers (100+) of child processes at the same time. Since adding the open telemetry library this is now the system to core when those child processes are started. Here is an example of a stack trace from gdb:
I believe the problem is caused by the following code:
where an onFork handler was added inside the random number generator made in 2018.
There are only a very restricted set of functions that are valid to be called at that point within the child process (https://man7.org/linux/man-pages/man3/pthread_atfork.3.html): they must be async-signal-safe, and not use any heap functions because that can cause memory corruption. This onFork call does not obey those restrictions. It also performs unnecessary work whenever you create a child process (I suspect it now takes a long time to start the child processes because of the need to wait for sufficient entropy from the random number generator).
I believe the fix is to delete that call to onFork, and re-examine why that change was made. In particular, why would a process ever call the open telemetry functions inside the child of a fork()?
The text was updated successfully, but these errors were encountered: