storage: consider improvements on-by-default jemalloc observability #139067
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-storage
Storage Team
In escalations like cockroachlabs/support#3153 where we require observability into jemalloc behavior, we're currently forced to ask users to collect additional jemalloc stats through setting vmodule flags. This back and forth adds latency to our investigations and frustrates users. There's probably something we can do to improve the observability we collect by default that can improve the situation. If I recall correctly, we currently attempt to notice an imminent oom death and dump a Go heap profile to the filesystem. Should we also dump detailed jemalloc stats alongside the heap profile? Are there stats that we can collect periodically without too much overhead.
Jira issue: CRDB-46489
The text was updated successfully, but these errors were encountered: