Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substantial memory leak on aarch64 and x86_64 architectures #1946

Open
kseistrup opened this issue Jan 9, 2024 · 1 comment
Open

Substantial memory leak on aarch64 and x86_64 architectures #1946

kseistrup opened this issue Jan 9, 2024 · 1 comment
Milestone

Comments

@kseistrup
Copy link

TL;DR: Profanity ate 1 GB swap in less than 15 days

I have profanity running on a RPi4 (aarch64 architecture) that is running ArchLinuxARM. Profanity receives notifications from e.g. cron jobs and casual interpersonal chat, altogether less than a couple of hundred messages daily — some days substantially less.

The RPi4 has 4 GB RAM + 1 GB swap. Usually no swap is used at all, so at first I didn't notice when free -tk reported that used and free swap had swapped places, so to speak: 100% used swap and 0% free. This was when the RPi4 and profanity had been up for slightly less than 15 days.

A ps ux revealed that the profanity process had gobbled up 2.5+ GB RAM, including all of the available 1 GB swap. Taking profanity down released all of the used swap, except ~40 MB.

Expected Behavior

Ideally, profanity should not keep gobbling up RAM.

Current Behavior

Launching profanity again, this is what I noticed (numbers taken from ps ux):

HH:MM     VSZ     RSS
-----··------··------
??:??  188208  163968
18:58  249260  225152
21:42  286228  262016
07:24  346020  321792

HH:MM is wallclock hours and minutes.

I tried the same thing on a x86_64 desktop pc, running ArchLinux:

HH:MM     VSZ     RSS
-----··------··------
??:??   55120   33976
18:13  280880   40264
22:13  316360   77128
07:21  360808  121544

Possible Solution

I haven't got a clue.

Steps to Reproduce (for bugs)

  1. Launch profanity in a terminal emulator
  2. Regularly run e.g. ps ux | rg pro[f]anity to see its increasing memory consumption

Context

N/A

Environment

  • Give us the version and build information output generated by profanity -v
$ profanity -v
Profanity, version 0.14.0
Copyright (C) 2012 - 2019 James Booth <[email protected]>.
Copyright (C) 2019 - 2023 Michael Vetter <[email protected]>.
License GPLv3+: GNU GPL version 3 or later <https://www.gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Build information:
XMPP library: libstrophe
Desktop notification support: Enabled
OTR support: Enabled (libotr 4.1.1)
PGP support: Enabled (libgpgme 1.23.2)
OMEMO support: Enabled
C plugins: Enabled
Python plugins: Enabled (3.11.6)
GTK icons/clipboard: Disabled
GDK Pixbuf: Enabled
  • If you could not yet build profanity, mention the revision you try to build from
  • Operating System/Distribution: ArchLinux + ArchLinuxARM
  • curl: 8.5.0
  • expat: 2.5.0
  • glib2: 2.78.3
  • glibc: 2.38
  • libassuan: 2.5.6
  • libgcrypt: 1.10.3
  • libgpg-error: 1.47
  • libgpgme: 1.23.2
  • libnotify: 0.8.3
  • libotr: 4.1.1
  • libsignal-protocol-c: 2.3.3
  • libstrophe: 0.12.3
  • ncurses: 6.4_20230520
  • python: 3.11.6
  • readline: 8.2.007

Options used when building the package can be seen here:

Cheers.

@jubalh jubalh added this to the next milestone Jan 10, 2024
@jubalh jubalh mentioned this issue Jun 19, 2024
@jubalh
Copy link
Member

jubalh commented Jun 25, 2024

I run valgrind like this: G_DEBUG=gc-friendly G_SLICE=always-malloc valgrind --tool=memcheck --track-origins=yes --leak-check=full --leak-resolution=high --num-callers=30 --log-file=profval --show-reachable=yes --suppressions=prof.supp ./profanity

The output:
profanity-valgrind-output.txt

UI buffer

Probably the most frequent one:

==23236== 1 bytes in 1 blocks are still reachable in loss record 1 of 3,077
==23236==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23236==    by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23236==    by 0x48A2B2: _create_entry (buffer.c:201)
==23236==    by 0x489E95: _buffer_add (buffer.c:107)
==23236==    by 0x489DDF: buffer_append (buffer.c:95)
==23236==    by 0x468067: win_println_va_internal (window.c:1539)
==23236==    by 0x46826F: win_println (window.c:1565)
==23236==    by 0x47C421: _cons_splash_logo (console.c:2706)
==23236==    by 0x475565: cons_about (console.c:388)
==23236==    by 0x46A622: ui_init (core.c:108)
==23236==    by 0x431BDD: _init (profanity.c:193)
==23236==    by 0x431966: prof_run (profanity.c:98)
==23236==    by 0x4EDAFC: main (main.c:174)

profanity/src/ui/buffer.c

Lines 201 to 205 in beeddda

e->display_from = STRDUP_OR_NULL(display_from);
e->from_jid = STRDUP_OR_NULL(from_jid);
e->message = STRDUP_OR_NULL(message);
e->receipt = receipt;
e->id = STRDUP_OR_NULL(id);

Should later get freed by buffer_free() in which we call g_slist_free_full(buffer->entries, (GDestroyNotify)_free_entry);.
The buffer/window/ui code is a bit hard to read/follow so I'm not yet 100% sure if we call the free function in all scenarios. And if not in one case we are assigning again. But lets not that is using glib data container function.

Maybe we have a mistake here so the more we "write" the more the needed memory will grow because somehow we don't free.

autocomplete

==23236== 3 bytes in 1 blocks are still reachable in loss record 64 of 3,077
==23236==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23236==    by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23236==    by 0x4C5C33: autocomplete_add (autocomplete.c:175)
==23236==    by 0x4B3FD4: cmd_ac_init (cmd_ac.c:529)
==23236==    by 0x4966E4: cmd_init (cmd_defs.c:2851)
==23236==    by 0x431C3D: _init (profanity.c:200)
==23236==    by 0x431966: prof_run (profanity.c:98)
==23236==    by 0x4EDAFC: main (main.c:174)

void
autocomplete_add(Autocomplete ac, const char* item)
{
if (ac) {
char* item_cpy;
GList* curr = g_list_find_custom(ac->items, item, (GCompareFunc)strcmp);
// if item already exists
if (curr) {
return;
}
item_cpy = strdup(item);
ac->items = g_list_insert_sorted(ac->items, item_cpy, (GCompareFunc)strcmp);
}
}

In line 175 we again use strdup() and add it to a glib container.
It is freed via:

g_list_free_full(ac->items, free);

When reading the surrounding code it looks to me like the free function will be called in appropriate places. Again its a glib container.

ncurses

==23248== 9 bytes in 1 blocks are still reachable in loss record 729 of 3,099
==23248==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248==    by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23248==    by 0x5127295: ??? (in /usr/lib64/libtinfo.so.6.5)
==23248==    by 0x512BC4F: _nc_tiparm (in /usr/lib64/libtinfo.so.6.5)
==23248==    by 0x50F55A4: newterm_sp (in /usr/lib64/libncursesw.so.6.5)
==23248==    by 0x50F60A5: newterm (in /usr/lib64/libncursesw.so.6.5)
==23248==    by 0x50ED356: initscr (in /usr/lib64/libncursesw.so.6.5)
==23248==    by 0x46A5BD: ui_init (core.c:95)
==23248==    by 0x431BDD: _init (profanity.c:193)
==23248==    by 0x431966: prof_run (profanity.c:98)
==23248==    by 0x4EDAFC: main (main.c:174)

Which is confusing. We use the same setup/teardown for ncurses like https://gist.github.com/DesmondHayes/0c80af2654fc3ec602f9 for example.

I went on to get the ncurses tarball and see if they ship examples. They have a couple of tests and I ran valgrind on test/firstlast which gave me leaks as well?
Still need to follow up on this post.

libstrophe / ssl

==23248== 9 bytes in 1 blocks are still reachable in loss record 735 of 3,099
==23248==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248==    by 0x5929EA5: CRYPTO_strdup (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x589602C: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x5896143: CONF_module_add (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x58961A1: OPENSSL_load_builtin_modules (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x58961F8: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x52F9E7E: __pthread_once_slow (in /usr/lib64/libc.so.6)
==23248==    by 0x59341B8: CRYPTO_THREAD_run_once (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x5896466: CONF_modules_load (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x5896BE9: CONF_modules_load_file_ex (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x592CAB3: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x52F9E7E: __pthread_once_slow (in /usr/lib64/libc.so.6)
==23248==    by 0x59341B8: CRYPTO_THREAD_run_once (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x59301F5: OPENSSL_init_crypto (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248==    by 0x56C7340: OPENSSL_init_ssl (in /usr/lib64/glibc-hwcaps/x86-64-v3/libssl.so.3.1.4)
==23248==    by 0x5159D4F: xmpp_initialize (in /usr/lib64/libstrophe.so.0.13.1)
==23248==    by 0x439631: connection_init (connection.c:131)
==23248==    by 0x438514: session_init (session.c:106)
==23248==    by 0x431C38: _init (profanity.c:199)
==23248==    by 0x431966: prof_run (profanity.c:98)
==23248==    by 0x4EDAFC: main (main.c:174)
==23248== 16 bytes in 1 blocks are still reachable in loss record 1,075 of 3,099
==23248==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248==    by 0x43958A: _xmalloc (connection.c:102)
==23248==    by 0x515DBD9: xmpp_conn_new (in /usr/lib64/libstrophe.so.0.13.1)
==23248==    by 0x43974E: connection_init (connection.c:153)
==23248==    by 0x438514: session_init (session.c:106)
==23248==    by 0x431C38: _init (profanity.c:199)
==23248==    by 0x431966: prof_run (profanity.c:98)
==23248==    by 0x4EDAFC: main (main.c:174)

Something for @sjaeckel ?

theme

==23248== 18 bytes in 1 blocks are still reachable in loss record 1,092 of 3,099
==23248==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248==    by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23248==    by 0x4D0C0E: theme_init (theme.c:84)
==23248==    by 0x431BCC: _init (profanity.c:190)
==23248==    by 0x431966: prof_run (profanity.c:98)
==23248==    by 0x4EDAFC: main (main.c:174)

defaults = g_hash_table_new_full(g_str_hash, g_str_equal, free, free);
// Set default colors
g_hash_table_insert(defaults, strdup("main.text"), strdup("default"));
g_hash_table_insert(defaults, strdup("main.text.history"), strdup("default"));

Again a glib container and we pass the relevant free functions to g_hash_table_new_full()

readline

==23248== 228 bytes in 140 blocks are still reachable in loss record 2,918 of 3,099
==23248==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248==    by 0x50BA488: xmalloc (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x5098534: rl_macro_bind (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50A8C3B: rl_parse_and_bind (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50A8DDC: ??? (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50A9099: ??? (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50A8A22: rl_parse_and_bind (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50A8DDC: ??? (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50A8F88: rl_read_init_file (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50A2CA9: rl_initialize (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x50B5D78: ??? (in /usr/lib64/libreadline.so.8.2)
==23248==    by 0x472625: create_input_window (inputwin.c:163)
==23248==    by 0x46A613: ui_init (core.c:105)
==23248==    by 0x431BDD: _init (profanity.c:193)
==23248==    by 0x431966: prof_run (profanity.c:98)
==23248==    by 0x4EDAFC: main (main.c:174)

Again I can't see where we use readline the wrong way.

omemo

==26071== 10,136 bytes in 1 blocks are still reachable in loss record 765 of 779
==26071==    at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26071==    by 0x43958A: _xmalloc (connection.c:102)
==26071==    by 0x515AD01: ??? (in /usr/lib64/libstrophe.so.0.13.1)
==26071==    by 0x515AE93: xmpp_send_raw_string (in /usr/lib64/libstrophe.so.0.13.1)
==26071==    by 0x441F0E: iq_send_stanza (iq.c:2572)
==26071==    by 0x4EBD20: omemo_bundle_publish (omemo.c:155)
==26071==    by 0x4E5C07: omemo_publish_crypto_materials (omemo.c:339)                                                
==26071==    by 0x460407: sv_ev_connection_features_received (server_events.c:202)
==26071==    by 0x43AA87: connection_features_received (connection.c:779)
==26071==    by 0x4418C9: _disco_info_response_id_handler_onconnect (iq.c:2423)
==26071==    by 0x43B9F1: _iq_handler (iq.c:241) 
==26071==    by 0x5163848: ??? (in /usr/lib64/libstrophe.so.0.13.1)
==26071==    by 0x516A224: ??? (in /usr/lib64/libstrophe.so.0.13.1)
==26071==    by 0x5E4FE43: ??? (in /usr/lib64/libxml2.so.2.12.8)
==26071==    by 0x5E54927: xmlParseChunk (in /usr/lib64/libxml2.so.2.12.8)
==26071==    by 0x5163450: xmpp_run_once (in /usr/lib64/libstrophe.so.0.13.1)
==26071==    by 0x439797: connection_check_events (connection.c:162)
==26071==    by 0x43894E: session_process_events (session.c:256)
==26071==    by 0x4319FF: prof_run (profanity.c:128)
==26071==    by 0x4EDAFC: main (main.c:174)

pgp

==26119== 32 bytes in 1 blocks are still reachable in loss record 2,541 of 3,077
==26119==    at 0x484CA2F: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26119==    by 0x4B2D1D5: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119==    by 0x4B10DCE: ??? (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119==    by 0x4B11C35: g_hash_table_new_full (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119==    by 0x4E1A1C: p_gpg_init (gpg.c:121)                                                                      
==26119==    by 0x431C60: _init (profanity.c:209)
==26119==    by 0x431966: prof_run (profanity.c:98)
==26119==    by 0x4EDAFC: main (main.c:174)

Allocated:

pubkeys = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, (GDestroyNotify)_p_gpg_free_pubkeyid);

And freed:

profanity/src/pgp/gpg.c

Lines 131 to 137 in beeddda

void
p_gpg_close(void)
{
if (pubkeys) {
g_hash_table_destroy(pubkeys);
pubkeys = NULL;
}

Again glib container

muc

==26119== 32 bytes in 1 blocks are still reachable in loss record 2,539 of 3,077
==26119==    at 0x484CA2F: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26119==    by 0x4B2D1D5: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119==    by 0x4B10DCE: ??? (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119==    by 0x4B11C35: g_hash_table_new_full (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119==    by 0x4323BE: muc_init (muc.c:100)
==26119==    by 0x431C51: _init (profanity.c:202)
==26119==    by 0x431966: prof_run (profanity.c:98)
==26119==    by 0x4EDAFC: main (main.c:174)

profanity/src/xmpp/muc.c

Lines 94 to 101 in beeddda

void
muc_init(void)
{
invite_ac = autocomplete_new();
confservers_ac = autocomplete_new();
rooms = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, (GDestroyNotify)_free_room);
invite_passwords = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, g_free);
}

Again glib container with g_hash_table_new_full.

That's enough for now. Lets try to fix those and then continue.
What I'm wondering: do we really have so many mistakes in the code where we don't call the free functions? Or are we using the glib containers the wrong way and thus those free functions are never called? Does glib have memleaks (lets create a test program)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants