Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GH-17122: memory leak in regex #17132

Closed
wants to merge 3 commits into from
Closed

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Dec 12, 2024

Because the subpattern names are persistent, and the fact that the symbol table destruction is skipped when using fast_shutdown, this means the refcounts will not be updated for the destruction of the arrays that hold the subpattern name keys.
To solve this, detect this situation and duplicate the strings.
An alternative is always destroying the symbol table, but that changes engine behaviour.
New solution: see below

@nielsdos nielsdos linked an issue Dec 12, 2024 that may be closed by this pull request
Because the subpattern names are persistent, and the fact that the
symbol table destruction is skipped when using fast_shutdown,
this means the refcounts will not be updated for the destruction of
the arrays that hold the subpattern name keys.
To solve this, detect this situation and duplicate the strings.
@nielsdos nielsdos marked this pull request as ready for review December 12, 2024 19:51
@nielsdos nielsdos requested a review from iluuu1994 December 12, 2024 19:57
@nielsdos
Copy link
Member Author

Actually a thought just occurred to me. Since the subpattern names are destroyed during mshutdown we could just use zend_string_free instead of zend_string_release and then we don't have to do this dance...

@bwoebi
Copy link
Member

bwoebi commented Dec 12, 2024

@nielsdos If I see this correctly, the subpatterns are actually never freed at runtime. So yes, this sounds like we just don't care about refcounting and can unconditionally release them later. I prefer that.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -506,10 +506,21 @@ static int pcre_clean_cache(zval *data, void *arg)
/* }}} */

static void free_subpats_table(zend_string **subpat_names, uint32_t num_subpats) {
bool destroy = EG(flags) & EG_FLAGS_IN_SHUTDOWN;
Copy link
Member Author

@nielsdos nielsdos Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah I realised that this might not be safe.
Consider the following:
We are in a function called during shutdown (i.e. due to register_shutdown_function). Prior to the engine calling php_call_shutdown_functions(), the EG_FLAGS_IN_SHUTDOWN will already be set.
As a consequence, when during shutdown we prune an entry from the pcre cache table we will destroy the subpattern name permanently.
So if we have an array with the subpattern name in it, and then we make sure to prune the cache table, then we end up with a dangling subpattern name.

A solution to this would be to set a flag in ext/pcre's module shutdown that indicates we may destroy. This would be fairly simple and would avoid the above issue. WDYT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus: and even that is not enough.
We still may leak memory if:

  1. We create an array in global scope by matching some regex and using subpattern names
  2. Prune the cache, we decrement the refcount of the subpattern name string to 1
  3. The global scope gets cleaned up, except the symbol table doesn't and we end up with a leak

Furthermore, if the final string release happens on hash table cleanup, we can break ZendMM because it gets cleaned up in zend_array_destroy with zend_string_release_ex.

The proper solution to all of this is change how the subpattern names are cached: make the subpattern name cache per-request cache, and make it lazily populated.

@nielsdos nielsdos requested a review from iluuu1994 December 21, 2024 18:00
@nielsdos
Copy link
Member Author

@iluuu1994 I've re-requested your review because of the 2 comments I posted outlaying some additional problems I realized. Sorry for the double work!

@iluuu1994
Copy link
Member

@nielsdos No worries. I'll have another look in a day or two.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. I believe when I looked at this last, non-interned persistent strings were never exposed to userland to begin with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

memory leak in regex
3 participants