-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GH-17122: memory leak in regex #17132
Conversation
Because the subpattern names are persistent, and the fact that the symbol table destruction is skipped when using fast_shutdown, this means the refcounts will not be updated for the destruction of the arrays that hold the subpattern name keys. To solve this, detect this situation and duplicate the strings.
Actually a thought just occurred to me. Since the subpattern names are destroyed during mshutdown we could just use zend_string_free instead of zend_string_release and then we don't have to do this dance... |
@nielsdos If I see this correctly, the subpatterns are actually never freed at runtime. So yes, this sounds like we just don't care about refcounting and can unconditionally release them later. I prefer that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ext/pcre/php_pcre.c
Outdated
@@ -506,10 +506,21 @@ static int pcre_clean_cache(zval *data, void *arg) | |||
/* }}} */ | |||
|
|||
static void free_subpats_table(zend_string **subpat_names, uint32_t num_subpats) { | |||
bool destroy = EG(flags) & EG_FLAGS_IN_SHUTDOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah I realised that this might not be safe.
Consider the following:
We are in a function called during shutdown (i.e. due to register_shutdown_function). Prior to the engine calling php_call_shutdown_functions()
, the EG_FLAGS_IN_SHUTDOWN
will already be set.
As a consequence, when during shutdown we prune an entry from the pcre cache table we will destroy the subpattern name permanently.
So if we have an array with the subpattern name in it, and then we make sure to prune the cache table, then we end up with a dangling subpattern name.
A solution to this would be to set a flag in ext/pcre's module shutdown that indicates we may destroy. This would be fairly simple and would avoid the above issue. WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bonus: and even that is not enough.
We still may leak memory if:
- We create an array in global scope by matching some regex and using subpattern names
- Prune the cache, we decrement the refcount of the subpattern name string to 1
- The global scope gets cleaned up, except the symbol table doesn't and we end up with a leak
Furthermore, if the final string release happens on hash table cleanup, we can break ZendMM because it gets cleaned up in zend_array_destroy
with zend_string_release_ex
.
The proper solution to all of this is change how the subpattern names are cached: make the subpattern name cache per-request cache, and make it lazily populated.
@iluuu1994 I've re-requested your review because of the 2 comments I posted outlaying some additional problems I realized. Sorry for the double work! |
@nielsdos No worries. I'll have another look in a day or two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense. I believe when I looked at this last, non-interned persistent strings were never exposed to userland to begin with.
Because the subpattern names are persistent, and the fact that the symbol table destruction is skipped when using fast_shutdown, this means the refcounts will not be updated for the destruction of the arrays that hold the subpattern name keys.
To solve this, detect this situation and duplicate the strings.New solution: see belowAn alternative is always destroying the symbol table, but that changes engine behaviour.