-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: scoreboard: failed to lock (already locked) #13437
Comments
I just checked and this happens when scoreboard is already used by other process. This could happen due to holding lock longer (there was potentially one such change) but a bit before I think. What version was your previous version that you updated from? Another thing that could play role is high number of children configured. Could you provide your fpm config? |
I have upgraded from 8.1.18-fpm-bullseye to 8.2.14-fpm-bookworm FPM config:
|
Could you try to increase (significantly) or remove |
You should generally not need this and it seems it didn't work correctly before 8.2 (we had couple of reports regarding to that) - it means it didn't restart correctly which was somehow fixed in 8.2 |
I have removed pm.max_requests option, but error is still present even on the newest 8.2.17-fpm-bookworm.
|
I have been thinking about this and looking through the code / changes. I don't see any specific FPM change that should impact locking. Last fix in that area was also merged to 8.1. What has got most likely biggest impact on scoreboard locking is probably status collection. Do you have some external collector for status? If so, how often does it run? If often, try to decrease it or temporarily disable it just to see if it helps. Otherwise there are some other factors that can be the cause. Could you also try it with minimal config just to see if it helps and then try to potentially increase pm.max_children (up to 100) if that still happening:
If it disappears, try to add back the removed options one by one and see when it happens again |
@bukka thank you for looking into the issue. I use healthcheck which periodically calls /status endpoint: It's set to 10s. Let me test the settings and I will let you know if it is of any help. |
@jkabat I'm using the same healthcheck also with an interval set to |
If you're running this in a k8s pod this is probably caused by your liveness and readyness probes being executed at the same exact moment. |
@realcnbs yes I have in k8s pods - what would be the solution? Do you have any? |
Play with probes delays and periods so they have a couple of seconds between them. |
But then still, it should be able to handle both at once right? My logs are being spammed with this (300 times last 24 hours). |
It should, but apparently it doesn't. |
I tried 5 seconds between readiness and liveness and I am still hitting the same issue. I also tried to reproduce it locally without any luck :/ |
It would be useful to get more info about the setup (including k8s deployment spec and possibly a bit info about cluster) and traffic where this happen. I'm building a new testing tool https://github.com/wstool/wst/ that will allow me to better test higher traffic issues including k8s setups so more info I get, better chance of recreating that I have... |
I will send our dockerfile tomorrow. In general it is not really related to high traffic if im correct |
I can provide k8s setup too, agree with @pimjansen that it is not related to high traffic |
@bukka here is a gist for the Dockerfile and related config. If im correct there is not much "special" in this usecase except maybe the healthcheck: https://gist.github.com/pimjansen/5bc6b3222cf26f505b608207cf350e31 EDIT:
|
Try to disable one of the probes and see if that fixes it. If it does then 5 seconds is not enough to guarantee the time gap. As an option you can try using startupProbe insted of readiness. |
When I have commented out readiness probe, it really helped... EDIT: as per @realcnbs suggestion I have replaced readiness probe with startupProbe and I saw only one case when it happened so far. Not sure its safe to get rid of readinessProbe, will keep it as temporary solution until its fixed. Thank you guys. |
After upgrading from PHP7.4.33 to PHP8.0.30, I also started noticing such messages in the logs:
PHP is installed from the repository https://packages.sury.org/php/ and runs in a container. The config file looks something like this:
Metrics are collected using:
PHP Version8.0.30 Operating SystemDebian 11 (bullseye) |
i think the proper solution here is to fix the locking issue, this is also reproducible by spamming the fpm process with multiple concurrent
requsts. you'll eventually end up with the error described here:
also i think readinessprobe should just use |
Seems fair but wouldnt it be good for the php implementation for fpm in docker to provide this out of the box? Would reduce these errrors, better control and better user experience since it is already embedded |
This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.
This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.
So I finally managed to reliably recreate issue with my new tool called WST funded by STF (Soverign tech fund) that allows me to spin multiple services and test various things in its pipeline - specifically allowing to run load test and checking the logs. The actual test for this can be seen here. After that I looked properly into the scoreboard locking and found couple of issues. I have got an initial fix in #15805 after which I no longer see the locking error but it might be changing a bit too much so not sure if I should maybe go for lighter version for PHP 8.2 and do the rest in master as it might be a bit risky. |
The same with php 8.3.10 and using php-fpm-healthcheck with k8s |
This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.
This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read. Closes phpGH-15805
This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read. Closes phpGH-15805
@bukka i see this as patched which is great to start with. Do you know how far this is backported? |
@pimjansen It will be part of 8.3.16 and 8.4.3 . 8.2 branch is already closed for normal bug fixes and only accepts security fixes. |
Fair! Thanks for quick reply |
@bukka thanks for investing so much time to resolve the issue!!! |
Description
After the recent upgrade of application containers to PHP 8.2.14 I see plenty of the error messages in the log:
Is there something I can do to prevent it from happening? End users seem not to be affected as far as I can tell. These log messages are written randomly and only for the PHP-FPM containers. Long running PHP-CLI commands operating under the Supervisor do not have these issues.
opcode: enabled
jit: disabled
Edit: worth to mention that errors randomly occurs even on the container which is barely used during of waiting period.
PHP Version
PHP 8.2.14
Operating System
Debian GNU/Linux 12 (bookworm)
The text was updated successfully, but these errors were encountered: