ERROR: scoreboard: failed to lock (already locked) #13437

jkabat · 2024-02-20T05:50:05Z

Description

After the recent upgrade of application containers to PHP 8.2.14 I see plenty of the error messages in the log:

ERROR: scoreboard: failed to lock (already locked)
ERROR: status: unable to find or access status shared

Is there something I can do to prevent it from happening? End users seem not to be affected as far as I can tell. These log messages are written randomly and only for the PHP-FPM containers. Long running PHP-CLI commands operating under the Supervisor do not have these issues.

opcode: enabled
jit: disabled

Edit: worth to mention that errors randomly occurs even on the container which is barely used during of waiting period.

PHP Version

PHP 8.2.14

Operating System

Debian GNU/Linux 12 (bookworm)

The text was updated successfully, but these errors were encountered:

bukka · 2024-02-22T12:02:48Z

I just checked and this happens when scoreboard is already used by other process. This could happen due to holding lock longer (there was potentially one such change) but a bit before I think. What version was your previous version that you updated from?

Another thing that could play role is high number of children configured. Could you provide your fpm config?

jkabat · 2024-02-22T17:20:08Z

I have upgraded from 8.1.18-fpm-bullseye to 8.2.14-fpm-bookworm

FPM config:

[global]
daemonize = no

log_level = notice

emergency_restart_threshold = 3
emergency_restart_interval = 1m
process_control_timeout = 10s

[www]
access.log = /dev/null

listen = 9000

request_terminate_timeout = 180s

pm = static
pm.max_children = 10
pm.max_requests = 200
pm.process_idle_timeout = 10s

pm.status_path = /status

bukka · 2024-03-08T17:31:38Z

Could you try to increase (significantly) or remove pm.max_requests and see if that helps?

bukka · 2024-03-08T17:33:35Z

You should generally not need this and it seems it didn't work correctly before 8.2 (we had couple of reports regarding to that) - it means it didn't restart correctly which was somehow fixed in 8.2

jkabat · 2024-03-18T12:15:02Z

I have removed pm.max_requests option, but error is still present even on the newest 8.2.17-fpm-bookworm.
EDIT: could it have something to do with OPCACHE enabled? but still it happens only now and then.

Used memory | 83587760
Free memory | 84184400

bukka · 2024-03-21T18:36:02Z

I have been thinking about this and looking through the code / changes. I don't see any specific FPM change that should impact locking. Last fix in that area was also merged to 8.1.

What has got most likely biggest impact on scoreboard locking is probably status collection. Do you have some external collector for status? If so, how often does it run? If often, try to decrease it or temporarily disable it just to see if it helps.

Otherwise there are some other factors that can be the cause. Could you also try it with minimal config just to see if it helps and then try to potentially increase pm.max_children (up to 100) if that still happening:

[global]
daemonize = no
log_level = notice

[www]
listen = 9000

pm = static
pm.max_children = 10

pm.status_path = /status

If it disappears, try to add back the removed options one by one and see when it happens again

jkabat · 2024-03-26T10:16:22Z

@bukka thank you for looking into the issue.

I use healthcheck which periodically calls /status endpoint:
https://raw.githubusercontent.com/renatomefi/php-fpm-healthcheck/master/php-fpm-healthcheck

It's set to 10s. Let me test the settings and I will let you know if it is of any help.

ToshY · 2024-04-18T08:30:01Z

@jkabat I'm using the same healthcheck also with an interval set to 10s, getting the same messages in log. Have you been able to resolve this?

realcnbs · 2024-06-12T18:51:09Z

If you're running this in a k8s pod this is probably caused by your liveness and readyness probes being executed at the same exact moment.

jkabat · 2024-07-11T18:22:28Z

@realcnbs yes I have in k8s pods - what would be the solution? Do you have any?

realcnbs · 2024-07-11T18:26:54Z

Play with probes delays and periods so they have a couple of seconds between them.

pimjansen · 2024-07-11T19:29:42Z

But then still, it should be able to handle both at once right? My logs are being spammed with this (300 times last 24 hours).

realcnbs · 2024-07-11T19:30:56Z

It should, but apparently it doesn't.

adamkral12 · 2024-07-18T12:33:58Z

I tried 5 seconds between readiness and liveness and I am still hitting the same issue. I also tried to reproduce it locally without any luck :/

bukka · 2024-07-21T18:30:31Z

It would be useful to get more info about the setup (including k8s deployment spec and possibly a bit info about cluster) and traffic where this happen. I'm building a new testing tool https://github.com/wstool/wst/ that will allow me to better test higher traffic issues including k8s setups so more info I get, better chance of recreating that I have...

pimjansen · 2024-07-21T20:01:39Z

It would be useful to get more info about the setup (including k8s deployment spec and possibly a bit info about cluster) and traffic where this happen. I'm building a new testing tool https://github.com/wstool/wst/ that will allow me to better test higher traffic issues including k8s setups so more info I get, better chance of recreating that I have...

I will send our dockerfile tomorrow. In general it is not really related to high traffic if im correct

jkabat · 2024-07-22T07:27:37Z

I can provide k8s setup too, agree with @pimjansen that it is not related to high traffic

pimjansen · 2024-07-22T08:39:00Z

@bukka here is a gist for the Dockerfile and related config. If im correct there is not much "special" in this usecase except maybe the healthcheck:

https://gist.github.com/pimjansen/5bc6b3222cf26f505b608207cf350e31

EDIT:
And some deployment probing:

          livenessProbe:
            exec:
              command:
                - php-fpm-healthcheck
                - --listen-queue=10
            initialDelaySeconds: 0
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - php-fpm-healthcheck
            initialDelaySeconds: 1
            periodSeconds: 5

realcnbs · 2024-07-22T13:36:14Z

I tried 5 seconds between readiness and liveness and I am still hitting the same issue. I also tried to reproduce it locally without any luck :/

Try to disable one of the probes and see if that fixes it. If it does then 5 seconds is not enough to guarantee the time gap. As an option you can try using startupProbe insted of readiness.

jkabat · 2024-07-22T16:36:37Z

I tried 5 seconds between readiness and liveness and I am still hitting the same issue. I also tried to reproduce it locally without any luck :/

Try to disable one of the probes and see if that fixes it. If it odes then 5 seconds is not enough to guarantee the time gap. As an option you can try using startupProbe insted of readiness.

When I have commented out readiness probe, it really helped...

EDIT: as per @realcnbs suggestion I have replaced readiness probe with startupProbe and I saw only one case when it happened so far. Not sure its safe to get rid of readinessProbe, will keep it as temporary solution until its fixed. Thank you guys.

karabanov · 2024-08-12T07:12:28Z

After upgrading from PHP7.4.33 to PHP8.0.30, I also started noticing such messages in the logs:

[11-Aug-2024 17:13:42] WARNING: [pool big_app_prod] child 667 said into stderr: "ERROR: scoreboard: failed to lock (already locked)"
[11-Aug-2024 17:13:42] WARNING: [pool big_app_prod] child 667 said into stderr: "ERROR: status: unable to find or access status shared memory"
[11-Aug-2024 18:12:01] NOTICE: error log file re-opened
[11-Aug-2024 18:57:01] WARNING: [pool big_app_prod] child 461 said into stderr: "ERROR: scoreboard: failed to lock (already locked)"
[11-Aug-2024 18:57:01] WARNING: [pool big_app_prod] child 461 said into stderr: "ERROR: status: unable to find or access status shared memory"

PHP is installed from the repository https://packages.sury.org/php/ and runs in a container.

The config file looks something like this:

[big_app_prod]
; Standard
listen = /www/php_sockets/php-fpm-docker-big_app_prod.sock
listen.backlog = 4096
listen.group = www-data
listen.mode = 0660
rlimit_files = 10240
chdir = /
catch_workers_output = yes
security.limit_extensions = .php
 
; Deployment-specific
pm = static
pm.max_children = 1500
pm.status_path = /fpm-status

php_admin_flag[log_errors] = on
php_admin_flag[display_errors] = off
php_admin_value[error_log] = /var/log/php/$pool/$pool.log
php_admin_value[sendmail_path] = /usr/bin/msmtp -C /etc/msmtprc -t --read-envelope-from

; Project-specific
php_admin_value[precision] = 12
php_admin_flag[ignore_user_abort] = on
php_admin_flag[expose_php] = off
php_admin_value[max_input_nesting_level] = 8
php_admin_value[max_execution_time] = 600
php_admin_value[memory_limit] = 300M
php_admin_value[error_reporting] = E_ALL & ~E_NOTICE & ~E_DEPRECATED & ~E_STRICT
php_admin_flag[html_errors] = off
php_admin_value[post_max_size] = 32M
php_admin_value[upload_max_filesize] = 32M
php_admin_flag[file_uploads] = True
php_admin_value[cgi.fix_pathinfo] = 0
php_admin_value[session.name] = big_app_prod
php_admin_value[session.save_handler] = redis
php_admin_value[session.save_path] = "tcp://127.0.0.1:6379?timeout=10&prefix=big_app_prod_SESS_"
php_admin_value[session.cookie_httponly] = On
php_admin_value[apc.shm_size] = 2048M
php_admin_value[max_input_vars] = 1000
php_admin_value[session.gc_maxlifetime] = 1440

Metrics are collected using:

# https://github.com/Lusitaniae/phpfpm_exporter
prometheus-php-fpm-exporter \
    --web.listen-address=127.0.0.1:5584 \
    --phpfpm.socket-paths=/www/php_sockets/php-fpm-docker-big_app_prod.sock \
    --phpfpm.status-path=/fpm-status \
    --phpfpm.script-collector-paths=/var/lib/prometheus/php_opcache_exporter.php

PHP Version

8.0.30

Operating System

Debian 11 (bullseye)

verfriemelt-dot-org · 2024-08-21T08:41:02Z

i think the proper solution here is to fix the locking issue, this is also reproducible by spamming the fpm process with multiple concurrent

$ SCRIPT_NAME=/status SCRIPT_FILENAME=/status QUERY_STRING=full REQUEST_METHOD=GET cgi-fcgi -bind -connect 127.0.0.1:9000

requsts. you'll eventually end up with the error described here:

127.0.0.1 -  21/Aug/2024:08:36:16 +0000 "GET /status" 200
127.0.0.1 -  21/Aug/2024:08:36:16 +0000 "GET /status" 500
ERROR: scoreboard: failed to lock (already locked)
ERROR: status: unable to find or access status shared memory

also i think readinessprobe should just use /ping instead of the full /status ... the script mentioned here (https://raw.githubusercontent.com/renatomefi/php-fpm-healthcheck/master/php-fpm-healthcheck) should be adapted to use /ping for the pingmode, so without arguments to avoid this alltogether. this would be a viable workaround for the time being.

pimjansen · 2024-08-21T17:19:56Z

i think the proper solution here is to fix the locking issue, this is also reproducible by spamming the fpm process with multiple concurrent
$ SCRIPT_NAME=/status SCRIPT_FILENAME=/status QUERY_STRING=full REQUEST_METHOD=GET cgi-fcgi -bind -connect 127.0.0.1:9000
requsts. you'll eventually end up with the error described here:
127.0.0.1 -  21/Aug/2024:08:36:16 +0000 "GET /status" 200

127.0.0.1 -  21/Aug/2024:08:36:16 +0000 "GET /status" 500

ERROR: scoreboard: failed to lock (already locked)

ERROR: status: unable to find or access status shared memory
also i think readinessprobe should just use /ping instead of the full /status ... the script mentioned here (https://raw.githubusercontent.com/renatomefi/php-fpm-healthcheck/master/php-fpm-healthcheck) should be adapted to use /ping for the pingmode, so without arguments to avoid this alltogether. this would be a viable workaround for the time being.

Seems fair but wouldnt it be good for the php implementation for fpm in docker to provide this out of the box? Would reduce these errrors, better control and better user experience since it is already embedded

This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.

bukka · 2024-09-08T15:13:11Z

So I finally managed to reliably recreate issue with my new tool called WST funded by STF (Soverign tech fund) that allows me to spin multiple services and test various things in its pipeline - specifically allowing to run load test and checking the logs. The actual test for this can be seen here.

After that I looked properly into the scoreboard locking and found couple of issues. I have got an initial fix in #15805 after which I no longer see the locking error but it might be changing a bit too much so not sure if I should maybe go for lighter version for PHP 8.2 and do the rest in master as it might be a bit risky.

goffyara · 2024-09-11T13:45:29Z

The same with php 8.3.10 and using php-fpm-healthcheck with k8s

This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.

This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read. Closes phpGH-15805

pimjansen · 2024-12-14T14:22:50Z

@bukka i see this as patched which is great to start with. Do you know how far this is backported?

bukka · 2024-12-14T14:24:28Z

@pimjansen It will be part of 8.3.16 and 8.4.3 . 8.2 branch is already closed for normal bug fixes and only accepts security fixes.

pimjansen · 2024-12-14T17:50:46Z

@pimjansen It will be part of 8.3.16 and 8.4.3 . 8.2 branch is already closed for normal bug fixes and only accepts security fixes.

Fair! Thanks for quick reply

jkabat · 2024-12-19T11:26:11Z

@bukka thanks for investing so much time to resolve the issue!!!

jkabat added Bug Status: Needs Triage labels Feb 20, 2024

devnexen added SAPI: fpm and removed Status: Needs Triage labels Feb 20, 2024

devnexen assigned bukka Feb 20, 2024

verfriemelt-dot-org mentioned this issue Aug 21, 2024

implement /ping renatomefi/php-fpm-healthcheck#58

Draft

3 tasks

bukka added a commit to bukka/php-src that referenced this issue Sep 7, 2024

Fix phpGH-13437: FPM: ERROR: scoreboard: failed to lock (already locked)

2b35af6

bukka added a commit to bukka/php-src that referenced this issue Sep 8, 2024

Fix phpGH-13437: FPM: ERROR: scoreboard: failed to lock (already locked)

cf83248

bukka added a commit to bukka/php-src that referenced this issue Sep 8, 2024

Fix phpGH-13437: FPM: ERROR: scoreboard: failed to lock (already locked)

216d268

This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.

bukka added a commit to bukka/php-src that referenced this issue Sep 8, 2024

Fix phpGH-13437: FPM: ERROR: scoreboard: failed to lock (already locked)

0f647dc

This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.

bukka mentioned this issue Sep 8, 2024

Fix GH-13437: FPM: ERROR: scoreboard: failed to lock (already locked) #15805

Closed

bukka added a commit to bukka/php-src that referenced this issue Nov 23, 2024

Fix phpGH-13437: FPM: ERROR: scoreboard: failed to lock (already locked)

a584cc8

This changes locking for scoreboard to reduce contention between readers and adds retries for acquiring scoreboard for read.

bukka closed this as completed in 3490ac0 Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: scoreboard: failed to lock (already locked) #13437

ERROR: scoreboard: failed to lock (already locked) #13437

jkabat commented Feb 20, 2024 •

edited

Loading

bukka commented Feb 22, 2024

jkabat commented Feb 22, 2024

bukka commented Mar 8, 2024

bukka commented Mar 8, 2024

jkabat commented Mar 18, 2024 •

edited

Loading

bukka commented Mar 21, 2024

jkabat commented Mar 26, 2024

ToshY commented Apr 18, 2024

realcnbs commented Jun 12, 2024

jkabat commented Jul 11, 2024

realcnbs commented Jul 11, 2024

pimjansen commented Jul 11, 2024 •

edited

Loading

realcnbs commented Jul 11, 2024

adamkral12 commented Jul 18, 2024

bukka commented Jul 21, 2024

pimjansen commented Jul 21, 2024

jkabat commented Jul 22, 2024

pimjansen commented Jul 22, 2024 •

edited

Loading

realcnbs commented Jul 22, 2024 •

edited

Loading

jkabat commented Jul 22, 2024 •

edited

Loading

karabanov commented Aug 12, 2024

verfriemelt-dot-org commented Aug 21, 2024

pimjansen commented Aug 21, 2024

bukka commented Sep 8, 2024

goffyara commented Sep 11, 2024

pimjansen commented Dec 14, 2024

bukka commented Dec 14, 2024

pimjansen commented Dec 14, 2024

jkabat commented Dec 19, 2024

ERROR: scoreboard: failed to lock (already locked) #13437

ERROR: scoreboard: failed to lock (already locked) #13437

Comments

jkabat commented Feb 20, 2024 • edited Loading

Description

PHP Version

Operating System

bukka commented Feb 22, 2024

jkabat commented Feb 22, 2024

bukka commented Mar 8, 2024

bukka commented Mar 8, 2024

jkabat commented Mar 18, 2024 • edited Loading

bukka commented Mar 21, 2024

jkabat commented Mar 26, 2024

ToshY commented Apr 18, 2024

realcnbs commented Jun 12, 2024

jkabat commented Jul 11, 2024

realcnbs commented Jul 11, 2024

pimjansen commented Jul 11, 2024 • edited Loading

realcnbs commented Jul 11, 2024

adamkral12 commented Jul 18, 2024

bukka commented Jul 21, 2024

pimjansen commented Jul 21, 2024

jkabat commented Jul 22, 2024

pimjansen commented Jul 22, 2024 • edited Loading

realcnbs commented Jul 22, 2024 • edited Loading

jkabat commented Jul 22, 2024 • edited Loading

karabanov commented Aug 12, 2024

PHP Version

Operating System

verfriemelt-dot-org commented Aug 21, 2024

pimjansen commented Aug 21, 2024

bukka commented Sep 8, 2024

goffyara commented Sep 11, 2024

pimjansen commented Dec 14, 2024

bukka commented Dec 14, 2024

pimjansen commented Dec 14, 2024

jkabat commented Dec 19, 2024

jkabat commented Feb 20, 2024 •

edited

Loading

jkabat commented Mar 18, 2024 •

edited

Loading

pimjansen commented Jul 11, 2024 •

edited

Loading

pimjansen commented Jul 22, 2024 •

edited

Loading

realcnbs commented Jul 22, 2024 •

edited

Loading

jkabat commented Jul 22, 2024 •

edited

Loading