-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
varnishstat reports "cold" VCLs after panic #3358
Comments
Do you get something from |
In the docker variant it says:
And same when using -QUIT on RHEL machine.
so -QUIT does not panic it, just coredump/signal-kills it. But, this bug report is not really about the panicing, rather than when a child process is restarted, varnishstats show "too much" info. |
Collecting the panic output could have helped us figure how you ended up in that situation, and maybe reproduce the bug. We'll probably have a look anyway. |
Good to know, was not aware of that. On the real machines I've enabled the possibility to dump core (ulimit) and are now waiting for next crash (it has happened on 6 of 6 machines so far, getting equal share of traffic, so I'd be surprised if it does not occur again). Will use that command as well whenever it happens! |
I think this might be related to #3362 |
Pretty sure this has a different root cause (fix for 3362 is coming), will look |
For some reason (not yet determined but not really related to this), we have had a panic in some varnish child processes. After this,
prometheus_varnish_exporter
started to choke on duplicate metrics, as it does not support different VCL instances present invarnishstat
.After some digging it was discovered that
varnishstat
, which normally only gives out metrics for warm VCLs, was reporting cold VCLs as well. Then it was noticed that the child process had paniced/been restarted.In docker below, the -QUIT is not really registered as a panic but just a
child process died (signal)
. On my real RHEL7 systems, a -QUIT triggers aChild process core dumped
and I think it wasChild process panic
.Reproduced in in 6.3.1, 6.3.2, 6.4.0 from CentOS7 RPMs, and as shown below, in latest varnish docker image.
Expected Behavior
Cool VCLs should not be reported in
varnishstat
after child panic, i.e. same behaviour as before panic.Current Behavior
Cool VCLs appear in
varnishstat
after a child panic has occured, but not before.Possible Solution
No idea.
Steps to Reproduce (for bugs)
Using varnish docker image:
as seen above, we have one cold and one warm VCL, and only the warm one is exposed in varnishstat.
Now, find the child pid and force a panic:
Varnishstat reports the cold VCL too
Context
After at least one reload + one panic has occured, prometheus_varnish_exporter fails. That itself is a non-perfect behaviour, tracked in jonnenauha/prometheus_varnish_exporter#57, but this still seems like a bug in Varnish.
On the good side, this made me caught the fact that varnish child had paniced.. Root cause for that TBD, enabled core dumps now.
Your Environment
The text was updated successfully, but these errors were encountered: