-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
varnish-exporter resource consumption issues #81
Comments
There has not been any activity to this issue in the last 14 days. It will automatically be closed after 7 more days. Remove the |
Sorry for the late response. The varnish exporter is a third-party binary that TBH I'm not particularly familiar with. From other prometheus exporters, I know that metrics churn (i.e. lots of new time series added over time) is often a problem and suspect a similar issue in this case (especially since backend names are updated/rotated frequently with most VCL configurations). Could you have a look at the metrics exported by the exporter (especially after it's been running a while and memory consumption has risen by a significant degree)? |
@martin-helmich I did a quick check and noticed the following two things: This already indicates that many labels get created by the exporter, as the backend names constantly update due to the config-reloader. I also noticed that So I guess the amount of labels could indeed be a problem here (not only for the exporter, but also for prometheus), why do you think? Can we as a solution maybe change the names that the config-reloader puts in the config to not be unique on every config reload? |
Yes -- memory consumption of the exporter will increase (approximately) linearly with the number of metrics. AFAIK, Prometheus itself is a bit more optimized for high metric churn than it used to be in the past.
IIRC, the I'm just going to dump some thoughts that come into my mind: A possible "solution" might be to re-use previous VCL config names (however, off the top of my head, I don't know if the However, Pod names will also change frequently and still cause metrics churn, even if the VCL names stay constant. This might (!) be solved by naming VCL backends deterministically (maybe just using indices, instead of their names):
I'll give this issue some more thought, but cannot make any promises as to when I'll find the time to actually put some work into this. In the meantime, any help is welcome. 🙂 |
From the looks of your logs, the VCL template is updated quite frequently, which seems to trigger a configuration reload. IIRC, template changes are determined by watching the respective file using inotify. Can you check if there are actual changes being made to said file? Alternatively, it might be worth adding a check to the controller to only trigger a config reload if the actual contents of the VCL template changed. |
@martin-helmich I assume that in the code, there should be a check for the kind of event The current implementation is: (https://github.com/mittwald/kube-httpcache/blob/master/pkg/watcher/template_watch.go#L17) func (t *fsnotifyTemplateWatcher) watch(updates chan []byte, errors chan error) {
for ev := range t.watcher.Events {
glog.V(6).Infof("observed %s event on %s", ev.String(), ev.Name)
content, err := ioutil.ReadFile(t.filename)
if err != nil {
glog.Warningf("error while reading file %s: %s", t.filename, err.Error())
errors <- err
continue
}
updates <- content
}
} So no matter what event, the file gets reloaded. This can cause issues if multiple processes access the file, and the access-time - for example - gets updated. Maybe it would make sense to add a check like: (from the if ev.Op&fsnotify.Write == fsnotify.Write {
log.Println("modified file:", ev.Name)
} For our case I switched to your poll implementation (by adding |
And regarding the metrics-churn, here is what I found out so far: The However, to get this running, we cloned your repo and changed https://github.com/mittwald/kube-httpcache/blob/master/pkg/controller/watch.go#L80 to configname := fmt.Sprintf("reload_%d", time.Now().UnixNano()) This will then correctly be discarded by the exporter: In addition to that, defining a lot of inline VCLs, can be memory consuming, as varnish keeps the compiled VCLs and never discards them. You can inspect this by running So I changed the run command of the STS container to: containers:
- name: varnish
image: registry.staffbase.com/private/kube-httpcache:0.0.3
imagePullPolicy: IfNotPresent
command:
- /bin/bash
- -cxe
- |
set -eo pipefail
while true; do
varnishadm vcl.list \
| tr -s ' ' \
| cut -d ' ' -f 4 \
| head -n -6 \
| while read in; do varnishadm vcl.discard "$in" &>/dev/null; done
sleep 60;
done &
./kube-httpcache \
-admin-addr=0.0.0.0 \
-admin-port=6083 \
-signaller-enable \
-signaller-port=8090 \
-frontend-watch \
-frontend-namespace=$(NAMESPACE) \
-frontend-service=varnish-headless \
-frontend-port=8080 \
-backend-watch \
-backend-namespace=$(NAMESPACE) \
-backend-service=frontend-cache-service \
-varnish-secret-file=/etc/varnish/k8s-secret/secret \
-varnish-vcl-template=/etc/varnish/tmpl/default.vcl.tmpl \
-varnish-vcl-template-poll \
-varnish-storage=malloc,256M
ports:
- containerPort: 8080
name: http
- containerPort: 8090
name: signaller Which launches a fork that discards every VCL, older than the most recent 5 VCLs. Let's see if those steps already resolve our resource issue. 🤞🏼 |
Hey Tim, thanks for your research in this matter -- that's tremendously helpful and much appreciated. 👏 I think renaming the VCL configs to match the exporter's automatic cleanup mechanism should be a possible change to make without any nasty side effects. And maybe we can also adjust the controller to regularly discard old configurations... 🤔 |
Awesome! I'm in to help you with the changes if you want 🚀 Just some last concerns before we can start: In case you still want to bundle the metrics-exporter in the docker image, we have to consider an issue that I came across: jonnenauha/prometheus_varnish_exporter#57 It has been fixed on their master (which is why I had to fork the repo and build an image for us by hand), so in case we make the change that the exporter will auto-discard the What is your suggested way for this? Extend the documentation of your service with a section on how to run it with the exporter as sidecar (and require a user to get their own image of its master), or build it from source in the dockerfile? Or do you even have an alternative option in mind? |
There has not been any activity to this issue in the last 14 days. It will automatically be closed after 7 more days. Remove the |
* Explicitly check for write events in fsnotify watcher (#81) * Add Create event to watcher check Co-authored-by: Martin Helmich <[email protected]> Co-authored-by: Martin Helmich <[email protected]>
Hey everyone! Since the helm option to include the exporter is publicly available, do you believe @martin-helmich it's possible to publish a new Docker image version that contains the exporter referenced in the For now, enabling the Warning Failed 0s (x3 over 23s) kubelet Error: failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "/exporter/prometheus_varnish_exporter": stat /exporter/prometheus_varnish_exporter: no such file or directory: unknown Indeed it's not in the image: docker run --entrypoint=bash -it quay.io/mittwald/kube-httpcache:stable
root@c14a649e7891:/# ls /
bin boot dev etc home kube-httpcache lib lib64 media mnt opt proc root run sbin srv sys tmp usr var Both the |
I wouldn't recommend Publishing the current state of the Docker Image due to the issues I mentioned above. Those should be addressed before releasing the cache + exporter combo into the wild, as I'd not consider the current implementation as production-ready. |
While I share your concerns in introducing a not ready feature in the Following, your earlier message, I wonder what could be the best way to move forward on this? |
@thylong I think findig a soulution to the question I had above, would enable us moving forward here:
Dou you have some thoughts on this? |
As stated by @timkante in issue mittwald#81, Varnish keeps all previous VCL versions after reloads. This increases the memory consumption of Varnish over time and can be deadly if the configuration gets repeatedly reloaded (because of autoscaling related behaviors for example). This commit intends to keep to 1 the list of VCLs in the history.
VCL reload naming fixIMHO, we should use your suggest fix of line [#80](https://github.com/mittwald/kube-httpcache/blob/master/pkg/controller/watch.go#L80 to) : configname := fmt.Sprintf("reload_%d", time.Now().UnixNano()) It keeps things simple, doesn't alter at all the current behavior of VCL list increasing over timeI tend to agree with @martin-helmich on the discard. Here is a PR suggestion a fix : #95 Prometheus_varnish_exporter outdated
I was about to suggest to reach out to the maintainer of prometheus_varnish_exporter to request a release that includes jonnenauha/prometheus_varnish_exporter#57but it seems that it has already been done a while ago with no success here... 😞 Second alternative option would be to fork ourselves the original repo and publish a release until the maintainer do so. Preferred solutionsIMHO, a If we don't pick this option, I tend to prefer building the exporter as it will ensure the WDYT @timkante @martin-helmich ? |
Hey @thylong, thanks for placing your thoughts on this ✌🏼 I agree with both fixes which target this project's source (the config renaming and wiring the VCL removal into the application code). Regarding the I think we should give this another shot, as the owner of the If there is no reaction I'd agree with you, preferring the mittwald exporter Fork - backed by a from-source build if the owners don't like this idea 💯 |
Hey all! 👋 Thanks for all your thoughts on this issue -- it took me a while to catch up on this one. 😉 Regarding the VCL naming fix (using
|
ADD --chown=varnish https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/${EXPORTER_VERSION}/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz /tmp | |
RUN cd /exporter && \ | |
tar -xzf /tmp/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz && \ | |
ln -sf /exporter/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64/prometheus_varnish_exporter prometheus_varnish_exporter |
It's only the Dockerfile used for Goreleaser that's missing the exporter binary (I'm going to have to refresh my memory on why we're using two different Dockerfile in the first place 🤔). I'll create I have created #96 as a follow-up issue for adjusting the Dockerfile accordingly (or better, see if we can get away with using a single Dockerfile instead of two separate ones).
🕺 We've got a new release: https://github.com/jonnenauha/prometheus_varnish_exporter/releases/tag/1.6.1 |
* Use varnishreload-compatible vcl names #81 (comment) * Honor varnishd max_vcl https://varnish-cache.org/docs/6.0/reference/varnishd.html#max-vcl
Describe the bug
We are running your latest image as a StatefulSet (without Helm), and manually rebuilt your master-state that includes the varnish-exporter.
We did the following steps to get there:
kube-httpcache
(see config below)Everything is working seemingly fine, but the memory and CPU consumption of the exporter-container is steadily rising.
We increased its resource limits several times with no success, after a couple of days it always reaches its limits and will stop working.
It also does not produce any suspicious logs.
CPU Consumption (we had a short metrics outage on the system, and the drop at the end is a restart of the STS):
Memory Consumption:
Expected behavior
A reasonably constant usage of resources.
Configuration
our StatefulSet-Config:
The exact docker image we build for the exporter:
Kubernetes Version: 1.18.15
Thanks for looking into this 🤝 🚀
The text was updated successfully, but these errors were encountered: