revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

nigoroll · 2021-08-17T13:18:44Z

For ordinary requests, we (try to) replace stale cache objects when we fetch new ones:

in HSH_Lookup(), we return the "best object" via the ocp argument
in cnt_lookup(), we either pass this object directly to VBF_Fetch() as the oldoc argument for a bgfetch, or indirectly by saving it in req->stale_oc for the miss case.
in vbf_stp_fetchend(), when we have a new cache object, we kill a single stale object which was saved before, if any

Now this mechanism clearly does not work with hash_always_miss, as previously documented in #2945 : If this mode is present, we insert a new object, never cleaning up old stale objects (because we do not track them).

A different but seemingly similar issue exists with hash_ignore_busy: For concurrent requests, we insert multiple objects, but only remove one.

This ticket is to question if this behavior is in fact what we want. I see the following issues:

For the short ttl + long grace case, we can end up with a relevant number of unused cache objects
This behavior quite likely is un-POLA
Removing the surplus objects requires a PURGE or a BAN

On the other end, I could imagine cases where these facilities are used to deliberately create fallback objects in cache, and I wonder if such use cases relying on the property of inserting additional objects exist.

Demo

vcl:

vcl 4.1;

import vtc;

backend alwaysdown none;

sub vcl_recv {
        if (req.url == "/ham") {
                set req.hash_always_miss = true;
        } else if (req.url == "/hib") {
                set req.hash_ignore_busy = true;
        }
        set req.url = "/";
        return (hash);
}

sub vcl_backend_error {
        vtc.sleep(10s);
        set beresp.status = 200;
        set beresp.ttl = 1s;
        set beresp.grace = 1h;
        synthetic("");
        return (deliver);
}

hash_always_miss

slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                7          .   object structs made
MAIN.n_objectcore            7          .   objectcore structs made
slink@haggis21:~$ curl -I localhost:8080/ham &curl -I localhost:8080/ham &
[1] 27050
[2] 27051
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                7          .   object structs made
MAIN.n_objectcore            7          .   objectcore structs made
slink@haggis21:~$ # waHTTP/1.1 200 OK
HTTP/1.1 200 OK
Date: Tue, 17 Aug 2021 13:15:22 GMT
Date: Tue, 17 Aug 2021 13:15:22 GMT
Server: Varnish
Server: Varnish
X-Varnish: 131080
X-Varnish: 65547
Age: 10
Age: 10
Via: 1.1 varnish (Varnish/6.6)
Via: 1.1 varnish (Varnish/6.6)
Accept-Ranges: bytes
Accept-Ranges: bytes
Content-Length: 0
Content-Length: 0
Connection: keep-alive
Connection: keep-alive



[1]-  Done                    curl -I localhost:8080/ham
[2]+  Done                    curl -I localhost:8080/ham
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                9          .   object structs made
MAIN.n_objectcore            9          .   objectcore structs made

hash_ignore_busy

slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                9          .   object structs made
MAIN.n_objectcore            9          .   objectcore structs made
slink@haggis21:~$ curl -I localhost:8080/hib &curl -I localhost:8080/hib &
[1] 27158
[2] 27159
slink@haggis21:~$ HTTP/1.1 200 OK
HTTP/1.1 200 OK
Date: Tue, 17 Aug 2021 13:15:22 GMT
Date: Tue, 17 Aug 2021 13:15:22 GMT
Server: Varnish
Server: Varnish
X-Varnish: 65550 131081
X-Varnish: 131083 131081
Age: 58
Age: 58
Via: 1.1 varnish (Varnish/6.6)
Via: 1.1 varnish (Varnish/6.6)
Accept-Ranges: bytes
Accept-Ranges: bytes
Content-Length: 0
Content-Length: 0
Connection: keep-alive

Connection: keep-alive


[1]-  Done                    curl -I localhost:8080/hib
[2]+  Done                    curl -I localhost:8080/hib
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                9          .   object structs made
MAIN.n_objectcore           11          .   objectcore structs made
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object               10          .   object structs made
MAIN.n_objectcore           10          .   objectcore structs made

The text was updated successfully, but these errors were encountered:

nigoroll · 2021-08-23T14:17:37Z

FTR: This has been discussed during bugwash with no clear decision. Some notes:

A related question is if keep should be active at all for non 304-able objects. It probably should not.
Vary can also cause a high number of objcores per objhead, but this is a different case
LRU would eventually limit the number of objects in cache, but it has some limitations in varnish-cache

Regarding this ticket, one solution which was discussed is to

add a hash_backup_objects parameter, defaulting to 0, to configure the number of "surplus" objects besides the freshest one
and make this parameter work equally for a "normal" fetch and the cases brought up here

bsdphk · 2021-08-24T11:44:10Z

I think a crucial insight is that by definition we only need to prune when we finish fetching a new objcore.

That means that it is something the be-worker can do, after getting out of the critical path.

Likewise it is theoretically enough to prune a single object.

However, that leaves no margin for VCL or parameter changes, so two should probably be attempted.

The problem is finding an objcore to prune, because they may be stuck in the middle of the objhdr list, surrounded by a LOT of other objcores with incompatible Vary:.

In many cases, (but not all: cache_always_miss) we will have wandered the entire objhdr list before resorting to fetch and maybe we can cache a "if this fetch succeeds, you should probably kill this objcore____" reference ?

Anyway, it sounds like it might be a good idea to start simulating the dynamic behaviour of this area...

nigoroll · 2022-11-14T13:55:04Z

This ticket has been discussed between @bsdphk and myself. Brief summary:

We did not come up with a particularly nice solution, any ideas to clean the object list in retrospect seem overly complicated.

We discussed the following idea as a likely "best" (reasonably simple) compromise:

For hash_always_miss, if the "cleanup" mode is activated, a cache lookup would be conducted for the sole purpose of finding a stale_oc, which would then be removed once the new object gets unbusied.

This approach is not a complete solution, as it would only remove at most one "surplus" object per insert.

Noted after the disucssion: We might want to consider moving stale ocs found during always_miss / ignore_busy to the end of the object list, such that at least chances increase for other old objects to be cleaned up.

side notes:

hash_ignore_busy could probably do conditional requests (not checked if it already does)

bsdphk · 2022-11-14T14:20:12Z

I'm not sure I would do an extra lookup at the end, I would probably force hash_always_miss through the initial lookup, so it comes in with a stale_oc like everbody else.

nigoroll · 2022-11-14T17:33:45Z

an extra lookup at the end

... is not what I meant. I meant to move a found stale_oc to the end of the oc list such that another stale_oc has a chance to be picked up by the next lookup

nigoroll · 2024-04-22T13:38:15Z

this has been discussed during bugwash - no clear result, please see edit versions if interested

dridi · 2024-04-22T14:07:17Z

My suggestion from the bugwash discussion.

For a lookup with hash_always_miss raised:

always loop through the entire objhead
grab the first compatible fresh oc as the expired oc
- fall back to the first stale oc
- kill any other compatible oc
always disable bereq revalidation (IMS and INM)
kill the expired oc upon success

bsdphk · 2024-04-22T14:09:14Z

I have a hard time seeing how there could be more than a single compatible oc, if we implement this, but other than that: Yes.

dridi · 2024-04-23T05:34:49Z

I have a hard time seeing how there could be more than a single compatible oc, if we implement this, but other than that: Yes.

If you start two fetch tasks for the same hash, same variant, you can have two fresh compatible ocs once you cleared OC_F_BUSY on both.

bsdphk · 2024-04-29T13:47:41Z

I think the pseudo-IMS will go a long way to reduce the problem, I'd prefer to not add expensive machinery to oh until we know it is (still) needed.

nigoroll · 2024-04-29T13:50:21Z

bugwash: It seems we have got close to the second last edit of #3670 (comment):

yes, we should probably do a cache lookup for hash_always_miss
the second matching object can be removed right away
if that does not exist, the fist matching object should be remembed as the stale_oc, with IMS/INM logic disabled

should be revisited after #4073

nigoroll added the a=need bugwash label Aug 17, 2021

bsdphk removed the a=need bugwash label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

nigoroll commented Aug 17, 2021 •

edited

Loading

nigoroll commented Aug 23, 2021

bsdphk commented Aug 24, 2021

nigoroll commented Nov 14, 2022

bsdphk commented Nov 14, 2022

nigoroll commented Nov 14, 2022

nigoroll commented Apr 22, 2024 •

edited

Loading

dridi commented Apr 22, 2024

bsdphk commented Apr 22, 2024

dridi commented Apr 23, 2024

bsdphk commented Apr 29, 2024

nigoroll commented Apr 29, 2024

revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

Comments

nigoroll commented Aug 17, 2021 • edited Loading

Demo

nigoroll commented Aug 23, 2021

bsdphk commented Aug 24, 2021

nigoroll commented Nov 14, 2022

bsdphk commented Nov 14, 2022

nigoroll commented Nov 14, 2022

nigoroll commented Apr 22, 2024 • edited Loading

dridi commented Apr 22, 2024

bsdphk commented Apr 22, 2024

dridi commented Apr 23, 2024

bsdphk commented Apr 29, 2024

nigoroll commented Apr 29, 2024

nigoroll commented Aug 17, 2021 •

edited

Loading

nigoroll commented Apr 22, 2024 •

edited

Loading