Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

Open
nigoroll opened this issue Aug 17, 2021 · 11 comments
Open

revisit hash_always_miss and hash_ignore_busy vs. stale objects #3670

nigoroll opened this issue Aug 17, 2021 · 11 comments

Comments

@nigoroll
Copy link
Member

nigoroll commented Aug 17, 2021

For ordinary requests, we (try to) replace stale cache objects when we fetch new ones:

  • in HSH_Lookup(), we return the "best object" via the ocp argument
  • in cnt_lookup(), we either pass this object directly to VBF_Fetch() as the oldoc argument for a bgfetch, or indirectly by saving it in req->stale_oc for the miss case.
  • in vbf_stp_fetchend(), when we have a new cache object, we kill a single stale object which was saved before, if any

Now this mechanism clearly does not work with hash_always_miss, as previously documented in #2945 : If this mode is present, we insert a new object, never cleaning up old stale objects (because we do not track them).

A different but seemingly similar issue exists with hash_ignore_busy: For concurrent requests, we insert multiple objects, but only remove one.

This ticket is to question if this behavior is in fact what we want. I see the following issues:

  • For the short ttl + long grace case, we can end up with a relevant number of unused cache objects
  • This behavior quite likely is un-POLA
  • Removing the surplus objects requires a PURGE or a BAN

On the other end, I could imagine cases where these facilities are used to deliberately create fallback objects in cache, and I wonder if such use cases relying on the property of inserting additional objects exist.


Demo

vcl:

vcl 4.1;

import vtc;

backend alwaysdown none;

sub vcl_recv {
        if (req.url == "/ham") {
                set req.hash_always_miss = true;
        } else if (req.url == "/hib") {
                set req.hash_ignore_busy = true;
        }
        set req.url = "/";
        return (hash);
}

sub vcl_backend_error {
        vtc.sleep(10s);
        set beresp.status = 200;
        set beresp.ttl = 1s;
        set beresp.grace = 1h;
        synthetic("");
        return (deliver);
}
  • hash_always_miss
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                7          .   object structs made
MAIN.n_objectcore            7          .   objectcore structs made
slink@haggis21:~$ curl -I localhost:8080/ham &curl -I localhost:8080/ham &
[1] 27050
[2] 27051
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                7          .   object structs made
MAIN.n_objectcore            7          .   objectcore structs made
slink@haggis21:~$ # waHTTP/1.1 200 OK
HTTP/1.1 200 OK
Date: Tue, 17 Aug 2021 13:15:22 GMT
Date: Tue, 17 Aug 2021 13:15:22 GMT
Server: Varnish
Server: Varnish
X-Varnish: 131080
X-Varnish: 65547
Age: 10
Age: 10
Via: 1.1 varnish (Varnish/6.6)
Via: 1.1 varnish (Varnish/6.6)
Accept-Ranges: bytes
Accept-Ranges: bytes
Content-Length: 0
Content-Length: 0
Connection: keep-alive
Connection: keep-alive



[1]-  Done                    curl -I localhost:8080/ham
[2]+  Done                    curl -I localhost:8080/ham
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                9          .   object structs made
MAIN.n_objectcore            9          .   objectcore structs made
  • hash_ignore_busy
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                9          .   object structs made
MAIN.n_objectcore            9          .   objectcore structs made
slink@haggis21:~$ curl -I localhost:8080/hib &curl -I localhost:8080/hib &
[1] 27158
[2] 27159
slink@haggis21:~$ HTTP/1.1 200 OK
HTTP/1.1 200 OK
Date: Tue, 17 Aug 2021 13:15:22 GMT
Date: Tue, 17 Aug 2021 13:15:22 GMT
Server: Varnish
Server: Varnish
X-Varnish: 65550 131081
X-Varnish: 131083 131081
Age: 58
Age: 58
Via: 1.1 varnish (Varnish/6.6)
Via: 1.1 varnish (Varnish/6.6)
Accept-Ranges: bytes
Accept-Ranges: bytes
Content-Length: 0
Content-Length: 0
Connection: keep-alive

Connection: keep-alive


[1]-  Done                    curl -I localhost:8080/hib
[2]+  Done                    curl -I localhost:8080/hib
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object                9          .   object structs made
MAIN.n_objectcore           11          .   objectcore structs made
slink@haggis21:~$ /tmp/bin/varnishstat -1 -f MAIN.n_object -f MAIN.n_objectcore
MAIN.n_object               10          .   object structs made
MAIN.n_objectcore           10          .   objectcore structs made
@nigoroll
Copy link
Member Author

FTR: This has been discussed during bugwash with no clear decision. Some notes:

  • A related question is if keep should be active at all for non 304-able objects. It probably should not.
  • Vary can also cause a high number of objcores per objhead, but this is a different case
  • LRU would eventually limit the number of objects in cache, but it has some limitations in varnish-cache

Regarding this ticket, one solution which was discussed is to

  • add a hash_backup_objects parameter, defaulting to 0, to configure the number of "surplus" objects besides the freshest one
  • and make this parameter work equally for a "normal" fetch and the cases brought up here

@bsdphk
Copy link
Contributor

bsdphk commented Aug 24, 2021

I think a crucial insight is that by definition we only need to prune when we finish fetching a new objcore.

That means that it is something the be-worker can do, after getting out of the critical path.

Likewise it is theoretically enough to prune a single object.

However, that leaves no margin for VCL or parameter changes, so two should probably be attempted.

The problem is finding an objcore to prune, because they may be stuck in the middle of the objhdr list, surrounded by a LOT of other objcores with incompatible Vary:.

In many cases, (but not all: cache_always_miss) we will have wandered the entire objhdr list before resorting to fetch and maybe we can cache a "if this fetch succeeds, you should probably kill this objcore____" reference ?

Anyway, it sounds like it might be a good idea to start simulating the dynamic behaviour of this area...

@nigoroll
Copy link
Member Author

This ticket has been discussed between @bsdphk and myself. Brief summary:

We did not come up with a particularly nice solution, any ideas to clean the object list in retrospect seem overly complicated.

We discussed the following idea as a likely "best" (reasonably simple) compromise:

For hash_always_miss, if the "cleanup" mode is activated, a cache lookup would be conducted for the sole purpose of finding a stale_oc, which would then be removed once the new object gets unbusied.

This approach is not a complete solution, as it would only remove at most one "surplus" object per insert.

Noted after the disucssion: We might want to consider moving stale ocs found during always_miss / ignore_busy to the end of the object list, such that at least chances increase for other old objects to be cleaned up.

side notes:

hash_ignore_busy could probably do conditional requests (not checked if it already does)

@bsdphk
Copy link
Contributor

bsdphk commented Nov 14, 2022

I'm not sure I would do an extra lookup at the end, I would probably force hash_always_miss through the initial lookup, so it comes in with a stale_oc like everbody else.

@nigoroll
Copy link
Member Author

an extra lookup at the end

... is not what I meant. I meant to move a found stale_oc to the end of the oc list such that another stale_oc has a chance to be picked up by the next lookup

@nigoroll
Copy link
Member Author

nigoroll commented Apr 22, 2024

this has been discussed during bugwash - no clear result, please see edit versions if interested

@dridi
Copy link
Member

dridi commented Apr 22, 2024

My suggestion from the bugwash discussion.

For a lookup with hash_always_miss raised:

  • always loop through the entire objhead
  • grab the first compatible fresh oc as the expired oc
    • fall back to the first stale oc
    • kill any other compatible oc
  • always disable bereq revalidation (IMS and INM)
  • kill the expired oc upon success

@bsdphk
Copy link
Contributor

bsdphk commented Apr 22, 2024

I have a hard time seeing how there could be more than a single compatible oc, if we implement this, but other than that: Yes.

@dridi
Copy link
Member

dridi commented Apr 23, 2024

I have a hard time seeing how there could be more than a single compatible oc, if we implement this, but other than that: Yes.

If you start two fetch tasks for the same hash, same variant, you can have two fresh compatible ocs once you cleared OC_F_BUSY on both.

@bsdphk
Copy link
Contributor

bsdphk commented Apr 29, 2024

I think the pseudo-IMS will go a long way to reduce the problem, I'd prefer to not add expensive machinery to oh until we know it is (still) needed.

@nigoroll
Copy link
Member Author

bugwash: It seems we have got close to the second last edit of #3670 (comment):

  • yes, we should probably do a cache lookup for hash_always_miss
  • the second matching object can be removed right away
  • if that does not exist, the fist matching object should be remembed as the stale_oc, with IMS/INM logic disabled

should be revisited after #4073

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants