Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit? |
Date | |
Msg-id | CAH2-WznFeGRpXAxBM2+4SNWfwzd45eFC2EcLJpgd32MNx_JupA@mail.gmail.com Whole thread Raw |
In response to | Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit? (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit? |
List | pgsql-hackers |
On Mon, Dec 16, 2024 at 1:50 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > Then in the next vacuum, we end up scanning those all-frozen pages again because the > ranges of frozen pages are smaller than SKIP_PAGES_THRESHOLD. This is > mostly going to happen for an insert-only workload. I'm not saying > freezing the pages is bad, I'm saying that causing these pockets of > frozen pages leads to scanning all-frozen pages on future vacuums. I guess that it doesn't necessarily matter what component you define as being at fault here, but FWIW I think that the problem is primarily SKIP_PAGES_THRESHOLD itself. After all, SKIP_PAGES_THRESHOLD tries to encourage certain desirable outcomes (namely readahead and non-aggressive relfrozenxid advancement) without ever really verifying that that's working out. If VACUUM doesn't truly get readahead (quite likely), it stills pay quite a high cost in CPU cycles. Similarly, if VACUUM skips even one all-visible page, we can't expect much of any benefit for having not skipped any other all-visible pages (whether or not we can safely advance relfrozenxid at all is an all or nothing thing). > That's an interesting idea. And it seems like a much more effective > way of getting some relfrozenxid advancement than hoping that the > pages you scan due to SKIP_PAGES_THRESHOLD end up being enough to have > scanned all unfrozen tuples. Sometimes this can be truly perverse. It's possible for a SELECT FOR SHARE to unset the all-frozen bit, without unsetting the all-visible bit. So all that it takes is one such SELECT FOR SHARE against one random tuple in an originally-all-frozen heap page inside a large grouping of all-frozen heap pages. That's almost certainly enough to obstruct non-aggressive relfrozenxid advancement _for the entire table_. ISTM that it's just criminal to miss out on non-aggressive relfrozenxid advancement because of some tiny issue such as that. And so SKIP_PAGES_THRESHOLD should be replaced by something that specifically has relfrozenxid as a goal of reading all-visible pages. Just how well this works out in practice will be very workload dependent. But there are workloads/tables where it works quite well: https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Mixed_inserts_and_deletes > > I think that aggressive mode is a bad idea more generally. The > > behavior around waiting for a cleanup lock (the second > > aggressive-mode-influenced behavior) is also a lot more brittle than > > it really needs to be, simply because we're not weighing costs and > > benefits. There's a bunch of relevant information that could be > > applied when deciding what to do (at the level of each individual heap > > page that cannot be cleanup locked right away), but we make no effort > > to apply that information -- we only care about the static choice of > > aggressive vs. non-aggressive there. > > What kind of information? Could you say more? Waiting or not waiting doesn't have to be an either/or choice. For example, we could retry a few times, with a backoff, if and only if it seemed to make sense. Testing that I did a few years ago showed that that worked rather well. Why should we necessarily have to advance relfrozenxid exactly up to FreezeLimit during every aggressive VACUUM? Surely the picture over time and across multiple VACUUM operations is what matters most? At the very least, we should have an independent XID cutoff for "must advance relfrozenxid up to here, no matter what" -- we should just reuse FreezeLimit to control that behavior. We might very well "try quite hard to advance relfrozenxid to a value >= FreezeLimit" -- we just don't have to do it no matter what the cost is. There is a huge practical difference between "try quite hard" (e.g., retry the cleanup lock acquisition 3 times, with a sleep between each) and "try infinitely hard" (i.e., wait for a cleanup lock indefinitely). VACUUM is the only thing that freezes. Right now we'll sometimes wait for a cleanup lock forever, because "we have to freeze". But waiting like that necessarily implies that we cannot freeze any other page in the table. That might actually be the least worst thing that VACUUM can do, but only in the extreme case where we've really put it off for as long as we possibly could (by not waiting forever in prior VACUUM operations). There is actually a lot of potential benefit from "kicking the can down the road", if we go about it intelligently. Just waiting forever is a pretty horrible strategy, and so it should only be the strategy of last resort (we might reasonably be able to dump a log report about it in a world where it's truly the strategy of last resort). > Andres mentioned the other day that we could set pages all-visible in > the VM even if we don't get the cleanup lock (lazy_scan_noprune()) > case. That seems like a good idea. Maybe it's a good idea, but right now it poses a similar risk to my scenario involving a random, isolated SELECT FOR SHARE that happens to affect some random tuple on a cold/frozen page. Of course, this wouldn't be all that hard to fix. -- Peter Geoghegan
pgsql-hackers by date: