Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Date
Msg-id CAH2-WznFeGRpXAxBM2+4SNWfwzd45eFC2EcLJpgd32MNx_JupA@mail.gmail.com
Whole thread Raw
In response to Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
List pgsql-hackers
On Mon, Dec 16, 2024 at 1:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Then in the next vacuum, we end up scanning those all-frozen pages again because the
> ranges of frozen pages are smaller than SKIP_PAGES_THRESHOLD. This is
> mostly going to happen for an insert-only workload. I'm not saying
> freezing the pages is bad, I'm saying that causing these pockets of
> frozen pages leads to scanning all-frozen pages on future vacuums.

I guess that it doesn't necessarily matter what component you define
as being at fault here, but FWIW I think that the problem is primarily
SKIP_PAGES_THRESHOLD itself. After all, SKIP_PAGES_THRESHOLD tries to
encourage certain desirable outcomes (namely readahead and
non-aggressive relfrozenxid advancement) without ever really verifying
that that's working out. If VACUUM doesn't truly get readahead (quite
likely), it stills pay quite a high cost in CPU cycles. Similarly, if
VACUUM skips even one all-visible page, we can't expect much of any
benefit for having not skipped any other all-visible pages (whether or
not we can safely advance relfrozenxid at all is an all or nothing
thing).

> That's an interesting idea. And it seems like a much more effective
> way of getting some relfrozenxid advancement than hoping that the
> pages you scan due to SKIP_PAGES_THRESHOLD end up being enough to have
> scanned all unfrozen tuples.

Sometimes this can be truly perverse. It's possible for a SELECT FOR
SHARE to unset the all-frozen bit, without unsetting the all-visible
bit. So all that it takes is one such SELECT FOR SHARE against one
random tuple in an originally-all-frozen heap page inside a large
grouping of all-frozen heap pages. That's almost certainly enough to
obstruct non-aggressive relfrozenxid advancement _for the entire
table_. ISTM that it's just criminal to miss out on non-aggressive
relfrozenxid advancement because of some tiny issue such as that. And
so SKIP_PAGES_THRESHOLD should be replaced by something that
specifically has relfrozenxid as a goal of reading all-visible pages.

Just how well this works out in practice will be very workload
dependent. But there are workloads/tables where it works quite well:

https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Mixed_inserts_and_deletes

> > I think that aggressive mode is a bad idea more generally. The
> > behavior around waiting for a cleanup lock (the second
> > aggressive-mode-influenced behavior) is also a lot more brittle than
> > it really needs to be, simply because we're not weighing costs and
> > benefits. There's a bunch of relevant information that could be
> > applied when deciding what to do (at the level of each individual heap
> > page that cannot be cleanup locked right away), but we make no effort
> > to apply that information -- we only care about the static choice of
> > aggressive vs. non-aggressive there.
>
> What kind of information? Could you say more?

Waiting or not waiting doesn't have to be an either/or choice. For
example, we could retry a few times, with a backoff, if and only if it
seemed to make sense. Testing that I did a few years ago showed that
that worked rather well.

Why should we necessarily have to advance relfrozenxid exactly up to
FreezeLimit during every aggressive VACUUM? Surely the picture over
time and across multiple VACUUM operations is what matters most? At
the very least, we should have an independent XID cutoff for "must
advance relfrozenxid up to here, no matter what" -- we should just
reuse FreezeLimit to control that behavior. We might very well "try
quite hard to advance relfrozenxid to a value >= FreezeLimit" -- we
just don't have to do it no matter what the cost is. There is a huge
practical difference between "try quite hard" (e.g., retry the cleanup
lock acquisition 3 times, with a sleep between each) and "try
infinitely hard" (i.e., wait for a cleanup lock indefinitely).

VACUUM is the only thing that freezes. Right now we'll sometimes wait
for a cleanup lock forever, because "we have to freeze". But waiting
like that necessarily implies that we cannot freeze any other page in
the table. That might actually be the least worst thing that VACUUM
can do, but only in the extreme case where we've really put it off for
as long as we possibly could (by not waiting forever in prior VACUUM
operations). There is actually a lot of potential benefit from
"kicking the can down the road", if we go about it intelligently. Just
waiting forever is a pretty horrible strategy, and so it should only
be the strategy of last resort (we might reasonably be able to dump a
log report about it in a world where it's truly the strategy of last
resort).

> Andres mentioned the other day that we could set pages all-visible in
> the VM even if we don't get the cleanup lock (lazy_scan_noprune())
> case. That seems like a good idea.

Maybe it's a good idea, but right now it poses a similar risk to my
scenario involving a random, isolated SELECT FOR SHARE that happens to
affect some random tuple on a cold/frozen page. Of course, this
wouldn't be all that hard to fix.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: IANA timezone abbreviations versus timezone_abbreviations
Next
From: Noah Misch
Date:
Subject: Re: Windows UTF8 system locale