Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Date
Msg-id CAH2-Wzky0C7v1i2TGkPQEaVUrZNhwdZn9nnOtNnzNceZkLjY3Q@mail.gmail.com
Whole thread Raw
In response to Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
List pgsql-hackers
On Mon, Dec 16, 2024 at 10:37 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On a related note, the other day I noticed another negative effect
> caused in part by SKIP_PAGES_THRESHOLD. SKIP_PAGES_THRESHOLD interacts
> with the opportunistic freeze heuristic [1] causing lots of all-frozen
> pages to be scanned when checksums are enabled. You can easily end up
> with a table that has very fragmented ranges of frozen, all-visible,
> and modified pages. In this case, the opportunistic freeze heuristic
> bears most of the blame.

Bears most of the blame for what? Significantly reducing the total
amount of WAL written?

> However, we are not close to coming up with a
> replacement heuristic, so removing SKIP_PAGES_THRESHOLD would help.
> This wouldn't have affected your results, but it is worth considering
> more generally.

One of the reasons why we have SKIP_PAGES_THRESHOLD is that it makes
it more likely that non-aggressive VACUUMs will advance relfrozenxid.
Granted, it's probably not doing a particularly good job at that right
now. But any effort to replace it should account for that.

This is possible by making VACUUM consider the cost of scanning extra
heap pages up-front. If the number of "extra heap pages to be scanned"
to advance relfrozenxid happens to not be very high (or not so high
*relative to the current age(relfrozenxid)*), then pay that cost now,
in the current VACUUM operation. Even if age(relfrozenxid) is pretty
far from the threshold for aggressive mode, if the added cost of
advancing relfrozenxid is still not too high, why wouldn't we just do
it?

I think that aggressive mode is a bad idea more generally. The
behavior around waiting for a cleanup lock (the second
aggressive-mode-influenced behavior) is also a lot more brittle than
it really needs to be, simply because we're not weighing costs and
benefits. There's a bunch of relevant information that could be
applied when deciding what to do (at the level of each individual heap
page that cannot be cleanup locked right away), but we make no effort
to apply that information -- we only care about the static choice of
aggressive vs. non-aggressive there.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: improve EXPLAIN for wide tables
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: Improving default column names/aliases of subscript text expressions