Re: Emit fewer vacuum records by reaping removable tuples during pruning - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date
Msg-id CAH2-Wzmd0OyGpqOer8MUTBzAb60kKuosoFsg7AsWuYcocczT9Q@mail.gmail.com
Whole thread Raw
In response to Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Emit fewer vacuum records by reaping removable tuples during pruning
List pgsql-hackers
On Thu, Jan 18, 2024 at 8:52 AM Robert Haas <robertmhaas@gmail.com> wrote:
> But I also said one more thing that I'd still like to hear your
> thoughts about, which is: why is it right to update the FSM after the
> second heap pass rather than the first one? I can't help but suspect
> this is an algorithmic holdover from pre-HOT days, when VACUUM's first
> heap pass was read-only and all the work happened in the second pass.
> Now, nearly all of the free space that will ever become free becomes
> free in the first pass, so why not advertise it then, instead of
> waiting?

I don't think that doing everything FSM-related in the first heap pass
is a bad idea -- especially not if it buys you something elsewhere.

The problem with your justification for moving things in that
direction (if any) is that it is occasionally not quite true: there
are at least some cases where line pointer truncation after making a
page's LP_DEAD items -> LP_UNUSED will actually matter. Plus
PageGetHeapFreeSpace() will return 0 if and when
"PageGetMaxOffsetNumber(page) > MaxHeapTuplesPerPage &&
!PageHasFreeLinePointers(page)". Of course, nothing stops you from
compensating for this by anticipating what will happen later on, and
assuming that the page already has that much free space.

It might even be okay to just not try to compensate for anything,
PageGetHeapFreeSpace-wise -- just do all FSM stuff in the first heap
pass, and ignore all this. I happen to believe that a FSM_CATEGORIES
of 256 is way too much granularity to be useful in practice -- I just
don't have any faith in the idea that that kind of granularity is
useful (it's quite the opposite).

A further justification might be what we already do in the heapam.c
REDO routines: the way that we use XLogRecordPageWithFreeSpace already
operates with far less precision that corresponding code from
vacuumlazy.c. heap_xlog_prune() already has recovery do what you
propose to do during original execution; it doesn't try to avoid
duplicating an anticipated call to XLogRecordPageWithFreeSpace that'll
take place when heap_xlog_vacuum() runs against the same page a bit
later on.

You'd likely prefer a simpler argument for doing this -- an argument
that doesn't require abandoning/discrediting the idea that a high
degree of FSM_CATEGORIES-wise precision is a valuable thing. Not sure
that that's possible -- the current design is at least correct on its
own terms. And what you propose to do will probably be less correct on
those same terms, silly though they are.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Sequence Access Methods, round two
Next
From: Robert Haas
Date:
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning