Re: Emit fewer vacuum records by reaping removable tuples during pruning - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date
Msg-id CAAKRu_ZDdWMLw9_DfQBntOWkL8RbSTuPkBrw-8bkpdYGv=x+DA@mail.gmail.com
Whole thread Raw
In response to Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Emit fewer vacuum records by reaping removable tuples during pruning
List pgsql-hackers
On Thu, Jan 4, 2024 at 12:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Dec 27, 2023 at 11:27 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > Do you have specific concerns about its correctness? I understand it
> > is an area where we have to be sure we are correct. But, to be fair,
> > that is true of all the pruning and vacuuming code.
>
> I'm kind of concerned that 0002 might be a performance regression. It
> pushes more branches down into the heap-pruning code, which I think
> can sometimes be quite hot, for the sake of a case that rarely occurs
> in practice. I take your point about it improving things when there
> are no indexes, but what about when there are? And even if there are
> no adverse performance consequences, is it really worth complicating
> the logic at such a low level?

Regarding the additional code complexity, I think the extra call to
lazy_vacuum_heap_page() in lazy_scan_heap() actually represents a fair
amount of code complexity. It is a special case of page-level
processing that should be handled by heap_page_prune() and not
lazy_scan_heap().

lazy_scan_heap() is responsible for three main things -- loop through
the blocks in a relation and process each page (pruning, freezing,
etc), invoke index vacuuming, invoke functions to loop through
dead_items and vacuum pages. The logic to do the per-page processing
is spread across three places, though.

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

> Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
> informal word that seems to have no advantage over something like
> "immediate" or "now," and I don't think "reap" has a precise,
> universally-understood meaning. You could call this "mark_unused_now"
> or "immediately_mark_unused" or something and it would be far more
> self-documenting, IMHO.

Yes, I see how pronto is unnecessarily informal. If there are no cases
other than when the table has no indexes that we would consider
immediately marking LPs unused, then perhaps it is better to call it
"no_indexes" (per andres' suggestion)?

- Melanie



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: the s_lock_stuck on perform_spin_delay
Next
From: Andres Freund
Date:
Subject: Re: the s_lock_stuck on perform_spin_delay