Re: Eliminate redundant tuple visibility check in vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Eliminate redundant tuple visibility check in vacuum
Date
Msg-id CA+Tgmobh6pP=sUrgPMdV6Gu5CDBtv6DAgGo-dyj-ZaA_O-XcJA@mail.gmail.com
Whole thread Raw
In response to Eliminate redundant tuple visibility check in vacuum  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Eliminate redundant tuple visibility check in vacuum
List pgsql-hackers
On Mon, Aug 28, 2023 at 7:49 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> While working on a set of patches to combine the freeze and visibility
> map WAL records into the prune record, I wrote the attached patches
> reusing the tuple visibility information collected in heap_page_prune()
> back in lazy_scan_prune().
>
> heap_page_prune() collects the HTSV_Result for every tuple on a page
> and saves it in an array used by heap_prune_chain(). If we make that
> array available to lazy_scan_prune(), it can use it when collecting
> stats for vacuum and determining whether or not to freeze tuples.
> This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
> the page.
>
> It also gets rid of the retry loop in lazy_scan_prune().

In general, I like these patches. I think it's a clever approach, and
I don't really see any down side. It should just be straight-up better
than what we have now, and if it's not better, it still shouldn't be
any worse.

I have a few suggestions:

- Rather than removing the rather large comment block at the top of
lazy_scan_prune() I'd like to see it rewritten to explain how we now
deal with the problem. I'd suggest leaving the first paragraph ("Prior
to...") just as it is and replace all the words following "The
approach we take now is" with a description of the approach that this
patch takes to the problem.

- I'm not a huge fan of the caller of heap_page_prune() having to know
how to initialize the PruneResult. Can heap_page_prune() itself do
that work, so that presult is an out parameter rather than an in-out
parameter? Or, if not, can it be moved to a new helper function, like
heap_page_prune_init(), rather than having that logic in 2+ places?

- int ndeleted,
- nnewlpdead;
-
- ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
-    limited_ts, &nnewlpdead, NULL);
+ int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
+    limited_ts, &presult, NULL);

- I don't particularly like merging the declaration with the
assignment unless the call is narrow enough that we don't need a line
break in there, which is not the case here.

- I haven't thoroughly investigated yet, but I wonder if there are any
other places where comments need updating. As a general note, I find
it desirable for a function's header comment to mention whether any
pointer parameters are in parameters, out parameters, or in-out
parameters, and what the contract between caller and callee actually
is.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Денис Смирнов
Date:
Subject: Re: Use virtual tuple slot for Unique node
Next
From: Muhammad Malik
Date:
Subject: Re: Improve heapgetpage() performance, overhead from serializable