Re: Combine Prune and Freeze records emitted by vacuum - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Combine Prune and Freeze records emitted by vacuum
Date
Msg-id CAAKRu_abm2tHhrc0QSQa==sHe=VA1=oz1dJMQYUOKuHmu+9Xrg@mail.gmail.com
Whole thread Raw
In response to Re: Combine Prune and Freeze records emitted by vacuum  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Mar 30, 2024 at 8:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Mar 30, 2024 at 1:57 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > I think that we are actually successfully removing more RECENTLY_DEAD
> > HOT tuples than in master with heap_page_prune()'s new approach, and I
> > think it is correct; but let me know if I am missing something.
>
> /me blinks.
>
> Isn't zero the only correct number of RECENTLY_DEAD tuples to remove?

At the top of the comment for heap_prune_chain() in master, it says

 * If the item is an index-referenced tuple (i.e. not a heap-only tuple),
 * the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
 * chain.  We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
 * This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really
 * DEAD, our visibility test is just too coarse to detect it.

Heikki had added a comment in one of his patches to the fast path for
HOT tuples at the top of heap_prune_chain():

             * Note that we might first arrive at a dead heap-only tuple
             * either while following a chain or here (in the fast
path).  Whichever path
             * gets there first will mark the tuple unused.
             *
             * Whether we arrive at the dead HOT tuple first here or while
             * following a chain above affects whether preceding RECENTLY_DEAD
             * tuples in the chain can be removed or not.  Imagine that you
             * have a chain with two tuples: RECENTLY_DEAD -> DEAD.  If we
             * reach the RECENTLY_DEAD tuple first, the chain-following logic
             * will find the DEAD tuple and conclude that both tuples are in
             * fact dead and can be removed.  But if we reach the DEAD tuple
             * at the end of the chain first, when we reach the RECENTLY_DEAD
             * tuple later, we will not follow the chain because the DEAD
             * TUPLE is already 'marked', and will not remove the
             * RECENTLY_DEAD tuple.  This is not a correctness issue, and the
             * RECENTLY_DEAD tuple will be removed by a later VACUUM.

My patch splits the tuples into HOT and non-HOT while gathering their
visibility information and first calls heap_prune_chain() on the
non-HOT tuples and then processes the yet unmarked HOT tuples in a
separate loop afterward. This will follow all of the chains and
process them completely as well as processing all HOT tuples which may
not be reachable from a valid chain. The fast path contains a special
check to ensure that line pointers for DEAD not HOT-updated HOT tuples
(dead orphaned tuples from aborted HOT updates) are still marked
LP_UNUSED even though they are not reachable from a valid HOT chain.
By doing this later, we don't preclude ourselves from following all
chains.

- Melanie



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Adding OLD/NEW support to RETURNING
Next
From: Kartyshov Ivan
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed