Re: Emit fewer vacuum records by reaping removable tuples during pruning - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date
Msg-id CAAKRu_YQRSBeLCb2db_r4uaAUE-zuqtajFJ2WFi7kbQ0oJyj0g@mail.gmail.com
Whole thread Raw
In response to Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Andres Freund <andres@anarazel.de>)
Responses Re: Emit fewer vacuum records by reaping removable tuples during pruning
List pgsql-hackers
Thanks for the review!

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:
> >               Assert(ItemIdIsNormal(lp));
> >               htup = (HeapTupleHeader) PageGetItem(dp, lp);
> > @@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
> >                * redirect the root to the correct chain member.
> >                */
> >               if (i >= nchain)
> > -                     heap_prune_record_dead(prstate, rootoffnum);
> > +             {
> > +                     /*
> > +                      * If the relation has no indexes, we can remove dead tuples
> > +                      * during pruning instead of marking their line pointers dead. Set
> > +                      * this tuple's line pointer LP_UNUSED.
> > +                      */
> > +                     if (prstate->pronto_reap)
> > +                             heap_prune_record_unused(prstate, rootoffnum);
> > +                     else
> > +                             heap_prune_record_dead(prstate, rootoffnum);
> > +             }
> >               else
> >                       heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
> >       }
> > @@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
> >                * item.  This can happen if the loop in heap_page_prune caused us to
> >                * visit the dead successor of a redirect item before visiting the
> >                * redirect item.  We can clean up by setting the redirect item to
> > -              * DEAD state.
> > +              * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
> >                */
> > -             heap_prune_record_dead(prstate, rootoffnum);
> > +             if (prstate->pronto_reap)
> > +                     heap_prune_record_unused(prstate, rootoffnum);
> > +             else
> > +                     heap_prune_record_dead(prstate, rootoffnum);
> >       }
> >
> >       return ndeleted;
>
> There's three new calls to heap_prune_record_unused() and the logic got more
> nested. Is there a way to get to a nicer end result?

So, I could do another loop through the line pointers in
heap_page_prune() (after the loop calling heap_prune_chain()) and, if
pronto_reap is true, set dead line pointers LP_UNUSED. Then, when
constructing the WAL record, I would just not add the prstate.nowdead
that I saved from heap_prune_chain() to the prune WAL record.

This would eliminate the extra if statements from heap_prune_chain().
It will be more performant than sticking with the original (master)
call to lazy_vacuum_heap_page(). However, I'm not convinced that the
extra loop to set line pointers LP_DEAD -> LP_UNUSED is less confusing
than keeping the if pronto_reap test in heap_prune_chain().
heap_prune_chain() is where line pointers' new values are decided. It
seems weird to pick one new value for a line pointer in
heap_prune_chain() and then pick another, different new value in a
loop after heap_prune_chain(). I don't see any way to eliminate the if
pronto_reap tests without a separate loop setting LP_DEAD->LP_UNUSED,
though.

> > From 608658f2cbc0acde55aac815c0fdb523ec24c452 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman@gmail.com>
> > Date: Mon, 13 Nov 2023 16:47:08 -0500
> > Subject: [PATCH v2 1/2] Indicate rel truncation unsafe in lazy_scan[no]prune
> >
> > Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
> > not there are tuples on the page making rel truncation unsafe.
> > LVRelState->nonempty_pages is updated to reflect this. Previously, both
> > functions set an output parameter or output parameter member, hastup, to
> > indicate that nonempty_pages should be updated to reflect the latest
> > non-removable page. There doesn't seem to be any reason to wait until
> > lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
> > counters in the LVRelState are updated in lazy_scan_[no]prune().
> > This allows us to get rid of the output parameter hastup.
>
>
> > @@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
> >                               continue;
> >                       }
> >
> > -                     /* Collect LP_DEAD items in dead_items array, count tuples */
> > -                     if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
> > +                     /*
> > +                      * Collect LP_DEAD items in dead_items array, count tuples,
> > +                      * determine if rel truncation is safe
> > +                      */
> > +                     if (lazy_scan_noprune(vacrel, buf, blkno, page,
> >                                                                 &recordfreespace))
> >                       {
> >                               Size            freespace = 0;
> >
> >                               /*
> >                                * Processed page successfully (without cleanup lock) -- just
> > -                              * need to perform rel truncation and FSM steps, much like the
> > -                              * lazy_scan_prune case.  Don't bother trying to match its
> > -                              * visibility map setting steps, though.
> > +                              * need to update the FSM, much like the lazy_scan_prune case.
> > +                              * Don't bother trying to match its visibility map setting
> > +                              * steps, though.
> >                                */
> > -                             if (hastup)
> > -                                     vacrel->nonempty_pages = blkno + 1;
> >                               if (recordfreespace)
> >                                       freespace = PageGetHeapFreeSpace(page);
> >                               UnlockReleaseBuffer(buf);
>
> The comment continues to say that we "determine if rel truncation is safe" -
> but I don't see that?  Oh, I see, it's done inside lazy_scan_noprune(). This
> doesn't seem like a clear improvement to me. Particularly because it's only
> set if lazy_scan_noprune() actually does something.

I don't get what the last sentence means ("Particularly because...").
The new location of the hastup test in lazy_scan_noprune() is above an
unconditional return true, so it is also only set if
lazy_scan_noprune() actually does something. I think the
lazy_scan[]prune() functions shouldn't try to export the hastup
information to lazy_scan_heap(). It's confusing. We should be moving
all of the page-specific processing into the individual functions
instead of in the body of lazy_scan_heap().

> I don't like the existing code in lazy_scan_heap(). But this kinda seems like
> tinkering around the edges, without getting to the heart of the issue. I think
> we should
>
> 1) Move everything after ReadBufferExtended() and the end of the loop into its
>    own function
>
> 2) All the code in the loop body after the call to lazy_scan_prune() is
>    specific to the lazy_scan_prune() path, it doesn't make sense that it's at
>    the same level as the the calls to lazy_scan_noprune(),
>    lazy_scan_new_or_empty() or lazy_scan_prune(). Either it should be in
>    lazy_scan_prune() or a new wrapper function.
>
> 3) It's imo wrong that we have UnlockReleaseBuffer() (there are 6 different
>    places unlocking if I didn't miscount!) and RecordPageWithFreeSpace() calls
>    in this many places. I think this is largely a consequence of the previous
>    points. Once those are addressed, we can have one common place.

I have other patches that do versions of all of the above, but they
didn't seem to really fit with this patch set. I am taking a step to
move code out of lazy_scan_heap() that doesn't belong there. That fact
that other code should also be moved from there seems more like a "yes
and" than a "no but". That being said, do you think I should introduce
patches doing further refactoring of lazy_scan_heap() (like what you
suggest above) into this thread?

- Melanie



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Adding facility for injection points (or probe points?) for more advanced tests
Next
From: Nathan Bossart
Date:
Subject: Re: Hide exposed impl detail of wchar.c