Re: Combine Prune and Freeze records emitted by vacuum - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Combine Prune and Freeze records emitted by vacuum
Date
Msg-id 20240320013602.6sypr4cx6sefpemg@liskov
Whole thread Raw
In response to Re: Combine Prune and Freeze records emitted by vacuum  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Combine Prune and Freeze records emitted by vacuum
Re: Combine Prune and Freeze records emitted by vacuum
List pgsql-hackers
On Mon, Mar 18, 2024 at 01:15:21AM +0200, Heikki Linnakangas wrote:
> On 15/03/2024 02:56, Melanie Plageman wrote:
> > Okay, so I was going to start using xl_heap_prune for vacuum here too,
> > but I realized it would be bigger because of the
> > snapshotConflictHorizon. Do you think there is a non-terrible way to
> > make the snapshotConflictHorizon optional? Like with a flag?
> 
> Yeah, another flag would do the trick.

Okay, I've done this in attached v4 (including removing
XLOG_HEAP2_VACUUM). I had to put the snapshot conflict horizon in the
"main chunk" of data available at replay regardless of whether or not
the record ended up including an FPI.

I made it its own sub-record (xlhp_conflict_horizon) less to help with
alignment (though we can use all the help we can get there) and more to
keep it from getting lost. When you look at heapam_xlog.h, you can see
what a XLOG_HEAP2_PRUNE record will contain starting with the
xl_heap_prune struct and then all the sub-record types.

> > > I introduced a few sub-record types similar to what you suggested --
> > > they help a bit with alignment, so I think they are worth keeping. There
> > > are comments around them, but perhaps a larger diagram of the layout of
> > > a the new XLOG_HEAP2_PRUNE record would be helpful.
> > 
> > I started doing this, but I can't find a way of laying out the diagram
> > that pgindent doesn't destroy. I thought I remember other diagrams in
> > the source code showing the layout of something (something with pages
> > somewhere?) that don't get messed up by pgindent, but I couldn't find
> > them.
> 
> See src/tools/pgindent/README, section "Cleaning up in case of failure or
> ugly output":
> 
>     /*----------
>      * Text here will not be touched by pgindent.
>      */
> 

Cool. This version doesn't include the spiffy drawing I promised yet.

> Note that we don't provide WAL compatibility across major versions. You can
> fully remove the old xl_heap_freeze_page format. (We should bump
> XLOG_PAGE_MAGIC when this is committed though)

I've removed the xl_heap_freeze (and xl_heap_prune). I didn't bump
XLOG_PAGE_MAGIC.

> > > On the point of removing the freeze-only code path from
> > > heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
> > > I realized that heap_pre_freeze_checks() was not being called in the
> > > case that we decide to freeze because we emitted an FPI while setting
> > > the hint bit. I've fixed that, however, I've done so by moving
> > > heap_pre_freeze_checks() into the critical section. I think that is not
> > > okay? I could move it earlier and not do call it when the hint bit FPI
> > > leads us to freeze tuples. But, I think that would lead to us doing a
> > > lot less validation of tuples being frozen when checksums are enabled.
> > > Or, I could make two critical sections?
> > 
> > I found another approach and just do the pre-freeze checks if we are
> > considering freezing except for the FPI criteria.
> 
> Hmm, I think you can make this simpler if you use
> XLogCheckBufferNeedsBackup() to check if the hint-bit update will generate a
> full-page-image before entering the critical section. Like you did to check
> if pruning will generate a full-page-image. I included that change in the
> attached patches.

I used your proposed structure. You had XLogCheckBufferNeedsBackup()
twice in your proposed version a few lines apart. I don't think there is
any point in checking it twice. If we are going to rely on
XLogCheckBufferNeedsBackup() to tell us whether or not setting the hint
bit is *likely* to emit an FPI, then we might as well just call
XLogCheckBufferNeedsBackup() once.

> I don't fully understand this:
> 
> >     /*
> >      * If we will freeze tuples on the page or they were all already frozen
> >      * on the page, if we will set the page all-frozen in the visibility map,
> >      * we can advance relfrozenxid and relminmxid to the values in
> >      * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
> >      */
> >     if (presult->all_frozen || presult->nfrozen > 0)
> >     {
> >         presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
> >         presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
> >     }
> >     else
> >     {
> >         presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
> >         presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
> >     }
> 
> Firstly, the comment is outdated, because we have already done the freezing
> at this point. But more importantly, I don't understand the logic here.
> Could we check just presult->nfrozen > 0 here and get the same result?

I've updated the comment. The reason I had
if (presult->all_frozen ||  presult->nfrozen > 0) was because of this
comment in heapam.h in the HeapPageFreeze struct

     * "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
     *
     * Trackers used when heap_freeze_execute_prepared freezes, or when there
     * are zero freeze plans for a page.  It is always valid for vacuumlazy.c
     * to freeze any page, by definition.  This even includes pages that have
     * no tuples with storage to consider in the first place.  That way the
     * 'totally_frozen' results from heap_prepare_freeze_tuple can always be
     * used in the same way, even when no freeze plans need to be executed to
     * "freeze the page".  Only the "freeze" path needs to consider the need
     * to set pages all-frozen in the visibility map under this scheme.
     *
     * When we freeze a page, we generally freeze all XIDs < OldestXmin, only
     * leaving behind XIDs that are ineligible for freezing, if any.  And so
     * you might wonder why these trackers are necessary at all; why should
     * _any_ page that VACUUM freezes _ever_ be left with XIDs/MXIDs that
     * ratchet back the top-level NewRelfrozenXid/NewRelminMxid trackers?
     *
     * It is useful to use a definition of "freeze the page" that does not
     * overspecify how MultiXacts are affected.  heap_prepare_freeze_tuple
     * generally prefers to remove Multis eagerly, but lazy processing is used
     * in cases where laziness allows VACUUM to avoid allocating a new Multi.
     * The "freeze the page" trackers enable this flexibility.
     */

So, I don't really know if it is right to just check presult->nfrozen >
0 when updating relminmxid. I have changed it to the way you suggested.
But we can change it back.

> Here are some patches on top of your patches for some further refactorings.
> Some notable changes in heap_page_prune_and_freeze():
> 
> - I moved the heap_prepare_freeze_tuple() work from the 2nd loop to the 1st
> one. It seems more clear and efficient that way.

cool. I kept this.

> - I extracted the code to generate the WAL record to a separate function.

cool. kept this.

> These patches are in a very rough WIP state, but I wanted to share early. I
> haven't done much testing, and I'm not wedded to these changes, but I think
> they make it more readable. Please include / squash in the patch set if you
> agree with them.

I've squashed the changes into and across my nineteen patches :)

I cleaned up your sugestions a bit and made a few stylistic choices.

In this version, I also broke up the last couple commits which
streamlined the WAL record and eliminated XLOG_HEAP2_FREEZE/VACUUM and
redistributed those changes in a way that I thought made sense.

Now, the progression is that in one commit we merge the prune and freeze
record, eliminating the XLOG_HEAP2_FREEZE record. Then, in another
commit, we eliminate the XLOG_HEAP2_VACUUM record. Then a later commit
streamlines the new mega xl_heap_prune struct into the variable size
structure based on which modifications it includes.

> From 622620a7875ae8c1626e9cd118156e0c734d44ed Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 17 Mar 2024 22:52:28 +0200
> Subject: [PATCH v3heikki 1/4] Inline heap_frz_conflict_horizon() to the
>  caller.
> 
> FIXME: This frz_conflict_horizon business looks fishy to me. We have:
> - local frz_conflict_horizon variable,
> - presult->frz_conflict_horizon, and
> - prstate.snapshotConflictHorizon

Yea, this is a mistake I made when I was rebasing some changes in. The
local variable is gone now.


> should we really have all three, and what are the differences?

We do need both the prstate.snapshotConflictHorizon and the
presult->frz_conflict_horizon because the youngest freezable xmin will
often be different than the oldest removable xmax, so we have to track
both and take the most conservative one if we are both freezing and
pruning.

The third (local variable) one was an oops.

> From 0219842487931f899abcf183c863c43270c098f0 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 17 Mar 2024 23:05:31 +0200
> Subject: [PATCH v3heikki 2/4] Misc cleanup
> ---
>  src/backend/access/heap/pruneheap.c  | 16 +++++++---------
>  src/backend/access/heap/vacuumlazy.c |  7 ++++++-
>  2 files changed, 13 insertions(+), 10 deletions(-)
> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 8d3723faf3a..d3df7029667 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -433,7 +433,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
>       * heap_page_prune_and_freeze(). We expect vistest will always make
>       * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
>       * OldestXmin.  lazy_scan_prune must never become confused about whether a
> -     * tuple should be frozen or removed.  (In the future we might want to
> +     * tuple should be frozen or removed.
> +     * HEIKKI: Is such confusion possible anymore?
> +     * (In the future we might want to
>       * teach lazy_scan_prune to recompute vistest from time to time, to
>       * increase the number of dead tuples it can prune away.)

TBH, I don't really know what this comment is even saying. But
lazy_scan_prune() doesn't do any freezing anymore, so I removed this
sentence.

>       */
> @@ -1387,6 +1389,9 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
>   * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
>   * left with storage after pruning.
>   *
> + * HEIKKI: does the above paragraph still make sense? We don't call
> + * HeapTupleSatisfiesVacuum() here anymore
> + *

Yea this whole comment definitely doesn't belong here anymore. Even
though we are calling HeapTupleSatisfiesVacuum() (from inside
heap_prune_satisifes_vacuum()) inside heap_page_prune_and_freeze(), the
comment really doesn't fit anywhere in there either. The comment is
describing a situation that is no longer possible. So describing a
situation that is no longer possible in a part of the code that it never
could have been possible does not make sense to me. I've removed the
comment.

>   * As of Postgres 17, we circumvent this problem altogether by reusing the
>   * result of heap_page_prune_and_freeze()'s visibility check. Without the
>   * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
> -- 
> 2.39.2
> 

> From d72cebf13f9866112309883f72a382fc2cb57e17 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 17 Mar 2024 23:08:42 +0200
> Subject: [PATCH v3heikki 3/4] Move work to the first loop
>  src/backend/access/heap/pruneheap.c | 141 ++++++++++------------------
>  1 file changed, 52 insertions(+), 89 deletions(-)
> 
> diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
> index b3573bb628b..3541628799a 100644
> --- a/src/backend/access/heap/pruneheap.c
> +++ b/src/backend/access/heap/pruneheap.c
> @@ -78,9 +78,6 @@ static int    heap_prune_chain(Buffer buffer,
>                               PruneState *prstate, PruneFreezeResult *presult);
>  
>  static inline HTSV_Result htsv_get_valid_status(int status);
> -static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
> -                                       HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
> -                                       PruneFreezeResult *presult);
>  static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
>  static void heap_prune_record_redirect(PruneState *prstate,
>                                         OffsetNumber offnum, OffsetNumber rdoffnum,
> @@ -322,6 +319,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
>      /* for recovery conflicts */
>      presult->frz_conflict_horizon = InvalidTransactionId;
>  
> +    /*
> +     * We will update the VM after pruning, collecting LP_DEAD items, and
> +     * freezing tuples. Keep track of whether or not the page is all_visible
> +     * and all_frozen and use this information to update the VM. all_visible
> +     * implies lpdead_items == 0, but don't trust all_frozen result unless
> +     * all_visible is also set to true.
> +     */
> +    /* HEIKKI: the caller updates the VM? not us */

I've updated this comment.

Other questions and notes on v4:

xl_heap_prune->flags is a uint8, but we are already using 7 of the bits.
Should we make it a unit16?

Eventually, I would like to avoid emitting a separate XLOG_HEAP2_VISIBLE
record for vacuum's first and second passes and just include the VM
update flags in the xl_heap_prune record. xl_heap_visible->flags is a
uint8. If we made xl_heap_prune->flags uint16, we could probably combine
them (though maybe we want other bits available). Also vacuum's second
pass doesn't set a snapshotConflictHorizon, so if we combined
xl_heap_visible and xl_heap_prune for vacuum we would end up saving even
more space (since vacuum sets xl_heap_visible->snapshotConflictHorizon
to InvalidXLogRecPtr).

A note on sub-record naming: I kept xl_heap_freeze_plan's name but
prefixed the other sub-records with xlhp. Do you think it is worth
renaming it (to xlhp_freeze_plan)? Also, should I change xlhp_freeze to
xlhp_freeze_page?

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: pg16: XX000: could not find pathkey item to sort
Next
From: Amit Kapila
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation