Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date
Msg-id mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y@sy4ymcdtdklo
Whole thread Raw
In response to Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
Hi,

On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
> From 363f0e4ac9ac7699a6d9c2a267a2ad60825407c8 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 17 Nov 2025 15:11:27 -0500
> Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
>
> Refactor the setup and planning phases of pruning and freezing into
> helpers. This streamlines heap_page_prune_and_freeze() and makes it more
> clear when the examination of tuples ends and page modifications begin.

I think this is a considerable improvement.

I didn't review this with a lot of detail, given that it's mostly moving
code.

One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
argument, prune_freeze_setup() gets the entire prstate,
heap_page_will_freeze() gets the Relation. It's what they need, but still a
bit odd.


FWIW, I found the diff generated by
  git show --diff-algorithm=minimal --color-moved-ws=allow-indentation-change

useful for viewing this diff, showed much more clearly how little the code
actually changed.



> From 8ebaf434af5afaebcf71550116c59355b3bf15c1 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Wed, 8 Oct 2025 15:39:01 -0400
> Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
>  prune/freeze
>
> Vacuum no longer emits a separate WAL record for each page set
> all-visible or all-frozen during phase I. Instead, visibility map
> updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
> is already emitted for pruning and freezing.
>
> Previously, heap_page_prune_and_freeze() determined whether a page was
> all-visible, but the corresponding VM bits were only set later in
> lazy_scan_prune(). Now the VM is updated immediately in
> heap_page_prune_and_freeze(), at the same time as the heap
> modifications.
>
> This change applies only to vacuum phase I, not to pruning performed
> during normal page access.

Hm. This change makes sense, but unfortunately I find it somewhat hard to
review. There are a lot of changes that don't obviously work towards one
goal in this commit.

>@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
>     Relation    relation;       /* relation containing buffer to be pruned */
>     Buffer      buffer;         /* buffer to be pruned */
> 
>+    /*
>+     *
>+     * vmbuffer is the buffer that must already contain contain the required
>+     * block of the visibility map if we are to update it. blk_known_av is the
>+     * visibility status of the heap block as of the last call to
>+     * find_next_unskippable_block().
>+     */
>+    Buffer      vmbuffer;
>+    bool        blk_known_av;
>+
>     /*
>      * The reason pruning was performed.  It is used to set the WAL record
>      * opcode which is used for debugging and analysis purposes.

What is blk_known_av set to if the block is known to not be all visible?
Compared to the case where we did not yet determine the visibility status of
the block?


>@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
>      * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
>      * LP_UNUSED during pruning.
>      *
>-     * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
>-     * will return 'all_visible', 'all_frozen' flags to the caller.
>+     * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
>+     *
>+     * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
>+     * in the VM.
>      */
>     int         options;

nit^2: The previous version and the other paragraphs end in a .


> @@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
>          /* There should be no more data */
>          Assert((char *) frz_offsets == dataptr + datalen);
>
> -        if (vmflags & VISIBILITYMAP_VALID_BITS)
> -            PageSetAllVisible(page);
> -
> -        MarkBufferDirty(buffer);
> +        if (do_prune || nplans > 0)
> +            mark_buffer_dirty = set_lsn = true;
>
>          /*
> -         * See log_heap_prune_and_freeze() for commentary on when we set the
> -         * heap page LSN.
> +         * The critical integrity requirement here is that we must never end
> +         * up with with the visibility map bit set and the page-level
> +         * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page

s/clear/unset/ would be a tad clearer.


> +         * modification would fail to clear the visibility map bit.
> +         *
> +         * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
> +         * marking an all-visible page all-frozen). If only the VM is updated,
> +         * the heap page need not be dirtied.
>           */
> -        if (do_prune || nplans > 0 ||
> -            ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
> +        if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
> +        {
> +            PageSetAllVisible(page);
> +            mark_buffer_dirty = true;
> +
> +            /*
> +             * See log_heap_prune_and_freeze() for commentary on when we set
> +             * the heap page LSN.
> +             */
> +            if (XLogHintBitIsNeeded())
> +                set_lsn = true;
> +        }

Maybe worth adding something like Assert(!set_lsn || mark_buffer_dirty)?


> +/*
> + * Decide whether to set the visibility map bits for heap_blk, using
> + * information from PruneState and blk_known_av. Some callers may already
> + * have examined this page’s VM bits (e.g., VACUUM in the previous
> + * heap_vac_scan_next_block() call) and can pass that along.

That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
to me how the state of a block where did determine that the block is *not*
all-visible is represented.


> + * Returns true if one or both VM bits should be set, along with the desired
> + * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
> + * should be set on the heap page.
> + */
> +static bool
> +heap_page_will_set_vis(Relation relation,
> +                       BlockNumber heap_blk,
> +                       Buffer heap_buf,
> +                       Buffer vmbuffer,
> +                       bool blk_known_av,
> +                       const PruneState *prstate,
> +                       uint8 *vmflags,
> +                       bool *do_set_pd_vis)
> +{
> +    Page        heap_page = BufferGetPage(heap_buf);
> +    bool        do_set_vm = false;
> +
> +    *do_set_pd_vis = false;
> +
> +
> +    /*
> +     * Now handle two potential corruption cases:
> +     *
> +     * These do not need to happen in a critical section and are not
> +     * WAL-logged.
> +     *
> +     * As of PostgreSQL 9.2, the visibility map bit should never be set if the
> +     * page-level bit is clear.  However, it's possible that in vacuum the bit
> +     * got cleared after heap_vac_scan_next_block() was called, so we must
> +     * recheck with buffer lock before concluding that the VM is corrupt.
> +     */
> +    else if (blk_known_av && !PageIsAllVisible(heap_page) &&
> +             visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
> +    {
> +        ereport(WARNING,
> +                (errcode(ERRCODE_DATA_CORRUPTED),
> +                 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
> +                        RelationGetRelationName(relation), heap_blk)));
> +
> +        visibilitymap_clear(relation, heap_blk, vmbuffer,
> +                            VISIBILITYMAP_VALID_BITS);

Wait, why is it ok to perform this check iff blk_known_av is set?


> +            old_vmbits = visibilitymap_set_vmbits(blockno,
> +                                                  vmbuffer, new_vmbits,
> +                                                  params->relation->rd_locator);
> +            if (old_vmbits == new_vmbits)
> +            {
> +                LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> +                /* Unset so we don't emit WAL since no change occurred */
> +                do_set_vm = false;
> +            }
> +        }

What can lead to this path being reached? Doesn't this mean that something
changed the state of the VM while we were holding an exclusive lock on the
heap buffer?


> +        /*
> +         * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
> +         * only updating the VM and it turns out it was already set, we will
> +         * have unset do_set_vm earlier. As such, check it again before
> +         * emitting the record.
> +         */
> +        if (RelationNeedsWAL(params->relation) &&
> +            (do_prune || do_freeze || do_set_vm))
> +        {
>              log_heap_prune_and_freeze(params->relation, buffer,
> -                                      InvalidBuffer,    /* vmbuffer */
> -                                      0,    /* vmflags */
> +                                      do_set_vm ? vmbuffer : InvalidBuffer,
> +                                      do_set_vm ? new_vmbits : 0,
>                                        conflict_xid,
> -                                      true, params->reason,
> +                                      true, /* cleanup lock */
> +                                      do_set_pd_vis,
> +                                      params->reason,
>                                        prstate.frozen, prstate.nfrozen,
>                                        prstate.redirected, prstate.nredirected,
>                                        prstate.nowdead, prstate.ndead,

This function is now taking 16 parameters :/


> @@ -959,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>
>      END_CRIT_SECTION();
>
> +    if (do_set_vm)
> +        LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> +
> +    /*
> +     * During its second pass over the heap, VACUUM calls
> +     * heap_page_would_be_all_visible() to determine whether a page is
> +     * all-visible and all-frozen. The logic here is similar. After completing
> +     * pruning and freezing, use an assertion to verify that our results
> +     * remain consistent with heap_page_would_be_all_visible().
> +     */
> +#ifdef USE_ASSERT_CHECKING
> +    if (prstate.all_visible)
> +    {
> +        TransactionId debug_cutoff;
> +        bool        debug_all_frozen;
> +
> +        Assert(prstate.lpdead_items == 0);
> +        Assert(prstate.cutoffs);
> +
> +        if (!heap_page_is_all_visible(params->relation, buffer,
> +                                      prstate.cutoffs->OldestXmin,
> +                                      &debug_all_frozen,
> +                                      &debug_cutoff, off_loc))
> +            Assert(false);

I don't love Assert(false), because the message for the assert failure is
pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
() that has no body other than Assert(false)? Just Assert the expression
directly.


> From 34f0009570e117d7d48b560cd097ee25c6cdcc7c Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Sat, 27 Sep 2025 11:55:21 -0400
> Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
>
> As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
> marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This whole business of treating empty pages as all-visible continues to not
make any sense to me. Particularly in combination with a not crashsafe FSM it
just seems ... unhelpful. It also means that there there's a decent chance of
extra WAL when bulk extending. But that's not the fault of this change.


> From 0d6a06d4533cfe153440d301c3d20915ba07892f Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Sat, 27 Sep 2025 11:55:36 -0400
> Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
>
> As no remaining users emit XLOG_HEAP2_VISIBLE records.
> This includes deleting the xl_heap_visible struct and all functions
> responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

Probably worth mentioning that this changes the VM API.


> @@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
>   *
>   * This is used for several different page maintenance operations:
>   *
> - * - Page pruning, in VACUUM's 1st pass or on access: Some items are
> + * - Page pruning, in vacuum phase I or on-access: Some items are
>   *   redirected, some marked dead, and some removed altogether.
>   *
> - * - Freezing: Items are marked as 'frozen'.
> + * - Freezing: During vacuum phase I, items are marked as 'frozen'
>   *
> - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
> + * - Reaping: During vacuum phase III, items that are already LP_DEAD are
> + *   marked as unused.
>   *
> - * They have enough commonalities that we use a single WAL record for them
> + * - VM updates: After vacuum phases I and III, the heap page may be marked
> + *   all-visible and all-frozen.
> + *
> + * These changes all happen together, so we use a single WAL record for them
>   * all.
>   *
>   * If replaying the record requires a cleanup lock, pass cleanup_lock =
>   true.

How's that related to the commit's subject?


> From fd0455230968fd919999a5c035f3830d310f0e49 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Fri, 18 Jul 2025 16:30:04 -0400
> Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
>  GlobalVisXidVisibleToAll()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> The function is currently only used to check whether a tuple’s xmax is
> visible to all transactions (and thus removable). Upcoming changes will
> also use it to test whether a tuple’s xmin is visible to all to
> decide if a page can be marked all-visible in the visibility map.
>
> The new name, GlobalVisXidVisibleToAll(), better reflects this broader
> purpose.

If we want this - and I'm not convinced we do - I think it needs to go further
and change the other uses of removable in
procarray.c. ComputeXidHorizonsResult has a lot of related fields.

There's also GetOldestNonRemovableTransactionId(),
GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
included in the renaming.


> From 565014e31aa117fb43993ee2e64da38ffb74f372 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Tue, 29 Jul 2025 14:38:24 -0400
> Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
>  visibility
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> During vacuum's first and third phases, we examine tuples' visibility
> to determine if we can set the page all-visible in the visibility map.
>
> Previously, this check compared tuple xmins against a single XID chosen at
> the start of vacuum (OldestXmin). We now use GlobalVisState, which also
> enables future work to set the VM during on-access pruning, since ordinary
> queries have access to GlobalVisState but not OldestXmin.
>
> This also benefits vacuum directly: in some cases, GlobalVisState may
> advance during a vacuum, allowing more pages to become considered
> all-visible. And, in the future, we could easily add a heuristic to
> update GlobalVisState more frequently during vacuums of large tables. In
> the rare case that the GlobalVisState moves backward, vacuum falls back
> to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
> wasn’t yet prunable according to the GlobalVisState.

I think it may be better to make sure that the GlobalVisState can't move
backward.


> From bced81f6df3d303679fac2a1414d42f0db401232 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Tue, 29 Jul 2025 14:34:30 -0400
> Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
>
> Many queries do not modify the underlying relation. For such queries, if
> on-access pruning occurs during the scan, we can check whether the page
> has become all-visible and update the visibility map accordingly.
> Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> all-frozen.

> Supporting this requires passing information about whether the relation
> is modified from the executor down to the scan descriptor.

I think it'd be good to split this part into a separate commit. The set of
folks to review that are distinct (and broader) from the ones looking at
heapam internals.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Mihail Nikalayeu
Date:
Subject: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Next
From: Masahiko Sawada
Date:
Subject: Re: Add mode column to pg_stat_progress_vacuum