Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date
Msg-id CAAKRu_aMTPcu2uXqdWPCKEo5ePqoD+B=o9HtFAHdMsz2Jb3MaA@mail.gmail.com
Whole thread Raw
In response to Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
Hi,

Attached v26 includes a new patch, 0002, which gets rid of
all_visible_according_to_vm in lazy_scan_prune(). We've kept this
cached copy of the all-visible bit since the VM was added way back in
608195a3a365. Back then, the VM wasn't pinned unless
all_visible_according_to_vm was false. Now that we unconditionally
have the VM page pinned, there isn't much performance benefit to using
that cached value. I did some testing of the worst possible case and
saw no difference in timing. By removing that, we simplify heap vacuum
code now.  And we improve clarity once the VM update is combined into
the prune/freeze WAL record and when the VM is set on-access.

I think 0001 and 0002 (and maybe 0003) are worthwhile clarity
improvements on their own.

On Wed, Dec 10, 2025 at 11:07 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
> A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from
reviewingthis patch. 

Thanks for the continued review. Your feedback is improving the patchset.

> The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
> ```
> evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
>  blkno | all_visible | all_frozen
> -------+-------------+------------
>      0 | t           | t
> (1 row)

I've done this. I've actually added three such verifications -- one
after each step where the VM is expected to change. It shouldn't be
very expensive, so I think it is okay. The way the test would fail if
the buffer wasn't correctly dirtied is that it would assert out -- so
the visibility map test wouldn't even have a chance to fail. But, I
think it is also okay to confirm that the expected things are
happening with the VM -- it just gives us extra coverage.

>                 if (presult.all_frozen)
>                 {
> +                       /*
> +                        * We can pass InvalidTransactionId as our cutoff_xid, since a
> +                        * snapshotConflictHorizon sufficient to make everything safe for
> +                        * REDO was logged when the page's tuples were frozen.
> +                        */
>                         Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
> -                       flags |= VISIBILITYMAP_ALL_FROZEN;
> +                       new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
>                 }
>
> The comment here is a little confusing. In the old code, the Assert() as immediately above the call
visibilitymap_set(),and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the
commentfar away from the call visibilitymap_set(), so I think the comment should stay together with the call of
visibilitymap_set().

Good point. I've moved it closer to visibilitymap_set() and modified
and moved the assert so that it is together with the comment. I think
the comment makes little sense without the assertion.

>  * If it finds that the page-level visibility hint or VM is corrupted, it will
> * fix them by clearing the VM bits and visibility page hint. This does not
>
> In the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually
“page-levelvisibility hint”, so how about just “visibility hint”. 

I've changed this.

>         /*
> -        * As of PostgreSQL 9.2, the visibility map bit should never be set if the
> -        * page-level bit is clear.  However, it's possible that the bit got
> -        * cleared after heap_vac_scan_next_block() was called, so we must recheck
> -        * with buffer lock before concluding that the VM is corrupt.
> +        * For the purposes of logging, count whether or not the page was newly
> +        * set all-visible and, potentially, all-frozen.
>          */
> -       else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
> -                        visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
> +       if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
> +               (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
>         {
> ```
>
> Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into
“if(do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like: 

Good point. I've actually gone ahead in 0002 and refactored this whole
section a bit (I got rid of all_visible_according_to_vm). 0002 is a
new patch in this attached v26, and it needs review. I think this
refactoring makes the code quite a bit clearer -- especially once we
start setting the VM on-access. It does, amongst other things, return
early if all_visible is false, like you suggested.

> + * Returns true if one or both VM bits should be set, along with returning the
> + * desired what bits should be set in the VM in *new_vmbits.
> ```
>
> Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be
set”.

Fixed.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Next
From: Nathan Bossart
Date:
Subject: Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ?barriers