Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) - Mailing list pgsql-hackers

From Chao Li
Subject Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date
Msg-id 81DAFACF-7D55-4A84-ACB0-0425D1669DB4@gmail.com
Whole thread Raw
In response to Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers

> On Mar 4, 2026, at 16:59, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On Mar 3, 2026, at 23:52, Melanie Plageman <melanieplageman@gmail.com> wrote:
>>
>>
>>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert
shouldbe changed to something like: 
>>>
>>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>> TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
>>
>> This is covered by TransactionIdPrecedesOrEquals because
>> InvalidTransactionId is 0. We assume that in many places throughout
>> the code.
>>
>
> I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that
wouldleave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId.
ThusI think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at
thepoint. 
>
>
>>> I will continue with 0005 tomorrow.
>>
>
> 4 - 0005
> ```
>  * Caller must have pin on the buffer, and must *not* have a lock on it.
>  */
> void
> -heap_page_prune_opt(Relation relation, Buffer buffer)
> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> ```
>
> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit,
vmbufferonly passes in data into the function without passing out anything. 
>
> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the
headercommit to explain what this parameter will do. 
>
> 5  - 0006
> ```
> + *
> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
> + * page, but it does not need to be done in a critical section because
> + * clearing the VM is not WAL-logged.
> + */
> +static void
> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> ```
>
> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a
copy-pasto.
>
> 6 - 0006
> ```
> + if (prstate->lpdead_items > 0)
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("LP_DEAD item found on page marked as all-visible"),
> + errdetail("relation \"%s\", page %u, tuple %u",
> +   RelationGetRelationName(prstate->relation),
> +   prstate->block, offnum)));
> + }
> + else
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("tuple not visible to all found on page marked as all-visible"),
> + errdetail("relation \"%s\", page %u, tuple %u",
> +   RelationGetRelationName(prstate->relation),
> +   prstate->block, offnum)));
> + }
> ```
>
> I recently just learned that a detail message should use complete sentences, and end each with a period, and
capitalizethe first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html. 
>
> 7 - 0006
> ```
> + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> + {
> + /*
> + * As of PostgreSQL 9.2, the visibility map bit should never be set if
> + * the page-level bit is clear.  However, it's possible that the bit
> + * got cleared after heap_vac_scan_next_block() was called, so we must
> + * recheck with buffer lock before concluding that the VM is corrupt.
> + */
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
> + prstate->block,
> + RelationGetRelationName(prstate->relation))));
> + }
> ```
>
> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?
>
> 8 - 0007
> ```
> +static void
> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> +{
> + OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> + Page page = prstate->page;
> +
> + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> +   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> + !prstate->attempt_freeze));
> +
> + /* We'll fill in presult for the caller */
> + memset(presult, 0, sizeof(PruneFreezeResult));
> +
> + /*
> + * Since the page is all-visible, a count of the normal ItemIds on the
> + * page should be sufficient for vacuum's live tuple count.
> + */
> + for (OffsetNumber off = FirstOffsetNumber;
> + off <= maxoff;
> + off = OffsetNumberNext(off))
> + {
> + if (ItemIdIsNormal(PageGetItemId(page, off)))
> + prstate->live_tuples++;
> + }
> +
> + presult->live_tuples = prstate->live_tuples;
> +
> + /* Clear any stale prune hint */
> + if (TransactionIdIsValid(PageGetPruneXid(page)))
> + {
> + PageClearPrunable(page);
> + MarkBufferDirtyHint(prstate->buffer, true);
> + }
> +
> + presult->vmbits = prstate->vmbits;
> +
> + if (!PageIsEmpty(page))
> + presult->hastup = true;
> +}
> ```
>
> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That
couldbe a tiny optimization. 
> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and
heap_fix_vm_corruption(),so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;? 
> * Do we need to set all_visible and all_frozen to presult?
>
> 0008 LGTM
>
> I will continue with 0009 tomorrow.
>

9 - 0009
···
+     * Currently, only VACUUM performs freezing, but other callers may in the
+     * future. Other callers must initialize prstate.all_frozen to false,
···

Nit: prstate.all_frozen -> prstate.set_all_frozen

I saw you have fixed this in 0010, but I think it’s better also fix it here.

10 - 0010
```
+     * Whether or not the page was newly set all-visible and all-frozen during
+     * phase I of vacuuming.
      */
-    uint8        vmbits;
+    BlockNumber new_all_visible_pages;
+    BlockNumber new_all_visible_frozen_pages;
+    BlockNumber new_all_frozen_pages;
```

These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though
underlyingBlockNumber is uint32. I think they can be just int type. 

11 - 0010
```
+ BlockNumber new_all_visible_pages;
+ BlockNumber new_all_visible_frozen_pages;
+ BlockNumber new_all_frozen_pages;
```

I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
```
    PruneFreezeResult presult;
```
So, those fields will hold random values.

12 - 0010
```
+     * conflict would ahve been handled in reaction to the WAL record freezing
```

Nit: ahve -> have

0011 LGTM

13 - 0012 - bufmask.c
```
+     * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+     * for more details.
```

I don’t find a function named heap_xlog_prune_and_freeze().

14 - 0012 - heapam_xlog.c
```
+     * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+     * heap_xlog_prune_and_freeze()).
```

Same as 13.

0013 LGTM

I will try to finish the rest 5 commits tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Anthonin Bonnefoy
Date:
Subject: Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
Next
From: Jeff Davis
Date:
Subject: Re: [19] CREATE SUBSCRIPTION ... SERVER