Re: Add 64-bit XIDs into PostgreSQL 15 - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: Add 64-bit XIDs into PostgreSQL 15
Date
Msg-id 6dff672e-b8b1-4d44-bbe2-cbd2eb96a2bf@postgrespro.ru
Whole thread Raw
In response to Re: Add 64-bit XIDs into PostgreSQL 15  (Evgeny Voropaev <evorop.wiki@gmail.com>)
List pgsql-hackers
11.06.2025 09:00, Evgeny Voropaev wrote:
> 2) About repairing fragmentation.
> 
> The original approach implemented in PG18 assumes that fragmentation 
> occurs during every `prune_freeze` operation. It happens because the 
> logic of the "redo"-function `heap_xlog_prune_freeze` assumes that 
> fragmentation has to be done by `heap_page_prune_execute`.


> Attempting to 
> omit fragmentation can result in page inconsistencies on the "redo"-side 
> (i.e. on a secondary node, or during the recovery process on primary 
> one).

No! Because patch uses flag in WAL record to instruct "redo"-side to omit
fragmentation as well if needed.

> So, implementation of optional repairing of fragmentation 
> conflicts with the basic assumption about "necessity of fragmentation". 
> In order to prevent inconsistency xid64v62 patch invokes 
> `heap_page_prune_and_freeze` with `repairFragmentation` equal to true 
> from everywhere in the patch code except from 
> `heap_page_prepare_for_xid` which uses `repairFragmentation=false`.
> 
> So, why must we perform a `heap_page_prune_execute` without a 
> fragmentation during the preparation of a page for xid?
> 
> What exactly would break if we did invoke `heap_page_prune_execute` with 
> `repairFragmentation=true` during performing of `heap_page_prepare_for_xid`?

Short answer:
- `repairFragmentation` parameter were added after investigating real
production issues with earlier patch versions.

Long answer:

How SELECT works with tuples on a page?
It:
- PINS the page
- takes CONTENT LOCK in SHARED mode
- collects HeapTuples which LOOKS INTO RAW PAGE with t_data.t_choice.t_heap
- RELEASES content lock
- may use those HeapTuples for indefinitely long time relying only on PIN
of the page.

I.e. SELECT relies on the fact, while a page is pinned, tuples on the page
stay at the same positions in memory.

That is why LockBufferForCleanup and ConditionalLockBufferForCleanup checks
there is only single PIN on the page - only backend which will perform
cleanup is allowed to PIN the page.

UPDATE/INSERT/DELETE lock CONTENT LOCK in EXCLUSIVE mode because they may
add new tuples. But they are not allowed to move tuples because concurrent
backends allowed to read tuples from the page in exactly same moment.

-- 
regards
Yura Sokolov aka funny-falcon



pgsql-hackers by date:

Previous
From: Dmitry Koval
Date:
Subject: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Next
From: Christoph Berg
Date:
Subject: Re: CHECKPOINT unlogged data