Re: Add 64-bit XIDs into PostgreSQL 15 - Mailing list pgsql-hackers

From Evgeny Voropaev
Subject Re: Add 64-bit XIDs into PostgreSQL 15
Date
Msg-id 4eb56320-744e-49ba-b766-702bc2fb61a8@tantorlabs.com
Whole thread Raw
In response to Re: Add 64-bit XIDs into PostgreSQL 15  (Aleksander Alekseev <aleksander@timescale.com>)
List pgsql-hackers
Hello, hackers!

Unfortunately, the problem of inconsistency while using prune_frezze 
with repairFragmentation=false does not only pertain to the content of 
dead and unused tuples, but it also can bring about inconsistency of 
locations of alive tuples.

This case appears in the logic of heap_insert. See the attached figure. 
When heap_insert determines that a new tuple is the only one on a page, 
it sets the XLOG_HEAP_INIT_PAGE and, as a result, “redo”-side 
initializes the new page and inserts the new tuple on this new page 
instead of inserting the new tuple on the existing page.

So, we have the next situation in the xid64 patch.

Do-side:
1. Having page ABC with several tuples.
2. Starting to perform insertion of new tuple
    2.1. In the case of an inappropriate xid_base, trying to fit base
    2.1.1 Freezing and pruning tuples without further repairing 
fragmentation.
     2.1.2 All tuples have been pruned (no alive tuples on the page 
since this moment)
3. Inserting a new tuple and setting XLOG_HEAP_INIT_PAGE, assuming that 
the only tuple located at the bottom of the page (assuming that 
fragmentation has been performed).


Result: We have the ABC page with the new tuple inserted somewhere in 
the MIDDLE of the page and surrounded with garbage from dead and unused 
tuples. At the same time we have an xlog record bringing the 
XLOG_HEAP_INIT_PAGE bit.

Redo-side
1. Observing XLOG_HEAP_INIT_PAGE
2. Creating a new page and inserting the new tuple into the first 
position of the page.

Result: We have the ABC page with the new tuple inserted at the BOTTOM 
of the page.

This example of inconsistency is not about the content of the tuple but 
about tuple’s locations on the page. And tuple offsets are not subject 
to masking by the standard masking procedure.

The possible fix can be like one in attachment. But what I’m trying to 
suggest is adhering to the original realization of PG, performing 
prune_freeze only under a buffer cleanup lock, and fully excluding 
repairFragmentation=false as a vice!

Best regards,
Evgeny Voropaev,
Tantor Labs, LLC.
Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: array_random
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: meson vs. llvm bitcode files