Hello, hackers!
Unfortunately, the problem of inconsistency while using prune_frezze
with repairFragmentation=false does not only pertain to the content of
dead and unused tuples, but it also can bring about inconsistency of
locations of alive tuples.
This case appears in the logic of heap_insert. See the attached figure.
When heap_insert determines that a new tuple is the only one on a page,
it sets the XLOG_HEAP_INIT_PAGE and, as a result, “redo”-side
initializes the new page and inserts the new tuple on this new page
instead of inserting the new tuple on the existing page.
So, we have the next situation in the xid64 patch.
Do-side:
1. Having page ABC with several tuples.
2. Starting to perform insertion of new tuple
2.1. In the case of an inappropriate xid_base, trying to fit base
2.1.1 Freezing and pruning tuples without further repairing
fragmentation.
2.1.2 All tuples have been pruned (no alive tuples on the page
since this moment)
3. Inserting a new tuple and setting XLOG_HEAP_INIT_PAGE, assuming that
the only tuple located at the bottom of the page (assuming that
fragmentation has been performed).
Result: We have the ABC page with the new tuple inserted somewhere in
the MIDDLE of the page and surrounded with garbage from dead and unused
tuples. At the same time we have an xlog record bringing the
XLOG_HEAP_INIT_PAGE bit.
Redo-side
1. Observing XLOG_HEAP_INIT_PAGE
2. Creating a new page and inserting the new tuple into the first
position of the page.
Result: We have the ABC page with the new tuple inserted at the BOTTOM
of the page.
This example of inconsistency is not about the content of the tuple but
about tuple’s locations on the page. And tuple offsets are not subject
to masking by the standard masking procedure.
The possible fix can be like one in attachment. But what I’m trying to
suggest is adhering to the original realization of PG, performing
prune_freeze only under a buffer cleanup lock, and fully excluding
repairFragmentation=false as a vice!
Best regards,
Evgeny Voropaev,
Tantor Labs, LLC.