Re: Add 64-bit XIDs into PostgreSQL 15 - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: Add 64-bit XIDs into PostgreSQL 15 |
Date | |
Msg-id | 19b9a166-dfa5-4c18-8a42-c60c45d4d5f5@postgrespro.ru Whole thread Raw |
In response to | Re: Add 64-bit XIDs into PostgreSQL 15 (Aleksander Alekseev <aleksander@timescale.com>) |
List | pgsql-hackers |
07.07.2025 11:17, Evgeny Voropaev wrote: > Hello, hackers! > > Unfortunately, the problem of inconsistency while using prune_frezze > with repairFragmentation=false does not only pertain to the content of > dead and unused tuples, but it also can bring about inconsistency of > locations of alive tuples. > > This case appears in the logic of heap_insert. See the attached figure. > When heap_insert determines that a new tuple is the only one on a page, > it sets the XLOG_HEAP_INIT_PAGE and, as a result, “redo”-side > initializes the new page and inserts the new tuple on this new page > instead of inserting the new tuple on the existing page. > > So, we have the next situation in the xid64 patch. > > Do-side: > 1. Having page ABC with several tuples. > 2. Starting to perform insertion of new tuple > 2.1. In the case of an inappropriate xid_base, trying to fit base > 2.1.1 Freezing and pruning tuples without further repairing > fragmentation. > 2.1.2 All tuples have been pruned (no alive tuples on the page > since this moment) > 3. Inserting a new tuple and setting XLOG_HEAP_INIT_PAGE, assuming that > the only tuple located at the bottom of the page (assuming that > fragmentation has been performed). > > > Result: We have the ABC page with the new tuple inserted somewhere in > the MIDDLE of the page and surrounded with garbage from dead and unused > tuples. At the same time we have an xlog record bringing the > XLOG_HEAP_INIT_PAGE bit. > > Redo-side > 1. Observing XLOG_HEAP_INIT_PAGE > 2. Creating a new page and inserting the new tuple into the first > position of the page. > > Result: We have the ABC page with the new tuple inserted at the BOTTOM > of the page. > > This example of inconsistency is not about the content of the tuple but > about tuple’s locations on the page. And tuple offsets are not subject > to masking by the standard masking procedure. Wow, it is really great bug. > The possible fix can be like one in attachment. But what I’m trying to > suggest is adhering to the original realization of PG, performing > prune_freeze only under a buffer cleanup lock, and fully excluding > repairFragmentation=false as a vice! Not setting XLOG_HEAP_INIT_PAGE, like in your patch, looks like the right way to me. "only under a buffer cleanup lock" means no any backend may have a pin on this page. This greatly increase rate of failure on update or delete of tuple in this page. For insert there could be workaround, but still it will be quite trivial. Therefore, "fully excluding repairFragmentation=false" is not wise. But... probably we could try to acquire "cleanup lock" to perform prune_freeze with "repairFragmentation=true", and fallback to "repairFragmentation=false" in case of acquire failure. Still, I repeat, "excluding repairFragmentation=false" completely doesn't look feasible to me. -- regards Yura Sokolov aka funny-falcon
pgsql-hackers by date: