Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15) - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15) |
Date | |
Msg-id | CAJ7c6TNik98PyeYgdFBxujbA8jJXPxnHYy4qWLovc9X-MFT5bA@mail.gmail.com Whole thread Raw |
In response to | Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15) (Andrey Borodin <amborodin86@gmail.com>) |
List | pgsql-hackers |
Hi Andrey, > Hi! I think that 64-bit xids are a very important feature and I want > to help advance it. That's why I want to try to understand a patch > better. Thanks for your interest to the patchset! > Do I get it right that the proposed v51 patchset only changes the SLRU > filenames and type of pageno representation? Is SLRU wraparound still > exactly there after 0xFFFFFFFF byte? OK, let me give some background then. I suspect you already know this, but this can be useful for other reviewers. Additionally we have two rather large threads on our hands and it's easy to lose track of things. SLRU is basically a general-purpose LRU implementation with ReadPage() / WritePage() interface with the only exception that instead of something like Page* object it operates slot numbers (array indexes). SLRU is used as an underlying container for several internal PostgreSQL structures, most importantly CLOG. Despite the name CLOG is not a log (journal) but rather a large bit array. For every transaction it stores two bits that reflect the status of the transaction (more detail in clog.c / clog.h). Currently SLRU operates 32-bit page numbers. What we currently agreed on [1] and what we are trying to achieve in this thread is to make SLRU pages 64-bit. The rest of the 64-bit XIDs is discussed in another thread [2]. As Robert Haas put it: > Specifically, there are a couple of patches in here that > have to do with making SLRUs indexed by 64-bit integers rather than by > 32-bit integers. We've had repeated bugs in the area of handling SLRU > wraparound in the past, some of which have caused data loss. Just by > chance, I ran across a situation just yesterday where an SLRU wrapped > around on disk for reasons that I don't really understand yet and > chaos ensued. Switching to an indexing system for SLRUs that does not > ever wrap around would probably enable us to get rid of a whole bunch > of crufty code, and would also likely improve the general reliability > of the system in situations where wraparound is threatened. It seems > like a really, really good idea. So our goal here is to eliminate wrap-around for SLRU. It means that if I save something to the page 0x0000000012345678 it will stay there forever. Other parts of the system however have to form proper 64-bit page numbers in order to make it work. If they don't the wrap-around is possible for these particular subsystems (but not SLRU per se). > Also, I do not understand what is the reason for splitting 1st and 2nd > steps. Certainly, there must be some logic behind it, but I just can't > grasp it... 0001 patch changes the SLRU internals without affecting the callers. It also preserves the short SLRU filenames which means nothing changes for an outside observer. All it changes is PostgreSQL binary. It can be merged any time and even backported to the previous versions if we want to. The 0002 patch makes changes to the callers and also enlarges SLRU filenames. For sure we could do everything at once, but it would complicate testing and more importantly code review. Personally I believe Maxim did a great job here. Both patches were easy to read and understand (relatively, of course). > And the purpose of the 3rd step with pg_upgrade changes is a complete > mystery for me. 0001 and 0002 will work fine for new PostgreSQL instances. But if you have an instance that already has on-disk state we have to move the SLRU segments accordingly. This is what 0003 does. That's the theory at least. Personally I still have to meditate a bit more on the code in order to get a good understanding of it, especially the parts that deal with transaction epochs because this is something I have limited experience with. Also I wouldn't exclude the possibility of bugs. Particularly this part of 0003: ``` + oldseg.segno = pageno / SLRU_PAGES_PER_SEGMENT; + oldseg.pageno = pageno % SLRU_PAGES_PER_SEGMENT; + + newseg.segno = pageno / SLRU_PAGES_PER_SEGMENT; + newseg.pageno = pageno % SLRU_PAGES_PER_SEGMENT; ``` looks suspicious to me. I agree that adding a couple of additional comments could be appropriate, especially when it comes to epochs. [1]: https://www.postgresql.org/message-id/CA%2BTgmoZFmTGjgkmjgkcm2-vQq3_TzcoMKmVimvQLx9oJLbye0Q%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com -- Best regards, Aleksander Alekseev
pgsql-hackers by date: