Re: IPC/MultixactCreation on the Standby server - Mailing list pgsql-hackers
| From | Andrey Borodin |
|---|---|
| Subject | Re: IPC/MultixactCreation on the Standby server |
| Date | |
| Msg-id | BCE7AEBD-3737-4459-B6CE-4271F3108CB6@yandex-team.ru Whole thread Raw |
| In response to | Re: IPC/MultixactCreation on the Standby server (Heikki Linnakangas <hlinnaka@iki.fi>) |
| Responses |
Re: IPC/MultixactCreation on the Standby server
|
| List | pgsql-hackers |
> On 28 Nov 2025, at 21:21, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 27/11/2025 20:25, Andrey Borodin wrote: >>> On 27 Nov 2025, at 01:59, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> >>> This is because the WAL, created with old version, contains records like this: >>> >>> lsn: 0/05561030, prev 0/05561008, desc: CREATE_ID 2047 offset 4093 nmembers 2: 2830 (keysh) 2831 (keysh) >>> lsn: 0/055611A8, prev 0/05561180, desc: ZERO_OFF_PAGE 1 >>> lsn: 0/055611D0, prev 0/055611A8, desc: CREATE_ID 2048 offset 4095 nmembers 2: 2831 (keysh) 2832 (keysh) >> That's an interesting case. I don't see how SLRU interface can be >> used to test if SLRU page is initialized correctly. We need a >> version of SimpleLruReadPage() that can avoid failure if page does >> not exist yet, and this must not be more expensive than current >> SimpleLruReadPage(). Alternatively we need new >> XLOG_MULTIXACT_CREATE_ID_2 in back branches. > > There is SimpleLruDoesPhysicalPageExist() to check if a page has been initialized. There's also the 'latest_page_number'field which tracks what is the latest page that has been initialized. During working on v8 of this patch I observed races between SimpleLruDoesPhysicalPageExist() and ExtendMultiXactOffset().Because ExtendMultiXactOffset() may initialize page only in a buffer, and file system call will notobserve it. latest_page_number looks great! though I'm slightly worried by this commend: /* * During XLOG replay, latest_page_number isn't necessarily set up * yet; insert a suitable value to bypass the sanity test in * SimpleLruTruncate. */ and by /* * latest_page_number is the page number of the current end of the log; * this is __not critical data__, since we use it only to avoid swapping out * the latest page. */ If we use it in startup process to prevent failure, now it is critical data. But I think the comments are off, latest_page_number is updated precisely. The page initialization dance is only needed in back branches. And we inevitable will have conflicts with SLRU refactoringin 18 and banking in 17. Conceptually v12 looks good to me, I can prepare backport versions. You definitely beat LLM in commit message clarity. Though, s/coudl/could/g. > On 28 Nov 2025, at 21:21, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >>> - I reverted the changes to ExtendMultiXactOffset(), so that it >>> deals with wraparound and FirstMultiXactId the same way as before. >>> The caller never passes FirstMultiXactId, but the changed comments >>> and the assertion were confusing, so I felt it's best to just >>> leave it alone >> Maybe move a decision (to extend or not to extend) out of this >> function? Now its purpose is "MaybeExtendMultiXactOffset". And >> there's just one caller. > > Perhaps. Let's leave that for a separate non-backported patch though. OK, we just have to be sure that we do not emit 2 ZERO_OFF_PAGE records during wraparound. The problem seems to be introduced by 1986ca5 as a fix for a problem that might be there even longer. Perhaps, since multitransactionsexisted. Interestingly the only know to me report was by Thom Brown in 2024 (besides this thread). https://www.postgresql.org/message-id/CAA-aLv7AcTh%2BO9rnoPVbck%3DFw8atMOYrUttk5G9ctFhXOsqQbw%40mail.gmail.com Thanks! Best regards, Andrey Borodin.
pgsql-hackers by date: