Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting - Mailing list pgsql-hackers
From | Xuneng Zhou |
---|---|
Subject | Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting |
Date | |
Msg-id | CABPTF7WAQhp+=+JXOKGmy_MAz0WG1N9A+xCu2z83ZrmFoSrqpQ@mail.gmail.com Whole thread Raw |
In response to | Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
Hi Michael, Thanks again for your insightful review. On Mon, Oct 6, 2025 at 10:43 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Oct 04, 2025 at 03:21:07PM +0800, Xuneng Zhou wrote: > > v6 refactors and extends the infrastructure from the WAIT FOR command > > patch, applying it to read_local_xlog_page_guts. I'm also thinking of > > creating a standalone patch/commit for the extended > > infra in xlogwait, so it can be reused in different threads. > > Yes, I think that you should split your patch where you think that it > can make review easier, because your change touches a very sensitive > area of the code base: > - First patch tointroduce what you consider is the basic > infrastructure required for your patch, that can be shared between > multiple pieces. I doubt that you really need to have everything's > that in waitlsn.c to achieve what you want here. > - Second patch to introduce your actual feature, to make the callback > more responsive. > - Then, potentially have a third patch, that adds pieces of > infrastructure to waitlsn.c that you did not need in the first patch, > still are required for the waitlsn.c thread. It would be optionally > possible to rebase the waitlsn patch to use patches 1 and 3, then. > > I'd even try to consider the problem from the angle of looking for > independent pieces that could be extracted from the first patch and > split as other patches, to ease even again the review. There is a > limit to this idea because you need a push/pull/reporting facility for > a flush LSN and a replay LSN depending on if you are on a primary, on > a standby, and even another case where you are dealing with a promoted > standby where you decide to loop back *inside* the callback (which I > suspect may not be always the right thing to do depending on the new > TLI selected), so there is a limit in what could be treated as an > independent piece. At least the bits about pairingheap_initialize() > may be worth considering. +1 for further split to smooth the review process. The timeout in wait for patch is not needed for the polling problem in the current thread. I'll remove other unused mechanisms as well. Yeh, just looping back *inside* the callback could be problematic if the wait for LSNs don't exist on the current timeline. I'll add a check for it. The new patch set will look like this per your suggestion: Patch 0: pairingheap infrastructure (independent) src/backend/lib/pairingheap.c | +14 -4 src/include/lib/pairingheap.h | +3 Adds pairingheap_initialize() for shared memory usage. Patch 1: Minimal LSN waiting infrastructure src/backend/access/transam/xlogwait.c | (simplified, no timeout…) src/include/access/xlogwait.h | +80 src/backend/storage/ipc/ipci.c | +3 src/include/storage/lwlocklist.h | +1 src/backend/utils/activity/wait_event... | +3 src/backend/access/transam/xact.c | +6 src/backend/storage/lmgr/proc.c | +6 Provides WaitForLSNReplay() and WaitForLSNFlush() for internal WAL consumers. Patch 2: Replace polling in read_local_xlog_page_guts src/backend/access/transam/xlogutils.c | +40 -5 src/backend/access/transam/xlog.c | +10 src/backend/access/transam/xlogrecovery.c | +6 src/backend/replication/walsender.c | -4 Uses Patch 1 infrastructure to eliminate busy-waiting. Patch 3 Extend LSN waiting infrastructure that WAIT FOR needs Patch Wait for command based on Patch 1 and 3 SQL interface, full error handling. > > + if (waitLSNState && > + (XLogRecoveryCtl->lastReplayedEndRecPtr >= > + pg_atomic_read_u64(&waitLSNState->minWaitedReplayLSN))) > + WaitLSNWakeupReplay(XLogRecoveryCtl->lastReplayedEndRecPtr); > > This code pattern looks like a copy-paste of what's done in > synchronous replication. Has some c between syncrep.c and > this kind of facility ever been considered? In terms of queues, waits > and wakeups, the requirements are pretty similar, still your patch has > zero changes related to syncrep.c or syncrep.h. > I'm not aware of this before; they do share some basic requirements here. I'll explore the possibility of consolidating them. > As far as I can see based on your patch, you are repeating some of the > mistakes of the wait LSN patch, where I've complained about > WaitForLSNReplay() and the duplication it had. One thing you have > decided to pull for example is duplicated calls to > GetXLogReplayRecPtr(). > -- Will refactor this. Best, Xuneng
pgsql-hackers by date: