Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting - Mailing list pgsql-hackers
From | Xuneng Zhou |
---|---|
Subject | Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting |
Date | |
Msg-id | CABPTF7Vr99gZ5GM_ZYbYnd9MMnoVW3pukBEviVoHKRvJW-dE3g@mail.gmail.com Whole thread Raw |
In response to | Improve read_local_xlog_page_guts by replacing polling with latch-based waiting (Xuneng Zhou <xunengzhou@gmail.com>) |
Responses |
Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting
|
List | pgsql-hackers |
Hi, Attached the wrong patch v1-0001-Improve-read_local_xlog_page_guts-by-replacing-po.patch. The correct one is attached again. On Wed, Aug 27, 2025 at 11:23 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi hackers, > > During a performance run [1], I observed heavy polling in > read_local_xlog_page_guts(). Heikki’s comment from a few months ago > suggests replacing the current check–sleep–repeat loop with the > condition-variable (CV) infrastructure used by the walsender: > > 1) Problem and Background > /* > * Loop waiting for xlog to be available if necessary > * > * TODO: The walsender has its own version of this function, which uses a > * condition variable to wake up whenever WAL is flushed. We could use the > * same infrastructure here, instead of the check/sleep/repeat style of > * loop. > */ > > Because read_local_xlog_page_guts() waits for a specific flush or > replay LSN, polling becomes inefficient when waits are long. I built a > POC patch that swaps polling for CVs, but a single global CV (or even > separate “flush” and “replay” CVs) isn’t ideal: > • The wake-up routines don’t know which LSN each waiter cares about, > so they would need to broadcast on every flush/replay. > > • Caching the minimum outstanding target LSN could reduce spurious > wake-ups but won’t eliminate them when multiple backends wait for > different LSNs simultaneously. > > • The walsender accepts some broadcast overhead via two CVs for > different waiters. A more precise approach would require a request > queue that maps waiters to target LSNs and issues targeted > wake-ups—adding complexity. > > 2) Proposal > I came across the thread “Implement waiting for WAL LSN replay: > reloaded” [2] by Alexander. The “Implement WAIT FOR” patch in that > thread provides a well-established infrastructure for waiting on WAL > replay in backends. With modest adjustments, it could be generalized. > > Main changes in patch v1 Improve read_local_xlog_page_guts by replacing polling > with latch-based waiting: > • Introduce WaitForLSNFlush, analogous to WaitForLSNReplay from the > “WAIT FOR” work. > > • Replace the busy-wait in read_local_xlog_page_guts() with > WaitForLSNFlush and WaitForLSNReplay. > > • Add wake-up calls in XLogFlush and XLogBackgroundFlush. > > Edge Case: Timeline Switch During Wait > /* > * Check which timeline to get the record from. > * > * We have to do it each time through the loop because if we're in > * recovery as a cascading standby, the current timeline might've > * become historical. We can't rely on RecoveryInProgress() because in > * a standby configuration like > * > * A => B => C > * > * if we're a logical decoding session on C, and B gets promoted, our > * timeline will change while we remain in recovery. > * > * We can't just keep reading from the old timeline as the last WAL > * archive in the timeline will get renamed to .partial by > * StartupXLOG(). > > read_local_xlog_page_guts() re-evaluates the active timeline on each > loop iteration because, on a cascading standby, the current timeline > can become historical. Once that happens, there’s no need to keep > waiting for that timeline. A timeline switch could therefore render an > in-progress wait unnecessary. > > One option is to add a wake-up at the point where the timeline switch > occurs, so waiting processes exit promptly. The current approach > chooses not to do this, given that most waits are short and timeline > changes in cascading standby are rare. Supporting timeline-switch > wake-ups would also require additional handling in both > WaitForLSNFlush and WaitForLSNReplay, increasing complexity. > > Comments and suggestions are welcome. > > [1] https://www.postgresql.org/message-id/CABPTF7VuFYm9TtA9vY8ZtS77qsT+yL_HtSDxUFnW3XsdB5b9ew@mail.gmail.com > [2] https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO%2BBBjcirozJ6nYbOW8Q%40mail.gmail.com > > Best, > Xuneng
Attachment
pgsql-hackers by date: