Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting
Date
Msg-id CABPTF7W+tRUpLyYrNG6wr5urZTXBRYQ+i6Q06x9nrPj=i-+LgA@mail.gmail.com
Whole thread Raw
In response to Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting  (Xuneng Zhou <xunengzhou@gmail.com>)
List pgsql-hackers
Hi,

On Sun, Sep 28, 2025 at 9:47 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Thu, Aug 28, 2025 at 4:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > Some changes in v3:
> > 1) Update the note of xlogwait.c to reflect the extending of its use
> > for flush waiting and internal use for both flush and replay waiting.
> > 2) Update the comment above logical_read_xlog_page which describes the
> > prior-change behavior of read_local_xlog_page.
>
> In an off-list discussion, Alexander pointed out potential issues with
> the current single-heap design for replay and flush when promotion
> occurs concurrently with WAIT FOR. The following is a simple example
> illustrating the problem:
>
> During promotion, there's a window where we can have mixed waiter
> types in the same heap:
>
>   T1: Process A calls read_local_xlog_page_guts on standby
>   T2: RecoveryInProgress() = TRUE, adds to heap as replay waiter
>   T3: Promotion begins
>   T4: EndRecovery() calls WaitLSNWakeup(InvalidXLogRecPtr)
>   T5: SharedRecoveryState = RECOVERY_STATE_DONE
>   T6: Process B calls read_local_xlog_page_guts
>   T7: RecoveryInProgress() = FALSE, adds to SAME heap as flush waiter
>
> The problem is that replay LSNs and flush LSNs represent different
> positions in the WAL stream. Having both types in the same heap can
> lead to:
>   - Incorrect wakeup logic (comparing incomparable LSNs)
>   - Processes waiting forever
>   - Wrong waiters being woken up
>
> To avoid this problem, patch v4 is updated to utilize two separate
> heaps for flush and replay like Alexander suggested earlier.  It also
> introduces a new separate min LSN tracking field for flushing.
>

v5-0002 separates the waitlsn_cmp() comparator function into two distinct
functions (waitlsn_replay_cmp and waitlsn_flush_cmp) for the replay
and flush heaps, respectively.

Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: NLS in Meson
Next
From: Tom Lane
Date:
Subject: Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs