Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: Implement waiting for wal lsn replay: reloaded |
| Date | |
| Msg-id | CABPTF7U2cYN=bMZirqj93Zv-aqBdw4f=wPRwovTzWKP=adYhDg@mail.gmail.com Whole thread Raw |
| In response to | Re: Implement waiting for wal lsn replay: reloaded (Xuneng Zhou <xunengzhou@gmail.com>) |
| List | pgsql-hackers |
Hi,
On Tue, Dec 2, 2025 at 11:08 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Mon, Dec 1, 2025 at 12:33 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi hackers,
> >
> > On Tue, Nov 25, 2025 at 7:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi!
> > >
> > > > > > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > > > > > If we intend to extend its functionality more broadly, one option is
> > > > > > > to add a mode option or something similar. Are users expected to wait
> > > > > > > for flush(or others) completion in such cases? If not, and the TAP
> > > > > > > test is the only intended use, this approach might be a bit of an
> > > > > > > overkill.
> > > > > >
> > > > > > I would say that adding mode parameter seems to be a pretty natural
> > > > > > extension of what we have at the moment. I can imagine some
> > > > > > clustering solution can use it to wait for certain transaction to be
> > > > > > flushed at the replica (without delaying the commit at the primary).
> > > > > >
> > > > > > ------
> > > > > > Regards,
> > > > > > Alexander Korotkov
> > > > > > Supabase
> > > > >
> > > > > Makes sense. I'll play with it and try to prepare a follow-up patch.
> > > > >
> > > > > --
> > > > > Best,
> > > > > Xuneng
> > > >
> > > > In terms of extending the functionality of the command, I see two
> > > > possible approaches here. One is to keep mode as a mandatory keyword,
> > > > and the other is to introduce it as an option in the WITH clause.
> > > >
> > > > Syntax Option A: Mode in the WITH Clause
> > > >
> > > > WAIT FOR LSN '0/12345' WITH (mode = 'replay');
> > > > WAIT FOR LSN '0/12345' WITH (mode = 'flush');
> > > > WAIT FOR LSN '0/12345' WITH (mode = 'write');
> > > >
> > > > With this option, we can keep "replay" as the default mode. That means
> > > > existing TAP tests won’t need to be refactored unless they explicitly
> > > > want a different mode.
> > > >
> > > > Syntax Option B: Mode as Part of the Main Command
> > > >
> > > > WAIT FOR LSN '0/12345' MODE 'replay';
> > > > WAIT FOR LSN '0/12345' MODE 'flush';
> > > > WAIT FOR LSN '0/12345' MODE 'write';
> > > >
> > > > Or a more concise variant using keywords:
> > > >
> > > > WAIT FOR LSN '0/12345' REPLAY;
> > > > WAIT FOR LSN '0/12345' FLUSH;
> > > > WAIT FOR LSN '0/12345' WRITE;
> > > >
> > > > This option produces a cleaner syntax if the intent is simply to wait
> > > > for a particular LSN type, without specifying additional options like
> > > > timeout or no_throw.
> > > >
> > > > I don’t have a clear preference among them. I’d be interested to hear
> > > > what you or others think is the better direction.
> > > >
> > >
> > > I've implemented a patch that adds MODE support to WAIT FOR LSN
> > >
> > > The new grammar looks like:
> > >
> > > ——
> > > WAIT FOR LSN '<lsn>' [MODE { REPLAY | WRITE | FLUSH }] [WITH (...)]
> > > ——
> > >
> > > Two modes added: flush and write
> > >
> > > Design decisions:
> > >
> > > 1. MODE as a separate keyword (not in WITH clause) - This follows the
> > > pattern used by LOCK command. It also makes the common case more
> > > concise.
> > >
> > > 2. REPLAY as the default - When MODE is not specified, it defaults to REPLAY.
> > >
> > > 3. Keywords rather than strings - Using `MODE WRITE` rather than `MODE 'write'`
> > >
> > > The patch set includes:
> > > -------
> > > 0001 - Extend xlogwait infrastructure with write and flush wait types
> > >
> > > Adds WAIT_LSN_TYPE_WRITE and WAIT_LSN_TYPE_FLUSH to WaitLSNType enum,
> > > along with corresponding wait events and pairing heaps. Introduces
> > > GetCurrentLSNForWaitType() to retrieve the appropriate LSN based on
> > > wait type, and adds wakeup calls in walreceiver for write/flush
> > > events.
> > >
> > > -------
> > > 0002 - Add pg_last_wal_write_lsn() SQL function
> > >
> > > Adds a new SQL function that returns the current WAL write position on
> > > a standby using GetWalRcvWriteRecPtr(). This complements existing
> > > pg_last_wal_receive_lsn() (flush) and pg_last_wal_replay_lsn()
> > > functions, enabling verification of WAIT FOR LSN MODE WRITE in TAP
> > > tests.
> > >
> > > -------
> > > 0003 - Add MODE parameter to WAIT FOR LSN command
> > >
> > > Extends the parser and executor to support the optional MODE
> > > parameter. Updates documentation with new syntax and mode
> > > descriptions. Adds TAP tests covering all three modes including
> > > mixed-mode concurrent waiters.
> > >
> > > -------
> > > 0004 - Add tab completion for WAIT FOR LSN MODE parameter
> > >
> > > Adds psql tab completion support: completes MODE after LSN value,
> > > completes REPLAY/WRITE/FLUSH after MODE keyword, and completes WITH
> > > after mode selection.
> > >
> > > -------
> > > 0005 - Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()
> > >
> > > Replaces polling-based wait_for_catchup() with WAIT FOR LSN when the
> > > target is a standby in recovery, improving test efficiency by avoiding
> > > repeated queries.
> > >
> > > The WRITE and FLUSH modes enable scenarios where applications need to
> > > ensure WAL has been received or persisted on the standby without
> > > waiting for replay to complete.
> > >
> > > Feedback welcome.
> > >
> >
> > Here is the updated v2 patch set. Most of the updates are in patch 3.
> >
> > Changes from v1:
> >
> > Patch 1 (Extend wait types in xlogwait infra)
> > - Renamed enum values for consistency (WAIT_LSN_TYPE_REPLAY →
> > WAIT_LSN_TYPE_REPLAY_STANDBY, etc.)
> >
> > Patch 2 (pg_last_wal_write_lsn):
> > - Clarified documentation and comment
> > - Improved pg_proc.dat description
> >
> > Patch 3 (MODE parameter):
> > - Replaced direct cast with explicit switch statement for WaitLSNMode
> > → WaitLSNType conversion
> > - Improved FLUSH/WRITE mode documentation with verification function references
> > - TAP tests (7b, 7c, 7d): Added walreceiver control for concurrency,
> > explicit blocking verification via poll_query_until, and log-based
> > completion verification via wait_for_log
> > - Fix the timing issue in wait for all three sessions to get the
> > errors after promotion of tap test 8.
> >
> > --
> > Best,
> > Xuneng
>
> Here is the updated v3. The changes are made to patch 3:
>
> - Refactor duplicated TAP test code by extracting helper routines for
> starting and stopping walreceiver.
> - Increase the number of concurrent WRITE and FLUSH waiters in tests
> 7b and 7c from three to five, matching the number in test 7a.
>
> --
> Best,
> Xuneng
Just realized that patch 2 in prior emails could be dropped for
simplicity. Since the write LSN can be retrieved directly from
pg_stat_wal_receiver, the TAP test in patch 3 does not require a
separate SQL function for this purpose alone.
--
Best,
Xuneng
Attachment
pgsql-hackers by date: