Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Implement waiting for wal lsn replay: reloaded
Date
Msg-id CABPTF7WDzTs2EEB6c5QmsV-svF4rKSTLOS1pWM+7ydUN6tnFOQ@mail.gmail.com
Whole thread Raw
In response to Re: Implement waiting for wal lsn replay: reloaded  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
Hi Chao,

Thanks a lot for your review!

On Fri, Dec 26, 2025 at 4:25 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Dec 19, 2025, at 10:49, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >>
> >> On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >>> On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >>>>
> >>>> Hi, Xuneng!
> >>>>
> >>>> On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >>>>> Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> >>>>> statement in v5 patch 1.
> >>>>
> >>>> Thank you for your work on this patchset.  Generally, it looks like
> >>>> good and quite straightforward extension of the current functionality.
> >>>> But this patch adds 4 new unreserved keywords to our grammar.  Do you
> >>>> think we can put mode into with options clause?
> >>>>
> >>>
> >>> Thanks for pointing this out. Yeah, 4 unreserved keywords add
> >>> complexity to the parser and it may not be worthwhile since replay is
> >>> expected to be the common use scenario. Maybe we can do something like
> >>> this:
> >>>
> >>> -- Default (REPLAY mode)
> >>> WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
> >>>
> >>> -- Explicit REPLAY mode
> >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
> >>>
> >>> -- WRITE mode
> >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
> >>>
> >>> If no mode is set explicitly in the options clause, it defaults to
> >>> replay. I'll update the patch per your suggestion.
> >>
> >> This is exactly what I meant.  Please, go ahead.
> >>
> >
> > Here is the updated patch set (v7). Please check.
> >
> > --
> > Best,
> > Xuneng
> >
<v7-0001-Extend-xlogwait-infrastructure-with-write-and-flu.patch><v7-0004-Use-WAIT-FOR-LSN-in.patch><v7-0003-Add-tab-completion-for-WAIT-FOR-LSN-MODE-option.patch><v7-0002-Add-MODE-option-to-WAIT-FOR-LSN-command.patch>
>
> Hi Xuneng,
>
> A solid patch! Just a few small comments:
>
> 1 - 0001
> ```
> +XLogRecPtr
> +GetCurrentLSNForWaitType(WaitLSNType lsnType)
> +{
> +       switch (lsnType)
> +       {
> +               case WAIT_LSN_TYPE_STANDBY_REPLAY:
> +                       return GetXLogReplayRecPtr(NULL);
> +
> +               case WAIT_LSN_TYPE_STANDBY_WRITE:
> +                       return GetWalRcvWriteRecPtr();
> +
> +               case WAIT_LSN_TYPE_STANDBY_FLUSH:
> +                       return GetWalRcvFlushRecPtr(NULL, NULL);
> +
> +               case WAIT_LSN_TYPE_PRIMARY_FLUSH:
> +                       return GetFlushRecPtr(NULL);
> +       }
> +
> +       elog(ERROR, "invalid LSN wait type: %d", lsnType);
> +       pg_unreachable();
> +}
> ```
>
> As you add pg_unreachable() in the new function GetCurrentLSNForWaitType(), I’m thinking if we should just do an
Assert(),I saw every existing related function has done such an assert, for example addLSNWaiter(), it does “Assert(i
>=0 && i < WAIT_LSN_TYPE_COUNT);”. I guess we can just following the current mechanism to verify lsnType. So, for
GetCurrentLSNForWaitType(),we can just add a default clause and Assert(false). 

My take is that Assert(false) alone might not be enough here, since
assertions vanish in non-assert builds. An unexpected lsnType is a
real bug even in production, so keeping a hard error plus
pg_unreachable() seems to be a safer pattern. It also acts as a
guardrail for future extensions — if new wait types are added without
updating this code, we’ll fail loudly rather than silently returning
an incorrect LSN. Assert(i >= 0 && i < WAIT_LSN_TYPE_COUNT) was added
to the top of the function.

> 2 - 0002
> ```
> +                       else
> +                               ereport(ERROR,
> +                                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                                errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
> +                                                               "MODE", mode_str),
> ```
>
> I wonder why don’t we directly put MODE into the error message?

Yeah, putting MODE into the error message is cleaner. It's done in v8.

> 3 - 0002
> ```
>                 case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
>                         if (throw)
>                         {
> +                               const           WaitLSNTypeDesc *desc = &WaitLSNTypeDescs[lsnType];
> +                               XLogRecPtr      currentLSN = GetCurrentLSNForWaitType(lsnType);
> +
>                                 if (PromoteIsTriggered())
>                                 {
>                                         ereport(ERROR,
>                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                                         errmsg("recovery is not in progress"),
> -                                                       errdetail("Recovery ended before replaying target LSN
%X/%08X;last replay LSN %X/%08X.", 
> +                                                       errdetail("Recovery ended before target LSN %X/%08X was %s;
last%s LSN %X/%08X.", 
>                                                                           LSN_FORMAT_ARGS(lsn),
> -
LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
> +                                                                         desc->verb,
> +                                                                         desc->noun,
> +                                                                         LSN_FORMAT_ARGS(currentLSN)));
>                                 }
>                                 else
>                                         ereport(ERROR,
>                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                                         errmsg("recovery is not in progress"),
> -                                                       errhint("Waiting for the replay LSN can only be executed
duringrecovery.")); 
> +                                                       errhint("Waiting for the %s LSN can only be executed during
recovery.",
> +                                                                       desc->noun));
>                         }
> ```
>
> currentLSN is only used in the if clause, thus it can be defined inside the if clause.

+ 1.

> 3 - 0002
> ```
> +       /*
> +        * If we wrote an LSN that someone was waiting for then walk over the
> +        * shared memory array and set latches to notify the waiters.
> +        */
> +       if (waitLSNState &&
> +               (LogstreamResult.Write >=
> +                pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> +               WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
> ```
>
> Do we need to mention "walk over the shared memory array and set latches” in the comment? The logic belongs to
WaitLSNWakeup().What about if the wake up logic changes in future, then this comment would become stale. So I think we
onlyneed to mention “notify the waiters”. 
>

It makes sense to me. They are incorporated into v8.

>
> 4 - 0003
> ```
> +       /*
> +        * Handle parenthesized option list.  This fires when we're in an
> +        * unfinished parenthesized option list.  get_previous_words treats a
> +        * completed parenthesized option list as one word, so the above test is
> +        * correct.  mode takes a string value ('replay', 'write', 'flush'),
> +        * timeout takes a string value, no_throw takes no value.
> +        */
>         else if (HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*") &&
>                          !HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*)"))
>         {
> -               /*
> -                * This fires if we're in an unfinished parenthesized option list.
> -                * get_previous_words treats a completed parenthesized option list as
> -                * one word, so the above test is correct.
> -                */
>                 if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
> -                       COMPLETE_WITH("timeout", "no_throw");
> -
> -               /*
> -                * timeout takes a string value, no_throw takes no value. We don't
> -                * offer completions for these values.
> -                */
> ```
>
> The new comment has lost the meaning of “We don’t offer completions for these values (timeout and no_throw)”, to be
explicit,I feel we can retain the sentence. 

 The sentence is retained.

> 5 - 0004
> ```
> +       my $isrecovery =
> +         $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> +       chomp($isrecovery);
>         croak "unknown mode $mode for 'wait_for_catchup', valid modes are "
>           . join(', ', keys(%valid_modes))
>           unless exists($valid_modes{$mode});
> @@ -3347,9 +3350,6 @@ sub wait_for_catchup
>         }
>         if (!defined($target_lsn))
>         {
> -               my $isrecovery =
> -                 $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> -               chomp($isrecovery);
> ```
>
> I wonder why pull up pg_is_in_recovery to an early place and unconditionally call it?
>

This seems unnecessary. I also realized that my earlier approach in
patch 4 may have been semantically incorrect — it could end up waiting
for the LSN to replay/write/flush on the node itself, rather than on
the downstream standby, which defeats the purpose of
wait_for_catchup(). Patch 4 attempts to address this by running WAIT
FOR LSN on the standby itself.

Support for primary-flush waiting and the refactoring of existing
modes have been also incorporated in v8 following Alexander’s
feedback. The major updates are in patches 2 and 4. Please check.

--
Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication
Next
From: Xuneng Zhou
Date:
Subject: Re: Streamify more code paths