Thread: pgsql: Implement pg_wal_replay_wait() stored procedure

pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

02 August 2024, 18:22:21

Implement pg_wal_replay_wait() stored procedure

pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top.  During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

pg_wal_replay_wait() needs to wait without any snapshot held.  Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock.  This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.

Catversion is bumped.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3c5db1d6b01642bcd8dbf5e34b68f034365747bb

Modified Files
--------------
doc/src/sgml/func.sgml                          | 117 ++++++++
src/backend/access/transam/xact.c               |   6 +
src/backend/access/transam/xlog.c               |   7 +
src/backend/access/transam/xlogrecovery.c       |  11 +
src/backend/catalog/system_functions.sql        |   3 +
src/backend/commands/Makefile                   |   3 +-
src/backend/commands/meson.build                |   1 +
src/backend/commands/waitlsn.c                  | 363 ++++++++++++++++++++++++
src/backend/lib/pairingheap.c                   |  18 +-
src/backend/storage/ipc/ipci.c                  |   3 +
src/backend/storage/lmgr/proc.c                 |   6 +
src/backend/tcop/pquery.c                       |   9 +-
src/backend/utils/activity/wait_event_names.txt |   2 +
src/include/catalog/catversion.h                |   2 +-
src/include/catalog/pg_proc.dat                 |   6 +
src/include/commands/waitlsn.h                  |  80 ++++++
src/include/lib/pairingheap.h                   |   3 +
src/include/storage/lwlocklist.h                |   1 +
src/test/recovery/meson.build                   |   1 +
src/test/recovery/t/043_wal_replay_wait.pl      | 150 ++++++++++
src/tools/pgindent/typedefs.list                |   2 +
21 files changed, 786 insertions(+), 8 deletions(-)

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Peter Eisentraut

Date:

30 August 2024, 22:42:20

On 02.08.24 20:22, Alexander Korotkov wrote:
> Implement pg_wal_replay_wait() stored procedure

Why is this under src/backend/command/?  Wouldn't it belong under 
src/backend/utils/adt/?

> pg_wal_replay_wait() is to be used on standby and specifies waiting for
> the specific WAL location to be replayed.  This option is useful when
> the user makes some data changes on primary and needs a guarantee to see
> these changes are on standby.
> 
> The queue of waiters is stored in the shared memory as an LSN-ordered pairing
> heap, where the waiter with the nearest LSN stays on the top.  During
> the replay of WAL, waiters whose LSNs have already been replayed are deleted
> from the shared memory pairing heap and woken up by setting their latches.
> 
> pg_wal_replay_wait() needs to wait without any snapshot held.  Otherwise,
> the snapshot could prevent the replay of WAL records, implying a kind of
> self-deadlock.  This is why it is only possible to implement
> pg_wal_replay_wait() as a procedure working without an active snapshot,
> not a function.
> 
> Catversion is bumped.
> 
> Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
> Author: Kartyshov Ivan, Alexander Korotkov
> Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
> Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
> Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
> 
> Branch
> ------
> master
> 
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/3c5db1d6b01642bcd8dbf5e34b68f034365747bb
> 
> Modified Files
> --------------
> doc/src/sgml/func.sgml                          | 117 ++++++++
> src/backend/access/transam/xact.c               |   6 +
> src/backend/access/transam/xlog.c               |   7 +
> src/backend/access/transam/xlogrecovery.c       |  11 +
> src/backend/catalog/system_functions.sql        |   3 +
> src/backend/commands/Makefile                   |   3 +-
> src/backend/commands/meson.build                |   1 +
> src/backend/commands/waitlsn.c                  | 363 ++++++++++++++++++++++++
> src/backend/lib/pairingheap.c                   |  18 +-
> src/backend/storage/ipc/ipci.c                  |   3 +
> src/backend/storage/lmgr/proc.c                 |   6 +
> src/backend/tcop/pquery.c                       |   9 +-
> src/backend/utils/activity/wait_event_names.txt |   2 +
> src/include/catalog/catversion.h                |   2 +-
> src/include/catalog/pg_proc.dat                 |   6 +
> src/include/commands/waitlsn.h                  |  80 ++++++
> src/include/lib/pairingheap.h                   |   3 +
> src/include/storage/lwlocklist.h                |   1 +
> src/test/recovery/meson.build                   |   1 +
> src/test/recovery/t/043_wal_replay_wait.pl      | 150 ++++++++++
> src/tools/pgindent/typedefs.list                |   2 +
> 21 files changed, 786 insertions(+), 8 deletions(-)
>

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

01 September 2024, 22:35:27

On Fri, Aug 30, 2024 at 10:42 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> On 02.08.24 20:22, Alexander Korotkov wrote:
> > Implement pg_wal_replay_wait() stored procedure
>
> Why is this under src/backend/command/?  Wouldn't it belong under
> src/backend/utils/adt/?

This path hasn't changes since the patch revision when it was a
utility command.  I agree that this doesn't look like proper path for
stored procedure.  But I don't think src/backend/utils/adt is
appropriate path either, because it's not really about data type.
pg_wal_replay_wait() looks a good neighbor for
pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
functions.  So, what about moving it to src/backend/access/transam?

------
Regards,
Alexander Korotkov
Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

03 September 2024, 16:07:11

Hi, Michael!

On Mon, Sep 2, 2024 at 3:25 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 02, 2024 at 02:55:50AM +0300, Alexander Korotkov wrote:
> > Could you, please, check the attached patch?
>
> The patch moving the code looks correct at quick glance.

Thank you for your feedback.

> Now, I've been staring at this line, wondering why this is required
> while WaitForLSNReplay() does not return any status:
> +   (void) WaitForLSNReplay(target_lsn, timeout);
>
> And this makes me question whether you have the right semantics
> regarding the SQL function itself.  Could it be more useful for
> WaitForLSNReplay() to report an enum state that tells you the reason
> why a wait may not have happened with a text or a tuple returned by
> the function?  There are quite a few reasons why a wait may or may not
> happen:
> - Recovery has ended, target LSN has been reached.
> - We're not in recovery anymore.  Is an error the most useful thing
> here actually?
> - Timeout may have been reached, where an error is also generated.
> ERRCODE_QUERY_CANCELED is not a correct error state.
> - Perhaps more of these in the future?
>
> My point is: this is a feature that can be used for monitoring as far
> as I know, still it does not stick as a feature that could be most
> useful for such applications.  Wouldn't it be more useful for client
> applications to deal with a state returned by the SQL function rather
> than having to parse error strings to know what happened?  What is
> returned by pg_wal_replay_wal() reflects on the design of
> WaitForLSNReplay() itself.

By design, pg_wal_replay_wal() should be followed up with read-only
query to standby.  The read-only query gets guarantee that it's
executed after the replay of LSN specified as pg_wal_replay_wal()
design.  Returning the status from pg_wal_replay_wal() would require
additional cycle of its processing.  But one of the main objectives of
this feature was reducing roundtrips during waiting for LSN.

On the other hand, I see that returning status could make sense for
certain use cases.  I think I could write two patches to provide that.
1. Make WaitForLSNReplay() return status, and make pg_wal_replay_wal()
be responsible for throwing all the errors.
2. New procedure pg_wal_replay_wal_status() (or some better name?),
which returns status to the user instead of throwing errors.

If no objections, I will push the patch moving code then go ahead
writing the two patches above.

------
Regards,
Alexander Korotkov
Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

26 September 2024, 18:41:18

On Thu, Sep 26, 2024 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Aug 02, 2024 at 06:22:21PM +0000, Alexander Korotkov wrote:
> > Implement pg_wal_replay_wait() stored procedure
> >
> > pg_wal_replay_wait() is to be used on standby and specifies waiting for
> > the specific WAL location to be replayed.  This option is useful when
> > the user makes some data changes on primary and needs a guarantee to see
> > these changes are on standby.
> >
> > The queue of waiters is stored in the shared memory as an LSN-ordered pairing
> > heap, where the waiter with the nearest LSN stays on the top.  During
> > the replay of WAL, waiters whose LSNs have already been replayed are deleted
> > from the shared memory pairing heap and woken up by setting their latches.
> >
> > pg_wal_replay_wait() needs to wait without any snapshot held.  Otherwise,
> > the snapshot could prevent the replay of WAL records, implying a kind of
> > self-deadlock.  This is why it is only possible to implement
> > pg_wal_replay_wait() as a procedure working without an active snapshot,
> > not a function.
> >
> > Catversion is bumped.
>
> I've spotted that this patch uses an OID that should not be used
> during the development cycle:
> +{ oid => '111',
> +  descr => 'wait for the target LSN to be replayed on standby with an optional timeout',
>
> Please use something in the 8000-9999 range, as required by
> 98eab30b93d5.

Fixed, sorry for messing this up.
I would appreciate if you have time to look at [0] to check if it
meets your expectations.

Links.
1. https://www.postgresql.org/message-id/CAPpHfdsw9oq62Fvt65JApHJf1auUirdGJV7%3DnRyVnDL3M8z5xA%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

27 September 2024, 17:07:09

On Fri, Sep 27, 2024 at 7:57 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Sep 27, 2024 at 01:35:23PM +0900, Michael Paquier wrote:
> > I would suggest to keep things simple and have one single function
> > rather than introduce two more pg_proc entries with slight differences
> > in their error reporting, making the original function return a text
> > about the reason of the failure when waiting (be it a timeout,
> > success, promotion or outside recovery).
>
> More to the point here.  I am not sure what's the benefit of having a
> procedure, while we have been using SQL functions for similar purposes
> in xlogfuncs.c for years.  And what you have here can also be coupled
> with more complex logic in SQL queries.

As I multiple times pointed in the thread [1] [2], this couldn't be
done in SQL function.  SQL-function should run within snapshot, which
could prevent WAL from being replayed.  In turn, depending on timeout
settings that could lead to hidden deadlock (function waits for LSN to
be replayed, replay process wait snapshot to be released) or an error.

In the stored procedure, we're releasing active and catalog snapshots
(similarly to VACUUM statement).  Waiting without holding a snapshots
allows us to workaround the problem described above.  We can't do this
in the SQL function, because that would leave the rest of SQL query
without a snapshot.

Links.
1. https://www.postgresql.org/message-id/CAPpHfduBSN8j5j5Ynn5x%3DThD%3D8ypNd53D608VXGweBsPzxPvqA%40mail.gmail.com
2. https://www.postgresql.org/message-id/CAPpHfdtiGgn0iS1KbW2HTam-1%2BoK%2BvhXZDAcnX9hKaA7Oe%3DF-A%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Peter Eisentraut

Date:

16 October 2024, 22:35:05

On 02.09.24 01:55, Alexander Korotkov wrote:
> On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:
>>> This path hasn't changes since the patch revision when it was a
>>> utility command.  I agree that this doesn't look like proper path for
>>> stored procedure.  But I don't think src/backend/utils/adt is
>>> appropriate path either, because it's not really about data type.
>>> pg_wal_replay_wait() looks a good neighbor for
>>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
>>> functions.  So, what about moving it to src/backend/access/transam?
>>
>> Moving the new function to xlogfuncs.c while publishing
>> WaitForLSNReplay() makes sense to me.
> 
> Thank you for proposal.  I like this.
> 
> Could you, please, check the attached patch?

We still have stuff in src/backend/commands/waitlsn.c that is nothing 
like a "command".  You have moved some stuff elsewhere, but what are you 
planning to do with the rest?

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

16 October 2024, 23:20:49

On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> On 02.09.24 01:55, Alexander Korotkov wrote:
> > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:
> >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:
> >>> This path hasn't changes since the patch revision when it was a
> >>> utility command.  I agree that this doesn't look like proper path for
> >>> stored procedure.  But I don't think src/backend/utils/adt is
> >>> appropriate path either, because it's not really about data type.
> >>> pg_wal_replay_wait() looks a good neighbor for
> >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
> >>> functions.  So, what about moving it to src/backend/access/transam?
> >>
> >> Moving the new function to xlogfuncs.c while publishing
> >> WaitForLSNReplay() makes sense to me.
> >
> > Thank you for proposal.  I like this.
> >
> > Could you, please, check the attached patch?
>
> We still have stuff in src/backend/commands/waitlsn.c that is nothing
> like a "command".  You have moved some stuff elsewhere, but what are you
> planning to do with the rest?

Thank you for spotting this another time.  What about moving that
somewhere like src/backend/access/transam/xlogwait.c ?

------
Regards,
Alexander Korotkov
Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Pavel Borisov

Date:

22 October 2024, 16:30:20

Hi, Alexander!

On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
>
> On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> > On 02.09.24 01:55, Alexander Korotkov wrote:
> > > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:
> > >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:
> > >>> This path hasn't changes since the patch revision when it was a
> > >>> utility command. I agree that this doesn't look like proper path for
> > >>> stored procedure. But I don't think src/backend/utils/adt is
> > >>> appropriate path either, because it's not really about data type.
> > >>> pg_wal_replay_wait() looks a good neighbor for
> > >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
> > >>> functions. So, what about moving it to src/backend/access/transam?
> > >>
> > >> Moving the new function to xlogfuncs.c while publishing
> > >> WaitForLSNReplay() makes sense to me.
> > >
> > > Thank you for proposal. I like this.
> > >
> > > Could you, please, check the attached patch?
> >
> > We still have stuff in src/backend/commands/waitlsn.c that is nothing
> > like a "command". You have moved some stuff elsewhere, but what are you
> > planning to do with the rest?
>
> Thank you for spotting this another time. What about moving that
> somewhere like src/backend/access/transam/xlogwait.c ?

Implemented this as a separate patch (0001). Also rebased other
pending patches on that. 0004 now revises header comment of
xlogwait.c with new procedure signature.

I've looked at v5 of a patchset.

0001:

Looks completely ok.

0002:

As stated in latch.c

- WL_POSTMASTER_DEATH: Wait for postmaster to die

- WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies

* wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
* if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
* return value if the postmaster dies

It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we should look for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH instead of just fatal immediately?

0003:

Besides refactor it looks that deleteLSNWaiter() is added in WaitForLSNReplay. Maybe it's worth mentioning in the commit message.

Maybe refactoring for loop for assigning result variable and breaking a loop instead of immediate return would look better and lead to natural call of after the loop before returning.

0004:

Comment:

+ * Waits until recovery replays the target LSN with optional timeout. Throw

+ * an error on failure.

may need mentioning "Unless no_error provided throws an error on failure"

Otherwise the patch looks good.

Regards,

Pavel Borisov

Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Pavel Borisov

Date:

22 October 2024, 16:32:51

Fix a typo of myself:

Maybe refactoring for loop for assigning result variable and breaking a loop instead of immediate return would look better and lead to natural call of after the loop before returning.

Maybe refactoring for loop for assigning result variable and breaking a loop instead of immediate return would look better and lead to natural call of deleteLSNWaiter() after the loop before returning.

Pavel.

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Pavel Borisov

Date:

23 October 2024, 09:01:49

Hi, Alexander!

On Wed, 23 Oct 2024 at 00:12, Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi, Pavel!

Thank you for your review.

On Tue, Oct 22, 2024 at 4:30 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
>> <aekorotkov@gmail.com> wrote:
>> >
>> > On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> > > On 02.09.24 01:55, Alexander Korotkov wrote:
>> > > > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>> > > >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:
>> > > >>> This path hasn't changes since the patch revision when it was a
>> > > >>> utility command. I agree that this doesn't look like proper path for
>> > > >>> stored procedure. But I don't think src/backend/utils/adt is
>> > > >>> appropriate path either, because it's not really about data type.
>> > > >>> pg_wal_replay_wait() looks a good neighbor for
>> > > >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
>> > > >>> functions. So, what about moving it to src/backend/access/transam?
>> > > >>
>> > > >> Moving the new function to xlogfuncs.c while publishing
>> > > >> WaitForLSNReplay() makes sense to me.
>> > > >
>> > > > Thank you for proposal. I like this.
>> > > >
>> > > > Could you, please, check the attached patch?
>> > >
>> > > We still have stuff in src/backend/commands/waitlsn.c that is nothing
>> > > like a "command". You have moved some stuff elsewhere, but what are you
>> > > planning to do with the rest?
>> >
>> > Thank you for spotting this another time. What about moving that
>> > somewhere like src/backend/access/transam/xlogwait.c ?
>>
>> Implemented this as a separate patch (0001). Also rebased other
>> pending patches on that. 0004 now revises header comment of
>> xlogwait.c with new procedure signature.
>
>
> I've looked at v5 of a patchset.

> 0002:
>
> As stated in latch.c
>
> - WL_POSTMASTER_DEATH: Wait for postmaster to die
> - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
>
> * wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
> * if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
> * return value if the postmaster dies
>
> It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we should look for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH instead of just fatal immediately?

As I get from the code, WL_EXIT_ON_PM_DEATH cause process to just
proc_exit(1) without throwing FATAL. So, in the most of situations we
do throw FATAL after seeing WL_POSTMASTER_DEATH event. So, it's
reasonable to do the same here. But indeed, this is a question (not
related to this patch) whether WL_EXIT_ON_PM_DEATH should cause
process to throw FATAL.

Libpq ends up with FATAL on WL_POSTMASTER_DEATH.

In a backend code on WL_POSTMASTER_DEATH: SyncRepWaitForLSN() sets ProcDiePending but don't FATAL. Walsender exits proc_exit(1).

I suppose WL_POSTMASTER_DEATH expected behavior is "Do whatever you want: wait for postmaster to die or end up immediately".

I think path 0002 is good.

I looked through patches v6 and I think they're all good now.

Regards,

Pavel Borisov

Supabase.

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

23 October 2024, 19:21:02

On Wed, Oct 23, 2024 at 9:02 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Wed, 23 Oct 2024 at 00:12, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> Thank you for your review.
>>
>> On Tue, Oct 22, 2024 at 4:30 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> > On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> >>
>> >> On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
>> >> <aekorotkov@gmail.com> wrote:
>> >> >
>> >> > On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> >> > > On 02.09.24 01:55, Alexander Korotkov wrote:
>> >> > > > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>> >> > > >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:
>> >> > > >>> This path hasn't changes since the patch revision when it was a
>> >> > > >>> utility command.  I agree that this doesn't look like proper path for
>> >> > > >>> stored procedure.  But I don't think src/backend/utils/adt is
>> >> > > >>> appropriate path either, because it's not really about data type.
>> >> > > >>> pg_wal_replay_wait() looks a good neighbor for
>> >> > > >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
>> >> > > >>> functions.  So, what about moving it to src/backend/access/transam?
>> >> > > >>
>> >> > > >> Moving the new function to xlogfuncs.c while publishing
>> >> > > >> WaitForLSNReplay() makes sense to me.
>> >> > > >
>> >> > > > Thank you for proposal.  I like this.
>> >> > > >
>> >> > > > Could you, please, check the attached patch?
>> >> > >
>> >> > > We still have stuff in src/backend/commands/waitlsn.c that is nothing
>> >> > > like a "command".  You have moved some stuff elsewhere, but what are you
>> >> > > planning to do with the rest?
>> >> >
>> >> > Thank you for spotting this another time.  What about moving that
>> >> > somewhere like src/backend/access/transam/xlogwait.c ?
>> >>
>> >> Implemented this as a separate patch (0001).  Also rebased other
>> >> pending patches on that.  0004 now revises header comment of
>> >> xlogwait.c with new procedure signature.
>> >
>> >
>> > I've looked at v5 of a patchset.
>>
>> > 0002:
>> >
>> > As stated in latch.c
>> >
>> > - WL_POSTMASTER_DEATH: Wait for postmaster to die
>> > - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
>> >
>> >  * wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
>> >  * if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
>> >  * return value if the postmaster dies
>> >
>> > It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we
shouldlook for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH
insteadof just fatal immediately? 
>>
>> As I get from the code, WL_EXIT_ON_PM_DEATH cause process to just
>> proc_exit(1) without throwing FATAL.  So, in the most of situations we
>> do throw FATAL after seeing WL_POSTMASTER_DEATH event.  So, it's
>> reasonable to do the same here.  But indeed, this is a question (not
>> related to this patch) whether WL_EXIT_ON_PM_DEATH should cause
>> process to throw FATAL.
>
>
> Libpq ends up with FATAL on WL_POSTMASTER_DEATH.
> In a backend code on WL_POSTMASTER_DEATH: SyncRepWaitForLSN() sets ProcDiePending but don't FATAL. Walsender exits
proc_exit(1).
> I suppose WL_POSTMASTER_DEATH expected behavior is "Do whatever you want: wait for postmaster to die or end up
immediately".
> I think path 0002 is good.
>
> I looked through patches v6 and I think they're all good now.

Thank you.  I will push this patchset if now objections.

------
Regards,
Alexander Korotkov
Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Heikki Linnakangas

Date:

25 October 2024, 09:06:10

If you call this procedure on a stand-alone server, you get:

postgres=# call pg_wal_replay_wait('1234/0');
ERROR:  recovery is not in progress
DETAIL:  Recovery ended before replaying target LSN 1234/0; last replay 
LSN 0/0.

The DETAIL seems a bit misleading. Recovery never ended, because it 
never started in the first place. Last replay LSN is indeed 0/0, but 
that's not helpful.

If a standby server has been promoted and you pass an LSN that's earlier 
than the last replay LSN, it returns successfully. That makes sense I 
guess; if you connect to a standby and wait for it to replay a commit 
that you made in the primary, and the standby gets promoted, that seems 
correct. But it's a little inconsistent: If the standby crashes 
immediately after promotion, and you call pg_wal_replay_wait() after 
recovery, it returns success. However, if you shut down the promoted 
server and restart it, then last replay LSN is 0/0, and the call will 
fail because no recovery happened.

What is the use case for the 'no_error' argument? Why would you ever 
want to pass no_error=true ? The separate pg_wal_replay_wait_status() 
function feels weird to me. Also it surely shouldn't be marked IMMUTABLE 
nor parallel safe.

This would benefit from more documentation, explaining how you would use 
this in practice. I believe the use case is that you want "read your 
writes" consistency between a primary and a standby. You commit a 
transaction in the primary, and you then want to run read-only queries 
in a standby, and you want to make sure that you see your own commit, 
but you're ok with slightly delayed results otherwise. It would be good 
to add a chapter somewhere in the docs to show how to do that in 
practice with these functions.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

25 October 2024, 14:56:36

Hi, Heikki!

On Fri, Oct 25, 2024 at 9:06 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> If you call this procedure on a stand-alone server, you get:
>
> postgres=# call pg_wal_replay_wait('1234/0');
> ERROR:  recovery is not in progress
> DETAIL:  Recovery ended before replaying target LSN 1234/0; last replay
> LSN 0/0.
>
> The DETAIL seems a bit misleading. Recovery never ended, because it
> never started in the first place. Last replay LSN is indeed 0/0, but
> that's not helpful.
>
> If a standby server has been promoted and you pass an LSN that's earlier
> than the last replay LSN, it returns successfully. That makes sense I
> guess; if you connect to a standby and wait for it to replay a commit
> that you made in the primary, and the standby gets promoted, that seems
> correct. But it's a little inconsistent: If the standby crashes
> immediately after promotion, and you call pg_wal_replay_wait() after
> recovery, it returns success. However, if you shut down the promoted
> server and restart it, then last replay LSN is 0/0, and the call will
> fail because no recovery happened.
>
> What is the use case for the 'no_error' argument? Why would you ever
> want to pass no_error=true ? The separate pg_wal_replay_wait_status()
> function feels weird to me. Also it surely shouldn't be marked IMMUTABLE
> nor parallel safe.
>
> This would benefit from more documentation, explaining how you would use
> this in practice. I believe the use case is that you want "read your
> writes" consistency between a primary and a standby. You commit a
> transaction in the primary, and you then want to run read-only queries
> in a standby, and you want to make sure that you see your own commit,
> but you're ok with slightly delayed results otherwise. It would be good
> to add a chapter somewhere in the docs to show how to do that in
> practice with these functions.

Thank you for your feedback!

I do agree that error reporting for "not in recovery" case needs to be
improved, as well, as the documentation.

I see that pg_wal_replay_wait_status() might look weird, but it seems
to me like the best of feasible solutions.  Given that
pg_wal_replay_wait() procedure can't work concurrently to a query
involving pg_wal_replay_wait_status() function, I think
pg_wal_replay_wait_status() should be stable and parallel safe.

This is the brief answer.  I will be able to come back with more
details on Monday.

------
Regards,
Alexander Korotkov
Supabase

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Heikki Linnakangas

Date:

28 October 2024, 12:36:43

On 25/10/2024 14:56, Alexander Korotkov wrote:
> I see that pg_wal_replay_wait_status() might look weird, but it seems
> to me like the best of feasible solutions. 

I haven't written many procedures, but our docs say:

 > Procedures do not return a function value; hence CREATE PROCEDURE 
lacks a RETURNS clause. However, procedures can instead return data to 
their callers via output parameters.

Did you consider using an output parameter?

> Given that
> pg_wal_replay_wait() procedure can't work concurrently to a query
> involving pg_wal_replay_wait_status() function, I think
> pg_wal_replay_wait_status() should be stable and parallel safe.

If you call pg_wal_replay_wait() in the backend process, and 
pg_wal_replay_wait_status() in a parallel worker process, it won't 
return the result of the wait. Probably not what you'd expect. So I'd 
argue that it should be parallel unsafe.

> This is the brief answer.  I will be able to come back with more
> details on Monday.

Thanks. A few more minor issues I spotted while playing with this:

- If you pass a very high value as the timeout, e.g. INT_MAX-1, it wraps 
around and doesn't wait at all
- You can pass NULLs as arguments. That should probably not be allowed, 
or we need to document what it means.

This is disappointing:

> postgres=# set default_transaction_isolation ='repeatable read';
> SET
> postgres=# call pg_wal_replay_wait('0/55DA24F');
> ERROR:  pg_wal_replay_wait() must be only called without an active or registered snapshot
> DETAIL:  Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ
COMMITTED,another procedure, or a function.
 

Is there any way we could make that work? Otherwise, the feature just 
basically doesn't work if you use repeatable read.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Robert Haas

Date:

28 October 2024, 18:47:19

Point of order:

Discussions of commits and potential followup commits are best
redirected to -hackers rather than continuing to use -committers.

...Robert

Re: pgsql: Implement pg_wal_replay_wait() stored procedure

From

Alexander Korotkov

Date:

28 October 2024, 21:35:02

On Mon, Oct 28, 2024 at 5:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Point of order:
>
> Discussions of commits and potential followup commits are best
> redirected to -hackers rather than continuing to use -committers.

Thank you for pointing this.
My response to Heikki's last message will be redirected to -hackers.

------
Regards,
Alexander Korotkov
Supabase