Thread: pgsql: Implement pg_wal_replay_wait() stored procedure
Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. The queue of waiters is stored in the shared memory as an LSN-ordered pairing heap, where the waiter with the nearest LSN stays on the top. During the replay of WAL, waiters whose LSNs have already been replayed are deleted from the shared memory pairing heap and woken up by setting their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working without an active snapshot, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/3c5db1d6b01642bcd8dbf5e34b68f034365747bb Modified Files -------------- doc/src/sgml/func.sgml | 117 ++++++++ src/backend/access/transam/xact.c | 6 + src/backend/access/transam/xlog.c | 7 + src/backend/access/transam/xlogrecovery.c | 11 + src/backend/catalog/system_functions.sql | 3 + src/backend/commands/Makefile | 3 +- src/backend/commands/meson.build | 1 + src/backend/commands/waitlsn.c | 363 ++++++++++++++++++++++++ src/backend/lib/pairingheap.c | 18 +- src/backend/storage/ipc/ipci.c | 3 + src/backend/storage/lmgr/proc.c | 6 + src/backend/tcop/pquery.c | 9 +- src/backend/utils/activity/wait_event_names.txt | 2 + src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_proc.dat | 6 + src/include/commands/waitlsn.h | 80 ++++++ src/include/lib/pairingheap.h | 3 + src/include/storage/lwlocklist.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/043_wal_replay_wait.pl | 150 ++++++++++ src/tools/pgindent/typedefs.list | 2 + 21 files changed, 786 insertions(+), 8 deletions(-)
On 02.08.24 20:22, Alexander Korotkov wrote: > Implement pg_wal_replay_wait() stored procedure Why is this under src/backend/command/? Wouldn't it belong under src/backend/utils/adt/? > pg_wal_replay_wait() is to be used on standby and specifies waiting for > the specific WAL location to be replayed. This option is useful when > the user makes some data changes on primary and needs a guarantee to see > these changes are on standby. > > The queue of waiters is stored in the shared memory as an LSN-ordered pairing > heap, where the waiter with the nearest LSN stays on the top. During > the replay of WAL, waiters whose LSNs have already been replayed are deleted > from the shared memory pairing heap and woken up by setting their latches. > > pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, > the snapshot could prevent the replay of WAL records, implying a kind of > self-deadlock. This is why it is only possible to implement > pg_wal_replay_wait() as a procedure working without an active snapshot, > not a function. > > Catversion is bumped. > > Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru > Author: Kartyshov Ivan, Alexander Korotkov > Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila > Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira > Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi > > Branch > ------ > master > > Details > ------- > https://git.postgresql.org/pg/commitdiff/3c5db1d6b01642bcd8dbf5e34b68f034365747bb > > Modified Files > -------------- > doc/src/sgml/func.sgml | 117 ++++++++ > src/backend/access/transam/xact.c | 6 + > src/backend/access/transam/xlog.c | 7 + > src/backend/access/transam/xlogrecovery.c | 11 + > src/backend/catalog/system_functions.sql | 3 + > src/backend/commands/Makefile | 3 +- > src/backend/commands/meson.build | 1 + > src/backend/commands/waitlsn.c | 363 ++++++++++++++++++++++++ > src/backend/lib/pairingheap.c | 18 +- > src/backend/storage/ipc/ipci.c | 3 + > src/backend/storage/lmgr/proc.c | 6 + > src/backend/tcop/pquery.c | 9 +- > src/backend/utils/activity/wait_event_names.txt | 2 + > src/include/catalog/catversion.h | 2 +- > src/include/catalog/pg_proc.dat | 6 + > src/include/commands/waitlsn.h | 80 ++++++ > src/include/lib/pairingheap.h | 3 + > src/include/storage/lwlocklist.h | 1 + > src/test/recovery/meson.build | 1 + > src/test/recovery/t/043_wal_replay_wait.pl | 150 ++++++++++ > src/tools/pgindent/typedefs.list | 2 + > 21 files changed, 786 insertions(+), 8 deletions(-) >
On Fri, Aug 30, 2024 at 10:42 PM Peter Eisentraut <peter@eisentraut.org> wrote: > On 02.08.24 20:22, Alexander Korotkov wrote: > > Implement pg_wal_replay_wait() stored procedure > > Why is this under src/backend/command/? Wouldn't it belong under > src/backend/utils/adt/? This path hasn't changes since the patch revision when it was a utility command. I agree that this doesn't look like proper path for stored procedure. But I don't think src/backend/utils/adt is appropriate path either, because it's not really about data type. pg_wal_replay_wait() looks a good neighbor for pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related functions. So, what about moving it to src/backend/access/transam? ------ Regards, Alexander Korotkov Supabase
Hi, Michael! On Mon, Sep 2, 2024 at 3:25 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Sep 02, 2024 at 02:55:50AM +0300, Alexander Korotkov wrote: > > Could you, please, check the attached patch? > > The patch moving the code looks correct at quick glance. Thank you for your feedback. > Now, I've been staring at this line, wondering why this is required > while WaitForLSNReplay() does not return any status: > + (void) WaitForLSNReplay(target_lsn, timeout); > > And this makes me question whether you have the right semantics > regarding the SQL function itself. Could it be more useful for > WaitForLSNReplay() to report an enum state that tells you the reason > why a wait may not have happened with a text or a tuple returned by > the function? There are quite a few reasons why a wait may or may not > happen: > - Recovery has ended, target LSN has been reached. > - We're not in recovery anymore. Is an error the most useful thing > here actually? > - Timeout may have been reached, where an error is also generated. > ERRCODE_QUERY_CANCELED is not a correct error state. > - Perhaps more of these in the future? > > My point is: this is a feature that can be used for monitoring as far > as I know, still it does not stick as a feature that could be most > useful for such applications. Wouldn't it be more useful for client > applications to deal with a state returned by the SQL function rather > than having to parse error strings to know what happened? What is > returned by pg_wal_replay_wal() reflects on the design of > WaitForLSNReplay() itself. By design, pg_wal_replay_wal() should be followed up with read-only query to standby. The read-only query gets guarantee that it's executed after the replay of LSN specified as pg_wal_replay_wal() design. Returning the status from pg_wal_replay_wal() would require additional cycle of its processing. But one of the main objectives of this feature was reducing roundtrips during waiting for LSN. On the other hand, I see that returning status could make sense for certain use cases. I think I could write two patches to provide that. 1. Make WaitForLSNReplay() return status, and make pg_wal_replay_wal() be responsible for throwing all the errors. 2. New procedure pg_wal_replay_wal_status() (or some better name?), which returns status to the user instead of throwing errors. If no objections, I will push the patch moving code then go ahead writing the two patches above. ------ Regards, Alexander Korotkov Supabase
On Thu, Sep 26, 2024 at 11:19 AM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Aug 02, 2024 at 06:22:21PM +0000, Alexander Korotkov wrote: > > Implement pg_wal_replay_wait() stored procedure > > > > pg_wal_replay_wait() is to be used on standby and specifies waiting for > > the specific WAL location to be replayed. This option is useful when > > the user makes some data changes on primary and needs a guarantee to see > > these changes are on standby. > > > > The queue of waiters is stored in the shared memory as an LSN-ordered pairing > > heap, where the waiter with the nearest LSN stays on the top. During > > the replay of WAL, waiters whose LSNs have already been replayed are deleted > > from the shared memory pairing heap and woken up by setting their latches. > > > > pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, > > the snapshot could prevent the replay of WAL records, implying a kind of > > self-deadlock. This is why it is only possible to implement > > pg_wal_replay_wait() as a procedure working without an active snapshot, > > not a function. > > > > Catversion is bumped. > > I've spotted that this patch uses an OID that should not be used > during the development cycle: > +{ oid => '111', > + descr => 'wait for the target LSN to be replayed on standby with an optional timeout', > > Please use something in the 8000-9999 range, as required by > 98eab30b93d5. Fixed, sorry for messing this up. I would appreciate if you have time to look at [0] to check if it meets your expectations. Links. 1. https://www.postgresql.org/message-id/CAPpHfdsw9oq62Fvt65JApHJf1auUirdGJV7%3DnRyVnDL3M8z5xA%40mail.gmail.com ------ Regards, Alexander Korotkov Supabase
On Fri, Sep 27, 2024 at 7:57 AM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Sep 27, 2024 at 01:35:23PM +0900, Michael Paquier wrote: > > I would suggest to keep things simple and have one single function > > rather than introduce two more pg_proc entries with slight differences > > in their error reporting, making the original function return a text > > about the reason of the failure when waiting (be it a timeout, > > success, promotion or outside recovery). > > More to the point here. I am not sure what's the benefit of having a > procedure, while we have been using SQL functions for similar purposes > in xlogfuncs.c for years. And what you have here can also be coupled > with more complex logic in SQL queries. As I multiple times pointed in the thread [1] [2], this couldn't be done in SQL function. SQL-function should run within snapshot, which could prevent WAL from being replayed. In turn, depending on timeout settings that could lead to hidden deadlock (function waits for LSN to be replayed, replay process wait snapshot to be released) or an error. In the stored procedure, we're releasing active and catalog snapshots (similarly to VACUUM statement). Waiting without holding a snapshots allows us to workaround the problem described above. We can't do this in the SQL function, because that would leave the rest of SQL query without a snapshot. Links. 1. https://www.postgresql.org/message-id/CAPpHfduBSN8j5j5Ynn5x%3DThD%3D8ypNd53D608VXGweBsPzxPvqA%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAPpHfdtiGgn0iS1KbW2HTam-1%2BoK%2BvhXZDAcnX9hKaA7Oe%3DF-A%40mail.gmail.com ------ Regards, Alexander Korotkov Supabase
On 02.09.24 01:55, Alexander Korotkov wrote: > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote: >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote: >>> This path hasn't changes since the patch revision when it was a >>> utility command. I agree that this doesn't look like proper path for >>> stored procedure. But I don't think src/backend/utils/adt is >>> appropriate path either, because it's not really about data type. >>> pg_wal_replay_wait() looks a good neighbor for >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related >>> functions. So, what about moving it to src/backend/access/transam? >> >> Moving the new function to xlogfuncs.c while publishing >> WaitForLSNReplay() makes sense to me. > > Thank you for proposal. I like this. > > Could you, please, check the attached patch? We still have stuff in src/backend/commands/waitlsn.c that is nothing like a "command". You have moved some stuff elsewhere, but what are you planning to do with the rest?
On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote: > On 02.09.24 01:55, Alexander Korotkov wrote: > > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote: > >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote: > >>> This path hasn't changes since the patch revision when it was a > >>> utility command. I agree that this doesn't look like proper path for > >>> stored procedure. But I don't think src/backend/utils/adt is > >>> appropriate path either, because it's not really about data type. > >>> pg_wal_replay_wait() looks a good neighbor for > >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related > >>> functions. So, what about moving it to src/backend/access/transam? > >> > >> Moving the new function to xlogfuncs.c while publishing > >> WaitForLSNReplay() makes sense to me. > > > > Thank you for proposal. I like this. > > > > Could you, please, check the attached patch? > > We still have stuff in src/backend/commands/waitlsn.c that is nothing > like a "command". You have moved some stuff elsewhere, but what are you > planning to do with the rest? Thank you for spotting this another time. What about moving that somewhere like src/backend/access/transam/xlogwait.c ? ------ Regards, Alexander Korotkov Supabase
Hi, Alexander!
On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
>
> On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> > On 02.09.24 01:55, Alexander Korotkov wrote:
> > > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:
> > >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:
> > >>> This path hasn't changes since the patch revision when it was a
> > >>> utility command. I agree that this doesn't look like proper path for
> > >>> stored procedure. But I don't think src/backend/utils/adt is
> > >>> appropriate path either, because it's not really about data type.
> > >>> pg_wal_replay_wait() looks a good neighbor for
> > >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
> > >>> functions. So, what about moving it to src/backend/access/transam?
> > >>
> > >> Moving the new function to xlogfuncs.c while publishing
> > >> WaitForLSNReplay() makes sense to me.
> > >
> > > Thank you for proposal. I like this.
> > >
> > > Could you, please, check the attached patch?
> >
> > We still have stuff in src/backend/commands/waitlsn.c that is nothing
> > like a "command". You have moved some stuff elsewhere, but what are you
> > planning to do with the rest?
>
> Thank you for spotting this another time. What about moving that
> somewhere like src/backend/access/transam/xlogwait.c ?
Implemented this as a separate patch (0001). Also rebased other
pending patches on that. 0004 now revises header comment of
xlogwait.c with new procedure signature.
I've looked at v5 of a patchset.
0001:
Looks completely ok.
0002:
As stated in latch.c
- WL_POSTMASTER_DEATH: Wait for postmaster to die
- WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
* wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
* if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
* return value if the postmaster dies
* if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
* return value if the postmaster dies
It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we should look for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH instead of just fatal immediately?
0003:
Besides refactor it looks that deleteLSNWaiter() is added in WaitForLSNReplay. Maybe it's worth mentioning in the commit message.
Maybe refactoring for loop for assigning result variable and breaking a loop instead of immediate return would look better and lead to natural call of after the loop before returning.
0004:
Comment:
+ * Waits until recovery replays the target LSN with optional timeout. Throw
+ * an error on failure.may need mentioning "Unless no_error provided throws an error on failure"
Otherwise the patch looks good.
Regards,
Pavel Borisov
Supabase
Fix a typo of myself:
Maybe refactoring for loop for assigning result variable and breaking a loop instead of immediate return would look better and lead to natural call of after the loop before returning.
Maybe refactoring for loop for assigning result variable and breaking a loop instead of immediate return would look better and lead to natural call of deleteLSNWaiter() after the loop before returning.
Pavel.
Hi, Alexander!
On Wed, 23 Oct 2024 at 00:12, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Pavel!
Thank you for your review.
On Tue, Oct 22, 2024 at 4:30 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov
>> <aekorotkov@gmail.com> wrote:
>> >
>> > On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> > > On 02.09.24 01:55, Alexander Korotkov wrote:
>> > > > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>> > > >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote:
>> > > >>> This path hasn't changes since the patch revision when it was a
>> > > >>> utility command. I agree that this doesn't look like proper path for
>> > > >>> stored procedure. But I don't think src/backend/utils/adt is
>> > > >>> appropriate path either, because it's not really about data type.
>> > > >>> pg_wal_replay_wait() looks a good neighbor for
>> > > >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related
>> > > >>> functions. So, what about moving it to src/backend/access/transam?
>> > > >>
>> > > >> Moving the new function to xlogfuncs.c while publishing
>> > > >> WaitForLSNReplay() makes sense to me.
>> > > >
>> > > > Thank you for proposal. I like this.
>> > > >
>> > > > Could you, please, check the attached patch?
>> > >
>> > > We still have stuff in src/backend/commands/waitlsn.c that is nothing
>> > > like a "command". You have moved some stuff elsewhere, but what are you
>> > > planning to do with the rest?
>> >
>> > Thank you for spotting this another time. What about moving that
>> > somewhere like src/backend/access/transam/xlogwait.c ?
>>
>> Implemented this as a separate patch (0001). Also rebased other
>> pending patches on that. 0004 now revises header comment of
>> xlogwait.c with new procedure signature.
>
>
> I've looked at v5 of a patchset.
> 0002:
>
> As stated in latch.c
>
> - WL_POSTMASTER_DEATH: Wait for postmaster to die
> - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
>
> * wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
> * if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
> * return value if the postmaster dies
>
> It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we should look for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH instead of just fatal immediately?
As I get from the code, WL_EXIT_ON_PM_DEATH cause process to just
proc_exit(1) without throwing FATAL. So, in the most of situations we
do throw FATAL after seeing WL_POSTMASTER_DEATH event. So, it's
reasonable to do the same here. But indeed, this is a question (not
related to this patch) whether WL_EXIT_ON_PM_DEATH should cause
process to throw FATAL.
Libpq ends up with FATAL on WL_POSTMASTER_DEATH.
In a backend code on WL_POSTMASTER_DEATH: SyncRepWaitForLSN() sets ProcDiePending but don't FATAL. Walsender exits proc_exit(1).
I suppose WL_POSTMASTER_DEATH expected behavior is "Do whatever you want: wait for postmaster to die or end up immediately".
I think path 0002 is good.
I looked through patches v6 and I think they're all good now.
Regards,
Pavel Borisov
Supabase.
On Wed, Oct 23, 2024 at 9:02 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > On Wed, 23 Oct 2024 at 00:12, Alexander Korotkov <aekorotkov@gmail.com> wrote: >> Thank you for your review. >> >> On Tue, Oct 22, 2024 at 4:30 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: >> > On Tue, 22 Oct 2024 at 13:26, Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> >> >> On Wed, Oct 16, 2024 at 11:20 PM Alexander Korotkov >> >> <aekorotkov@gmail.com> wrote: >> >> > >> >> > On Wed, Oct 16, 2024 at 10:35 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> > > On 02.09.24 01:55, Alexander Korotkov wrote: >> >> > > > On Mon, Sep 2, 2024 at 2:28 AM Michael Paquier <michael@paquier.xyz> wrote: >> >> > > >> On Sun, Sep 01, 2024 at 10:35:27PM +0300, Alexander Korotkov wrote: >> >> > > >>> This path hasn't changes since the patch revision when it was a >> >> > > >>> utility command. I agree that this doesn't look like proper path for >> >> > > >>> stored procedure. But I don't think src/backend/utils/adt is >> >> > > >>> appropriate path either, because it's not really about data type. >> >> > > >>> pg_wal_replay_wait() looks a good neighbor for >> >> > > >>> pg_wal_replay_pause()/pg_wal_replay_resume() and other WAL-related >> >> > > >>> functions. So, what about moving it to src/backend/access/transam? >> >> > > >> >> >> > > >> Moving the new function to xlogfuncs.c while publishing >> >> > > >> WaitForLSNReplay() makes sense to me. >> >> > > > >> >> > > > Thank you for proposal. I like this. >> >> > > > >> >> > > > Could you, please, check the attached patch? >> >> > > >> >> > > We still have stuff in src/backend/commands/waitlsn.c that is nothing >> >> > > like a "command". You have moved some stuff elsewhere, but what are you >> >> > > planning to do with the rest? >> >> > >> >> > Thank you for spotting this another time. What about moving that >> >> > somewhere like src/backend/access/transam/xlogwait.c ? >> >> >> >> Implemented this as a separate patch (0001). Also rebased other >> >> pending patches on that. 0004 now revises header comment of >> >> xlogwait.c with new procedure signature. >> > >> > >> > I've looked at v5 of a patchset. >> >> > 0002: >> > >> > As stated in latch.c >> > >> > - WL_POSTMASTER_DEATH: Wait for postmaster to die >> > - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies >> > >> > * wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit >> > * if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the >> > * return value if the postmaster dies >> > >> > It's not completely clear to me if these comments need some clarification (not related to the patchset), or if we shouldlook for WL_EXIT_ON_PM_DEATH for immediately FATAL. Or waiting for postmaster to die on WL_POSTMASTER_DEATH insteadof just fatal immediately? >> >> As I get from the code, WL_EXIT_ON_PM_DEATH cause process to just >> proc_exit(1) without throwing FATAL. So, in the most of situations we >> do throw FATAL after seeing WL_POSTMASTER_DEATH event. So, it's >> reasonable to do the same here. But indeed, this is a question (not >> related to this patch) whether WL_EXIT_ON_PM_DEATH should cause >> process to throw FATAL. > > > Libpq ends up with FATAL on WL_POSTMASTER_DEATH. > In a backend code on WL_POSTMASTER_DEATH: SyncRepWaitForLSN() sets ProcDiePending but don't FATAL. Walsender exits proc_exit(1). > I suppose WL_POSTMASTER_DEATH expected behavior is "Do whatever you want: wait for postmaster to die or end up immediately". > I think path 0002 is good. > > I looked through patches v6 and I think they're all good now. Thank you. I will push this patchset if now objections. ------ Regards, Alexander Korotkov Supabase
If you call this procedure on a stand-alone server, you get: postgres=# call pg_wal_replay_wait('1234/0'); ERROR: recovery is not in progress DETAIL: Recovery ended before replaying target LSN 1234/0; last replay LSN 0/0. The DETAIL seems a bit misleading. Recovery never ended, because it never started in the first place. Last replay LSN is indeed 0/0, but that's not helpful. If a standby server has been promoted and you pass an LSN that's earlier than the last replay LSN, it returns successfully. That makes sense I guess; if you connect to a standby and wait for it to replay a commit that you made in the primary, and the standby gets promoted, that seems correct. But it's a little inconsistent: If the standby crashes immediately after promotion, and you call pg_wal_replay_wait() after recovery, it returns success. However, if you shut down the promoted server and restart it, then last replay LSN is 0/0, and the call will fail because no recovery happened. What is the use case for the 'no_error' argument? Why would you ever want to pass no_error=true ? The separate pg_wal_replay_wait_status() function feels weird to me. Also it surely shouldn't be marked IMMUTABLE nor parallel safe. This would benefit from more documentation, explaining how you would use this in practice. I believe the use case is that you want "read your writes" consistency between a primary and a standby. You commit a transaction in the primary, and you then want to run read-only queries in a standby, and you want to make sure that you see your own commit, but you're ok with slightly delayed results otherwise. It would be good to add a chapter somewhere in the docs to show how to do that in practice with these functions. -- Heikki Linnakangas Neon (https://neon.tech)
Hi, Heikki! On Fri, Oct 25, 2024 at 9:06 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > If you call this procedure on a stand-alone server, you get: > > postgres=# call pg_wal_replay_wait('1234/0'); > ERROR: recovery is not in progress > DETAIL: Recovery ended before replaying target LSN 1234/0; last replay > LSN 0/0. > > The DETAIL seems a bit misleading. Recovery never ended, because it > never started in the first place. Last replay LSN is indeed 0/0, but > that's not helpful. > > If a standby server has been promoted and you pass an LSN that's earlier > than the last replay LSN, it returns successfully. That makes sense I > guess; if you connect to a standby and wait for it to replay a commit > that you made in the primary, and the standby gets promoted, that seems > correct. But it's a little inconsistent: If the standby crashes > immediately after promotion, and you call pg_wal_replay_wait() after > recovery, it returns success. However, if you shut down the promoted > server and restart it, then last replay LSN is 0/0, and the call will > fail because no recovery happened. > > What is the use case for the 'no_error' argument? Why would you ever > want to pass no_error=true ? The separate pg_wal_replay_wait_status() > function feels weird to me. Also it surely shouldn't be marked IMMUTABLE > nor parallel safe. > > This would benefit from more documentation, explaining how you would use > this in practice. I believe the use case is that you want "read your > writes" consistency between a primary and a standby. You commit a > transaction in the primary, and you then want to run read-only queries > in a standby, and you want to make sure that you see your own commit, > but you're ok with slightly delayed results otherwise. It would be good > to add a chapter somewhere in the docs to show how to do that in > practice with these functions. Thank you for your feedback! I do agree that error reporting for "not in recovery" case needs to be improved, as well, as the documentation. I see that pg_wal_replay_wait_status() might look weird, but it seems to me like the best of feasible solutions. Given that pg_wal_replay_wait() procedure can't work concurrently to a query involving pg_wal_replay_wait_status() function, I think pg_wal_replay_wait_status() should be stable and parallel safe. This is the brief answer. I will be able to come back with more details on Monday. ------ Regards, Alexander Korotkov Supabase
On 25/10/2024 14:56, Alexander Korotkov wrote: > I see that pg_wal_replay_wait_status() might look weird, but it seems > to me like the best of feasible solutions. I haven't written many procedures, but our docs say: > Procedures do not return a function value; hence CREATE PROCEDURE lacks a RETURNS clause. However, procedures can instead return data to their callers via output parameters. Did you consider using an output parameter? > Given that > pg_wal_replay_wait() procedure can't work concurrently to a query > involving pg_wal_replay_wait_status() function, I think > pg_wal_replay_wait_status() should be stable and parallel safe. If you call pg_wal_replay_wait() in the backend process, and pg_wal_replay_wait_status() in a parallel worker process, it won't return the result of the wait. Probably not what you'd expect. So I'd argue that it should be parallel unsafe. > This is the brief answer. I will be able to come back with more > details on Monday. Thanks. A few more minor issues I spotted while playing with this: - If you pass a very high value as the timeout, e.g. INT_MAX-1, it wraps around and doesn't wait at all - You can pass NULLs as arguments. That should probably not be allowed, or we need to document what it means. This is disappointing: > postgres=# set default_transaction_isolation ='repeatable read'; > SET > postgres=# call pg_wal_replay_wait('0/55DA24F'); > ERROR: pg_wal_replay_wait() must be only called without an active or registered snapshot > DETAIL: Make sure pg_wal_replay_wait() isn't called within a transaction with an isolation level higher than READ COMMITTED,another procedure, or a function. Is there any way we could make that work? Otherwise, the feature just basically doesn't work if you use repeatable read. -- Heikki Linnakangas Neon (https://neon.tech)
Point of order: Discussions of commits and potential followup commits are best redirected to -hackers rather than continuing to use -committers. ...Robert
On Mon, Oct 28, 2024 at 5:47 PM Robert Haas <robertmhaas@gmail.com> wrote: > Point of order: > > Discussions of commits and potential followup commits are best > redirected to -hackers rather than continuing to use -committers. Thank you for pointing this. My response to Heikki's last message will be redirected to -hackers. ------ Regards, Alexander Korotkov Supabase