Home > mailing lists

Re: [HACKERS] make async slave to wait for lsn to be replayed - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: [HACKERS] make async slave to wait for lsn to be replayed
Date	June 14 15:46:15
Msg-id	CAPpHfdvGRssjqwX1+idm5Tu-eWsTcx6DthB2LhGqA1tZ29jJaw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] make async slave to wait for lsn to be replayed (Kartyshov Ivan <i.kartyshov@postgrespro.ru>)
Responses	Re: [HACKERS] make async slave to wait for lsn to be replayed
List	pgsql-hackers

Tree view

Hi, Ivan!

On Wed, Jun 12, 2024 at 11:36 AM Kartyshov Ivan
<i.kartyshov@postgrespro.ru> wrote:
>
> Hi, Alexander, Here, I made some improvements according to your
> discussion with Heikki.
>
> On 2024-04-11 18:09, Alexander Korotkov wrote:
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka@iki.fi>
> > wrote:
> >> In a nutshell, it's possible for the loop in WaitForLSN to exit
> >> without
> >> cleaning up the process from the heap. I was able to hit that by
> >> adding
> >> a delay after the addLSNWaiter() call:
> >>
> >> > TRAP: failed Assert("!procInfo->inHeap"), File: "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> >> > postgres: heikki postgres [local] CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> >> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> >> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> >> > postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> >> > postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> >> > postgres: heikki postgres [local] CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]
> >>
> >> I think there's a similar race condition if the timeout is reached at
> >> the same time that the startup process wakes up the process.
> >
> > Thank you for catching this.  I think WaitForLSN() just needs to call
> > deleteLSNWaiter() unconditionally after exit from the loop.
>
> Fix and add injection point test on this race condition.
>
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka@iki.fi>
> > wrote:
> >> The docs could use some-copy-editing, but just to point out one issue:
> >>
> >> > There are also procedures to control the progress of recovery.
> >>
> >> That's copy-pasted from an earlier sentence at the table that lists
> >> functions like pg_promote(), pg_wal_replay_pause(), and
> >> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control
> >> the
> >> progress of recovery like those functions do, it only causes the
> >> calling
> >> backend to wait.
>
> Fix documentation and add extra tests on multi-standby replication
> and cascade replication.

Thank you for the revised patch.

I see couple of items which are not addressed in this revision.
 * As Heikki pointed, that it's currently not possible in one round
trip to call call pg_wal_replay_wait() and do other job.  The attached
patch addresses this.  It milds the requirement.  Now, it's not
necessary to be in atomic context.  It's only necessary to have no
active snapshot.  This new requirement works for me so far.  I
appreciate a feedback on this.
 * As Alvaro pointed, multiple waiters case isn't covered by the test
suite.  That leads to no coverage of some code paths.  The attached
patch addresses this by adding a test case with multiple waiters.

The rest looks good to me.

Links.
1. https://www.postgresql.org/message-id/d1303584-b763-446c-9409-f4516118219f%40iki.fi
2.https://www.postgresql.org/message-id/202404051815.eri4u5q6oj26%40alvherre.pgsql

------
Regards,
Alexander Korotkov
Supabase

Attachment

v19-0001-Implement-pg_wal_replay_wait-stored-procedure.patch

pgsql-hackers by date:

From: Tomas Vondra
Date: 14 June, 15:27:50
Subject: Re: 回复: An implementation of multi-key sort

From: Kohei KaiGai
Date: 14 June, 15:54:34
Subject: assertion failure at cost_memoize_rescan()

Re: [HACKERS] make async slave to wait for lsn to be replayed - Mailing list pgsql-hackers

Attachment

Previous

Next