Home > mailing lists

RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From	Zhijie Hou (Fujitsu)
Subject	RE: Synchronizing slots from primary to standby
Date	February 6, 2024 13:49:17
Msg-id	OS0PR01MB5716923268FE208003DC784094462@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: Synchronizing slots from primary to standby (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: Synchronizing slots from primary to standby Re: Synchronizing slots from primary to standby
List	pgsql-hackers

Tree view

On Tuesday, February 6, 2024 3:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > >
> > > ---
> > > Since Two processes (e.g. the slotsync worker and
> > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > information, there is a race condition where slot's
> > > confirmed_flush_lsn goes backward.
> > >
> >
> > Right, this is possible, though there shouldn't be a problem because
> > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > required WAL won't be removed. Having said that, I can think of two
> > ways to avoid it: (a) We can have some flag in shared memory using
> > which we can detect whether any other process is doing slot
> > syncronization and then either error out at that time or simply wait
> > or may take nowait kind of parameter from user to decide what to do?
> > If this is feasible, we can simply error out for the first version and
> > extend it later if we see any use cases for the same (b) similar to
> > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > error, this is good for now but in future we may still have another
> > similar issue, so I would prefer (a) among these but I am fine if you
> > prefer (b) or have some other ideas like just note down in comments
> > that this is a harmless case and can happen only very rarely.
> 
> Thank you for sharing the ideas. I would prefer (a). For (b), the same issue still
> happens for other fields.

Attach the V79 patch which includes the following changes. (Note that only
0001 is sent in this version, we will send the later patches after rebasing)

1. Address all the comments from Amit[1], all the comments from Peter[2] and some of
   the comments from Sawada-san[3].
2. Using a flag in shared to memory to restrcit concurrent slot sync.
3. Add more tap tests for pg_sync_replication_slots function.

[1] https://www.postgresql.org/message-id/CAA4eK1KGHT9S-Bst_G1CUNQvRep%3DipMs5aTBNRQFVi6TogbJ9w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPtyoRf3adoLoTrbL6momzkhXAFKz656Vv9YRu4cp%3D6Yig%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAD21AoCEkcTaPb%2BGdOhSQE49_mKJG6D64quHcioJGx6RCqMv%2BQ%40mail.gmail.com

Best Regards,
Hou zj

Attachment

v79-0001-Add-a-slot-synchronization-function.patch

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 06 February 2024, 13:07:20
Subject: Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

From: "Zhijie Hou (Fujitsu)"
Date: 06 February 2024, 13:51:01
Subject: RE: Synchronizing slots from primary to standby

RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

Attachment

Previous

Next