Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uA2M0DmUMRJ6VZkcuPWdgnwd6m5jGqfiBG4Y6Nm6dumiw@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Wed, Oct 25, 2023 at 3:15 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On 10/25/23 5:00 AM, shveta malik wrote: > > On Tue, Oct 24, 2023 at 11:54 AM Drouvot, Bertrand > > <bertranddrouvot.pg@gmail.com> wrote: > >> > >> Hi, > >> > >> On 10/23/23 2:56 PM, shveta malik wrote: > >>> On Mon, Oct 23, 2023 at 5:52 PM Drouvot, Bertrand > >>> <bertranddrouvot.pg@gmail.com> wrote: > >> > >>>> We are waiting for DEFAULT_NAPTIME_PER_CYCLE (3 minutes) before checking if there > >>>> is new synced slot(s) to be created on the standby. Do we want to keep this behavior > >>>> for V1? > >>>> > >>> > >>> I think for the slotsync workers case, we should reduce the naptime in > >>> the launcher to say 30sec and retain the default one of 3mins for > >>> subscription apply workers. Thoughts? > >>> > >> > >> Another option could be to keep DEFAULT_NAPTIME_PER_CYCLE and create a new > >> API on the standby that would refresh the list of sync slot at wish, thoughts? > >> > > > > Do you mean API to refresh list of DBIDs rather than sync-slots? > > As per current design, launcher gets DBID lists for all the failover > > slots from the primary at intervals of DEFAULT_NAPTIME_PER_CYCLE. > > I mean an API to get a newly created slot on the primary being created/synced on > the standby at wish. > > Also let's imagine this scenario: > > - create logical_slot1 on the primary (and don't start using it) > > Then on the standby we'll get things like: > > 2023-10-25 08:33:36.897 UTC [740298] LOG: waiting for remote slot "logical_slot1" LSN (0/C00316A0) and catalog xmin (752)to pass local slot LSN (0/C0049530) and and catalog xmin (754) > > That's expected and due to the fact that ReplicationSlotReserveWal() does set the slot > restart_lsn to a value < at the corresponding restart_lsn slot on the primary. > > - create logical_slot2 on the primary (and start using it) > > Then logical_slot2 won't be created/synced on the standby until there is activity on logical_slot1 on the primary > that would produce things like: > > 2023-10-25 08:41:35.508 UTC [740298] LOG: wait over for remote slot "logical_slot1" as its LSN (0/C005FFD8) and catalogxmin (756) has now passed local slot LSN (0/C0049530) and catalog xmin (754) > > With this new dedicated API, it will be: > > - clear that the API call is "hanging" until there is some activity on the newly created slot > (currently there is "waiting for remote slot " message in the logfile as mentioned above but > I'm not sure that's enough) > > - be possible to create/sync logical_slot2 in the example above without waiting for activity > on logical_slot1. > > Maybe we should change our current algorithm during slot creation so that a newly created inactive > slot on the primary does not block other newly created "active" slots on the primary to be created > on the standby? Depending on how we implement that, the new API may not be needed at all. > > Thoughts? > I discussed this with my colleague Hou-San and we think that one possibility could be to somehow accelerate the increment of restart_lsn on primary. This can be achieved by connecting to the remote and executing pg_log_standby_snapshot() at reasonable intervals while waiting on standby during slot creation. This may increase speed to a reasonable extent w/o having to wait for the user or bgwriter to do the same for us. The current logical decoding uses a similar approach to speed up the slot creation. I refer to usage of LogStandbySnapshot in SnapBuildWaitSnapshot() and ReplicationSlotReserveWal()). Thoughts? thanks Shveta
pgsql-hackers by date: