Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAD21AoBgzONdt3o5mzbQ4MtqAE=WseiXUOq0LMqne-nWGjZBsA@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Wed, Jan 17, 2024 at 7:30 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote: > > > PFA v62. Details: > > > > Thanks! > > > > > v62-003: > > > It is a new patch which attempts to implement slot-sync worker as a > > > special process which is neither a bgworker nor an Auxiliary process. > > > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP > > > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if > > > it is hot-standby and 'enable_syncslot' is ON. > > > > The implementation looks reasonable to me (from what I can see some parts is > > copy/paste from an already existing "special" process and some parts are > > "sync slot" specific) which makes fully sense. > > > > A few remarks: > > > > 1 === > > + * Was it the slot sycn worker? > > > > Typo: sycn > > > > 2 === > > + * ones), and no walwriter, autovac launcher or bgwriter or slot sync > > > > Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync" > > > > 3 === > > + * restarting slot slyc worker. If stopSignaled is set, the worker will > > > > Typo: slyc > > > > 4 === > > +/* Flag to tell if we are in an slot sync worker process */ > > > > s/an/a/ ? > > > > 5 === (coming from v62-0002) > > + Assert(tuplestore_tuple_count(res->tuplestore) == 1); > > > > Is it even possible for the related query to not return only one row? (I think the > > "count" ensures it). > > > > 6 === > > if (conninfo_changed || > > primary_slotname_changed || > > + old_enable_syncslot != enable_syncslot || > > (old_hot_standby_feedback != hot_standby_feedback)) > > { > > ereport(LOG, > > errmsg("slot sync worker will restart because of" > > " a parameter change")); > > > > I don't think "slot sync worker will restart" is true if one change enable_syncslot > > from on to off. > > > > IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease > > the review). But let's wait to see if others think differently. > > > > Regards, > > > > -- > > Bertrand Drouvot > > PostgreSQL Contributors Team > > RDS Open Source Databases > > Amazon Web Services: https://aws.amazon.com > > > PFA v63. > > --It addresses comments by Peter given in [1], [2], comment by Nisha > given in [3], comments by Bertrand given in [4] > --It also moves race-condition fix from patch003 to patch002 as > suggested by Swada-san offlist. Race-condition is mentioned in [5] > Thank you for updating the patch. I have some comments: --- + latestWalEnd = GetWalRcvLatestWalEnd(); + if (remote_slot->confirmed_lsn > latestWalEnd) + { + elog(ERROR, "exiting from slot synchronization as the received slot sync" + " LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X", + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), + remote_slot->name, + LSN_FORMAT_ARGS(latestWalEnd)); + } IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is typically the primary server's flush position and doesn't mean the LSN where the walreceiver received/flushed up to. Does it really happen that the slot's confirmed_flush_lsn is higher than the primary's flush lsn? --- After dropping a database on the primary, I got the following LOG (PID 2978463 is the slotsync worker on the standby): LOG: still waiting for backend with PID 2978463 to accept ProcSignalBarrier CONTEXT: WAL redo at 0/301CE00 for Database/DROP: dir 1663/16384 Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: