Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: Synchronizing slots from primary to standby |
| Date | |
| Msg-id | CAD21AoBgzONdt3o5mzbQ4MtqAE=WseiXUOq0LMqne-nWGjZBsA@mail.gmail.com Whole thread Raw |
| In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
| Responses |
Re: Synchronizing slots from primary to standby
|
| List | pgsql-hackers |
On Wed, Jan 17, 2024 at 7:30 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote:
> > > PFA v62. Details:
> >
> > Thanks!
> >
> > > v62-003:
> > > It is a new patch which attempts to implement slot-sync worker as a
> > > special process which is neither a bgworker nor an Auxiliary process.
> > > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
> > > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
> > > it is hot-standby and 'enable_syncslot' is ON.
> >
> > The implementation looks reasonable to me (from what I can see some parts is
> > copy/paste from an already existing "special" process and some parts are
> > "sync slot" specific) which makes fully sense.
> >
> > A few remarks:
> >
> > 1 ===
> > + * Was it the slot sycn worker?
> >
> > Typo: sycn
> >
> > 2 ===
> > + * ones), and no walwriter, autovac launcher or bgwriter or slot sync
> >
> > Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync"
> >
> > 3 ===
> > + * restarting slot slyc worker. If stopSignaled is set, the worker will
> >
> > Typo: slyc
> >
> > 4 ===
> > +/* Flag to tell if we are in an slot sync worker process */
> >
> > s/an/a/ ?
> >
> > 5 === (coming from v62-0002)
> > + Assert(tuplestore_tuple_count(res->tuplestore) == 1);
> >
> > Is it even possible for the related query to not return only one row? (I think the
> > "count" ensures it).
> >
> > 6 ===
> > if (conninfo_changed ||
> > primary_slotname_changed ||
> > + old_enable_syncslot != enable_syncslot ||
> > (old_hot_standby_feedback != hot_standby_feedback))
> > {
> > ereport(LOG,
> > errmsg("slot sync worker will restart because of"
> > " a parameter change"));
> >
> > I don't think "slot sync worker will restart" is true if one change enable_syncslot
> > from on to off.
> >
> > IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease
> > the review). But let's wait to see if others think differently.
> >
> > Regards,
> >
> > --
> > Bertrand Drouvot
> > PostgreSQL Contributors Team
> > RDS Open Source Databases
> > Amazon Web Services: https://aws.amazon.com
>
>
> PFA v63.
>
> --It addresses comments by Peter given in [1], [2], comment by Nisha
> given in [3], comments by Bertrand given in [4]
> --It also moves race-condition fix from patch003 to patch002 as
> suggested by Swada-san offlist. Race-condition is mentioned in [5]
>
Thank you for updating the patch. I have some comments:
---
+ latestWalEnd = GetWalRcvLatestWalEnd();
+ if (remote_slot->confirmed_lsn > latestWalEnd)
+ {
+ elog(ERROR, "exiting from slot synchronization as the
received slot sync"
+ " LSN %X/%X for slot \"%s\" is ahead of the
standby position %X/%X",
+ LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
+ remote_slot->name,
+ LSN_FORMAT_ARGS(latestWalEnd));
+ }
IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
typically the primary server's flush position and doesn't mean the LSN
where the walreceiver received/flushed up to. Does it really happen
that the slot's confirmed_flush_lsn is higher than the primary's flush
lsn?
---
After dropping a database on the primary, I got the following LOG (PID
2978463 is the slotsync worker on the standby):
LOG: still waiting for backend with PID 2978463 to accept ProcSignalBarrier
CONTEXT: WAL redo at 0/301CE00 for Database/DROP: dir 1663/16384
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: