Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAD21AoBgzONdt3o5mzbQ4MtqAE=WseiXUOq0LMqne-nWGjZBsA@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Wed, Jan 17, 2024 at 7:30 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote:
> > > PFA v62. Details:
> >
> > Thanks!
> >
> > > v62-003:
> > > It is a new patch which attempts to implement slot-sync worker as a
> > > special process which is neither a bgworker nor an Auxiliary process.
> > > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
> > > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
> > > it is hot-standby and 'enable_syncslot' is ON.
> >
> > The implementation looks reasonable to me (from what I can see some parts is
> > copy/paste from an already existing "special" process and some parts are
> > "sync slot" specific) which makes fully sense.
> >
> > A few remarks:
> >
> > 1 ===
> > +                * Was it the slot sycn worker?
> >
> > Typo: sycn
> >
> > 2 ===
> > +                * ones), and no walwriter, autovac launcher or bgwriter or slot sync
> >
> > Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync"
> >
> > 3 ===
> > + * restarting slot slyc worker. If stopSignaled is set, the worker will
> >
> > Typo: slyc
> >
> > 4 ===
> > +/* Flag to tell if we are in an slot sync worker process */
> >
> > s/an/a/ ?
> >
> > 5 === (coming from v62-0002)
> > +       Assert(tuplestore_tuple_count(res->tuplestore) == 1);
> >
> > Is it even possible for the related query to not return only one row? (I think the
> > "count" ensures it).
> >
> > 6 ===
> >         if (conninfo_changed ||
> >                 primary_slotname_changed ||
> > +               old_enable_syncslot != enable_syncslot ||
> >                 (old_hot_standby_feedback != hot_standby_feedback))
> >         {
> >                 ereport(LOG,
> >                                 errmsg("slot sync worker will restart because of"
> >                                            " a parameter change"));
> >
> > I don't think "slot sync worker will restart" is true if one change enable_syncslot
> > from on to off.
> >
> > IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease
> > the review). But let's wait to see if others think differently.
> >
> > Regards,
> >
> > --
> > Bertrand Drouvot
> > PostgreSQL Contributors Team
> > RDS Open Source Databases
> > Amazon Web Services: https://aws.amazon.com
>
>
> PFA v63.
>
> --It addresses comments by Peter given in [1], [2], comment by Nisha
> given in [3], comments by Bertrand given in [4]
> --It also moves race-condition fix from patch003 to patch002 as
> suggested by Swada-san offlist. Race-condition is mentioned in [5]
>

Thank you for updating the patch. I have some comments:

---
+        latestWalEnd = GetWalRcvLatestWalEnd();
+        if (remote_slot->confirmed_lsn > latestWalEnd)
+        {
+                elog(ERROR, "exiting from slot synchronization as the
received slot sync"
+                         " LSN %X/%X for slot \"%s\" is ahead of the
standby position %X/%X",
+                         LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
+                         remote_slot->name,
+                         LSN_FORMAT_ARGS(latestWalEnd));
+        }

IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
typically the primary server's flush position and doesn't mean the LSN
where the walreceiver received/flushed up to. Does it really happen
that the slot's confirmed_flush_lsn is higher than the primary's flush
lsn?

---
After dropping a database on the primary, I got the following LOG (PID
2978463 is the slotsync worker on the standby):

LOG:  still waiting for backend with PID 2978463 to accept ProcSignalBarrier
CONTEXT:  WAL redo at 0/301CE00 for Database/DROP: dir 1663/16384

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx