Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uCJ7MKgZnKLAACd-AXN0VbWM5gVJ+GRJ1Za_A2UmF3R0A@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Tue, Jan 16, 2024 at 3:10 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > > > >
> > > > > > There are multiple approaches discussed and tried when it comes to
> > > > > > starting a slot-sync worker. I am summarizing all here:
> > > > > >
> > > > > >  1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > > > > > walwriter, walreceiver etc). The benefit this approach provides is, it
> > > > > > can control begin and stop in a more flexible way as each auxiliary
> > > > > > process could have different checks before starting and can have
> > > > > > different stop conditions. But it needs code duplication for process
> > > > > > management(start, stop, crash handling, signals etc) and currently it
> > > > > > does not support db-connection smoothly (none of the auxiliary process
> > > > > > has one so far)
> > > > > >
> > > > >
> > > > > As slotsync worker needs to perform transactions and access syscache,
> > > > > we can't make it an auxiliary process as that doesn't initialize the
> > > > > required stuff like syscache. Also, see the comment "Auxiliary
> > > > > processes don't run transactions ..." in AuxiliaryProcessMain() which
> > > > > means this is not an option.
> > > > >
> > > > > >
> > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > > > > > which is neither an Auxiliary process nor a bgworker one. It allows
> > > > > > db-connection and also provides flexibility to have start and stop
> > > > > > conditions for a process.
> > > > > >
> > > > >
> > > > > Yeah, due to these reasons, I think this option is worth considering
> > > > > and another plus point is that this allows us to make enable_syncslot
> > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
> > > > >
> > > > > >
> > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > > > > > relevant start_time and restart_time and then the process management
> > > > > > is well taken care of. It does not need any code-duplication and
> > > > > > allows db-connection smoothly in registered process. The only thing it
> > > > > > lacks is that it does not provide flexibility of having
> > > > > > start-condition which then makes us to have 'enable_syncslot' as
> > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > > > > > feel enable_syncslot is something which will not be changed frequently
> > > > > > and with the benefits provided by bgworker infra, it seems a
> > > > > > reasonably good option to choose this approach.
> > > > > >
> > > > >
> > > > > I agree but it may be better to make it a PGC_SIGHUP parameter.
> > > > >
> > > > > > 4) Another option is to have Logical Replication Launcher(or a new
> > > > > > process) to launch slot-sync worker. But going by the current design
> > > > > > where we have only 1 slotsync worker, it may be an overhead to have an
> > > > > > additional manager process maintained.
> > > > > >
> > > > >
> > > > > I don't see any good reason to have an additional launcher process here.
> > > > >
> > > > > >
> > > > > > Thus weighing pros and cons of all these options, we have currently
> > > > > > implemented the bgworker approach (approach 3).  Any feedback is
> > > > > > welcome.
> > > > > >
> > > > >
> > > > > I vote to go for (2) unless we face difficulties in doing so but (3)
> > > > > is also okay especially if others also think so.
> > > >
> > > > I am not against any of the approaches but I still feel that when we
> > > > have a standard way of doing things (bgworker) we should not keep
> > > > adding code to do things in a special way unless there is a strong
> > > > reason to do so. Now we need to decide if 'enable_syncslot' being
> > > > PGC_POSTMASTER is a strong reason to go the non-standard way?
> > > >
> > >
> > > Agreed and as said earlier I think it is better to make it a
> > > PGC_SIGHUP. Also, not sure we can say it is a non-standard way as
> > > already autovacuum launcher is handled in the same way. One more minor
> > > thing is it will save us for having a new bgworker state
> > > BgWorkerStart_ConsistentState_HotStandby as introduced by this patch.
> >
> > Why do we need to add a new BgWorkerStart_ConsistentState_HotStandby
> > for the slotsync worker? Isn't it sufficient that the slotsync worker
> > exits if not in hot standby mode?
>
> It is doable, but that will mean starting slot-sync worker even on
> primary on every server restart which does not seem like a good idea.
> We wanted to have a way where-in it does not start itself in
> non-standby mode.
>
> > Is there any technical difficulty or obstacle to make the slotsync
> > worker start using bgworker after reloading the config file?
>
> When we register slotsync worker as bgworker, we can only register the
> bgworker before initializing shared memory, we cannot register
> dynamically in the cycle of ServerLoop and thus we do not have
> flexibility of registering/deregistering the bgworker  (or controlling
> the bgworker start) based on config parameters each time they change.
> We can always start slot-sync worker and let it check if
> enable_syncslot is ON. If not, exit and retry the next time when
> postmaster will restart it after restart_time(60sec). The downside of
> this approach is, even if any user does not want slot-sync
> functionality and thus has permanently disabled 'enable_syncslot', it
> will keep on restarting and exiting there.


PFA v62. Details:

v62-001: No change.

v62-002:
1) Addressed slotsync.c related comments by Peter in [1].
2) Addressed CFBot failure where there was a crash in 32 bit env while
accessing DatumGetLSN
3) Addressed another CFBot failure where the test for
'050_standby_failover_slots_sync.pl' was hanging. Thanks Hou-San for
this fix.

v62-003:
It is a new patch which attempts to implement slot-sync worker as a
special process which is neither a bgworker nor an Auxiliary process.
Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
it is hot-standby and 'enable_syncslot' is ON.

v62-004:
Small change in document.

v62-005: No change

v62-006:
Separated the failover-ready validation steps into this separate
doc-patch (which were earlier present in v61-002 and v61-003). Also
addressed some of the doc comments by Peter in [1].
Thanks Hou-San for providing this patch.

[1]: https://www.postgresql.org/message-id/CAHut%2BPteZVNx1jQ6Hs3mEdoC%3DDNALVpJJ2mZDYim7sU-04tiaw%40mail.gmail.com

thanks
Shveta

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: ResourceOwner refactoring
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby