Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uCJ7MKgZnKLAACd-AXN0VbWM5gVJ+GRJ1Za_A2UmF3R0A@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby Re: Synchronizing slots from primary to standby Re: Synchronizing slots from primary to standby |
List | pgsql-hackers |
On Tue, Jan 16, 2024 at 3:10 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > > > There are multiple approaches discussed and tried when it comes to > > > > > > starting a slot-sync worker. I am summarizing all here: > > > > > > > > > > > > 1) Make slotsync worker as an Auxiliary Process (like checkpointer, > > > > > > walwriter, walreceiver etc). The benefit this approach provides is, it > > > > > > can control begin and stop in a more flexible way as each auxiliary > > > > > > process could have different checks before starting and can have > > > > > > different stop conditions. But it needs code duplication for process > > > > > > management(start, stop, crash handling, signals etc) and currently it > > > > > > does not support db-connection smoothly (none of the auxiliary process > > > > > > has one so far) > > > > > > > > > > > > > > > > As slotsync worker needs to perform transactions and access syscache, > > > > > we can't make it an auxiliary process as that doesn't initialize the > > > > > required stuff like syscache. Also, see the comment "Auxiliary > > > > > processes don't run transactions ..." in AuxiliaryProcessMain() which > > > > > means this is not an option. > > > > > > > > > > > > > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher > > > > > > which is neither an Auxiliary process nor a bgworker one. It allows > > > > > > db-connection and also provides flexibility to have start and stop > > > > > > conditions for a process. > > > > > > > > > > > > > > > > Yeah, due to these reasons, I think this option is worth considering > > > > > and another plus point is that this allows us to make enable_syncslot > > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER. > > > > > > > > > > > > > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our > > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a > > > > > > relevant start_time and restart_time and then the process management > > > > > > is well taken care of. It does not need any code-duplication and > > > > > > allows db-connection smoothly in registered process. The only thing it > > > > > > lacks is that it does not provide flexibility of having > > > > > > start-condition which then makes us to have 'enable_syncslot' as > > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I > > > > > > feel enable_syncslot is something which will not be changed frequently > > > > > > and with the benefits provided by bgworker infra, it seems a > > > > > > reasonably good option to choose this approach. > > > > > > > > > > > > > > > > I agree but it may be better to make it a PGC_SIGHUP parameter. > > > > > > > > > > > 4) Another option is to have Logical Replication Launcher(or a new > > > > > > process) to launch slot-sync worker. But going by the current design > > > > > > where we have only 1 slotsync worker, it may be an overhead to have an > > > > > > additional manager process maintained. > > > > > > > > > > > > > > > > I don't see any good reason to have an additional launcher process here. > > > > > > > > > > > > > > > > > Thus weighing pros and cons of all these options, we have currently > > > > > > implemented the bgworker approach (approach 3). Any feedback is > > > > > > welcome. > > > > > > > > > > > > > > > > I vote to go for (2) unless we face difficulties in doing so but (3) > > > > > is also okay especially if others also think so. > > > > > > > > I am not against any of the approaches but I still feel that when we > > > > have a standard way of doing things (bgworker) we should not keep > > > > adding code to do things in a special way unless there is a strong > > > > reason to do so. Now we need to decide if 'enable_syncslot' being > > > > PGC_POSTMASTER is a strong reason to go the non-standard way? > > > > > > > > > > Agreed and as said earlier I think it is better to make it a > > > PGC_SIGHUP. Also, not sure we can say it is a non-standard way as > > > already autovacuum launcher is handled in the same way. One more minor > > > thing is it will save us for having a new bgworker state > > > BgWorkerStart_ConsistentState_HotStandby as introduced by this patch. > > > > Why do we need to add a new BgWorkerStart_ConsistentState_HotStandby > > for the slotsync worker? Isn't it sufficient that the slotsync worker > > exits if not in hot standby mode? > > It is doable, but that will mean starting slot-sync worker even on > primary on every server restart which does not seem like a good idea. > We wanted to have a way where-in it does not start itself in > non-standby mode. > > > Is there any technical difficulty or obstacle to make the slotsync > > worker start using bgworker after reloading the config file? > > When we register slotsync worker as bgworker, we can only register the > bgworker before initializing shared memory, we cannot register > dynamically in the cycle of ServerLoop and thus we do not have > flexibility of registering/deregistering the bgworker (or controlling > the bgworker start) based on config parameters each time they change. > We can always start slot-sync worker and let it check if > enable_syncslot is ON. If not, exit and retry the next time when > postmaster will restart it after restart_time(60sec). The downside of > this approach is, even if any user does not want slot-sync > functionality and thus has permanently disabled 'enable_syncslot', it > will keep on restarting and exiting there. PFA v62. Details: v62-001: No change. v62-002: 1) Addressed slotsync.c related comments by Peter in [1]. 2) Addressed CFBot failure where there was a crash in 32 bit env while accessing DatumGetLSN 3) Addressed another CFBot failure where the test for '050_standby_failover_slots_sync.pl' was hanging. Thanks Hou-San for this fix. v62-003: It is a new patch which attempts to implement slot-sync worker as a special process which is neither a bgworker nor an Auxiliary process. Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if it is hot-standby and 'enable_syncslot' is ON. v62-004: Small change in document. v62-005: No change v62-006: Separated the failover-ready validation steps into this separate doc-patch (which were earlier present in v61-002 and v61-003). Also addressed some of the doc comments by Peter in [1]. Thanks Hou-San for providing this patch. [1]: https://www.postgresql.org/message-id/CAHut%2BPteZVNx1jQ6Hs3mEdoC%3DDNALVpJJ2mZDYim7sU-04tiaw%40mail.gmail.com thanks Shveta
Attachment
- v62-0004-Allow-logical-walsenders-to-wait-for-the-physica.patch
- v62-0005-Non-replication-connection-and-app_name-change.patch
- v62-0002-Add-logical-slot-sync-capability-to-the-physical.patch
- v62-0001-Enable-setting-failover-property-for-a-slot-thro.patch
- v62-0003-Slot-sync-worker-as-a-special-process.patch
- v62-0006-Document-the-steps-to-check-if-the-standby-is-re.patch
pgsql-hackers by date: