Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uCHGhH+a4f3ikBwL=H-Yws_EiBhPRGRPeLbOchMHf60Cw@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Tue, Aug 1, 2023 at 4:52 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Jul 27, 2023 at 12:13 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Thu, Jul 27, 2023 at 10:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jul 26, 2023 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Mon, Jul 24, 2023 at 8:03 AM Bharath Rupireddy > > > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > > > > > > Is having one (or a few more - not > > > > > > necessarily one for each logical slot) worker for all logical slots > > > > > > enough? > > > > > > > > > > > > > > > > I guess for a large number of slots the is a possibility of a large > > > > > gap in syncing the slots which probably means we need to retain > > > > > corresponding WAL for a much longer time on the primary. If we can > > > > > prove that the gap won't be large enough to matter then this would be > > > > > probably worth considering otherwise, I think we should find a way to > > > > > scale the number of workers to avoid the large gap. > > > > > > > > > > > > > How about this: > > > > > > > > 1) On standby, spawn 1 worker per database in the start (as it is > > > > doing currently). > > > > > > > > 2) Maintain statistics on activity against each primary's database on > > > > standby by any means. Could be by maintaining 'last_synced_time' and > > > > 'last_activity_seen time'. The last_synced_time is updated every time > > > > we sync/recheck slots for that particular database. The > > > > 'last_activity_seen_time' changes only if we get any slot on that > > > > database where actually confirmed_flush or say restart_lsn has changed > > > > from what was maintained already. > > > > > > > > 3) If at any moment, we find that 'last_synced_time' - > > > > 'last_activity_seen' goes beyond a threshold, that means that DB is > > > > not active currently. Add it to list of inactive DB > > > > > > > > > > I think we should also increase the next_sync_time if in current sync, > > > there is no update. > > > > +1 > > > > > > > > > 4) Launcher on the other hand is always checking if it needs to spawn > > > > any other extra worker for any new DB. It will additionally check if > > > > number of inactive databases (maintained on standby) has gone higher > > > > (> some threshold), then it brings down the workers for those and > > > > starts a common worker which takes care of all such inactive databases > > > > (or merge all in 1), while workers for active databases remain as such > > > > (i.e. one per db). Each worker maintains the list of DBs which it is > > > > responsible for. > > > > > > > > 5) If in the list of these inactive databases, we again find any > > > > active database using the above logic, then the launcher will spawn a > > > > separate worker for that. > > > > > > > > > > I wonder if we anyway some sort of design like this because we > > > shouldn't allow to spawn as many workers as the number of databases. > > > There has to be some existing or new GUC like max_sync_slot_workers > > > which decided the number of workers. > > > > > > > Currently it does not have any such GUC for sync-slot workers. It > > mainly uses the logical-rep-worker framework for the sync-slot worker > > part and thus it relies on 'max_logical_replication_workers' GUC. Also > > it errors out if 'max_replication_slots' is set to zero. I think it is > > not the correct way of doing things for sync-slot. We can have a new > > GUC (max_sync_slot_workers) as you suggested and if the number of > > databases < max_sync_slot_workers, then we can start 1 worker per > > dbid, else divide the work equally among the max sync-workers > > possible. And for inactive database cases, we can increase the > > next_sync_time rather than starting a special worker to handle all the > > inactive databases. Thoughts? > > > > Attaching the PoC patch (0003) where attempts to implement the basic > infrastructure for the suggested design. Rebased the existing patches > (0001 and 0002) as well. > > This patch adds a new GUC max_slot_sync_workers; the default and max > value is kept at 2 and 50 respectively for this PoC patch. Now the > replication launcher divides the work equally among these many > slot-sync workers. Let us say there are multiple slots on primary > belonging to 10 DBs and say new GUC on standby is set at default value > of 2, then each worker on standby will manage 5 dbs individually and > will keep on synching the slots for them. If a new DB is found by > replication launcher, it will assign this new db to the worker > handling the minimum number of dbs currently (or first worker in case > of equal count) and that worker will pick up the new db the next time > it tries to sync the slots. > I have kept the changes in separate patches (003) for ease of review. > Since this is just a PoC patch, many things are yet to be done > appropriately, will cover those in next versions. > Attaching new set of patches which attempt to implement below changes: 1) Logical Replication launcher now gets only the list of unique dbids belonging to slots in 'synchronize_slot_names' instead of getting all the slots-data. This has been implemented using the new command LIST_DBID_FOR_LOGICAL_SLOTS. 2) The launcher assigns the DBs to sync slot workers. Each worker will have its own dbids list. Since the upper limit of this dbid-count is not known, it is now allocated using dsm. The launcher initially allocates memory to hold 100 dbids for each worker. If this limit is exhausted, it reallocates this memory with size incremented by 100 again and relaunches the worker. This re-launched worker will continue to have the existing set of DBs which it was managing earlier plus the new DB. Both these changes are in patch v11_0002. The earlier patch v10_0003 is now merged to 0002 itself. More on standby-side design of this PoC patch can be found in commit message of v11-0002 Thanks Ajin for working on 1. thanks Shveta
Attachment
pgsql-hackers by date: