Home > mailing lists

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From	Dilip Kumar
Subject	Re: Synchronizing slots from primary to standby
Date	January 20, 2024 05:21:49
Msg-id	CAFiTN-vNyO879VfvgcxD=bUYNjCRKQR8DGGHD8zHycv8uD1QeQ@mail.gmail.com Whole thread
In response to	Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Synchronizing slots from primary to standby
List	pgsql-hackers

Tree view

On Fri, Jan 19, 2024 at 5:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 4:00 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
>
> I had some off-list discussions with Sawada-San, Hou-San, and Shveta
> on the topic of extending replication commands instead of using the
> current model where we fetch the required slot information via SQL
> using a database connection. I would like to summarize the discussion
> and would like to know the thoughts of others on this topic.
>
> In the current patch, we launch the slotsync worker on physical
> standby which connects to the specified database (currently we let
> users specify the required dbname in primary_conninfo) on the primary.
> It then fetches the required information for failover marked slots
> from the primary and also does some primitive checks on the upstream
> node via SQL (the additional checks are like whether the upstream node
> has a specified physical slot or whether the upstream node is a
> primary node or a standby node). To fetch the required information it
> uses a libpqwalreciever API which is mostly apt for this purpose as it
> supports SQL execution but for this patch, we don't need a replication
> connection, so we extend the libpqwalreciever connect API.

What sort of extension we have done to 'libpqwalreciever'? Is it
something like by default this supports replication connections so we
have done an extension to the API so that we can provide an option
whether to create a replication connection or a normal connection?

> Now, the concerns related to this could be that users would probably
> need to change existing mechanisms/tools to update priamry_conninfo
> and one of the alternatives proposed is to have an additional GUC like
> slot_sync_dbname. Users won't be able to drop the database this worker
> is connected to aka whatever is specified in slot_sync_dbname but as
> the user herself sets up the configuration it shouldn't be a big deal.

Yeah for this purpose users may use template1 or so which they
generally don't plan to drop.  So in case the user wants to drop that
database user needs to turn off the slot syncing option and then it
can be done?

> Then we also discussed whether extending libpqwalreceiver's connect
> API is a good idea and whether we need to further extend it in the
> future. As far as I can see, slotsync worker's primary requirement is
> to execute SQL queries which the current API is sufficient, and don't
> see something that needs any drastic change in this API. Note that
> tablesync worker that executes SQL also uses these APIs, so we may
> need something in the future for either of those. Then finally we need
> a slotsync worker to also connect to a database to use SQL and fetch
> results.

While looking into the patch v64-0002 I could not exactly point out
what sort of extensions are there in libpqwalreceiver.c, I just saw
one extra API for fetching the dbname from connection info?

> Now, let us consider if we extend the replication commands like
> READ_REPLICATION_SLOT and or introduce a new set of replication
> commands to fetch the required information then we don't need a DB
> connection with primary or a connection in slotsync worker. As per my
> current understanding, it is quite doable but I think we will slowly
> go in the direction of making replication commands something like SQL
> because today we need to extend it to fetch all slots info that have
> failover marked as true, the existence of a particular replication,
> etc. Then tomorrow, if we want to extend this work to have multiple
> slotsync workers say workers perdb then we have to extend the
> replication command to fetch per-database failover marked slots. To
> me, it sounds more like we are slowly adding SQL-like features to
> replication commands.
>
> Apart from this when we are reading per-db replication slots without
> connecting to a database, we probably need some additional protection
> mechanism so that the database won't get dropped.

Something like locking the database only while fetching the slots?

> Considering all this it seems that for now probably extending
> replication commands can simplify a few things like mentioned above
> but using SQL's with db-connection is more extendable.

Even I have similar thoughts.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 20 January 2024, 04:34:21
Subject: Re: Optimize duplicate code and fix memory leak in function fetch_remote_table_info()

From: John Naylor
Date: 20 January 2024, 05:50:47
Subject: Re: Change GUC hashtable to use simplehash?

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

Previous

Next