Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAA4eK1L92BC3uaFeZJWC5fUJXZLSkFbnoa_ijmXoXWjP9FB6RA@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Mon, Jan 22, 2024 at 5:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Jan 20, 2024 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Jan 20, 2024 at 10:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Jan 19, 2024 at 5:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Jan 17, 2024 at 4:00 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > > > > I had some off-list discussions with Sawada-San, Hou-San, and Shveta > > > > on the topic of extending replication commands instead of using the > > > > current model where we fetch the required slot information via SQL > > > > using a database connection. I would like to summarize the discussion > > > > and would like to know the thoughts of others on this topic. > > > > > > > > In the current patch, we launch the slotsync worker on physical > > > > standby which connects to the specified database (currently we let > > > > users specify the required dbname in primary_conninfo) on the primary. > > > > It then fetches the required information for failover marked slots > > > > from the primary and also does some primitive checks on the upstream > > > > node via SQL (the additional checks are like whether the upstream node > > > > has a specified physical slot or whether the upstream node is a > > > > primary node or a standby node). To fetch the required information it > > > > uses a libpqwalreciever API which is mostly apt for this purpose as it > > > > supports SQL execution but for this patch, we don't need a replication > > > > connection, so we extend the libpqwalreciever connect API. > > > > > > What sort of extension we have done to 'libpqwalreciever'? Is it > > > something like by default this supports replication connections so we > > > have done an extension to the API so that we can provide an option > > > whether to create a replication connection or a normal connection? > > > > > > > Yeah and in the future there could be more as well. The other function > > added walrcv_get_dbname_from_conninfo doesn't appear to be a problem > > either for now. > > > > > > Now, the concerns related to this could be that users would probably > > > > need to change existing mechanisms/tools to update priamry_conninfo > > I'm concerned about this. In fact, a primary_conninfo value generated > by pg_basebackup does not work with enable_syncslot. > Right, but if we want can't we extend pg_basebackup to do that? It is just that I am not sure that it is a good idea to extend pg_basebackup in the first version. > > > > and one of the alternatives proposed is to have an additional GUC like > > > > slot_sync_dbname. Users won't be able to drop the database this worker > > > > is connected to aka whatever is specified in slot_sync_dbname but as > > > > the user herself sets up the configuration it shouldn't be a big deal. > > > > > > Yeah for this purpose users may use template1 or so which they > > > generally don't plan to drop. > > > > > > > Using template1 has other problems like users won't be able to create > > a new database. See [2] (point number 2.2) > > > > > > > > So in case the user wants to drop that > > > database user needs to turn off the slot syncing option and then it > > > can be done? > > > > > > > Right. > > If the user wants to continue using slot syncing, they need to switch > the database to connect. Which requires modifying primary_conninfo and > reloading the configuration file. Which further leads to restarting > the physical replication. If they use synchronous replication, it > means the application temporarily stops during that. > Yes, that would be an inconvenience but the point is we don't expect this to change often. > > > > > > Then we also discussed whether extending libpqwalreceiver's connect > > > > API is a good idea and whether we need to further extend it in the > > > > future. As far as I can see, slotsync worker's primary requirement is > > > > to execute SQL queries which the current API is sufficient, and don't > > > > see something that needs any drastic change in this API. Note that > > > > tablesync worker that executes SQL also uses these APIs, so we may > > > > need something in the future for either of those. Then finally we need > > > > a slotsync worker to also connect to a database to use SQL and fetch > > > > results. > > > > > > While looking into the patch v64-0002 I could not exactly point out > > > what sort of extensions are there in libpqwalreceiver.c, I just saw > > > one extra API for fetching the dbname from connection info? > > > > > > > Right, the worry was that we may need it in the future. > > Yes. IIUC the slotsync worker uses libpqwalreceiver to establish a > non-replication connection and to execute SQL query. But neither of > them are relevant with replication. > But we are already using libpqwalreceiver to execute SQL queries via tablesync worker. I'm a bit concerned that when we > need to extend the slotsync feature in the future we will end up > extending libpqwalreceiver, even if the new feature is not also > relevant with replication. > > > > > > > Now, let us consider if we extend the replication commands like > > > > READ_REPLICATION_SLOT and or introduce a new set of replication > > > > commands to fetch the required information then we don't need a DB > > > > connection with primary or a connection in slotsync worker. As per my > > > > current understanding, it is quite doable but I think we will slowly > > > > go in the direction of making replication commands something like SQL > > > > because today we need to extend it to fetch all slots info that have > > > > failover marked as true, the existence of a particular replication, > > > > etc. Then tomorrow, if we want to extend this work to have multiple > > > > slotsync workers say workers perdb then we have to extend the > > > > replication command to fetch per-database failover marked slots. To > > > > me, it sounds more like we are slowly adding SQL-like features to > > > > replication commands. > > Right. How about filtering slots on the standby side? That is, for > example, the LIST_SLOT command returns all slots and the slotsync > worker filters out non-failover slots. > Yeah, we can do that but it could be unnecessary network overhead when there are very few failover slots. And it may not be just one time, we need to fetch slot information periodically. -- With Regards, Amit Kapila.
pgsql-hackers by date: