Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uBhPx1MDHh903XpFAhpBH23KzVXyg_4VjH2zXk81oGi1w@mail.gmail.com Whole thread Raw |
In response to | RE: Synchronizing slots from primary to standby ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Tue, Dec 26, 2023 at 4:41 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Wednesday, December 20, 2023 7:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 20, 2023 at 3:29 PM shveta malik <shveta.malik@gmail.com> > > wrote: > > > > > > On Wed, Dec 20, 2023 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > > > > > On Tue, Dec 19, 2023 at 5:30 PM shveta malik <shveta.malik@gmail.com> > > wrote: > > > > > > > > > > Thanks for reviewing. I have addressed these in v50. > > > > > > > > > > > > > I was looking at this patch to see if something smaller could be > > > > independently committable. I think we can extract > > > > pg_get_slot_invalidation_cause() and commit it as that function > > > > could be independently useful as well. What do you think? > > > > > > > > > > Sure, forked another thread [1] > > > [1]: > > > > > https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6 > > N%3D > > > anX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com > > > > > > > Thanks, thinking more, we can split the patch into the following three patches > > which can be committed separately (a) Allowing the failover property to be set > > for a slot via SQL API and subscription commands > > (b) sync slot worker infrastructure (c) GUC standby_slot_names and the the > > corresponding wait logic in server-side. > > > > Thoughts? > > I agree. Here is the V54 patch set which was split based on the suggestion. > The commit message in each patch is also improved. > I would like to revisit the current dependency of slotsync worker on dbname used in 002 patch. Currently we accept dbname in primary_conninfo and thus the user has to make sure to provide one (by manually altering it) even in case of a conf file auto-generated by "pg_basebackup -R". Thus I would like to discuss if there are better ways to do it. Complete background is as follow: We need dbname for 2 purposes: 1) to connect to remote db in order to run SELECT queries to fetch the info needed by slotsync worker. 2) to make connection in slot-sync worker itself in order to be able to use libpq APIs for 1) We run 3 kind of select queries in slot-sync worker currently: a) To fetch all failover slots (logical slots) info at once in synchronize_slots(). b) To fetch a particular slot info during wait_for_primary_slot_catchup() logic (logical slot). c) To validate primary slot (physical one) and also to distinguish between standby and cascading standby by running pg_is_in_recovery(). 1) One approach to avoid dependency on dbname is using commands instead of SELECT. This will need implementing LIST_SLOTS command for a), and for b) we can use LIST_SLOTS and fetch everything (even though it is not needed) or have LIST_SLOTS with a filter on slot-name or extend READ_REPLICATION_SLOT, and for c) we can have some other command to get pg_is_in_recovery() info. But, I feel by relying on commands we will be making the extension of the slot-sync feature difficult. In future, if there is some more requirement to fetch any other info, then there too we have to implement a command. I am not sure if it is good and extensible approach. 2) Another way to avoid asking for a dbname in primary_conninfo is to use the default dbname internally. This brings us to two questions: 'How' and 'Which default db'? 2.1) To answer 'How': Using default dbname is simpler for the purpose of slot-sync worker having its own db-connection, but is a little tricky for the purpose of connection to remote_db. This is because we have to inject this dbname internally in our connection-info. 2.1.1) Say we use primary_conninfo (i.e. original one w/o dbname), then currently it could have 2 formats: a) The simple "=" format for key-value pairs, example: 'user=replication host=127.0.0.1 port=5433 dbname=postgres'. b) URI format, example: postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp We can distinguish between the 2 formats using 'uri_prefix_length' but injecting the dbname part will be messy specially for URI format. If we want to do it w/o injecting and only by changing libpq interfaces to accept dbname separately apart from conninfo, then there is no current simpler way available. It will need a good amount of changes in libpq. 2.1.2) Another way is to not rely on primary_conninfo directly but rely on 'WalRcv->conninfo' in order to connect to remote_db. This is because the latter is never URI format, it is some parsed format and appending may work. As an example, primary_conninfo = 'postgresql://replication@localhost:5433', WalRcv->conninfo loaded internally is: "user=replication passfile=/home/shveta/.pgpass channel_binding=prefer dbname=replication host=localhost port=5433 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable", '\000' So we can try appending our default dbname to this. But all the defaults loaded in WalRcv->conninfo need some careful analysis to figure out if they work for slot-sync worker case. 2.2) Now coming to 'Which default db': 2.2.1) If we use 'template1' as default db, it may block 'create db' operations on primary for the time when the slot-sync worker is connected to remote using this dbname. Example: postgres=# create database newdb1; ERROR: source database "template1" is being accessed by other users DETAIL: There is 1 other session using the database. 2.2.2) If we use 'postgres' as default db, there are chances that it can be dropped as unlike 'template1', it is allowed to be dropped by user, and if slotsync worker is connected to it, user may see: newdb1=# drop database postgres; ERROR: database "postgres" is being accessed by other users DETAIL: There is 1 other session using the database. But once the slot-sync worker or standby goes down, user can always drop this and next time slot-sync worker may not be able to come up. ================ As explained, there is no clean approach to avoid dbname dependency and thus making us implement it this way where we ask dbname in primary_conninfo. It will be good to know what others think on this and if there are better ways to do it. thanks Shveta
pgsql-hackers by date: