Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: speed up a logical replica setup
Date
Msg-id bb7ab653-0b9f-4247-8a6c-d3f113343702@app.fastmail.com
Whole thread Raw
In response to Re: speed up a logical replica setup  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Responses Re: speed up a logical replica setup
List pgsql-hackers
On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
I found some issues and fixed those issues with top up patches
v23-0012 and v23-0013
1.
Suppose there is a cascade physical replication node1->node2->node3.
Now if we run pg_createsubscriber with node1 as primary and node2 as
standby, pg_createsubscriber will be successful but the connection
between node2 and node3 will not be retained and log og node3 will
give error:
2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
identifier differs between the primary and standby
2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
is 7337575856950914038, the standby's identifier is
7337575783125171076.
2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
available at 0/3000F10

To fix this I am avoiding pg_createsubscriber to run if the standby
node is primary to any other server.
Made the change in v23-0012 patch

IIRC we already discussed the cascading replication scenario. Of course,
breaking a node is not good that's why you proposed v23-0012. However,
preventing pg_createsubscriber to run if there are standbys attached to it is
also annoying. If you don't access to these hosts you need to (a) kill
walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
add a firewall rule to prevent that these hosts do not establish a connection
to the target server. I wouldn't like to include the patch as-is. IMO we need
at least one message explaining the situation to the user, I mean, add a hint
message.  I'm resistant to a new option but probably a --force option is an
answer. There is no test coverage for it. I adjusted this patch (didn't include
the --force option) and add a test case.

2.
While checking 'max_replication_slots' in 'check_publisher' function,
we are not considering the temporary slot in the check:
+   if (max_repslots - cur_repslots < num_dbs)
+   {
+       pg_log_error("publisher requires %d replication slots, but
only %d remain",
+                    num_dbs, max_repslots - cur_repslots);
+       pg_log_error_hint("Consider increasing max_replication_slots
to at least %d.",
+                         cur_repslots + num_dbs);
+       return false;
+   }
Fixed this in v23-0013

Good catch!

Both are included in the next patch.


--
Euler Taveira

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Add publisher and subscriber to glossary documentation.
Next
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby