On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
I found some issues and fixed those issues with top up patches
v23-0012 and v23-0013
1.
Suppose there is a cascade physical replication node1->node2->node3.
Now if we run pg_createsubscriber with node1 as primary and node2 as
standby, pg_createsubscriber will be successful but the connection
between node2 and node3 will not be retained and log og node3 will
give error:
2024-02-20 12:32:12.340 IST [277664] FATAL: database system
identifier differs between the primary and standby
2024-02-20 12:32:12.340 IST [277664] DETAIL: The primary's identifier
is 7337575856950914038, the standby's identifier is
7337575783125171076.
2024-02-20 12:32:12.341 IST [277491] LOG: waiting for WAL to become
available at 0/3000F10
To fix this I am avoiding pg_createsubscriber to run if the standby
node is primary to any other server.
Made the change in v23-0012 patch
IIRC we already discussed the cascading replication scenario. Of course,
breaking a node is not good that's why you proposed v23-0012. However,
preventing pg_createsubscriber to run if there are standbys attached to it is
also annoying. If you don't access to these hosts you need to (a) kill
walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
add a firewall rule to prevent that these hosts do not establish a connection
to the target server. I wouldn't like to include the patch as-is. IMO we need
at least one message explaining the situation to the user, I mean, add a hint
message. I'm resistant to a new option but probably a --force option is an
answer. There is no test coverage for it. I adjusted this patch (didn't include
the --force option) and add a test case.
2.
While checking 'max_replication_slots' in 'check_publisher' function,
we are not considering the temporary slot in the check:
+ if (max_repslots - cur_repslots < num_dbs)
+ {
+ pg_log_error("publisher requires %d replication slots, but
only %d remain",
+ num_dbs, max_repslots - cur_repslots);
+ pg_log_error_hint("Consider increasing max_replication_slots
to at least %d.",
+ cur_repslots + num_dbs);
+ return false;
+ }
Fixed this in v23-0013
Good catch!
Both are included in the next patch.