Let's say node pg12 in a cluster needs to be removed because it has
serious problems. I remove it by running this command on another node in
the cluster:
SELECT bdr.bdr_part_by_node_names('{pg12}');
On pg12, I run this:
BEGIN;
SET LOCAL bdr.permit_unsafe_ddl_commands = true;
SET LOCAL bdr.skip_ddl_locking = true;
SECURITY LABEL FOR 'bdr' ON DATABASE pgmirror IS '{"bdr": false}';
COMMIT;
I repair the broken node, drop the existing database, fix whatever is
wrong with it, re-create the database (empty). It's basically like a new
node. Then I try to re-join it to the cluster under the same old name:
SELECT bdr.bdr_group_join(
local_node_name := 'pg12',
node_external_dsn := 'host=pg12 dbname=pgmirror',
join_using_dsn := 'host=pg11 dbname=pgmirror'
);
SELECT bdr.bdr_node_join_wait_for_ready();
The problem is, bdr_node_join_wait_for_ready() never returns, it just
waits forever. If I go on pg11 and run SELECT * FROM bdr.bdr_nodes, I
see pg12 listed twice, with node_status k and i, respectively. On pg11 I
see this in the logs:
"System identification mismatch between connection and slot","Connection
for bdr (6211167104388615363,1,16387,) resulted in slot on node bdr
(6211167104388615363,1,17163,) instead of expected node",,,,,,,,"bdr
(6211167104388615363,1,17163,): perdb"
How can I re-join an old node to the cluster after rebuilding it from
scratch, under the old name?
Do I have to change the name every time I re-join a node?
--
Florin Andrei
http://florin.myip.org/