Thread: BDR table level replication questions
Hello,
I am trying to setup table level replication with BDR 0.9.0 but I am running into some errors. I believe I must be missing a step or two in the process.
I currently have a two node group that I want to build.
1) On node1 I have a database built with the tables I want to replicate; table1, table2, table3.
2) On node2 I have created the database to hold the tables (just issued a create database statement) but no other objects have been added on node2.
3) On node1 I have issued the following commands without error:
SELECT bdr.bdr_group_create(
local_node_name := 'node1',
node_external_dsn := 'port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.0.101',
replication_sets := ARRAY['test_rep_set']
);
SELECT bdr.bdr_node_join_wait_for_ready();
4) On node2 I then issue the following to join the group with out error:
SELECT bdr.bdr_group_join(
local_node_name := 'node2',
node_external_dsn := 'port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102',
join_using_dsn:= 'port=5432 dbname=apimgtdb user=postgres password=PostAlign host=10.0.0.101',
replication_sets := ARRAY[’test_rep_set']
);
5) On node2, if I then issue;
SELECT bdr.bdr_node_join_wait_for_ready();
the command just hangs. So I killed the command
6) Back on node1, I issue the command and receive an error:
select bdr.table_set_replication_sets(’table1', ARRAY[’test_rep_set'])
;
ERROR: No peer nodes or peer node count unknown, cannot acquire DDL lock
HINT: BDR is probably still starting up, wait a while
CONTEXT: SQL statement "SECURITY LABEL FOR bdr ON TABLE table1 IS '{ "sets" : [“test_rep_set"] }'"
PL/pgSQL function bdr.table_set_replication_sets(regclass,text[]) line 30 at EXECUTE statement
What am I missing? How are the steps different from setting database replication?
Thanks
On 15 May 2015 at 04:26, Dennis <dennisr@visi.com> wrote:
What am I missing? How are the steps different from setting database replication?
Please show the log output from both nodes, and the contents of "SELECT * FROM bdr.bdr_nodes" and "SELECT * FROM bdr.bdr_connections" on each node.
On node1:
apimgtdb=# SELECT * FROM bdr.bdr_nodes
apimgtdb-# ;
node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn
| node_init_from_dsn
---------------------+---------------+------------+-------------+-----------+----------------------------------------------------------------------------
--+--------------------
6145502501690488441 | 2 | 16424 | r | node1 | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101
|
(1 row)
apimgtdb=# SELECT * FROM bdr.bdr_connections
apimgtdb-# ;
conn_sysid | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |
conn_dsn | conn_apply_delay | conn_replication_sets
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+--------------
----------------------------------------------------------------+------------------+------------------------
6145502501690488441 | 2 | 16424 | 0 | 0 | 0 | f | port=5432 dbn
ame=apimgtdb user=postgres password=XXXXX host=10.0.0.101 | | {test_rep_set}
(1 row)
< 2015-05-15 05:47:47.070 PDT >ERROR: No peer nodes or peer node count unknown, cannot acquire DDL lock
< 2015-05-15 05:47:47.070 PDT >HINT: BDR is probably still starting up, wait a while
< 2015-05-15 05:47:47.070 PDT >CONTEXT: SQL statement "SECURITY LABEL FOR bdr ON TABLE idn_oauth2_access_token IS '{ "sets" : [“test_rep_set"] }'"
PL/pgSQL function bdr.table_set_replication_sets(regclass,text[]) line 30 at EXECUTE statement
< 2015-05-15 05:47:47.070 PDT >STATEMENT: select bdr.table_set_replication_sets(’table1', ARRAY[’test_rep_set']);
On node2:
apimgtdb=# SELECT * FROM bdr.bdr_nodes;
node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn
| node_init_from_dsn
---------------------+---------------+------------+-------------+-----------+-------------------------------------------------------------------------
----+------------------------------------------------------------------------------
6148366974419236867 | 1 | 16386 | i | node2 | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102
| port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101
(1 row)
apimgtdb=# SELECT * FROM bdr.bdr_connections;
conn_sysid | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |
conn_dsn | conn_apply_delay | conn_replication_sets
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+-----------
------------------------------------------------------------------+------------------+------------------------
6148366974419236867 | 1 | 16386 | 0 | 0 | 0 | f | port=5432
dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102 | | {test_rep_set}
(1 row)
The following lines repeat in the logs on node2:
< 2015-05-15 09:04:25.874 EDT >LOG: worker process: bdr db: apimgtdb (PID 16330) exited with exit code 1
< 2015-05-15 09:04:30.880 EDT >LOG: starting background worker process "bdr db: apimgtdb"
< 2015-05-15 09:04:30.915 EDT >ERROR: previous init failed, manual cleanup is required
< 2015-05-15 09:04:30.915 EDT >DETAIL: Found bdr.bdr_nodes entry for bdr (6148366974419236867,1,16386,) with state=i in remote bdr.bdr_nodes
< 2015-05-15 09:04:30.915 EDT >HINT: Remove all replication identifiers and slots corresponding to this node from the init target node then drop and recreate this database and try again
< 2015-05-15 09:04:30.916 EDT >LOG: worker process: bdr db: apimgtdb (PID 16338) exited with exit code 1
On May 15, 2015, at 2:30 AM, Craig Ringer <craig@2ndquadrant.com> wrote:SELECT * FROM bdr.bdr_connections
I think I spotted the problem today, I am missing a role on node2.
Is there a pointer to or can you provide list of steps to take for the manual cleanup mentioned the log file. I am assuming I just need to remove the relevant entries in the bdr tables just on node2 in my case. Is that correct?
On May 15, 2015, at 8:07 AM, Dennis <dennisr@visi.com> wrote:On node1:apimgtdb=# SELECT * FROM bdr.bdr_nodesapimgtdb-# ;node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn| node_init_from_dsn---------------------+---------------+------------+-------------+-----------+------------------------------------------------------------------------------+--------------------6145502501690488441 | 2 | 16424 | r | node1 | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101|(1 row)apimgtdb=# SELECT * FROM bdr.bdr_connectionsapimgtdb-# ;conn_sysid | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |conn_dsn | conn_apply_delay | conn_replication_sets---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+------------------------------------------------------------------------------+------------------+------------------------6145502501690488441 | 2 | 16424 | 0 | 0 | 0 | f | port=5432 dbname=apimgtdb user=postgres password=XXXXX host=10.0.0.101 | | {test_rep_set}(1 row)< 2015-05-15 05:47:47.070 PDT >ERROR: No peer nodes or peer node count unknown, cannot acquire DDL lock< 2015-05-15 05:47:47.070 PDT >HINT: BDR is probably still starting up, wait a while< 2015-05-15 05:47:47.070 PDT >CONTEXT: SQL statement "SECURITY LABEL FOR bdr ON TABLE idn_oauth2_access_token IS '{ "sets" : [“test_rep_set"] }'"PL/pgSQL function bdr.table_set_replication_sets(regclass,text[]) line 30 at EXECUTE statement< 2015-05-15 05:47:47.070 PDT >STATEMENT: select bdr.table_set_replication_sets(’table1', ARRAY[’test_rep_set']);On node2:apimgtdb=# SELECT * FROM bdr.bdr_nodes;node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn| node_init_from_dsn---------------------+---------------+------------+-------------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------------6148366974419236867 | 1 | 16386 | i | node2 | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102| port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101(1 row)apimgtdb=# SELECT * FROM bdr.bdr_connections;conn_sysid | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |conn_dsn | conn_apply_delay | conn_replication_sets---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+-----------------------------------------------------------------------------+------------------+------------------------6148366974419236867 | 1 | 16386 | 0 | 0 | 0 | f | port=5432dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102 | | {test_rep_set}(1 row)The following lines repeat in the logs on node2:< 2015-05-15 09:04:25.874 EDT >LOG: worker process: bdr db: apimgtdb (PID 16330) exited with exit code 1< 2015-05-15 09:04:30.880 EDT >LOG: starting background worker process "bdr db: apimgtdb"< 2015-05-15 09:04:30.915 EDT >ERROR: previous init failed, manual cleanup is required< 2015-05-15 09:04:30.915 EDT >DETAIL: Found bdr.bdr_nodes entry for bdr (6148366974419236867,1,16386,) with state=i in remote bdr.bdr_nodes< 2015-05-15 09:04:30.915 EDT >HINT: Remove all replication identifiers and slots corresponding to this node from the init target node then drop and recreate this database and try again< 2015-05-15 09:04:30.916 EDT >LOG: worker process: bdr db: apimgtdb (PID 16338) exited with exit code 1On May 15, 2015, at 2:30 AM, Craig Ringer <craig@2ndquadrant.com> wrote:SELECT * FROM bdr.bdr_connections