Thread: BDR table level replication questions

BDR table level replication questions

From
Dennis
Date:
Hello,

I am trying to setup table level replication with BDR 0.9.0 but I am running into some errors.  I believe I must be missing a step or two in the process.

I currently have a two node group that I want to build.  

1)  On node1 I have a database built with the tables I want to replicate;  table1,  table2, table3.  
2)  On node2 I have created the database to hold the tables (just issued a create database statement) but no other objects have been added on node2.
3)  On node1 I have issued the following commands without error: 

SELECT bdr.bdr_group_create(
      local_node_name := 'node1',
      node_external_dsn := 'port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.0.101',
      replication_sets := ARRAY['test_rep_set']
);

SELECT bdr.bdr_node_join_wait_for_ready();

4) On node2 I then issue the following to join the group with out error:

SELECT bdr.bdr_group_join(
      local_node_name := 'node2',
      node_external_dsn := 'port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102',
      join_using_dsn:= 'port=5432 dbname=apimgtdb user=postgres password=PostAlign host=10.0.0.101',
      replication_sets := ARRAY[’test_rep_set']
);

5) On node2, if I then issue;

SELECT bdr.bdr_node_join_wait_for_ready(); 
the command just hangs.  So I killed the command

6) Back on node1, I issue the command and receive an error:

select bdr.table_set_replication_sets(’table1', ARRAY[’test_rep_set'])
;
ERROR:  No peer nodes or peer node count unknown, cannot acquire DDL lock
HINT:  BDR is probably still starting up, wait a while
CONTEXT:  SQL statement "SECURITY LABEL FOR bdr ON TABLE table1 IS '{ "sets" : [“test_rep_set"] }'"
PL/pgSQL function bdr.table_set_replication_sets(regclass,text[]) line 30 at EXECUTE statement


What am I missing?  How are the steps different from setting database replication?

Thanks


Re: BDR table level replication questions

From
Craig Ringer
Date:

On 15 May 2015 at 04:26, Dennis <dennisr@visi.com> wrote:

What am I missing?  How are the steps different from setting database replication?


Please show the log output from both nodes, and the contents of "SELECT * FROM bdr.bdr_nodes" and "SELECT * FROM bdr.bdr_connections" on each node. 


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR table level replication questions

From
Dennis
Date:
On node1:

apimgtdb=# SELECT * FROM bdr.bdr_nodes
apimgtdb-# ;
     node_sysid      | node_timeline | node_dboid | node_status | node_name |                                node_local_dsn                              
  | node_init_from_dsn 
---------------------+---------------+------------+-------------+-----------+----------------------------------------------------------------------------
--+--------------------
 6145502501690488441 |             2 |      16424 | r           | node1     | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101
 | 
(1 row)

apimgtdb=# SELECT * FROM bdr.bdr_connections
apimgtdb-# ;
     conn_sysid      | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |              
                     conn_dsn                                   | conn_apply_delay | conn_replication_sets  
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+--------------
----------------------------------------------------------------+------------------+------------------------
 6145502501690488441 |             2 |      16424 | 0                 |                    0 |                 0 | f                      | port=5432 dbn
ame=apimgtdb user=postgres password=XXXXX host=10.0.0.101 |                  | {test_rep_set}
(1 row)


< 2015-05-15 05:47:47.070 PDT >ERROR:  No peer nodes or peer node count unknown, cannot acquire DDL lock
< 2015-05-15 05:47:47.070 PDT >HINT:  BDR is probably still starting up, wait a while
< 2015-05-15 05:47:47.070 PDT >CONTEXT:  SQL statement "SECURITY LABEL FOR bdr ON TABLE idn_oauth2_access_token IS '{ "sets" : [“test_rep_set"] }'"
        PL/pgSQL function bdr.table_set_replication_sets(regclass,text[]) line 30 at EXECUTE statement
< 2015-05-15 05:47:47.070 PDT >STATEMENT:  select bdr.table_set_replication_sets(’table1', ARRAY[’test_rep_set']);



On node2:

apimgtdb=# SELECT * FROM bdr.bdr_nodes;
     node_sysid      | node_timeline | node_dboid | node_status | node_name |                               node_local_dsn                            
    |                              node_init_from_dsn                              
---------------------+---------------+------------+-------------+-----------+-------------------------------------------------------------------------
----+------------------------------------------------------------------------------
 6148366974419236867 |             1 |      16386 | i           | node2     | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102
 | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101
(1 row)


apimgtdb=# SELECT * FROM bdr.bdr_connections;
     conn_sysid      | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |           
                       conn_dsn                                   | conn_apply_delay | conn_replication_sets  
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+-----------
------------------------------------------------------------------+------------------+------------------------
 6148366974419236867 |             1 |      16386 | 0                 |                    0 |                 0 | f                      | port=5432 
dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102 |                  | {test_rep_set}
(1 row)

The following lines repeat in the logs on node2:

< 2015-05-15 09:04:25.874 EDT >LOG:  worker process: bdr db: apimgtdb (PID 16330) exited with exit code 1
< 2015-05-15 09:04:30.880 EDT >LOG:  starting background worker process "bdr db: apimgtdb"
< 2015-05-15 09:04:30.915 EDT >ERROR:  previous init failed, manual cleanup is required
< 2015-05-15 09:04:30.915 EDT >DETAIL:  Found bdr.bdr_nodes entry for bdr (6148366974419236867,1,16386,) with state=i in remote bdr.bdr_nodes
< 2015-05-15 09:04:30.915 EDT >HINT:  Remove all replication identifiers and slots corresponding to this node from the init target node then drop and recreate this database and try again
< 2015-05-15 09:04:30.916 EDT >LOG:  worker process: bdr db: apimgtdb (PID 16338) exited with exit code 1


On May 15, 2015, at 2:30 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

SELECT * FROM bdr.bdr_connections

Re: BDR table level replication questions

From
Dennis
Date:
I think I spotted the problem today,  I am missing a role on node2.

Is there a pointer to or can you provide list of steps to take for the manual cleanup mentioned the log file.  I am assuming I just need to remove the relevant entries in the bdr tables just on node2 in my case.  Is that correct?


On May 15, 2015, at 8:07 AM, Dennis <dennisr@visi.com> wrote:

On node1:

apimgtdb=# SELECT * FROM bdr.bdr_nodes
apimgtdb-# ;
     node_sysid      | node_timeline | node_dboid | node_status | node_name |                                node_local_dsn                              
  | node_init_from_dsn 
---------------------+---------------+------------+-------------+-----------+----------------------------------------------------------------------------
--+--------------------
 6145502501690488441 |             2 |      16424 | r           | node1     | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101
 | 
(1 row)

apimgtdb=# SELECT * FROM bdr.bdr_connections
apimgtdb-# ;
     conn_sysid      | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |              
                     conn_dsn                                   | conn_apply_delay | conn_replication_sets  
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+--------------
----------------------------------------------------------------+------------------+------------------------
 6145502501690488441 |             2 |      16424 | 0                 |                    0 |                 0 | f                      | port=5432 dbn
ame=apimgtdb user=postgres password=XXXXX host=10.0.0.101 |                  | {test_rep_set}
(1 row)


< 2015-05-15 05:47:47.070 PDT >ERROR:  No peer nodes or peer node count unknown, cannot acquire DDL lock
< 2015-05-15 05:47:47.070 PDT >HINT:  BDR is probably still starting up, wait a while
< 2015-05-15 05:47:47.070 PDT >CONTEXT:  SQL statement "SECURITY LABEL FOR bdr ON TABLE idn_oauth2_access_token IS '{ "sets" : [“test_rep_set"] }'"
        PL/pgSQL function bdr.table_set_replication_sets(regclass,text[]) line 30 at EXECUTE statement
< 2015-05-15 05:47:47.070 PDT >STATEMENT:  select bdr.table_set_replication_sets(’table1', ARRAY[’test_rep_set']);



On node2:

apimgtdb=# SELECT * FROM bdr.bdr_nodes;
     node_sysid      | node_timeline | node_dboid | node_status | node_name |                               node_local_dsn                            
    |                              node_init_from_dsn                              
---------------------+---------------+------------+-------------+-----------+-------------------------------------------------------------------------
----+------------------------------------------------------------------------------
 6148366974419236867 |             1 |      16386 | i           | node2     | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102
 | port=5432 dbname=apimgtdb user=postgres password=XXXX host=10.0.0.101
(1 row)


apimgtdb=# SELECT * FROM bdr.bdr_connections;
     conn_sysid      | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |           
                       conn_dsn                                   | conn_apply_delay | conn_replication_sets  
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+-----------
------------------------------------------------------------------+------------------+------------------------
 6148366974419236867 |             1 |      16386 | 0                 |                    0 |                 0 | f                      | port=5432 
dbname=apimgtdb user=postgres password=XXXX host=10.0.0.102 |                  | {test_rep_set}
(1 row)

The following lines repeat in the logs on node2:

< 2015-05-15 09:04:25.874 EDT >LOG:  worker process: bdr db: apimgtdb (PID 16330) exited with exit code 1
< 2015-05-15 09:04:30.880 EDT >LOG:  starting background worker process "bdr db: apimgtdb"
< 2015-05-15 09:04:30.915 EDT >ERROR:  previous init failed, manual cleanup is required
< 2015-05-15 09:04:30.915 EDT >DETAIL:  Found bdr.bdr_nodes entry for bdr (6148366974419236867,1,16386,) with state=i in remote bdr.bdr_nodes
< 2015-05-15 09:04:30.915 EDT >HINT:  Remove all replication identifiers and slots corresponding to this node from the init target node then drop and recreate this database and try again
< 2015-05-15 09:04:30.916 EDT >LOG:  worker process: bdr db: apimgtdb (PID 16338) exited with exit code 1


On May 15, 2015, at 2:30 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

SELECT * FROM bdr.bdr_connections