bdr_init_copy fails when starting 2nd BDR node - Mailing list pgsql-general

From John Casey
Subject bdr_init_copy fails when starting 2nd BDR node
Date
Msg-id 007701d023ec$38ebdce0$aac396a0$@icloud.com
Whole thread Raw
Responses Re: bdr_init_copy fails when starting 2nd BDR node  (Andres Freund <andres@2ndquadrant.com>)
Re: bdr_init_copy fails when starting 2nd BDR node  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-general

I’ve been having issues while attempting to begin BDR replication. If I set up the main node, then use bdr_init_copy, it always fails on second node, as shown below.

 

postgres$ rm -Rf $PGDATA

postgres$ echo db_password | pg_basebackup -X stream -h main_node_ip -p 5432 -U username -D $PGDATA

postgres$ cp $HOME/backup/postgresql.conf $PGDATA

postgres$ bdr_init_copy -U username -D $PGDATA

bdr_init_copy: starting...

Assigning new system identifier: 6098464173726284030...

Creating primary replication slots...

Creating restore point...

Could not connect to the remote server: could not connect to server: No such file or directory

        Is the server running locally and accepting

        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

 

If I start both servers simply by using pg_ctl using conf set up for replication, I get the following error on the main node:.

 

LOG:  starting background worker process "bdr (6098483684958107256,1,16384,): dr: apply"

CONTEXT:  slot "bdr_16384_6098483684958107256_1_16384__", output plugin "bdr", in the startup callback

ERROR:  data stream ended

LOG:  worker process: bdr (6098483684958107256,1,16384,): dr: apply (PID 6294) exited with exit code 1

 

… and, I get the following error on the second node:

 

ERROR:  bdr output plugin: slot creation rejected, bdr.bdr_nodes entry for local node (sysid=6098483778037269710, timelineid=1, dboid=16384): status='i', bdr still starting up: applying initial dump of remote node

HINT:  Monitor pg_stat_activity and the logs, wait until the node has caught up

CONTEXT:  slot "bdr_16384_6098483684958107256_1_16384__", output plugin "bdr", in the startup callback

LOG:  could not receive data from client: Connection reset by peer

 

It will keep cycling these errors indefinitely.

 

I have gotten this working off and on; but, I keep running into this issue. I am on CentOS 6.5.  Both servers can execute psql against the databases on other nodes when not configured for replication, so it is not a connectivity or firewall issue. I have installed using the beta2 rpm as well as built it from source for rc1 (bdr stable).

 

Any ideas?

 

pgsql-general by date:

Previous
From: David Johnston
Date:
Subject: Re: [HACKERS] ON_ERROR_ROLLBACK
Next
From: Merlin Moncure
Date:
Subject: Re: extra function calls from query returning composite type