Thread: BDR: cascading setup

BDR: cascading setup

From
Oleksii Kliukin
Date:
Hi,

We are evaluating BDR for a multi-master cross-datacenter replication, with 2 masters actually communicating across datacenter, supplemented by a local in-datacenter replicas to provide HA.

Basically, something like 

M <——> M
 |                |
S               S

I could run all nodes as multi-masters, but I don’t want many-to-many cross-datacenter communications, primary because of the latency issues, and also to avoid locking too many nodes on DDL changes.

As far as I see, I cannot make a promoted physical replica a member of the multi-master group without re-joining the group after promotion (which leads to re-transfering the whole database over the network from the surviving master), see https://github.com/2ndQuadrant/bdr/issues/98, so running multi-master with physical datacenter-local replicas is tough, but is there a better alternative at the moment, i.e:

- BDR multi-master being part of more than one replication group (I could create one group for cross-DS multi-masters, and another for DS-local communications)?
- BDR multi-master being also master with several UDR replicas attached (so that DS-local nodes will be running as UDR replicas of a master, that at the same time communicates via BDR to another master in another DS), and allowing the UDR replica to join the BDR group if the master dies. 

Kind regards,
--
Oleksii

Re: BDR: cascading setup

From
Craig Ringer
Date:
On 11 January 2016 at 18:55, Oleksii Kliukin <alexk@hintbits.com> wrote:
 
We are evaluating BDR for a multi-master cross-datacenter replication, with 2 masters actually communicating across datacenter, supplemented by a local in-datacenter replicas to provide HA.

This is strongly desirable ... but not currently supported.

 
As far as I see, I cannot make a promoted physical replica a member of the multi-master group without re-joining the group after promotion (which leads to re-transfering the whole database over the network from the surviving master), see https://github.com/2ndQuadrant/bdr/issues/98

Correct. 

The fundamental issue is that logical replication slots are not copied by pg_basebackup or replicated via physical WAL-based replication. Without this a promoted replica has no knowledge of the replay position of its peers, nor does it know how much extra WAL to retain to allow them to catch up on replication.

This will hopefully be fixed in 9.6 as there are patches in the queue to address these issues.

There are 
 
- BDR multi-master being part of more than one replication group (I could create one group for cross-DS multi-masters, and another for DS-local communications)?

It won't help. BDR always replicates in a mesh and doesn't do cascading. Replication sets won't change that.
 
- BDR multi-master being also master with several UDR replicas attached (so that DS-local nodes will be running as UDR replicas of a master, that at the same time communicates via BDR to another master in another DS), and allowing the UDR replica to join the BDR group if the master dies. 

Again this won't work. You can't promote a UDR replica to a full BDR peer. For now you're stuck with the extra replication traffic of the two local nodes speaking directly to their remote peers.

What you need is non-mesh topologies and support for selective changeset forwarding, and/or support for promotion of physical replicas to replace failed nodes.

Both of those are in the pipeline.

pglogical has a more flexible model of replication topology and forwarding, and we plan to rewrap BDR around pglogical for 9.6. This should (time permitting) allow for non-mesh topologies and cascading. To make it work well we'll need logical decoding support for logical slots, though, and that may not make it into 9.6. There's no practical way to add this to 9.4bdr since it relies heavily on (sysid,timeline,dboid) tuples to identify nodes, so it'll be 9.6-only.

If the pg_basebackup and replication patches to replicate slots are accepted into 9.6 then we'll be able to have physical standbys of pglogical/bdr nodes. It may be possible to backport this to 9.4bdr but I'm not aware of any plans to do so and available time/resources are mainly focused on driving 9.6/pglogical forward. Get in touch if you think this is something you could use more urgently.

I realise this isn't quite the answer you hoped for, but at least there's improvement on the horizon.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services