On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
> So there's two parts to this:
>
> 1. I need to ensure that data is replicated to X places.
>
> 2. I need to *know* which places data was synchronously replicated to
> when the master goes down.
>
> My entire point is that (1) alone is useless unless you also have (2).
I think there's a good set of usecases where that's really not the case.
> And do note that I'm talking about information on the replica, not on
> the master, since in any failure situation we don't have the old
> master around to check.
How would you, even theoretically, synchronize that knowledge to all the
replicas? Even when they're temporarily disconnected?
> Say you take this case:
>
> "2" : { "local_replica", "london_server", "nyc_server" }
>
> ... which should ensure that any data which is replicated is replicated
> to at least two places, so that even if you lose the entire local
> datacenter, you have the data on at least one remote data center.
> EXCEPT: say you lose both the local datacenter and communication with
> the london server at the same time (due to transatlantic cable issues, a
> huge DDOS, or whatever). You'd like to promote the NYC server to be the
> new master, but only if it was in sync at the time its communication
> with the original master was lost ... except that you have no way of
> knowing that.
Pick up the phone, compare the lsns, done.
> Given that, we haven't really reduced our data loss potential or
> improved availabilty from the current 1-redundant synch rep. We still
> need to wait to get the London server back to figure out if we want to
> promote or not.
>
> Now, this configuration would reduce the data loss window:
>
> "3" : { "local_replica", "london_server", "nyc_server" }
>
> As would this one:
>
> "2" : { "local_replica", "nyc_server" }
>
> ... because we would know definitively which servers were in sync. So
> maybe that's the use case we should be supporting?
If you want automated failover you need a leader election amongst the
surviving nodes. The replay position is all they need to elect the node
that's furthest ahead, and that information exists today.
Greetings,
Andres Freund