On Thu, Sep 3, 2015 at 03:40:40PM +0200, Tomas Vondra wrote:
> Not really, the idea is that you don't need to create the replica
> immediately. The system recognizes that primary shard location is
> unavailable and redirects the tasks to the "replicas." So the time
> to recreate the failed node is not that critical.
>
> It needs to be done in a smart way to prevent some typical issues
> like suddenly doubling the load on replicas due to failure of the
> primary location. By using different group of nodes for each "data
> segment" you can eliminate this, because the group of nodes to
> handle the additional load will be larger.
>
> The other issue then of course is that the groups of nodes must not
> be entirely random, otherwise the cluster would suffer data loss in
> case of outage of arbitrary group of K nodes (where K is the number
> of replicas for each piece of data).
>
> It's also non-trivial to do this when you have to consider racks,
> data centers etc.
>
> With regular slaves you can't do any of this - no matter what you
> do, you have to load balance the additional load only on the slaves.
Yes, and imagine doing this with FDW's, updating the catalog table
location of the FDW as part of the failover process --- interesting.
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://enterprisedb.com
+ Everyone has their own god. +