Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Horizontal scalability/sharding
Date
Msg-id 55E5FA4C.2050207@agliodbs.com
Whole thread Raw
In response to Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Horizontal scalability/sharding  (Robert Haas <robertmhaas@gmail.com>)
Re: Horizontal scalability/sharding  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Horizontal scalability/sharding  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On 09/01/2015 11:36 AM, Tomas Vondra wrote:
>> We want multiple copies of shards created by the sharding system itself.
>>   Having a separate, and completely orthagonal, redundancy system to the
>> sharding system is overly burdensome on the DBA and makes low-data-loss
>> HA impossible.
> 
> IMHO it'd be quite unfortunate if the design would make it impossible to
> combine those two features (e.g. creating standbys for shards and
> failing over to them).
> 
> It's true that solving HA at the sharding level (by keeping multiple
> copies of a each shard) may be simpler than combining sharding and
> standbys, but I don't see why it makes low-data-loss HA impossible.

Other way around, that is, having replication standbys as the only
method of redundancy requires either high data loss or high latency for
all writes.

In the case of async rep, every time we fail over a node, the entire
cluser would need to roll back to the last common known-good replay
point, hence high data loss.

In the case of sync rep, we are required to wait for at least double
network lag time in order to do a single write ... making
write-scalability quite difficult.

Futher, if using replication the sharding system would have no way to
(a) find out immediately if a copy was bad and (b) fail over quickly to
a copy of the shard if the first requested copy was not responding.
With async replication, we also can't use multiple copies of the same
shard as a way to balance read workloads.

If we write to multiple copies as a part of the sharding feature, then
that can be parallelized, so that we are waiting only as long as the
slowest write (or in failure cases, as long as the shard timeout).
Further, we can check for shard-copy health and update shard
availability data with each user request, so that the ability to see
stale/bad data is minimized.

There are obvious problems with multiplexing writes, which you can
figure out if you knock pg_shard around a bit.  But I really think that
solving those problems is the only way to go.

Mind you, I see a strong place for binary replication and BDR for
multi-region redundancy; you really don't want that to be part of the
sharding system if you're aiming for write scalability.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Horizontal scalability/sharding
Next
From: Robert Haas
Date:
Subject: Re: 9.4 broken on alpha