Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Horizontal scalability/sharding
Date
Msg-id 55E5F013.1020803@2ndquadrant.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Hi,

On 08/31/2015 10:16 PM, Josh Berkus wrote:
> It's also important to recognize that there are three major use-cases
> for write-scalable clustering:
>
> * OLTP: small-medium cluster, absolute ACID consistency,
>    bottlnecked on small writes per second
> * DW: small-large cluster, ACID optional,
>    bottlenecked on bulk reads/writes
> * Web: medium to very large cluster, ACID optional,
>    bottlenecked on # of connections
>
> We cannot possibly solve all of the above at once, but to the extent
> that we recognize all 3 use cases, we can build core features which
> can be adapted to all of them.

It would be good to have a discussion about use-cases first - each of us 
is mostly concerned about the use cases they're dealing with, with 
bottlenecks specific to their environment. These three basic use-cases 
seem like a good start, but some of the details certainly don't match my 
experience ...

For example I can't see how ACID can be optional for the DWH use-case, 
but maybe there's a good explanation - I can imagine sacrificing various 
ACID properties at the node level, but I can't really imagine 
sacrificing any of the ACID properties for the cluster as a whole. So 
this would deserve some explanation.

I also don't share the view that write scalability is the only (or even 
main) issue, that we should aim to solve. For the business-intelligence 
use cases I've been working on recently, handling complex read-only 
ad-hoc queries is often much more important. And in those cases the 
bottleneck is often CPU and/or RAM.

>
> I'm also going to pontificate that, for a future solution, we should
> not focus on write *IO*, but rather on CPU and RAM. The reason for
> this thinking is that, with the latest improvements in hardware and
> 9.5 improvements, it's increasingly rare for machines to be
> bottlenecked on writes to the transaction log (or the heap). This has
> some implications for system design. For example, solutions which
> require all connections to go through a single master node do not
> scale sufficiently to be worth bothering with.

+1

> On some other questions from Mason:
>
>> Do we want multiple copies of shards, like the pg_shard approach? Or
>> keep things simpler and leave it up to the DBA to add standbys?
>
> We want multiple copies of shards created by the sharding system itself.
>   Having a separate, and completely orthagonal, redundancy system to the
> sharding system is overly burdensome on the DBA and makes low-data-loss
> HA impossible.

IMHO it'd be quite unfortunate if the design would make it impossible to 
combine those two features (e.g. creating standbys for shards and 
failing over to them).

It's true that solving HA at the sharding level (by keeping multiple 
copies of a each shard) may be simpler than combining sharding and 
standbys, but I don't see why it makes low-data-loss HA impossible.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [PROPOSAL] Effective storage of duplicates in B-tree index.
Next
From: Robert Haas
Date:
Subject: Re: 9.4 broken on alpha