Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Horizontal scalability/sharding
Date
Msg-id CA+TgmobY7bmpPSryoT2Emd58CVFAci5wDkE2expuv+DDxNS1rA@mail.gmail.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Wed, Sep 2, 2015 at 6:56 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Sep  2, 2015 at 02:41:46PM -0400, Robert Haas wrote:
>> 4. Therefore, I think that we should instead use logical replication,
>> which might be either synchronous or asynchronous.  When you modify
>> one copy of the data, that change will then be replicated to all other
>> nodes.  If you are OK with eventual consistency, this replication can
>> be asynchronous, and nodes that are off-line will catch up when they
>> are on-line.  If you are not OK with that, then you must replicate
>> synchronously to every node before transaction commit; or at least you
>> must replicate synchronously to every node that is currently on-line.
>> This presents some challenges: logical decoding currently can't
>> replicate transactions that are still in process - replication starts
>> when the transaction commits.  Also, we don't have any way for
>> synchronous replication to wait for multiple nodes.  But in theory
>> those seem like limitations that can be lifted.  Also, the GTM needs
>> to be aware that this stuff is happening, or it will DTWT.  That too
>> seems like a problem that can be solved.
>
> Can you explain why logical replication is better than binary
> replication for this use-case?

Uh, well, for the same reasons it is better in many other cases.
Particularly, you probably don't want to replicate all the data on
machine A to machine B, just some of it.

Typically, sharding solutions store multiple copies of each piece of
data.  So let's say you have 4 machines.  You divide the data into 12
chunks.  Each machine is the write-master for 2 of those chunks, but
has secondary copies of 3 others.  So maybe things start out like
this:

machine #1: master for chunks 1, 2, 3; also has copies of chunks 4, 7, 10
machine #2: master for chunks 4, 5, 6; also has copies of chunks 1, 8, 11
machine #3: master for chunks 7, 8, 9; also has copies of chunks 2, 5, 12
machine #4: master for chunks 10, 11, 12; also has copies of chunks 3, 6, 9

If machine #1 is run over by a rabid triceratops, you can make machine
#2 the master for chunk 1, machine #3 the master for chunk 2, and
machine #4 the master for chunk 3.  The write load therefore remains
evenly divided.  If you can only copy entire machines, you can't
achieve that in this situation.

I'm not saying that the above is exactly what we're going to end up
with, or even necessarily close.  But a big part of the point of
sharding is that not all the machines have the same data - otherwise
you are not write scaling.  But it will frequently be the case, for
various reasons, that they have *overlapping* sets of data.  Logical
replication can handle that; physical replication can't.

In Postgres-XC, all tables are either sharded (part of the table is
present on each node) or distributed (all of the table is present on
every node).  Clearly, there's no way to use physical replication in
that scenario except if you are OK with having two copies of every
node.  But that's not a very good solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: src/test/ssl broken on HEAD
Next
From: Stephen Frost
Date:
Subject: Re: Allow replication roles to use file access functions