Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Horizontal scalability/sharding
Date
Msg-id 55E76BDE.4060505@agliodbs.com
Whole thread Raw
In response to Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Horizontal scalability/sharding  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
On 09/02/2015 12:30 PM, Robert Haas wrote:
> On Wed, Sep 2, 2015 at 3:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> 4. Therefore, I think that we should instead use logical replication,
>>> which might be either synchronous or asynchronous.  When you modify
>>> one copy of the data, that change will then be replicated to all other
>>> nodes.  If you are OK with eventual consistency, this replication can
>>> be asynchronous, and nodes that are off-line will catch up when they
>>> are on-line.  If you are not OK with that, then you must replicate
>>> synchronously to every node before transaction commit; or at least you
>>> must replicate synchronously to every node that is currently on-line.
>>> This presents some challenges: logical decoding currently can't
>>> replicate transactions that are still in process - replication starts
>>> when the transaction commits.  Also, we don't have any way for
>>> synchronous replication to wait for multiple nodes.
>>
>> Well, there is a WIP patch for that, which IMHO would be much improved
>> by having a concrete use-case like this one.  What nobody is working on
>> -- and we've vetoed in the past -- is a way of automatically failing and
>> removing from replication any node which repeatedly fails to sync, which
>> would be a requirement for this model.
> 
> Yep.  It's clear to me we need that in general, not just for sharding.
> To me, the key is to make sure there's a way for the cluster-ware to
> know about the state transitions.  Currently, when the synchronous
> standby changes, PostgreSQL doesn't tell anyone.  That's a problem.

There are many parts of our replication which are still effectively
unmonitorable. For example, there's still no way to tell from the
replica that it's lost contact with the master except by tailing the
log.  If we try to build bigger systems on top of these components,
we'll find that we need to add a lot of instrumentation.

> 
>> You'd also need a way to let the connection nodes know when a replica
>> has fallen behind so that they can be taken out of
>> load-balancing/sharding for read queries.  For the synchronous model,
>> that would be "fallen behind at all"; for asynchronous it would be
>> "fallen more than ### behind".
> 
> How is that different from the previous thing?  Just that we'd treat
> "lagging" as "down" beyond some threshold?  That doesn't seem like a
> mandatory feature.

It's a mandatory feature if you want to load-balance reads.  We have to
know which nodes not to send reads to because they are out of sync.

>> Yeah?  I'd assume that a GTM would be antithetical to two-stage copying.
> 
> I don't think so.  If transaction A writes data on X which is
> replicated to Y and then commits, a new snapshot which shows A as
> committed can't be used on Y until A's changes have been replicated
> there.  That could be enforced by having the commit of A wait for
> replication, or by having an attempt by a later transaction to use the
> snapshot on Y wait until replication completes, or some even more
> sophisticated strategy that considers whether the replication backlog
> touches the same data that the new transaction will read.  It's
> complicated, but it doesn't seem intractable.

I need to see this on a chalkboard to understand it.

>>  I'm not a big fan of a GTM at all, frankly; it makes clusters much
>> harder to set up, and becomes a SPOF.
> 
> I partially agree.  I think it's very important that the GTM is an
> optional feature of whatever we end up with, rather than an
> indispensable component.  People who don't want it shouldn't have to
> pay the price in performance and administrative complexity.  But at
> the same time, I think a lot of people will want it, because without
> it, the fact that sharding is in use is much less transparent to the
> application.

If it can be optional, then we're pretty close to covering most use
cases with one general infrastructure.  That would be nice.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: exposing pg_controldata and pg_config as functions
Next
From: Josh Berkus
Date:
Subject: Re: exposing pg_controldata and pg_config as functions