Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Horizontal scalability/sharding
Date
Msg-id 55E4ECE3.7070409@agliodbs.com
Whole thread Raw
In response to Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 08/31/2015 02:47 PM, Robert Haas wrote:
> On Mon, Aug 31, 2015 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> First, let me put out there that I think the horizontal scaling project
>> which has buy-in from the community and we're working on is infinitely
>> better than the one we're not working on or is an underresourced fork.
>> So we're in agreement on that.  However, I think there's a lot of room
>> for discussion; I feel like the FDW approach was decided in exclusive
>> meetings involving a very small number of people.  The FDW approach
>> *may* be the right approach, but I'd like to see some rigorous
>> questioning of that before it's final.
> 
> It seems to me that sharding consists of (1) breaking your data set up
> into shards, (2) possibly replicating some of those shards onto
> multiple machines, and then (3) being able to access the remote data
> from local queries.   As far as (1) is concerned, we need declarative
> partitioning, which is being worked on by Amit Langote.  As far as (2)
> is concerned, I hope and expect BDR, or technology derived therefrom,
> to eventually fill that need.  

Well, maybe.  If you look at pg_shard, you'll see that it works by
multiplexing writes to all copies.  There's a good reason to do that; it
allows you to have a tight feedback loop between the success of writes
and the availability of "good" nodes.  If you're depending on a separate
replication system to handle getting row copies from one shard to
another, then you need a different way to deal with bad nodes and with
inconsistency between copies of shards.  That's why the existing
multinode non-relational databases don't separate replication from
writes, either.

For that matter, if what you want is transactional fully ACID sharding,
I really don't see a way to do it via BDR, since BDR is purely
asynchronous replication, as far as I know.

Also, if we want BDR to do this, that's pretty far afield of what BDR is
currently capable of, so someone will need to put serious work into it
rather than just assuming functionality will show up.

> As far as (3) is concerned, why
> wouldn't we use the foreign data wrapper interface, and specifically
> postgres_fdw?  That interface was designed for the explicit purpose of
> allowing access to remote data sources, and a lot of work has been put
> into it, so it would be highly surprising if we decided to throw that
> away and develop something completely new from the ground up.

Well, query hooks are also a capability which we already have, and is
mature.  Citus has already posted about why they chose to use them instead.

As long as you recognize that the FDW API (not just the existing fdws)
will need to expand to make this work, it's a viable path.

Also consider that (3) includes both reads and writes.

> I think it's abundantly clear that we need a logical replication
> solution as part of any horizontal scalability story.  People will
> want to do things like have 10 machines with each piece of data on 3
> of them, and there won't be any reasonable way of doing that without
> logical replication.  I assume that BDR, or some technology derived
> from it, will end up in core and solve that problem.  I had actually
> hoped we were going to get that in 9.5, but it didn't happen that way.
> Still, I think that getting first single-master, and then eventually
> multi-master, logical replication in core is absolutely critical.  And
> not just for sharding specifically: replicating your whole database to
> several nodes and load-balancing your clients across them isn't
> sharding, but it does give you read scalability and is a good fit for
> people with geographically dispersed data with good geographical
> locality.  I think a lot of people will want that.

Well, the latter thing is something which BDR is designed for, so all
that needs to happen with that is getting the rest of the plumbing into
core.  Also documentation, packaging, productization, etc.  But the
heavy lifting has already been done.

However, integrating BDR with sharding has some major design issues
which aren't trivial and may be unresolvable, per above.

> I'm not quite sure yet how we can marry declarative partitioning and
> better FDW-pushdown and logical replication into one seamless, easy to
> deploy solution that requires very low administrator effort.  But I am
> sure that each of those things, taken individually, is very useful,
> and that being able to construct a solution from those building blocks
> would be a big improvement over what we have today.  I can't imagine
> that trying to do one monolithic project that provides all of those
> things, but only if you combine them in the specific way that the
> designer had in mind, is ever going to be successful.  People _will_
> want access to each of those features in an unbundled fashion.  And,
> trying to do them altogether leads to trying to solve too many
> problems at once.  I think the history of Postgres-XC is a cautionary
> tale there.

Yes.  It's also a cautionary tale about not skipping over major design
elements (like HA and DR) until after version 1.0, which is one of the
reasons I'm harping on certain things here.  I don't want us to repeat
those mistakes.

> I don't really understand how pg_shard fits into this equation.  It
> looks to me like it does some interesting things but, for example, it
> doesn't support JOIN pushdown, and suggests that you use the
> proprietary CitusDB engine if you need that.  But I think JOIN
> pushdown is something we want to have in core, not something where we
> want to point people to proprietary alternatives.  And it has some
> restrictions on INSERT statements - they have to contain only values
> which are constants or which can be folded to constants.  I'm just
> guessing, but I bet that's probably due to some limitation which
> pg_shard, being out of core, has difficulty overcoming, but we can do
> better in core.  Basically I guess I expect much of what pg_shard does
> to be subsumed as we improve FDWs, but maybe not all of it.

pg_shard provides an alternate implementation based on planner hooks
instead of FDWs.  Even if you pursue an FDW-based design, you should
look at (a) why the Citus team found FDWs to be unworkable and (b) what
got implemented in planner hooks.  Otherwise you're liable to repeat the
exact same "learning experience".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: pg_upgrade + Extensions
Next
From: Marc Munro
Date:
Subject: Re: Horizontal scalability/sharding