Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Horizontal scalability/sharding
Date
Msg-id CA+TgmoYMFg5tA4Tr0giFWBYHswgTRrB40cK1Gva2CEFmCp6CTQ@mail.gmail.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
Re: Horizontal scalability/sharding  (Ozgun Erdogan <ozgun@citusdata.com>)
List pgsql-hackers
On Tue, Sep 1, 2015 at 6:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
> I assumed these queries were going to be solved by sending as digested
> data as possible to the coordinator, and having the coordinator complete
> any remaining processing.  I think we are going to need to decide if
> such "sending data back to shards" is something we are ever going to
> implement.  I can see FDWs _not_ working well for that use-case.

I do think we are going to want to support that.  All the people I've
talked to about parallel and distributed query processing agree that
you need to do that sort of thing to get really good and scalable
performance.  I think that we could make a lot of headway as compared
with the status quo just by implementing more pushdown optimizations
than we have today.  Right now, SELECT COUNT(*) FROM table will suck
back the whole remote table and count the rows locally, and that's
stupid.  We can fix that case with better pushdown logic.  We can also
fix the case of N-way join nests where the joins are either on the
partitioning key or to replicated tables.  But suppose you have a join
between two tables which are sharded across the cluster but not on the
partitioning key.  There's no way to push the join down, so all the
work comes back to the coordinator, which is possibly OK if such
queries are rare, but not so hot if they are frequent.

To leverage the full CPU power of the cluster in such a case, you need
to be able to shuffle data around between the nodes.  You pick one of
the two tables being joined, and based on the partitioning key of that
table, each node examines the other table and, for each row, sends it
to the machine where it MIGHT have one or more join partners.  Then
each node can join its shard of the first table against the rows from
the second table that were sent to it.

Now the question is, where should the code that does all of this live?postgres_fdw?  Some new, sharding-specific FDW?
Incore?  I don't
 
know for sure, but what I do know is that we could make a lot of
progress over where we are today by just improving postgres_fdw, and I
don't think those improvements are even all that difficult.  If we
decide we need to implement something new, it's going to be a huge
project that will take years to complete, with uncertain results.  I'd
rather have a postgres_fdw-based implementation that is imperfect and
can't handle some kinds of queries in 9.6 than a promise that by 9.9
we'll have something really great that handles MPP perfectly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Horizontal scalability/sharding
Next
From: Robert Haas
Date:
Subject: Re: Horizontal scalability/sharding