Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Horizontal scalability/sharding
Date
Msg-id 20150901221145.GB27332@momjian.us
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Tue, Sep  1, 2015 at 12:40:40PM -0400, Robert Haas wrote:
> On Tue, Sep 1, 2015 at 6:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > I assumed these queries were going to be solved by sending as digested
> > data as possible to the coordinator, and having the coordinator complete
> > any remaining processing.  I think we are going to need to decide if
> > such "sending data back to shards" is something we are ever going to
> > implement.  I can see FDWs _not_ working well for that use-case.
> 
> I do think we are going to want to support that.  All the people I've
> talked to about parallel and distributed query processing agree that
> you need to do that sort of thing to get really good and scalable
> performance.  I think that we could make a lot of headway as compared
> with the status quo just by implementing more pushdown optimizations
> than we have today.  Right now, SELECT COUNT(*) FROM table will suck
> back the whole remote table and count the rows locally, and that's
> stupid.  We can fix that case with better pushdown logic.  We can also
> fix the case of N-way join nests where the joins are either on the
> partitioning key or to replicated tables.  But suppose you have a join
> between two tables which are sharded across the cluster but not on the
> partitioning key.  There's no way to push the join down, so all the
> work comes back to the coordinator, which is possibly OK if such
> queries are rare, but not so hot if they are frequent.

Let me clearer about what the Citus Data paper shows.  I said originally
that the data was sent to the coordinator, sorted, then resent to the
shards, but the document:
https://goo.gl/vJWF85https://www.citusdata.com/blog/114-how-to-build-your-distributed-database

has the shards create the groups and the groups are sent to the other
shards.  For example, to do COUNT(DISTINCT) if you have three shards,
then each shard breaks its data into 3 buckets (1B in size), then the
first bucket from each of the three shards goes to the first shard, and
the second bucket goes to the second shared, etc.

Basically, they are doing map-reduce, and the shards are creating
additional batches that get shipped to other shards.  I can see FDWs not
working well in that case as you are really creating a new data layout
just for the query.  This explains why the XC/XL people are saying they
would use FDWs if they existed at the time they started development,
while the Citus Data people are saying they couldn't use FDWs as they
currently exist.  They probably both needed FDW improvements, but I
think the Citus Data features would need a lot more.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: WIP: About CMake v2
Next
From: Jim Nasby
Date:
Subject: Re: [PATCH] SQL function to report log message