Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Horizontal scalability/sharding
Date
Msg-id 20150901105527.GA22330@momjian.us
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Horizontal scalability/sharding  (Mason S <masonlists@gmail.com>)
Re: Horizontal scalability/sharding  (Robert Haas <robertmhaas@gmail.com>)
Re: Horizontal scalability/sharding  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Aug 31, 2015 at 11:23:58PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > My hope is that many FDW improvements will benefit sharding and
> > non-sharding workloads, but I bet some improvements are going to be
> > sharding-specific.  I would say we are still in the exploratory stage,
> > but based on the number of people who care about this feature and want
> > to be involved, I think we are off to a very good start.  :-)
> 
> Having lots of interested people doesn't help with some problems,
> though.  The Citus document says:
> 
>     And the issue with these four limitations wasn't with foreign
>     data wrappers. We wrote mongo_fdw and cstore_fdw, and we're
>     quite happy with the contract FDWs provide. The problem was that
>     we were trying to retrofit an API for something that it was
>     fundamentally not designed to do.

I had a chance to review the Citus Data document just now:
https://goo.gl/vJWF85

Particularly, it links to this document, which is clearer about the
issues they are trying to solve:
https://www.citusdata.com/blog/114-how-to-build-your-distributed-database

The document opens a big question --- when queries can't be processed in
a traditional top/down fashion, Citus has the goal of sending groups of
results up the the coordinator, reordering them, then sending them back
to the shards for further processing, basically using the shards as
compute engines because the shards are no longer using local data to do
their computations.  The two examples they give are COUNT(DISTINCT) and
a join across two sharded tables ("CANADA").

I assumed these queries were going to be solved by sending as digested
data as possible to the coordinator, and having the coordinator complete
any remaining processing.  I think we are going to need to decide if
such "sending data back to shards" is something we are ever going to
implement.  I can see FDWs _not_ working well for that use-case.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



pgsql-hackers by date:

Previous
From: Mason S
Date:
Subject: Re: Horizontal scalability/sharding
Next
From: Mason S
Date:
Subject: Re: Horizontal scalability/sharding