Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Horizontal scalability/sharding
Date
Msg-id CABOikdO=faPUP67-cTmDXdVwPFp-vvtiXBOPL=M1haaDrbySSg@mail.gmail.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Horizontal scalability/sharding  (Bruce Momjian <bruce@momjian.us>)
Re: Horizontal scalability/sharding  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Tue, Sep 1, 2015 at 3:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:


It seems to me that sharding consists of (1) breaking your data set up
into shards, (2) possibly replicating some of those shards onto
multiple machines, and then (3) being able to access the remote data
from local queries.   As far as (1) is concerned, we need declarative
partitioning, which is being worked on by Amit Langote.  As far as (2)
is concerned, I hope and expect BDR, or technology derived therefrom,
to eventually fill that need.  As far as (3) is concerned, why
wouldn't we use the foreign data wrapper interface, and specifically
postgres_fdw?  That interface was designed for the explicit purpose of
allowing access to remote data sources, and a lot of work has been put
into it, so it would be highly surprising if we decided to throw that
away and develop something completely new from the ground up.

It's true that postgres_fdw doesn't do everything we need yet.  The
new join pushdown hooks aren't used by postgres_fdw yet, and the API
itself has some bugs with EvalPlanQual handling.  Aggregate pushdown
is waiting on upper planner path-ification.   DML pushdown doesn't
exist yet, and the hooks that would enable pushdown of ORDER BY
clauses to the remote side aren't being used by postgres_fdw.  But all
of these things have been worked on.  Patches for many of them have
already been posted.  They have suffered from a certain amount of
neglect by senior hackers, and perhaps also from a shortage of time on
the part of the authors.  But an awful lot of the work that is needed
here has already been done, if only we could get it committed.
Aggregate pushdown is a notable exception, but abandoning the foreign
data wrapper approach in favor of something else won't fix that.

Postgres-XC developed a purpose-built system for talking to other
nodes instead of using the FDW interface, for the very good reason
that the FDW interface did not yet exist at the time that Postgres-XC
was created.  But several people associated with the XC project have
said, including one on this thread, that if it had existed, they
probably would have used it.  And it's hard to see why you wouldn't:
with XC's approach, the remote data source is presumed to be
PostgreSQL (or Postgres-XC/XL/X2/whatever); and you can only use the
facility as part of a sharding solution.  The FDW interface can talk
to anything, and it can be used for stuff other than sharding, like
making one remote table appear local because you just happen to want
that for some reason.  This makes the XC approach look rather brittle
by comparison.  I don't blame the XC folks for taking the shortest
path between two points, but FDWs are better, and we ought to try to
leverage that.


In my discussions on this topic with various folks including Robert, I've conceded that if FDW was available when XC was first written, in all likelihood we would have used and extended that interface. But that wasn't the case and we did what we thought was the best solution at that time, given the resources and the schedule. To be honest, when XC project was started, I was quite skeptical about the whole thing given the goal was to built something which can replace Oracle RAC with may be less than 1% resources of what Oracle must have invested in building RAC. The lack of resources at the start of the project keeps showing up in the quality issues that users report from time to time. Having said that, I am quite satisfied with what we have been able to build within the constraints. 

But FDW is just one part of the story. There is this entire global consistency problem that would require something like GTM to give out XIDs and snapshots, atomicity which would require managing transactions across multiple shards, join pushdowns when all data is not available locally, something that XL is attempting to solve with datanode-datanode exchange of information, other global states such as sequences, replicating some part of the data to multiple shards for efficient operations, ability to add/remove shards with least disruption, globally consistent backups/restore. XC/XL has attempted to solve each of them to some extent. I don't claim that they are completely solved and there are no corner cases left, but we have made fairly good progress on each of them.
 
My worry is that if we start implementing them again from scratch, it will take a few years before we get them in a usable state. What XC/XL lacked is probably a Robert Haas or a Tom Lane who could look at the work and suggest major edits. If that had happened, the quality of the product could have been much better today. I don't mean to derate the developers who worked on XC/XL, but there is no harm in accepting that if someone with a much better understanding of the whole system was part of the team, that would have positively impacted the project. Is that an angle worth exploring? Does it make sense to commit some more resources to say XC or XL and try to improve the quality of the product even further? To be honest, XL is in far far better shape (haven't really tried XC in a while) and some more QA/polishing can make it production ready much sooner.

Yet another possibility is rework the design such that only coordinator needs to be a fork of PostgreSQL but the shards are all PostgreSQL instances, queried using standard client APIs. That would reduce the code that needs to go in the core to build the entire scalable system and also shorten the timeline considerably.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: perlcritic
Next
From: Peter Eisentraut
Date:
Subject: Unicode mapping scripts cleanup