Re: Agenda for the Vienna cluster meeting - Mailing list pgsql-cluster-hackers

From Bruce Momjian
Subject Re: Agenda for the Vienna cluster meeting
Date
Msg-id 20151013170657.GA7842@momjian.us
Whole thread Raw
In response to Re: Agenda for the Vienna cluster meeting  (Oleg Bartunov <obartunov@gmail.com>)
List pgsql-cluster-hackers
On Sat, Oct 10, 2015 at 11:32:14PM +0300, Oleg Bartunov wrote:
> What is the goal of this summit,  expected result ?
>
> We have XC/XL/X2, Citus DB, EDB groups and I'd certainly interest to know state
> of art of their sharding design and implementation. We'll present our proposal
> of DTM and its implementation with examples of integrations with FDW, pg_shard 
> and probably XL.  Our goal is to discuss API with all groups and eventually
> convince community to accept it for 9.6. That would make development of
> different approaches more easy.

The goal of the meeting is to discuss the possibility of adding built-in
sharding to Postgres.  I think this is similar to how we added built-in
replication --- we first implemented external replication solutions, but
once PITR was sufficiently developed, we enhanced it to implement
streaming replication.  It took us a few years to get PITR fully
developed, and then a few years to get streaming replication fully
developed --- built-in sharding will probably follow a similar path.

I think with FDWs and parallelism, we are nearing a point where built-in
sharding is a viable approach.  It is only viable if the backend changes
and additions are minimal.  (There is little community desire to add
tons of new code just to implement sharding.)  With FDWs and
parallelism, we can get sharding by enhancing them.  We already know we
need a better user partitioning API, so that could benefit sharding too.
A distributed transaction manager is another missing piece, but that
could benefit FDWs too, and other sharding implementations, as you
mentioned.

Streaming replication didn't make external replication solutions
disappear, but for the majority of users built-in replication was the
best approach.  I think the same will happen with sharding.  I don't
think maintaining a sharding patch set on top of Postgres is a viable
long-term approach, though it has short-term advantages.

Please don't label this as an EDB approach --- I think that is just
divisive.  Yes, EDB has customers who want this, and EDB and NTT are
funding some of the development, but many large Postgres users need this
too, and many Postgres service providers have customers who need this.

If you want to label it, call it my approach.  I was crazy enough to
lead the Windows port, and crazy enough to get pg_upgrade to production
quality --- hopefully I am crazy enough to get this done too.  ;-)  This
approach is different from all the ones you listed above because it has
a realistic chance of enabling _built-in_ sharding, and I think built-in
sharding is the only long-term viable mass-adoption solution, just like
streaming replication was for replication.

I have created a wiki agenda that we can all adjust before the meeting:

    https://wiki.postgresql.org/wiki/PG-EU_2015_Cluster_Summit

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +


pgsql-cluster-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Agenda for the Vienna cluster meeting
Next
From: Bruce Momjian
Date:
Subject: Summary of Vienna sharding summit, new TODO item