Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Horizontal scalability/sharding
Date
Msg-id 20150830222943.GA32295@momjian.us
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Horizontal scalability/sharding  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Sun, Aug 30, 2015 at 03:31:10PM +0100, Simon Riggs wrote:
> On 30 August 2015 at 03:17, Bruce Momjian <bruce@momjian.us> wrote:
> 
>     I have recently increased my public statements about the idea of adding
>     horizontal scaling/sharding to Postgres.
>
> Glad to see it. Many people have been pushing such things for years, so it is
> good to finally see some debate about this on Hackers.

Agreed.  Right now, in our community, we are only seeing users who are
happy with what Postgres offers but think they might need massive
horizontal scalability in the future.  I think there is a larger group
that cares about massive horizontal scalability, but those people are
using other software right now, so we don't see them yet.

Without a roadmap for built-in massive horizontal scalability, I think
Postgres adoption will eventually suffer.

>     I wanted to share with hackers
>     a timeline of how we got here, and where I think we are going in the
>     short term:
> 
>     2012-2013:  As part of writing my scaling talk
>     (http://momjian.us/main/presentations/overview.html#scaling), studying
>     Oracle RAC, and talking to users, it became clear that an XC-like
>     architecture (sharding) was the only architecture that was going to allow
>     for write scaling.
> 
> 
> What other architectures were discussed? Where was that discussion?

That was mostly my conclusion.  I explained it to small groups at
conferences and Postgres user groups.  No one said I was wrong, but that
is about the level of debate I had.

>     2014:  I started to shop around the idea that we could use FDWs,
>     parallelism, and a transaction/snapshot manager to get XC features
>     as built-in to Postgres.  (I don't remember where the original idea
>     came from.)  It was clear that having separate forks of the source code
>     in XC and XL was never going to achieve critical mass --- there just
>     aren't enough people who need high right scale right now, and the fork
>     maintenance overhead is a huge burden.
> 
> 
> I personally support the view that we should put scalability features into
> Postgres core, rather than run separate forks.

Good, I do think it is time, but as I stated above, there is limited
interest in our current community, so the tolerance for additional
community code to accomplish this is also limited.  This is the big
thing that had me excited about using FDWs --- FDW improvements can get
us closer to sharding without requiring community acceptance of
sharding-only features.

>     I realized that we would never get community acceptance to dump the XC
>     (or XL) code needed for sharding into community Postgres
> 
> 
> How or why did you realize that? There has never been any such discussion,
> AFAIK. Surely it can be possible to move required subsystems across?

Well, I have had many such discussions with XC/XL folks, and that was my
opinion.  I have seen almost no public discussion about this because the
idea had almost no chance of success.  If it was possible, someone would
have already suggested it on this list.

>     , but with FDWs,
>     we could add the features as _part_ of improving FDWs, which would benefit
>     FDWs _and_ would be useful for sharding.  (We already see some of those
>     FDW features in 9.5.)
> 
> 
> That is a huge presumption. Not discussed or technically analyzed in any way
> with the community.

True.  It seemed pretty obvious to me.

>     October, 2014:  EDB and NTT started working together in the community
>     to start improving FDWs as a basis for an FDW-based sharding solution.
>     Many of the 9.5 FDW improvements that also benefit sharding were developed
>     by a combined EDB/NTT team.  The features improved FDWs independent of
>     sharding, so they didn't need community buy-in on sharding to get them
>     accepted.
> 
>     June, 2015:  I attended the PGCon sharding unconference session and
>     there was a huge discussion about where we should go with sharding.
>     I think the big take-away was that most people liked the FDW approach,
>     but had business/customer reasons for wanting to work on XC or XL because
>     those would be production-ready faster.
> 
> 
> Cough, cough. You must surely be joking that "most people liked the FDW
> approach"? How did we measure the acceptance of this approach? 

Well, I didn't have my audience-meter with me at the time.  ;-)

The discussion was mostly in the hallway after the unconference session,
"Future of PostgreSQL shared-nothing cluster" by Konstantin Knizhnik,
Alexander Korotkov, and Oleg Bartunov.  Again, when I explained the
ability to use FDWs to get sharding into Postgres with minimal
additional code, no one said the idea was crazy, which I took as a big
thumbs-up!  When I asked why to continue with XC/XL, I was told those
were more mature and more customer-ready, which is true.  I will not
quote people from the from the hallway discussion for privacy reasons.

> What actually is the FDW approach? Since its not been written down anywhere, or
> even explained verbally, how can anyone actually agree to it?

Well, my sharding talk just has the outlines of an approach.  I think
there are five broad segments:

*  FDW push-down of joins, sorts, aggregates
*  ability to send FDW requests in parallel
*  transaction/snapshot manager to allow ACID transctions on shards
*  simpler user partitioning API
*  infrastructure to manage shards, including replicated tables used for joins

>     July, 2015:  Oleg Bartunov and his new company Postgres Professional (PP)
>     started to think about joining the FDW approach, rather than working on
>     XL, as they had stated at PGCon in June.  A joint NTT/EDB/PP phone-in
>     meeting is scheduled for September 1.
>
>     August, 2015:  While speaking at SFPUG, Citus Data approached me about
>     joining the FDW sharding team.  They have been invited to the September
>     1 meeting, as have the XC and XL people.
> 
> 
> 2ndQuadrant is working in this area, specifically bringing XL 9.5 forwards.

Yes, I saw the blog post about that:
http://blog.2ndquadrant.com/working-towards-postgres-xl-9-5/

> Please can invites be posted to myself, Pavan Deolasee and Petr Jelinek also?
> I'll pass on to others also.

OK, I will send you a separate email and you can then supply their email
addresses.

> Koichi Suzuki is arranging a meeting in Hong Long for XC/XL discussions.
> Presumably EDB is invited also? If Koichi is a leading organizer of this, why
> are there two meetings?

I certainly have heard nothing about it, except third-hand people
telling me a meeting is happening.  I assumed those meetings where
XC/XL-specific.

>     that the XC approach is the only reasonable way to do it,
>     and that FDWs are the cleanest way to get it into community
>     Postgres.
> 
> Those two things aren't at all obvious to me.
>
> Please don't presume my opposition. If the technical information were made
> public, I might understand and agree with "the FDW approach", perhaps others
> also.

Well, the beauty of my approach is that we didn't need any technical
direction or buy-in on sharding from the community to improve FDWs.  I
think now is the right time to try to get that buy-in, or adjust our
approach.

There isn't really much more to my _analysis_ than I presented.  There
is certainly a lot more work to do to even decide this is the right
approach.  Some of the groups already involved have more experience in
trying this, e.g. Citus Data.

> 2ndQuadrant is certainly happy to become involved in any team aiming to
> add features to Postgres core, as long as that makes sense. There may be areas
> we can all agree upon even if the full architecture remains in doubt.

Right.

> Before the community commits to a long term venture together we should see the
> plan. Like all IT projects, expensive failure is possible and the lack of a
> design is a huge flashing red warning light for me at present. If that requires
> a meeting of all Developers, why are the meetings for this specifically not
> happening at the agreed Developer meetings?

Well, what meetings should it be at?  I don't think there was clear
enough direction for the June 2015 PGCon meeting.  Is there an
unconference in Vienna?  One thing I saw at the last PGCon is that this
is a big topic, so I think having a dedicated room and 3-hour slot for
it is nice.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Potential GIN vacuum bug
Next
From: Tom Lane
Date:
Subject: icc vs. gcc-style asm blocks ... maybe the twain can meet?