Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: I'd like to discuss scaleout at PGCon
Date
Msg-id CAHyXU0xzrVfSVJTrU6g6dBhV25oM7k1m+tT75TJOK_hb4Mu2sg@mail.gmail.com
Whole thread Raw
In response to Re: I'd like to discuss scaleout at PGCon  (Bruce Momjian <bruce@momjian.us>)
Responses Re: I'd like to discuss scaleout at PGCon
Re: I'd like to discuss scaleout at PGCon
List pgsql-hackers
On Fri, Jun 22, 2018 at 12:34 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Fri, Jun  1, 2018 at 11:29:43AM -0500, Merlin Moncure wrote:
> > FWIW, Distributed analytical queries is the right market to be in.
> > This is the field in which I work, and this is where the action is at.
> > I am very, very, sure about this.  My view is that many of the
> > existing solutions to this problem (in particular hadoop class
> > soltuions) have major architectural downsides that make them
> > inappropriate in use cases that postgres really shines at; direct
> > hookups to low latency applications for example.  postgres is
> > fundamentally a more capable 'node' with its multiple man-millennia of
> > engineering behind it.  Unlimited vertical scaling (RAC etc) is
> > interesting too, but this is not the way the market is moving as
> > hardware advancements have reduced or eliminated the need for that in
> > many spheres.
> >
> > The direction of the project is sound and we are on the cusp of the
> > point where multiple independent coalescing features (FDW, logical
> > replication, parallel query, executor enhancements) will open new
> > scaling avenues that will not require trading off the many other
> > benefits of SQL that competing contemporary solutions might.  The
> > broader development market is starting to realize this and that is a
> > major driver of the recent upswing in popularity.  This is benefiting
> > me tremendously personally due to having gone 'all-in' with postgres
> > almost 20 years ago :-D. (Time sure flies)    These are truly
> > wonderful times for the community.
>
> While I am glad people know a lot about how other projects handle
> sharding, these can be only guides to how Postgres will handle such
> workloads.  I think we need to get to a point where we have all of the
> minimal sharding-specific code features done, at least as
> proof-of-concept, and then test Postgres with various workloads like
> OLTP/OLAP and read-write/read-only.  This will tell us where
> sharding-specific code will have the greatest impact.
>
> What we don't want to do is to add a bunch of sharding-specific code
> without knowing which workloads it benefits, and how many of our users
> will actually use sharding.  Some projects have it done that, and it
> didn't end well since they then had a lot of product complexity with
> little user value.

Key features from my perspective:
*) fdw in parallel.  how do i do it today? ghetto implemented parallel
queries with asynchronous dblink

*) column store

*) automatic partition management through shards

probably some more, gotta run :-)

merlin


pgsql-hackers by date:

Previous
From: Robbie Harwood
Date:
Subject: Re: libpq compression
Next
From: "Joshua D. Drake"
Date:
Subject: Re: I'd like to discuss scaleout at PGCon