Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Horizontal scalability/sharding
Date
Msg-id 55E757DB.1030506@2ndquadrant.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On 09/02/2015 08:27 PM, Robert Haas wrote:
> On Wed, Sep 2, 2015 at 1:59 PM, Merlin Moncure <mmoncure@gmail.com>
> wrote:
>>
>> This strikes me as a bit of a conflict of interest with FDW which
>> seems to want to hide the fact that it's foreign; the FDW
>> implementation makes it's own optimization decisions which might
>> make sense for single table queries but breaks down in the face of
>> joins.

+1 to these concerns

> Well, I don't think that ALL of the logic should go into the FDW.

Then maybe we shouldn't call this "FDW-based sharding" (or "FDW 
approach" or whatever was used in this thread so far) because that kinda 
implies that the proposal is to build on FDW.

In my mind, FDW is a wonderful tool to integrate PostgreSQL with 
external data sources, and it's nicely shaped for this purpose, which 
implies the abstractions and assumptions in the code.

The truth however is that many current uses of the FDW API are actually 
using it for different purposes because there's no other way to do that, 
not because FDWs are the "right way". And this includes the attempts to 
build sharding on FDW, I think.

Situations like this result in "improvements" of the API that seem to 
improve the API for the second group, but make the life harder for the 
original FDW API audience by making the API needlessly complex. And I 
say "seem to improve" because the second group eventually runs into the 
fundamental abstractions and assumptions the API is based on anyway.

And based on the discussions at pgcon, I think this is the main reason 
why people cringe when they hear "FDW" and "sharding" in the same sentence.

I'm not opposed to reusing the FDW infrastructure, of course.
> In particular, in this example, parallel aggregate needs the same> query rewrite, so the logic for that should live
incore so that> both parallel and distributed queries get the benefit.
 

I'm not sure the parallel query is a great example here - maybe I'm 
wrong but I think it's a fairly isolated piece of code, and we have 
pretty clear idea of the two use cases.

I'm sure it's non-trivial to design it well for both cases, but I think 
the questions for FWD/sharding will be much more about abstract concepts 
than particular technical solutions.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Fwd: Core dump with nested CREATE TEMP TABLE
Next
From: dinesh kumar
Date:
Subject: Re: Proposing COPY .. WITH PERMISSIVE