Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Horizontal scalability/sharding
Date
Msg-id CA+TgmoaiOkwyck3r1hP=TV3s9KXgm=xeR_KWpy_kXScChKAtKg@mail.gmail.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Horizontal scalability/sharding  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Wed, Sep 2, 2015 at 1:59 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Sep 1, 2015 at 11:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It would be a bad idea to cling blindly to the FDW infrastructure if
>> it's fundamentally inadequate to do what we want.  On the other hand,
>> it would also be a bad idea to set about recreating it without a
>> really good reason, and - just to take one example - the fact that it
>> doesn't currently push down DML operations to the remote side is not a
>> really good reason to rewrite the whole thing.  On the contrary, it's
>> a reason to put some energy into the already-written patch which
>> implements that optimization.
>
> The problem with FDW for these purposes as I see it is that too much
> intelligence is relegated to the implementer of the API.  There needs
> to be a mechanism so that the planner can rewrite the remote query and
> then do some after the fact processing.  This exactly what citus does;
> if you send out AVG(foo) it rewrites that to SUM(foo) and COUNT(foo)
> so that aggregation can be properly weighted to the result.   To do
> this (distributed OLAP-type processing) right, the planner needs to
> *know* that this table is in fact distributed and also know that it
> can make SQL compatible adjustments to the query.
>
> This strikes me as a bit of a conflict of interest with FDW which
> seems to want to hide the fact that it's foreign; the FDW
> implementation makes it's own optimization decisions which might make
> sense for single table queries but breaks down in the face of joins.

Well, I don't think that ALL of the logic should go into the FDW.  In
particular, in this example, parallel aggregate needs the same query
rewrite, so the logic for that should live in core so that both
parallel and distributed queries get the benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench stats per script & other stuff
Next
From: Robert Haas
Date:
Subject: Re: pgbench stats per script & other stuff