Re: pass-through queries to foreign servers - Mailing list pgsql-hackers

From David Gudeman
Subject Re: pass-through queries to foreign servers
Date
Msg-id CAE4Ysygubt3t-dLYU=qX5tcY-YRXZFJmMc1YYprinSHuR7Fp2Q@mail.gmail.com
Whole thread Raw
In response to Re: pass-through queries to foreign servers  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Aug 5, 2013 at 12:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Gudeman <dave.gudeman@gmail.com> writes:
>> For those who don't want to go to the link to see what I'm talking
>> about with query rewrites, I thought I'd give a brief description.
>> Foreign data wrappers currently do all of their work in the planning
>> phase but I claim that isn't the right place to optimize foreign
>> queries with aggregates and GROUP BY because optimizing those things
>> would involve collapsing multiple plan node back into a single node
>> for a foreign call.
>
> I'm not sure what the best implementation for that is, but what you
> propose here would still involve such collapsing, so this argument
> seems rather empty.
>
>> I propose to do these optimizations as query
>> rewrites instead. So for example suppose t is a foreign table on the
>> foreign server named fs. Then the query
>
>>   SELECT count(*) FROM t
>
>> is rewritten to
>
>>   SELECT count FROM fs('select count(*) from t') fs(count bigint)
>
>> where ts() is the pass-through query function for the server fs. To
>> implement this optimization as a query rewrite, all of the elements of
>> the result have to be real source-language constructs so the
>> pass-through query has to be available in Postgresql SQL.
>
> I don't believe in any part of that design, starting with the "pass
> through query function".  For one thing, it seems narrowly targeted to the
> assumption that the FDW is a frontend for a foreign server that speaks
> SQL.  If the FDW's infrastructure doesn't include some kind of textual
> query language, this isn't going to be useful for it at all.  For another,
> a query rewrite system is unlikely to be able to cost out the alternatives
> and decide whether pushing the aggregation across is actually a win or
> not.
>
> The direction I think we ought to be heading is to generate explicit Paths
> representing the various ways in which aggregation can be implemented.
> The logic in grouping_planner is already overly complex, and hard to
> extend, because it's all hard-wired comparisons of alternatives.  We'd be
> better off with something more like the add_path infrastructure.  Once
> that's been done, maybe we can allow FDWs to add Paths representing remote
> aggregation.
>
>                         regards, tom lane

Well, I will have to defer to your greater knowledge of the Posgres
planner. I'm surprised that it can do that since in my experience, the
planner did a cost-based search to find the best join order but the
rest of the plan structure was generated independently of the tables.
If the Postgres planner can look at the FROM clause and generate a
completely different structure based on the types of tables it finds
there, then that doesn't sound unreasonable.

However it still seems a bit like overkill. The purpose of a
cost-based heuristic search is to deal with the fact that there are
lots and lots of possible join orders (as the number of tables grows)
and it makes a huge difference which order you chose. By contrast,
this decision is fairly easy. Either you send the computation to the
data or you bring the data to the computation. Since in the case of
SQL aggregation the computation generally reduces the amount of data
that has to be  moved, the first choice is almost always the right
one.

Also, this means that you won't be able to use the code that I've
written since it's all based on query rewriting. I'm disappointed
because I wanted to contribute something back, but that's for you guys
to decide.

Regards,
David Gudeman
http://unobtainabol.blogspot.com



pgsql-hackers by date:

Previous
From: Andrew Tipton
Date:
Subject: Re: BackgroundWorkerInitializeConnection(NULL, ...) doesn't work
Next
From: Noah Misch
Date:
Subject: Re: killing pg_dump leaves backend process