Re: pass-through queries to foreign servers - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: pass-through queries to foreign servers
Date
Msg-id CAFjFpRei2A+bm0Wr5Kcg0L6EUwNjF+6dZVehNbABiShsOA3vUg@mail.gmail.com
Whole thread Raw
In response to Re: pass-through queries to foreign servers  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers



On Tue, Aug 6, 2013 at 12:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Gudeman <dave.gudeman@gmail.com> writes:
> For those who don't want to go to the link to see what I'm talking
> about with query rewrites, I thought I'd give a brief description.
> Foreign data wrappers currently do all of their work in the planning
> phase but I claim that isn't the right place to optimize foreign
> queries with aggregates and GROUP BY because optimizing those things
> would involve collapsing multiple plan node back into a single node
> for a foreign call.

I'm not sure what the best implementation for that is, but what you
propose here would still involve such collapsing, so this argument
seems rather empty.

> I propose to do these optimizations as query
> rewrites instead. So for example suppose t is a foreign table on the
> foreign server named fs. Then the query

>   SELECT count(*) FROM t

> is rewritten to

>   SELECT count FROM fs('select count(*) from t') fs(count bigint)

> where ts() is the pass-through query function for the server fs. To
> implement this optimization as a query rewrite, all of the elements of
> the result have to be real source-language constructs so the
> pass-through query has to be available in Postgresql SQL.

I don't believe in any part of that design, starting with the "pass
through query function".  For one thing, it seems narrowly targeted to the
assumption that the FDW is a frontend for a foreign server that speaks
SQL.  If the FDW's infrastructure doesn't include some kind of textual
query language, this isn't going to be useful for it at all.  For another,
a query rewrite system is unlikely to be able to cost out the alternatives
and decide whether pushing the aggregation across is actually a win or
not.

The direction I think we ought to be heading is to generate explicit Paths
representing the various ways in which aggregation can be implemented.
The logic in grouping_planner is already overly complex, and hard to
extend, because it's all hard-wired comparisons of alternatives.  We'd be
better off with something more like the add_path infrastructure.  Once
that's been done, maybe we can allow FDWs to add Paths representing remote
aggregation.


Postgres-XC has extended the current PostgreSQL planner to find out the largest subset of join tree that can be evaluated on the server where the data is (called the Datanode in XC jargon). If it finds that the whole of join tree can be evaluated on the Datanode/s, it also attempts to evaluate the grouped aggregates (sometime partially). Same is the case with ORDER BY, LIMIT clauses. An alternate method called fast-query-shipping is used to avoid planning and pass the entire query to the Datanode/s if the query can be completely evaluated at the Datanode/s. These two techniques eliminate the need of pass-through syntax in XC.

But, XC planner currently has these extensions 1. without real cost estimations (since in XC assumption is that the query perform better if evaluated on the Datanodes, which itself is not right in some cases.) 2. Right now it works only for PostgreSQL (but can be extended easily for all SQL based databases).

It might be worth to look at the XC planner and pick up pieces of work that fit in PostgreSQL.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: killing pg_dump leaves backend process
Next
From: Michael Paquier
Date:
Subject: Server crash when using bgw_main for a dynamic bgworker