Re: pass-through queries to foreign servers - Mailing list pgsql-hackers

From David Gudeman
Subject Re: pass-through queries to foreign servers
Date
Msg-id CAE4YsyhnedHU6__95H+FzyNTTvKt0ucN3z8W60ztg4gAx7_+yA@mail.gmail.com
Whole thread Raw
In response to Re: pass-through queries to foreign servers  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pass-through queries to foreign servers  (David Gudeman <dave.gudeman@gmail.com>)
List pgsql-hackers
On Tue, Jul 30, 2013 at 10:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Fetter <david@fetter.org> writes:
>> On Tue, Jul 30, 2013 at 04:40:38PM -0700, David Gudeman wrote:
>>> When you write an application involving foreign tables, you frequently
>>> end up with queries that are just too inefficient because they bring
>>> too much data over from the foreign server. For a trivial example,
>>> consider "SELECT count(*) FROM t" where t is a foreign table. This
>>> will pull the entire table over the network just to count up the rows.
>
>> Yes, and this case is a known limitation of our planner
>> infrastructure.   Aggregates are "special" when it comes to
>> generating paths for the planner to evaluate, so there's no current
>> way a FDW could supply such info to the planner, and hence no API in
>> our FDW code for having FDWs supply that info.  That's probably a
>> "should fix" but I don't know whether a project that size could be
>> done by 9.4.
>
> Yeah.  There's a lot left to be done in the FDW infrastructure.
> But not this:
>
>> All that said, my DBI-Link, back in the bad old days, provided two
>> important functions: remote_select(), which returned SETOF RECORD and
>> remote_execute(), which returned nothing.  It also provided ways to
>> control connections to the remote host, introspect remote schemas,
>> etc., etc.  We need capabilities like that in the FDW API, I believe
>> we could have them by 9.4.
>
> I would argue we *don't* want that.  If you want pass-through queries
> or explicit connection control, your needs are already met by dblink or
> dbi-link.  The whole point of FDW is that it's at a higher level of
> abstraction than that; which offers greater ease of use and will
> eventually offer better optimization than what you can get from dblink
> et al.  If we start trying to shoehorn things like passthrough queries
> into FDW, we'll be crippling the technology.  As an example, somebody
> on planet postgresql was just recently surprised to find that postgres_fdw
> honors transaction rollback.  Well, it can do that because users can't
> disconnect the connection underneath it, nor issue passthrough
> commit/rollback commands.  You don't get to have it both ways.
>
>                         regards, tom lane

Tom, you have a good point about transaction management, but I think
we _can_ have it both ways. There are several things that the author
of the foreign data wrapper can do to prevent bad things from being
done since he has ultimate control of everything that gets sent to the
foreign server. For many foreign servers it is enough to check that
the string being sent to the foreign server begins with "select ". Or
he can prevent pass-through queries when there is an on-going
transaction on the foreign server. Or the author of a particular
foreign data wrapper can prevent pass-through queries entirely.

The point is that this is only a concern for some kinds of foreign
servers and even then only for those foreign data wrappers that care
about transactions. If they don't implement update/insert/delete, for
example, then it doesn't matter. Since there are many other kinds of
foreign servers where this could be useful, it should be available, at
least as an option.

The reason I want it is to do use it for some of the things that David
Fetter was talking about --optimizing queries with aggregates and
GROUP BY. I have code that currently optimizes these sorts of queries
for a particular database engine. I did this several years ago before
there were foreign data wrappers so I had to roll my own using table
functions. My implementation query rewrite rather than plan
optimization (it seemed to me to be too hard to do in the panning
phase). See http://unobtainabol.blogspot.com/2013/04/daves-foreign-data-translating-foreign_24.html
for a description of what it does.

My plan was to generalize my current code to generic SQL databases and
to make it work with foreign data wrappers. If there is any interest
from the PG community I'll try to get my company to let me contribute
this back. But the first thing I need is to implement pass-through
queries for foreign servers or I have to duplicate all of the
functionality for managing foreign servers and tables.

Regards,
David Gudeman
http://unobtainabol.blogspot.com



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])
Next
From: Michael Paquier
Date:
Subject: Re: Regarding BGworkers