Thread: pgsql_fdw, FDW for PostgreSQL server

pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module.  I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.

I attached three patches for this new FDW.  They should be applied in
the order below.  I separated these patches so that first (or first two)
can be committed separately.

* fdw_helper_doc.patch provides documents for FDW developers about
helper functions existing in 9.1, so this can be back-patched.
* fdw_helper_funcs.patch provides additional helper functions which
would make manipulation of FDW options easier.
* pgsql_fdw.patch provides new FDW for external PG server.

Here are details of pgsql_fdw.

Name of the wrapper
===================
I used the name "pgsql_fdw" for the wrapper and its derivatives.  I
think it would be better to leave contrib/dblink and built-in
postgresql_fdw_validator for backward compatibility, and use new name
for new wrapper.

Or, it might be OK to rename postgresql_fdw_validator to
dblink_validator or something, or fix dblink to use validator of new
wrapper.  I'm not sure that dblink should be alone or integrated with
pgsql_fdw...

Connection management
=====================
The pgsql_fdw establishes a new connection when a foreign server is
accessed first for the local session.  Established connection is shared
between all foreign scans in the local query, and shared between even
scans in following queries.  Connections are discarded when the current
transaction aborts so that unexpected failure won't cause connection
leak.  This is implemented with resource owner mechanism.

User can see active connections via pgsql_fdw_connections view, and
discard arbitrary connection via pgsql_fdw_disconnect() function.  These
can be done from only same local session.

If local role has changed via SET ROLE or SET SESSION AUTHENTICATION,
pgsql_fdw ignores old role's connections and looks up appropriate
connection for the new role from the pool.  If there wasn't suitable
one, pgsql_fdw establishes new connection.  When local role has changed
to old role again, pooled connection will be used again.

Unlike contrib/dblink, one foreign server can have only one connection
at a time for one local role.  This is because pgsql_fdw doesn't support
named connections.

Cost estimation
===============
The pgsql_fdw executes an EXPLAIN command on remote side for each
PlanForeignScan call.  Returned costs and rows are used as local
estimation for the Path with adding connection costs and data transfer
costs.

SELECT optimization
===================
To reduce amount of data transferred from remote server, references to
unnecessary columns are replaced with NULL literal in remote query.

WHERE clause push-down
======================
Some kind of qualifiers in WHERE clause are pushed down to remote server
so that the query result can be reduced.  Currently qualifiers which
include any volatile or stable element can't be pushed down.  Even with
these limitations, most qualifiers would be pushed down in usual cases.

Cursor mode
===========
The pgsql_fdw switches the way to retrieve result records according to
estimated result rows; use simple SELECT for small result, and use
cursor with DECLARE/FETCH statements for large result.  The threshold
is default to 1000, and configurable with FDW option "min_cursor_rows".
In cursor mode, number of rows fetched at once can be controlled by FDW
option "fetch_count".

EXPLAIN output
==============
The pgsql_fdw shows a remote query used for each foreign scan node in
the output of EXPLAIN command with title "Remote SQL".  If pgsql_fdw
decided to use cursor for the scan, DECLARE statement is shown.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Magnus Hagander
Date:
2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
> contrib module.  I think that this module would be the basis of further
> SQL/MED development for core, e.g. join-push-down and ANALYZE support.

I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> 2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>> contrib module. �I think that this module would be the basis of further
>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.

> I have not looked at the code itself, but I wonder if we shouldn't
> consider making this a part of core-proper, not just a contrib module.
> The fact that it isn't *already* available in core surprises a lot of
> people...

We've just spent a whole lot of blood and sweat on making the extension
mechanism work nicely.  I don't understand this urge to not use it.

ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
Once we do that its release schedule will get locked to core's ---
wouldn't it be better to keep flexibility for now, while it's in such
active development?
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Magnus Hagander
Date:
On Tue, Oct 25, 2011 at 14:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> 2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
>>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>>> contrib module.  I think that this module would be the basis of further
>>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.
>
>> I have not looked at the code itself, but I wonder if we shouldn't
>> consider making this a part of core-proper, not just a contrib module.
>> The fact that it isn't *already* available in core surprises a lot of
>> people...
>
> We've just spent a whole lot of blood and sweat on making the extension
> mechanism work nicely.  I don't understand this urge to not use it.

We're back to the old discussion, I guess.. I'm happy to see it as an
extension, but I think it should be included with the standard
installation. Like we do with for example pl/pgsql (which I realize
has a dependency on the backend anyway, so it can't be done another
way easily) and pl/perl (which doesn't, AFAIK, so it's a better
example)


> ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
> Once we do that its release schedule will get locked to core's ---
> wouldn't it be better to keep flexibility for now, while it's in such
> active development?

I would be happy to keep it outside, and integrate it in the final CF
for example :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
>> ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
>> Once we do that its release schedule will get locked to core's ---
>> wouldn't it be better to keep flexibility for now, while it's in such
>> active development?
>
> I would be happy to keep it outside, and integrate it in the final CF
> for example :)
>
Right now, file_fdw is the only FDW module that we have in the core,
however, it is inadequacy to proof the new concept of FDW feature
to utilize external RDBMS, such as join push-down of foreign tables.

I think the pgsql-fdw module also should be included in the core
distribution as a basis of future enhancement, unless we don't
need any working modules when an enhancement of FDW is
proposed.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Stephen Frost
Date:
* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
> Right now, file_fdw is the only FDW module that we have in the core,

Erm, guess I'm a bit confused why we've got that in core while not
putting pgsql_fdw in core.  This all gets back to previous discussions
around 'recommended' contrib modules (which should really be installed
by default on the filesystem through the distros, ala Debian's
"recommends:" approach) and 'other' contrib modules.

I'm in favor of making that distinction.  I would still have pgsql_fdw,
file_fdw, etc, be packaged more-or-less the same way and still use the
CREATE EXTENTION framework, of course.

It would be nice if we didn't have to lock the release schedule of those
recommended modules to the core release schedule, or even to each other,
but that's a separate issue, imv.
Thanks,
    Stephen

Re: pgsql_fdw, FDW for PostgreSQL server

From
Marko Kreen
Date:
On Tue, Oct 25, 2011 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> 2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
>>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>>> contrib module.  I think that this module would be the basis of further
>>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.
>
>> I have not looked at the code itself, but I wonder if we shouldn't
>> consider making this a part of core-proper, not just a contrib module.
>> The fact that it isn't *already* available in core surprises a lot of
>> people...
>
> We've just spent a whole lot of blood and sweat on making the extension
> mechanism work nicely.  I don't understand this urge to not use it.
>
> ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
> Once we do that its release schedule will get locked to core's ---
> wouldn't it be better to keep flexibility for now, while it's in such
> active development?

Simple question - do FDW internals need work?

--
marko


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2011/10/25 19:15), Magnus Hagander wrote:
> 2011/10/25 Shigeru Hanada<shigeru.hanada@gmail.com>:
>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>> contrib module.  I think that this module would be the basis of further
>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.
> 
> I have not looked at the code itself, but I wonder if we shouldn't
> consider making this a part of core-proper, not just a contrib module.
> The fact that it isn't *already* available in core surprises a lot of
> people...

Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb?  It would be accomplished with
some more changes:

* Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
install dynamically loadable module during "make install" for core.  The
pgsql_fdw_handler function can't be included into core binary because we
must avoid liking libpq with server binary directly.  This method is
also used for libwalreceiver of replication module.
* Create pgsql_fdw extension during initdb invocation, like plpgsql.

These are not trivial, but not difficult so much.  However, I think
contrib would be the appropriate place for pgsql_fdw because it's
(relatively) special feature.
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> (2011/10/25 19:15), Magnus Hagander wrote:
>> I have not looked at the code itself, but I wonder if we shouldn't
>> consider making this a part of core-proper, not just a contrib module.
>> The fact that it isn't *already* available in core surprises a lot of
>> people...

> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
> so that it's available just after initdb?

If that was what he meant, I'd vote against it.  There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net.  This feature should be something that has to be
installed by explicit user action.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Robert Haas
Date:
2011/10/26 Shigeru Hanada <shigeru.hanada@gmail.com>:
> (2011/10/25 19:15), Magnus Hagander wrote:
>> 2011/10/25 Shigeru Hanada<shigeru.hanada@gmail.com>:
>>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>>> contrib module.  I think that this module would be the basis of further
>>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.
>>
>> I have not looked at the code itself, but I wonder if we shouldn't
>> consider making this a part of core-proper, not just a contrib module.
>> The fact that it isn't *already* available in core surprises a lot of
>> people...
>
> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
> so that it's available just after initdb?  It would be accomplished with
> some more changes:
>
> * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
> install dynamically loadable module during "make install" for core.  The
> pgsql_fdw_handler function can't be included into core binary because we
> must avoid liking libpq with server binary directly.  This method is
> also used for libwalreceiver of replication module.
> * Create pgsql_fdw extension during initdb invocation, like plpgsql.
>
> These are not trivial, but not difficult so much.  However, I think
> contrib would be the appropriate place for pgsql_fdw because it's
> (relatively) special feature.

I agree.  pgsql_fdw will be a nice feature, but there's no reason to
think that everyone will want it installed by default, and there are
some security reasons to think that they might not.  On the flip side,
pushing it out of contrib and onto pgfoundry or whatever makes it
unnecessarily difficult to install, and not as many people will
benefit from it.  So contrib seems exactly right to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql_fdw, FDW for PostgreSQL server

From
Dimitri Fontaine
Date:
Stephen Frost <sfrost@snowman.net> writes:
> I'm in favor of making that distinction.  I would still have pgsql_fdw,
> file_fdw, etc, be packaged more-or-less the same way and still use the
> CREATE EXTENTION framework, of course.

We called that idea “core extension” at the latest hackers meeting, and
Greg Smith had a patch with a first selections of extensions to package
this way.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
2011/10/26 Robert Haas <robertmhaas@gmail.com>:
> 2011/10/26 Shigeru Hanada <shigeru.hanada@gmail.com>:
>> (2011/10/25 19:15), Magnus Hagander wrote:
>>> 2011/10/25 Shigeru Hanada<shigeru.hanada@gmail.com>:
>>>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
>>>> contrib module.  I think that this module would be the basis of further
>>>> SQL/MED development for core, e.g. join-push-down and ANALYZE support.
>>>
>>> I have not looked at the code itself, but I wonder if we shouldn't
>>> consider making this a part of core-proper, not just a contrib module.
>>> The fact that it isn't *already* available in core surprises a lot of
>>> people...
>>
>> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
>> so that it's available just after initdb?  It would be accomplished with
>> some more changes:
>>
>> * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
>> install dynamically loadable module during "make install" for core.  The
>> pgsql_fdw_handler function can't be included into core binary because we
>> must avoid liking libpq with server binary directly.  This method is
>> also used for libwalreceiver of replication module.
>> * Create pgsql_fdw extension during initdb invocation, like plpgsql.
>>
>> These are not trivial, but not difficult so much.  However, I think
>> contrib would be the appropriate place for pgsql_fdw because it's
>> (relatively) special feature.
>
> I agree.  pgsql_fdw will be a nice feature, but there's no reason to
> think that everyone will want it installed by default, and there are
> some security reasons to think that they might not.  On the flip side,
> pushing it out of contrib and onto pgfoundry or whatever makes it
> unnecessarily difficult to install, and not as many people will
> benefit from it.  So contrib seems exactly right to me.
>
I also agree. The pgsql_fdw will be worthful to locate in the main tree
as a contrib module. It will give us clear opportunity to test new
features of FDW using RDBMS characteristics; such as join-push-down.
However, it should be a separated discussion whether it shall be installed
by the default.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Magnus Hagander
Date:
On Wed, Oct 26, 2011 at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Shigeru Hanada <shigeru.hanada@gmail.com> writes:
>> (2011/10/25 19:15), Magnus Hagander wrote:
>>> I have not looked at the code itself, but I wonder if we shouldn't
>>> consider making this a part of core-proper, not just a contrib module.
>>> The fact that it isn't *already* available in core surprises a lot of
>>> people...
>
>> Do you mean that pgsql_fdw should be a built-in extension like plpgsql
>> so that it's available just after initdb?
>
> If that was what he meant, I'd vote against it.  There are way too many
> people who will *not* want their databases configured to be able to
> reach out onto the net.  This feature should be something that has to be
> installed by explicit user action.

That is not what I meant.

I meant installed the shared library by defualt, but still require
CREATE EXTENSION.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pgsql_fdw, FDW for PostgreSQL server

From
Andrew Dunstan
Date:

On 10/26/2011 12:47 PM, Magnus Hagander wrote:
>>
>> If that was what he meant, I'd vote against it.  There are way too many
>> people who will *not* want their databases configured to be able to
>> reach out onto the net.  This feature should be something that has to be
>> installed by explicit user action.
> That is not what I meant.
>
> I meant installed the shared library by defualt, but still require
> CREATE EXTENSION.
>

I don't see why it should be different from other standard modules, such 
as citext or hstore, both of which have pretty wide use, and less 
possible security implications than this.

cheers

andrew


Re: pgsql_fdw, FDW for PostgreSQL server

From
Magnus Hagander
Date:
On Wed, Oct 26, 2011 at 19:25, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 10/26/2011 12:47 PM, Magnus Hagander wrote:
>>>
>>> If that was what he meant, I'd vote against it.  There are way too many
>>> people who will *not* want their databases configured to be able to
>>> reach out onto the net.  This feature should be something that has to be
>>> installed by explicit user action.
>>
>> That is not what I meant.
>>
>> I meant installed the shared library by defualt, but still require
>> CREATE EXTENSION.
>>
>
> I don't see why it should be different from other standard modules, such as
> citext or hstore, both of which have pretty wide use, and less possible
> security implications than this.

As I stated earlier, it's really back to the old discussion of
splitting up contrib. This would be the "additional module" part, but
not the "example of how to do things" part of that...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Wed, Oct 26, 2011 at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If that was what he meant, I'd vote against it. �There are way too many
>> people who will *not* want their databases configured to be able to
>> reach out onto the net. �This feature should be something that has to be
>> installed by explicit user action.

> That is not what I meant.

> I meant installed the shared library by defualt, but still require
> CREATE EXTENSION.

Whether the shlib is installed by default is a decision for packagers to
make, not us.  At best we could make a recommendation.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2011/10/26 23:57), Kohei KaiGai wrote:
> 2011/10/26 Robert Haas<robertmhaas@gmail.com>:
>> I agree.  pgsql_fdw will be a nice feature, but there's no reason to
>> think that everyone will want it installed by default, and there are
>> some security reasons to think that they might not.  On the flip side,
>> pushing it out of contrib and onto pgfoundry or whatever makes it
>> unnecessarily difficult to install, and not as many people will
>> benefit from it.  So contrib seems exactly right to me.
>>
> I also agree. The pgsql_fdw will be worthful to locate in the main tree
> as a contrib module. It will give us clear opportunity to test new
> features of FDW using RDBMS characteristics; such as join-push-down.
> However, it should be a separated discussion whether it shall be installed
> by the default.

There seems to be some approvals on pushing pgsql_fdw into main tree
(contrib or core extension, or something else), but not an external
module.  There are still some debatable issues, but they would be
meaningless unless pgsql_fdw is qualified for a contrib module.  So I'd
like to continue the development of pgsql_fdw as contrib module, at
least for a while.

Please find attached a patch for pgsql_fdw.  This patch needs first two
patches attached to OP[1] to be applied.  (Sorry. gathering patches from
another post must be bothersome work.  Should I create new CF items for
fundamental patches?)

[1] http://archives.postgresql.org/pgsql-hackers/2011-10/msg01329.php

Changes done since last post are:
* add colname FDW option support
* allow some libpq options (authtype and tty) to be specified as server
FDW options

--
Shigeru Hanada

  * ポルトガル語 - 自動検出
  * 英語
  * 日本語

  * 英語
  * 日本語

 <javascript:void(0);>

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Hitoshi Harada
Date:
2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
>
> Connection management
> =====================
> The pgsql_fdw establishes a new connection when a foreign server is
> accessed first for the local session.  Established connection is shared
> between all foreign scans in the local query, and shared between even
> scans in following queries.  Connections are discarded when the current
> transaction aborts so that unexpected failure won't cause connection
> leak.  This is implemented with resource owner mechanism.
>

I have a doubt here, on sharing connection for each server. What if
there are simultaneous scan on the same plan? Say,

-> Nested Loop -> Foreign Scan to table T1 on server A -> Foreign Scan to table T2 on server A

Okay, you are thinking about Foreign Join, so example above is too
simple. But it is always possible to execute such a query if foreign
scan nodes are separated far, isn't it? As far as I see from your
explanation, scan T1 and scan T2 share the same connection. Now join
node scans one row from left (T1) while asking rows from right (T2)
without fetching all the rows from left. If T2 requests to server A,
the connection's result (of T1) is discarded. Am I understand
correctly?

Regards,
--
Hitoshi Harada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Martijn van Oosterhout
Date:
On Sat, Oct 29, 2011 at 12:25:46AM -0700, Hitoshi Harada wrote:
> I have a doubt here, on sharing connection for each server. What if
> there are simultaneous scan on the same plan? Say,
>
> -> Nested Loop
>   -> Foreign Scan to table T1 on server A
>   -> Foreign Scan to table T2 on server A
>
> Okay, you are thinking about Foreign Join, so example above is too
> simple. But it is always possible to execute such a query if foreign
> scan nodes are separated far, isn't it? As far as I see from your
> explanation, scan T1 and scan T2 share the same connection. Now join
> node scans one row from left (T1) while asking rows from right (T2)
> without fetching all the rows from left. If T2 requests to server A,
> the connection's result (of T1) is discarded. Am I understand
> correctly?

This would need to be factored in in the cost calculations. For remote
servers there is an overhead per tuple transmitted.  So in the above
case it might actually be quicker to do the nested loop locally.

To handle the parallel case you might need to materialise in the inner
loop, that would avoid the double scan. Or we could fix the protocol so
you can stream multiple queries at once.

Actually, you can already do this is you use DECLARE CURSOR for all the
queries upfront and then FETCH as needed.  That way you can do it all
over one connection.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> I have a doubt here, on sharing connection for each server. What if
> there are simultaneous scan on the same plan? Say,

> -> Nested Loop
>   -> Foreign Scan to table T1 on server A
>   -> Foreign Scan to table T2 on server A

> Okay, you are thinking about Foreign Join, so example above is too
> simple. But it is always possible to execute such a query if foreign
> scan nodes are separated far, isn't it? As far as I see from your
> explanation, scan T1 and scan T2 share the same connection. Now join
> node scans one row from left (T1) while asking rows from right (T2)
> without fetching all the rows from left. If T2 requests to server A,
> the connection's result (of T1) is discarded. Am I understand
> correctly?

I have not looked at the code, but ISTM the way that this has to work is
that you set up a portal for each active scan.  Then you can fetch a few
rows at a time from any one of them.

If you're doing this through libpq, it'd be necessary to implement each
scan using a cursor.  I'm not sure whether it'd be worth our time to
add more functions to libpq to allow more-direct access to the protocol
portal feature.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Hitoshi Harada
Date:
On Sat, Oct 29, 2011 at 8:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> I have a doubt here, on sharing connection for each server. What if
>> there are simultaneous scan on the same plan? Say,
>
>> -> Nested Loop
>>   -> Foreign Scan to table T1 on server A
>>   -> Foreign Scan to table T2 on server A
>
>> Okay, you are thinking about Foreign Join, so example above is too
>> simple. But it is always possible to execute such a query if foreign
>> scan nodes are separated far, isn't it? As far as I see from your
>> explanation, scan T1 and scan T2 share the same connection. Now join
>> node scans one row from left (T1) while asking rows from right (T2)
>> without fetching all the rows from left. If T2 requests to server A,
>> the connection's result (of T1) is discarded. Am I understand
>> correctly?
>
> I have not looked at the code, but ISTM the way that this has to work is
> that you set up a portal for each active scan.  Then you can fetch a few
> rows at a time from any one of them.

Hmm, true. Looking back at the original proposal (neither did I look
at the code,) there seems to be a cursor mode. ISTM it is hard for fdw
to know how the whole plan tree looks, so consequently do we always
cursor regardless of estimated row numbers? I haven't had much
experiences around cursor myself, but is it as efficient as
non-cursor?

Regards,
--
Hitoshi Harada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> On Sat, Oct 29, 2011 at 8:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I have not looked at the code, but ISTM the way that this has to work is
>> that you set up a portal for each active scan.  Then you can fetch a few
>> rows at a time from any one of them.

> Hmm, true. Looking back at the original proposal (neither did I look
> at the code,) there seems to be a cursor mode. ISTM it is hard for fdw
> to know how the whole plan tree looks, so consequently do we always
> cursor regardless of estimated row numbers?

I think we have to.  Even if we estimate that a given scan will return
only a few rows, what happens if we're wrong?  We don't want to blow out
memory on the local server by retrieving gigabytes in one go.

> I haven't had much experiences around cursor myself, but is it as
> efficient as non-cursor?

No, but if you need max efficiency you shouldn't be using foreign tables
in the first place; they're always going to be expensive to access.

It's likely that making use of native protocol portals (instead of
executing a lot of FETCH commands) would help.  But I think we'd be well
advised to do the first pass with just the existing libpq facilities,
and then measure to see where to improve performance.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
2011/10/29 Hitoshi Harada <umi.tanuki@gmail.com>:
> I have a doubt here, on sharing connection for each server. What if
> there are simultaneous scan on the same plan? Say,
>
> -> Nested Loop
>  -> Foreign Scan to table T1 on server A
>  -> Foreign Scan to table T2 on server A
>
> Okay, you are thinking about Foreign Join, so example above is too
> simple. But it is always possible to execute such a query if foreign
> scan nodes are separated far, isn't it? As far as I see from your
> explanation, scan T1 and scan T2 share the same connection. Now join
> node scans one row from left (T1) while asking rows from right (T2)
> without fetching all the rows from left. If T2 requests to server A,
> the connection's result (of T1) is discarded. Am I understand
> correctly?

I think that sharing a connection doesn't cause any problem.

In cursor mode, using multiple cursors concurrently through one connection
is OK.  In SELECT mode, pgsql_fdw executes SELECT statement with
PQexecParams and retrieves whole result *inside* the first Iterate call for
an outer tuple.  So libpq connection is already available when another scan
needs to call Iterate function.

--
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
2011/10/30 Tom Lane <tgl@sss.pgh.pa.us>:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> On Sat, Oct 29, 2011 at 8:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I have not looked at the code, but ISTM the way that this has to work is
>>> that you set up a portal for each active scan.  Then you can fetch a few
>>> rows at a time from any one of them.
>
>> Hmm, true. Looking back at the original proposal (neither did I look
>> at the code,) there seems to be a cursor mode. ISTM it is hard for fdw
>> to know how the whole plan tree looks, so consequently do we always
>> cursor regardless of estimated row numbers?
>
> I think we have to.  Even if we estimate that a given scan will return
> only a few rows, what happens if we're wrong?  We don't want to blow out
> memory on the local server by retrieving gigabytes in one go.

Oh, I overlooked the possibility of wrong estimation.  Old PostgreSQL uses
1000 as default estimation, so big table which has not been analyzed may
crashes the backend.

To ensure the data retrieving safe, we need to get actual amount of result,
maybe by executing SELECT COUNT(*) in planning phase.  It sounds too heavy
to do for every scan, and it still lacks actual width.

One possible idea is to change default value of min_cursur_rows option to 0
so that pgsql_fdw uses CURSOR by default, but it seems not enough.  I'll
drop simple SELECT mode from first version of pgsql_fdw for safety.

>> I haven't had much experiences around cursor myself, but is it as
>> efficient as non-cursor?
>
> No, but if you need max efficiency you shouldn't be using foreign tables
> in the first place; they're always going to be expensive to access.
>
> It's likely that making use of native protocol portals (instead of
> executing a lot of FETCH commands) would help.  But I think we'd be well
> advised to do the first pass with just the existing libpq facilities,
> and then measure to see where to improve performance.

I long for protocol-level cursor. :)

--
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2011/10/30 11:34), Shigeru Hanada wrote:
> 2011/10/30 Tom Lane<tgl@sss.pgh.pa.us>:
>> I think we have to.  Even if we estimate that a given scan will return
>> only a few rows, what happens if we're wrong?  We don't want to blow out
>> memory on the local server by retrieving gigabytes in one go.
>
> Oh, I overlooked the possibility of wrong estimation.  Old PostgreSQL uses
> 1000 as default estimation, so big table which has not been analyzed may
> crashes the backend.
>
> To ensure the data retrieving safe, we need to get actual amount of result,
> maybe by executing SELECT COUNT(*) in planning phase.  It sounds too heavy
> to do for every scan, and it still lacks actual width.
>
> One possible idea is to change default value of min_cursur_rows option to 0
> so that pgsql_fdw uses CURSOR by default, but it seems not enough.  I'll
> drop simple SELECT mode from first version of pgsql_fdw for safety.

I removed simple SELECT mode from pgsql_fdw, and consequently also
removed min_cursor_rows FDW option.  This fix avoids possible memory
exhaustion due to wrong estimation gotten from remote side.

Once libpq has had capability to retrieve arbitrary number of rows from
remote portal at a time without server-side cursor in future, then we
will be able to revive simple SELECT.  Then it's enough safe even if we
don't have actual data size, but (maybe) faster than cursor mode because
we can reduce # of SQL commands.  Though of course proof of performance
advantage should be shown before such development.

--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Pavel Stehule
Date:
2011/10/31 Shigeru Hanada <shigeru.hanada@gmail.com>:
> (2011/10/30 11:34), Shigeru Hanada wrote:
>> 2011/10/30 Tom Lane<tgl@sss.pgh.pa.us>:
>>> I think we have to.  Even if we estimate that a given scan will return
>>> only a few rows, what happens if we're wrong?  We don't want to blow out
>>> memory on the local server by retrieving gigabytes in one go.
>>
>> Oh, I overlooked the possibility of wrong estimation.  Old PostgreSQL uses
>> 1000 as default estimation, so big table which has not been analyzed may
>> crashes the backend.
>>
>> To ensure the data retrieving safe, we need to get actual amount of result,
>> maybe by executing SELECT COUNT(*) in planning phase.  It sounds too heavy
>> to do for every scan, and it still lacks actual width.
>>
>> One possible idea is to change default value of min_cursur_rows option to 0
>> so that pgsql_fdw uses CURSOR by default, but it seems not enough.  I'll
>> drop simple SELECT mode from first version of pgsql_fdw for safety.
>
> I removed simple SELECT mode from pgsql_fdw, and consequently also
> removed min_cursor_rows FDW option.  This fix avoids possible memory
> exhaustion due to wrong estimation gotten from remote side.
>
> Once libpq has had capability to retrieve arbitrary number of rows from
> remote portal at a time without server-side cursor in future, then we
> will be able to revive simple SELECT.  Then it's enough safe even if we
> don't have actual data size, but (maybe) faster than cursor mode because
> we can reduce # of SQL commands.  Though of course proof of performance
> advantage should be shown before such development.

If you need a less SQL commands, then you can increase fetch_count
parameter - default 1000 is maybe too small, maybe 10000 lines as
default (not more).

For more complex queries can be interesting to set a cursor_tuple_fraction

Pavel

>
> --
> Shigeru Hanada
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
Hi,

Attached are revised version of pgsql_fdw patches.

fdw_helper_funcs_v2.patch provides some functions which would be useful
to implement FDW, and document about FDW helper functions including
those which exist in 9.1.  They are not specific to pgsql_fdw, so I
separated it from pgsql_fdw patch.

pgsql_fdw_v4.patch provides a FDW for PostgreSQL.  This patch requires
fdw_helper_funcs_v2.patch has been applied.  Changes done since last
version are:

* Default of fetch_count option is increased to 10000, as suggested by
Pavel Stehule.
* Remove unnecessary NULL check before PQresultStatus.
* Evaluate all conditions on local side even if some of them has been
pushed down to remote side, to ensure that results are correct.
* Use fixed costs for queries which contain external parameter, because
such query can't be used in EXPLAIN command.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
Hanada-san,

I'm still under reviewing of your patch, so the comment is not overall, sorry.

I'm not sure whether the logic of is_foreign_expr() is appropriate.
It checks oid of the function within FuncExpr and OpExpr to disallow to push
down user-defined functions.
However, if a user-defined operator is implemented based on built-in functions
with different meaning, is it really suitable to push-down?
E.g) It is available to define '=' operator using int4ne, even though
quite nonsense.
So, I'd like to suggest to check oid of the operator; whether it is a
built-in, or not.

On the other hand, this hard-wired restriction may damage to the purpose of
this module; that enables to handle a query on multiple nodes in parallel.
I'm not sure whether it is right design that is_foreign_expr() returns false
when the supplied expr contains mutable functions.

Probably, here are two perspectives. The one want to make sure functions works
with same manner in all nodes. The other want to distribute processing
of queries
as possible as we can. Here is a trade-off between these two perspectives.
So, how about an idea to add a guc variable to control the criteria of
pushing-down.

Thanks,

2011年11月15日17:55 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Hi,
>
> Attached are revised version of pgsql_fdw patches.
>
> fdw_helper_funcs_v2.patch provides some functions which would be useful
> to implement FDW, and document about FDW helper functions including
> those which exist in 9.1.  They are not specific to pgsql_fdw, so I
> separated it from pgsql_fdw patch.
>
> pgsql_fdw_v4.patch provides a FDW for PostgreSQL.  This patch requires
> fdw_helper_funcs_v2.patch has been applied.  Changes done since last
> version are:
>
> * Default of fetch_count option is increased to 10000, as suggested by
> Pavel Stehule.
> * Remove unnecessary NULL check before PQresultStatus.
> * Evaluate all conditions on local side even if some of them has been
> pushed down to remote side, to ensure that results are correct.
> * Use fixed costs for queries which contain external parameter, because
> such query can't be used in EXPLAIN command.
>
> Regards,
> --
> Shigeru Hanada
>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
Hi Hanada-san,

(2011/11/16 1:55), Shigeru Hanada wrote:
> Attached are revised version of pgsql_fdw patches.

I'm still under reviewing, so the following is not all.  I'm sorry.
estimate_costs() have been implemented to ask a remote postgres server
for the result of EXPLAIN for a remote query to get its costs such as
startup_cost and total_cost.  I think this approach is the most accurate
way to get its costs.  However, I think it would be rather costly.  And
I'm afraid of that it might work only for pgsql_fdw. Because, even if we
are able to obtain such a cost information by EXPLAINing a remote query
at a remote server where a DBMS different from postgres runs, it might
be difficult to incorporate such a cost information with the postgres
cost model due to their possible inconsistency that such a cost
information provided by the EXPLAIN command in the other DBMS might have
different meanings (or different scales) from that provided by the
EXPLAIN command in postgres.  So, I think it might be better to estimate
such costs by pgsql_fdw itself without EXPLAINing on the assumption that
a remote postgres server has the same abilities for query optimization,
which is less costly and widely applicable to the other DBMSs, while it,
of course, only works once we have statistics and/or index information
for foreign tables.  But AFAIK we eventually want to have those, so I'd
like to propose to use the proposed approach until that time.

Best regards,
Etsuro Fujita


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
Hi Kaigai-san,

(2011/11/20 2:42), Kohei KaiGai wrote:
> I'm still under reviewing of your patch, so the comment is not overall, sorry.

Thanks for the review!

> I'm not sure whether the logic of is_foreign_expr() is appropriate.
> It checks oid of the function within FuncExpr and OpExpr to disallow to push
> down user-defined functions.
> However, if a user-defined operator is implemented based on built-in functions
> with different meaning, is it really suitable to push-down?
> E.g) It is available to define '=' operator using int4ne, even though
> quite nonsense.
> So, I'd like to suggest to check oid of the operator; whether it is a
> built-in, or not.
> 
> On the other hand, this hard-wired restriction may damage to the purpose of
> this module; that enables to handle a query on multiple nodes in parallel.
> I'm not sure whether it is right design that is_foreign_expr() returns false
> when the supplied expr contains mutable functions.
> 
> Probably, here are two perspectives. The one want to make sure functions works
> with same manner in all nodes. The other want to distribute processing
> of queries
> as possible as we can. Here is a trade-off between these two perspectives.
> So, how about an idea to add a guc variable to control the criteria of
> pushing-down.

I agree that allowing users to control which function/operator should be
pushed down is useful, but GUC seems too large as unit of switching
behavior.  "Routine Mapping", a mechanism which is defined in SQL/MED
standard, would be the answer for this issue.  It can be used to map a
local routine (a procedure or a function) to something on a foreign
server.  It is like user mapping, but it has mapping name.  Probably it
would have these attributes:

pg_catalog.pg_routine_mapping   rmname        name   rmprocid        regproc   rmserverid        oid   rmfdwoptions
text[]

If we have routine mapping, FDW authors can provide default mappings
within extension installation, and users can customize them.  Maybe FDWs
will want to push down only functions/operators which have routine
mapping entries, so providing common routine which returns mapping
information of given function/operator, say GetRoutineMapping(procid,
serverid), is useful.

Unfortunately we don't have it at the moment, I'll fix pgsql_fdw so that
it pushes down only built-in operators, including scalar-array operators.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
Hi Fujita-san,

(2011/11/25 17:27), Etsuro Fujita wrote:
> I'm still under reviewing, so the following is not all.  I'm sorry.
> estimate_costs() have been implemented to ask a remote postgres server
> for the result of EXPLAIN for a remote query to get its costs such as
> startup_cost and total_cost.  I think this approach is the most accurate
> way to get its costs.  However, I think it would be rather costly.  And
> I'm afraid of that it might work only for pgsql_fdw.

Indeed.  In addition, this approach assumes that cost factors of target
PG server are same as local's ones.  pgsql_fdw might have to have cost
factors as FDW options of foreign server.

>                                                      Because, even if we
> are able to obtain such a cost information by EXPLAINing a remote query
> at a remote server where a DBMS different from postgres runs, it might
> be difficult to incorporate such a cost information with the postgres
> cost model due to their possible inconsistency that such a cost
> information provided by the EXPLAIN command in the other DBMS might have
> different meanings (or different scales) from that provided by the
> EXPLAIN command in postgres.

Yes, so implementing cost estimation for other DBMSs accurately would be
very difficult, but AFAIS rows estimation is the most important factor,
so reasonable row count and relatively high startup cost would produce
not-so-bad plan.

>                               So, I think it might be better to estimate
> such costs by pgsql_fdw itself without EXPLAINing on the assumption that
> a remote postgres server has the same abilities for query optimization,
> which is less costly and widely applicable to the other DBMSs, while it,
> of course, only works once we have statistics and/or index information
> for foreign tables.  But AFAIK we eventually want to have those, so I'd
> like to propose to use the proposed approach until that time.

Knowledge of foreign indexes also provide information of sort order.
Planner will be able to consider merge join without local sort with such
information.  Without foreign index, we have to enumerate possible sort
keys with Blute-Force approach for same result, as mentioned by
Itagaki-san before.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Robert Haas
Date:
2011/11/28 Shigeru Hanada <shigeru.hanada@gmail.com>:
> I agree that allowing users to control which function/operator should be
> pushed down is useful, but GUC seems too large as unit of switching
> behavior.  "Routine Mapping", a mechanism which is defined in SQL/MED
> standard, would be the answer for this issue.  It can be used to map a
> local routine (a procedure or a function) to something on a foreign
> server.  It is like user mapping, but it has mapping name.  Probably it
> would have these attributes:
>
> pg_catalog.pg_routine_mapping
>    rmname              name
>    rmprocid            regproc
>    rmserverid          oid
>    rmfdwoptions        text[]
>
> If we have routine mapping, FDW authors can provide default mappings
> within extension installation, and users can customize them.  Maybe FDWs
> will want to push down only functions/operators which have routine
> mapping entries, so providing common routine which returns mapping
> information of given function/operator, say GetRoutineMapping(procid,
> serverid), is useful.
>
> Unfortunately we don't have it at the moment, I'll fix pgsql_fdw so that
> it pushes down only built-in operators, including scalar-array operators.

One difficulty here is that even very simple operators don't
necessarily mean the same thing on both sides.  In my last job we had
a Microsoft SQL database where string equality was case insensitive,
and a PostgreSQL database where it wasn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2011/11/28 20:50), Shigeru Hanada wrote:
> (2011/11/25 17:27), Etsuro Fujita wrote:
>>                                So, I think it might be better to estimate
>> such costs by pgsql_fdw itself without EXPLAINing on the assumption that
>> a remote postgres server has the same abilities for query optimization,
>> which is less costly and widely applicable to the other DBMSs, while it,
>> of course, only works once we have statistics and/or index information
>> for foreign tables.  But AFAIK we eventually want to have those, so I'd
>> like to propose to use the proposed approach until that time.
> 
> Knowledge of foreign indexes also provide information of sort order.
> Planner will be able to consider merge join without local sort with such
> information.  Without foreign index, we have to enumerate possible sort
> keys with Blute-Force approach for same result, as mentioned by
> Itagaki-san before.

Yes, with the knowledge of foreign indexes, I think we can take the
approach of thinking multiple plans for a foreign table; the cheapest
unordered plan and the cheapest plan with a given sort order.  In
addition, it would be also possible to support
nestloop-with-inner-foreign-indexscans on a foreign table as pointed out
as future work by Tom Lane at PGCon 2011[1].

[1] http://www.pgcon.org/2011/schedule/attachments/188_Planner%20talk.pdf

Best regards,
Etsuro Fujita


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Robert Haas wrote:
> 2011/11/28 Shigeru Hanada <shigeru.hanada@gmail.com>:
>> I agree that allowing users to control which function/operator should be
>> pushed down is useful, but GUC seems too large as unit of switching
>> behavior.  "Routine Mapping", a mechanism which is defined in SQL/MED
>> standard, would be the answer for this issue.  It can be used to map a
>> local routine (a procedure or a function) to something on a foreign
>> server.  It is like user mapping, but it has mapping name.  Probably it
>> would have these attributes:
>>
>> pg_catalog.pg_routine_mapping
>>    rmname              name
>>    rmprocid            regproc
>>    rmserverid          oid
>>    rmfdwoptions        text[]
>>
>> If we have routine mapping, FDW authors can provide default mappings
>> within extension installation, and users can customize them.  Maybe FDWs
>> will want to push down only functions/operators which have routine
>> mapping entries, so providing common routine which returns mapping
>> information of given function/operator, say GetRoutineMapping(procid,
>> serverid), is useful.
>>
>> Unfortunately we don't have it at the moment, I'll fix pgsql_fdw so that
>> it pushes down only built-in operators, including scalar-array operators.
>
> One difficulty here is that even very simple operators don't
> necessarily mean the same thing on both sides.  In my last job we had
> a Microsoft SQL database where string equality was case insensitive,
> and a PostgreSQL database where it wasn't.

I think that this is not always safe even from PostgreSQL to PostgreSQL.
If two databases have different collation, "<" on strings will behave
differently.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
Sorry for delayed response.

2011/11/29 Albe Laurenz <laurenz.albe@wien.gv.at>:
> I think that this is not always safe even from PostgreSQL to PostgreSQL.
> If two databases have different collation, "<" on strings will behave
> differently.

Indeed.  I think that only the owner of foreign table can keep collation
consistent between foreign and local, like data type of column.  We need to
support per-column-collation on foreign tables too, or should deny pushing
down condition which is collation-sensitive...

Regards,--
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Greg Smith
Date:
On 12/07/2011 02:34 AM, Shigeru Hanada wrote:
> I think that only the owner of foreign table can keep collation
> consistent between foreign and local, like data type of column.  We need to
> support per-column-collation on foreign tables too, or should deny pushing
> down condition which is collation-sensitive...
>    

I am not sure about what next step you were planning here.  Are you 
thinking to block this sort of push-down in an update to your feature 
patch, or does some more serious work on foreign table collation need to 
happen first instead?

It looks like there has been some good discussion of this feature here, 
but there is still some work needed before it will be ready to commit.  
Hanada-san, did you get the feedback you were looking for here yet, or 
are there things you still wanted to discuss?  It is not clear to me 
what happens next; I would appreciate your comment on that.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: pgsql_fdw, FDW for PostgreSQL server

From
Robert Haas
Date:
On Wed, Dec 7, 2011 at 2:34 AM, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
> Sorry for delayed response.
>
> 2011/11/29 Albe Laurenz <laurenz.albe@wien.gv.at>:
>> I think that this is not always safe even from PostgreSQL to PostgreSQL.
>> If two databases have different collation, "<" on strings will behave
>> differently.
>
> Indeed.  I think that only the owner of foreign table can keep collation
> consistent between foreign and local, like data type of column.

+1.

> We need to
> support per-column-collation on foreign tables too, or should deny pushing
> down condition which is collation-sensitive...

It seems that we already do:

rhaas=# create foreign table ft1 (a text collate "de_DE") server s1;
CREATE FOREIGN TABLE

It does seem like this might not be enough information for the FDW to
make good decisions about pushdown.  Even supposing the server on the
other hand is also PostgreSQL, the collation names might not match
(if, say, one is running Windows, and the other, Linux).  And even if
they do, there is no guarantee that two collations with the same name
have the same behavior on two different machines; they probably
should, but who knows?  And if we're using an FDW to talk to some
other database server, the problem is much worse; it's not clear that
we'll even begin to be able to guess whether the remote side has
compatible semantics.  I feel like we might need a system here that
allows for more explicit user control about what to push down vs. not,
rather than assuming we'll be able to figure it out behind the scenes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2011/12/12 22:59), Robert Haas wrote:
> It does seem like this might not be enough information for the FDW to
> make good decisions about pushdown.  Even supposing the server on the
> other hand is also PostgreSQL, the collation names might not match
> (if, say, one is running Windows, and the other, Linux).  And even if
> they do, there is no guarantee that two collations with the same name
> have the same behavior on two different machines; they probably
> should, but who knows?  And if we're using an FDW to talk to some
> other database server, the problem is much worse; it's not clear that
> we'll even begin to be able to guess whether the remote side has
> compatible semantics.  I feel like we might need a system here that
> allows for more explicit user control about what to push down vs. not,
> rather than assuming we'll be able to figure it out behind the scenes.

Agreed.  How about to add a per-column boolean FDW option, say
"pushdown", to pgsql_fdw?  Users can tell pgsql_fdw that the column can
be pushed down safely by setting this option to true.  IMO default
should be false (no push down).

In most cases, columns with numeric/time-related types are safe to be
pushed down because they are free from collations issue, so users would
want to set to true.  OTOH, columns with string types would need some
considerations.  Once users have ensured that the column has compatible
semantics, they can set "pushdown=true" for efficiency.

If a condition contains any columns with pushdown=false, that condition
should NOT be pushed down.

This idea is only for pgsql_fdw now, but it can be used for other FDWs
which support push-down, and it would be also useful for ORDER BY
push-down support in future, which is apparently contains collation issue.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> (2011/12/12 22:59), Robert Haas wrote:
>> ... I feel like we might need a system here that
>> allows for more explicit user control about what to push down vs. not,
>> rather than assuming we'll be able to figure it out behind the scenes.

> Agreed.  How about to add a per-column boolean FDW option, say
> "pushdown", to pgsql_fdw?  Users can tell pgsql_fdw that the column can
> be pushed down safely by setting this option to true.

[ itch... ] That doesn't seem like the right level of granularity.
ISTM the problem is with whether specific operators have the same
meaning at the far end as they do locally.  If you try to attach the
flag to columns, you have to promise that *every* operator on that
column means what it does locally, which is likely to not be the
case ever if you look hard enough.  Plus, having to set the flag on
each individual column of the same datatype seems pretty tedious.

I don't have a better idea to offer at the moment though.  Trying
to attach such a property to operators seems impossibly messy too.
If it weren't for the collations issue, I might think that labeling
datatypes as being compatible would be a workable approximation.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Tom Lane wrote:
> Shigeru Hanada <shigeru.hanada@gmail.com> writes:
>> (2011/12/12 22:59), Robert Haas wrote:
>>> ... I feel like we might need a system here that
>>> allows for more explicit user control about what to push down vs.
not,
>>> rather than assuming we'll be able to figure it out behind the
scenes.

>> Agreed.  How about to add a per-column boolean FDW option, say
>> "pushdown", to pgsql_fdw?  Users can tell pgsql_fdw that the column
can
>> be pushed down safely by setting this option to true.

> [ itch... ] That doesn't seem like the right level of granularity.
> ISTM the problem is with whether specific operators have the same
> meaning at the far end as they do locally.  If you try to attach the
> flag to columns, you have to promise that *every* operator on that
> column means what it does locally, which is likely to not be the
> case ever if you look hard enough.  Plus, having to set the flag on
> each individual column of the same datatype seems pretty tedious.
>
> I don't have a better idea to offer at the moment though.  Trying
> to attach such a property to operators seems impossibly messy too.
> If it weren't for the collations issue, I might think that labeling
> datatypes as being compatible would be a workable approximation.

Maybe I'm missing something, but if pushdown worked as follows:

- Push down only system functions and operators on system types.
- Only push down what is guaranteed to work.

then the only things we would miss out on are encoding- or
collation-sensitive string operations.

Is that loss so big that it warrants a lot of effort?

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Heikki Linnakangas
Date:
On 13.12.2011 11:57, Albe Laurenz wrote:
> Tom Lane wrote:
>> Shigeru Hanada<shigeru.hanada@gmail.com>  writes:
>>> (2011/12/12 22:59), Robert Haas wrote:
>>>> ... I feel like we might need a system here that
>>>> allows for more explicit user control about what to push down vs.
> not,
>>>> rather than assuming we'll be able to figure it out behind the
> scenes.
>
>>> Agreed.  How about to add a per-column boolean FDW option, say
>>> "pushdown", to pgsql_fdw?  Users can tell pgsql_fdw that the column
> can
>>> be pushed down safely by setting this option to true.
>
>> [ itch... ] That doesn't seem like the right level of granularity.
>> ISTM the problem is with whether specific operators have the same
>> meaning at the far end as they do locally.  If you try to attach the
>> flag to columns, you have to promise that *every* operator on that
>> column means what it does locally, which is likely to not be the
>> case ever if you look hard enough.  Plus, having to set the flag on
>> each individual column of the same datatype seems pretty tedious.
>>
>> I don't have a better idea to offer at the moment though.  Trying
>> to attach such a property to operators seems impossibly messy too.
>> If it weren't for the collations issue, I might think that labeling
>> datatypes as being compatible would be a workable approximation.
>
> Maybe I'm missing something, but if pushdown worked as follows:
>
> - Push down only system functions and operators on system types.
> - Only push down what is guaranteed to work.
>
> then the only things we would miss out on are encoding- or
> collation-sensitive string operations.
>
> Is that loss so big that it warrants a lot of effort?

The SQL/MED spec handles this with the concept of "routine mappings". 
There is syntax for defining which remote "routines", meaning functions, 
correspond local functions:

CREATE ROUTINE MAPPING <routine mapping name> FOR <specific routine 
designator>
SERVER <foreign server name> [ <generic options> ]

<generic options> is FDW-specific, I'd imagine the idea is to give the 
name of the corresponding function in the remote server. It doesn't say 
anything about collations, but you could have extra options to specify 
that a function can only be mapped under C collation, or whatever.

It seems tedious to specify that per-server, though, so we'll probably 
still want to have some smarts in the pgsql_fdw to handle the built-in 
functions and types that we know to be safe.

I've been talking about functions here, not operators, on the assumption 
that we can look up the function underlying the operator and make the 
decisions based on that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2011/12/13 20:04), Heikki Linnakangas wrote:
> The SQL/MED spec handles this with the concept of "routine mappings". 
> There is syntax for defining which remote "routines", meaning functions, 
> correspond local functions:
> 
> CREATE ROUTINE MAPPING <routine mapping name> FOR <specific routine 
> designator>
> SERVER <foreign server name> [ <generic options> ]
> 
> <generic options> is FDW-specific, I'd imagine the idea is to give the 
> name of the corresponding function in the remote server. It doesn't say 
> anything about collations, but you could have extra options to specify 
> that a function can only be mapped under C collation, or whatever.

I considered ROUTINE MAPPING for other RDBMS before, and thought that
having order of parameter in generic options would be necessary.  It's
also useful for pgsql_fdw to support pushing down user-defined
functions.  Maybe built-in format() function suits for this purpose?

> It seems tedious to specify that per-server, though, so we'll probably 
> still want to have some smarts in the pgsql_fdw to handle the built-in 
> functions and types that we know to be safe.

One possible idea is having default mapping with serverid = InvalidOid,
and override them with entries which has valid server oid.  Such default
mappings can be loaded during CREATE EXTENSION.

> I've been talking about functions here, not operators, on the assumption 
> that we can look up the function underlying the operator and make the 
> decisions based on that.

It's interesting viewpoint to think operator notation is syntax sugar of
function notation, e.g. "A = B" -> "int4eq(A, B)".  Routine mappings
seem to work for operators too.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2011/12/13 18:57), Albe Laurenz wrote:
> Maybe I'm missing something, but if pushdown worked as follows:
> 
> - Push down only system functions and operators on system types.
> - Only push down what is guaranteed to work.

Oh, I didn't care whether system data types.  Indeed user defined types
would not be safe to push down.

> then the only things we would miss out on are encoding- or
> collation-sensitive string operations.
> 
> Is that loss so big that it warrants a lot of effort?

It depends on the definition of "collation-sensitive".  If we define it
as "all operations which might handle any collation-sensitive element",
all functions/operators which take any of character data types (text,
varchar, bpchar, sql_identifier, etc.) are unable to be pushed down.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2011/12/13 14:46), Tom Lane wrote:
> Shigeru Hanada<shigeru.hanada@gmail.com>  writes:
>> Agreed.  How about to add a per-column boolean FDW option, say
>> "pushdown", to pgsql_fdw?  Users can tell pgsql_fdw that the column can
>> be pushed down safely by setting this option to true.
>
> [ itch... ] That doesn't seem like the right level of granularity.
> ISTM the problem is with whether specific operators have the same
> meaning at the far end as they do locally.  If you try to attach the
> flag to columns, you have to promise that *every* operator on that
> column means what it does locally, which is likely to not be the
> case ever if you look hard enough.  Plus, having to set the flag on
> each individual column of the same datatype seems pretty tedious.

Indeed, I too think that labeling on each columns is not the best way,
but at that time I thought that it's a practical way, in a way.  IOW, I
chose per-column FDW options as a compromise between never-push-down and
indiscriminate-push-down.

Anyway, ISTM that we should consider various mapping for
functions, operators and collations to support push-down in general
way, but it would be hard to accomplish in this CF.

Here I'd like to propose three incremental patches:

1) fdw_helper_funcs_v3.patch:  This is not specific to pgsql_fdw, but
probably useful for every FDWs which use FDW options.  This patch
provides some functions which help retrieving FDW options from catalogs.
 This patch also enhances document about existing FDW helper functions.

2) pgsql_fdw_v5.patch:  This patch provides simple pgsql_fdw
which does *NOT* support any push-down.  All data in remote table are
retrieved for each foreign scan, and conditions are always evaluated on
local side.  This is safe about semantics difference between local and
remote, but very inefficient especially for large remote tables.

3) pgsql_fdw_pushdown_v1.patch:  This patch adds limited push-down
capability to pgsql_fdw which is implemented by previous patch.  The
criteria for pushing down is little complex.  I modified pgsql_fdw to
*NOT* push down conditions which contain any of:

  a) expression whose result collation is valid
  b) expression whose input collation is valid
  c) expression whose result type is user-defined
  d) expression which uses user-defined function
  e) array expression whose elements has user-defined type
  f) expression which uses user-defined operator
  g) expression which uses mutable function

As the result, pgsql_fdw can push down very limited conditions such as
numeric comparisons, but it would be still useful.  I hope that these
restriction are enough to avoid problems about semantics difference
between remote and local.

To implement d), I added exprFunction to nodefuncs.c which returns Oid
of function which is used in the expression node, but I'm not sure that
it should be there.  Should we have it inside pgsql_fdw?

I'd like to thank everyone who commented on this topic!

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Greg Smith
Date:
On 12/14/2011 09:02 AM, Shigeru Hanada wrote:
> Here I'd like to propose three incremental patches:
>
> 1) fdw_helper_funcs_v3.patch...:  This is not specific to pgsql_fdw, but
> probably useful for every FDWs which use FDW options...
> 2) pgsql_fdw_v5.patch:  This patch provides simple pgsql_fdw
> which does *NOT* support any push-down...
> 3) pgsql_fdw_pushdown_v1.patch:  This patch adds limited push-down
> capability to pgsql_fdw which is implemented by previous patch...
> ...
> To implement [expression which uses user-defined function], I added exprFunction to nodefuncs.c which returns Oid
> of function which is used in the expression node, but I'm not sure that
> it should be there.  Should we have it inside pgsql_fdw?

After failing to bring some light onto this during my general update,
will try again here. We now have 3 updated patches that refactor things
from how this was originally presented, with one asked implementation
question. There's also a spawned off "Join push-down for foreign tables"
patch off in another thread.

I don't think it's really clear to everyone what state this feature
proposal is in. We've gotten bits of review here from KaiGai and Heikki,
big picture comments from Robert and Tom. Given how these are
structured, is fdw_helper_funcs_v3.patch at the point where it should be
considered for committer review? Maybe pgsql_fdw_v5.patch too?

The others seem to be more in flux to me, due to all the recent pushdown
changes.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
Hi Harada-san,

I checked the "fdw_helper_funcs_v3.patch", "pgsql_fdw_v5.patch" and
"pgsql_fdw_pushdown_v1.patch". My comments are below.

[BUG]
Even though pgsql_fdw tries to push-down qualifiers being executable
on the remove side at the deparseSql(), it does not remove qualifiers
being pushed down from the baserel->baserestrictinfo, thus, these
qualifiers are eventually executed twice.

See the result of this EXPLAIN. postgres=# EXPLAIN SELECT * FROM ft1 WHERE a > 2 AND f_leak(b);
                     QUERY PLAN
------------------------------------------------------------------------------------------------------ Foreign Scan on
ft1 (cost=107.43..122.55 rows=410 width=36)    Filter: (f_leak(b) AND (a > 2))    Remote SQL: DECLARE
pgsql_fdw_cursor_0SCROLL CURSOR FOR SELECT
 
a, b FROM public.t1 WHERE (a > 2) (3 rows)

My expectation is (a > 2) being executed on the remote-side and f_leak(b)
being executed on the local-side. But, filter of foreign-scan on ft1 has both
of qualifiers. It has to be removed, if a RestrictInfo get pushed-down.

[Design comment]
I'm not sure the reason why store_result() uses MessageContext to save
the Tuplestorestate within PgsqlFdwExecutionState.
The source code comment says it is used to avoid memory leaks in error
cases. I also have a similar experience on implementation of my fdw module,
so, I could understand per-scan context is already cleared at the timing of
resource-release-callback, thus, handlers to external resource have to be
saved on separated memory context.
In my personal opinion, the right design is to declare a memory context for
pgsql_fdw itself, instead of the abuse of existing memory context.
(More wise design is to define sub-memory-context for each foreign-scan,
then, remove the sub-memory-context after release handlers.)

[Design comment]
When "BEGIN" should be issued on the remote-side?
The connect_pg_server() is an only chance to issue "BEGIN" command
at the remote-side on connection being opened. However, the connection
shall be kept unless an error is not raised. Thus, the remote-side will
continue to work within a single transaction block, even if local-side iterates
a pair of "BEGIN" and "COMMIT".
I'd like to suggest to close the transaction block at the timing of either
end of the scan, transaction or sub-transaction.

[Comment to Improve]
Also, which transaction isolation level should be specified in this case?
An idea is its isolation level is specified according to the current isolation
level on the local-side.
(Of course, it is your choice if it is not supported right now.)

[Comment to improve]
It seems to me the design of exprFunction is not future-proof, if we add
a new node type that contains two or more function calls, because it can
return an oid of functions.
I think, the right design is to handle individual node-types within the
large switch statement at foreign_expr_walker().
Of course, it is just my sense.

[Comment to improve]
The pgsql_fdw_handler() allocates FdwRoutine using makeNode(),
then it set function-pointers on each fields.
Why don't you return a pointer to statically declared FdwRoutine
variable being initialized at compile time, like:
 static FdwRoutine pgsql_fdw_handler = {     .type               = T_FdwRoutine,     .PlanForeignScan    =
pgsqlPlanForeignScan,    .ExplainForeignScan = pgsqlExplainForeignScan,     .BeginForeignScan   =
pgsqlBeginForeignScan,    .IterateForeignScan = pgsqlIterateForeignScan,     .ReScanForeignScan  =
pgsqlReScanForeignScan,    .EndForeignScan     = pgsqlEndForeignScan, };
 
 Datum pgsql_fdw_handler(PG_FUNCTION_ARGS) {       PG_RETURN_POINTER(&pgsql_fdw_handler); }

[Question to implementation]
At pgsqlIterateForeignScan(), it applies null-check on festate->tuples
and bool-checks on festete->cursor_opened.
Do we have a possible scenario that festate->tuples is not null, but
festate->cursor_opened, or an opposite combination?
If null-check on festate->tuples is enough to detect the first call of
the iterate callback, it is not my preference to have redundant flag.

Thanks,

2011年12月14日15:02 Shigeru Hanada <shigeru.hanada@gmail.com>:
> (2011/12/13 14:46), Tom Lane wrote:
>> Shigeru Hanada<shigeru.hanada@gmail.com>  writes:
>>> Agreed.  How about to add a per-column boolean FDW option, say
>>> "pushdown", to pgsql_fdw?  Users can tell pgsql_fdw that the column can
>>> be pushed down safely by setting this option to true.
>>
>> [ itch... ] That doesn't seem like the right level of granularity.
>> ISTM the problem is with whether specific operators have the same
>> meaning at the far end as they do locally.  If you try to attach the
>> flag to columns, you have to promise that *every* operator on that
>> column means what it does locally, which is likely to not be the
>> case ever if you look hard enough.  Plus, having to set the flag on
>> each individual column of the same datatype seems pretty tedious.
>
> Indeed, I too think that labeling on each columns is not the best way,
> but at that time I thought that it's a practical way, in a way.  IOW, I
> chose per-column FDW options as a compromise between never-push-down and
> indiscriminate-push-down.
>
> Anyway, ISTM that we should consider various mapping for
> functions, operators and collations to support push-down in general
> way, but it would be hard to accomplish in this CF.
>
> Here I'd like to propose three incremental patches:
>
> 1) fdw_helper_funcs_v3.patch:  This is not specific to pgsql_fdw, but
> probably useful for every FDWs which use FDW options.  This patch
> provides some functions which help retrieving FDW options from catalogs.
>  This patch also enhances document about existing FDW helper functions.
>
> 2) pgsql_fdw_v5.patch:  This patch provides simple pgsql_fdw
> which does *NOT* support any push-down.  All data in remote table are
> retrieved for each foreign scan, and conditions are always evaluated on
> local side.  This is safe about semantics difference between local and
> remote, but very inefficient especially for large remote tables.
>
> 3) pgsql_fdw_pushdown_v1.patch:  This patch adds limited push-down
> capability to pgsql_fdw which is implemented by previous patch.  The
> criteria for pushing down is little complex.  I modified pgsql_fdw to
> *NOT* push down conditions which contain any of:
>
>  a) expression whose result collation is valid
>  b) expression whose input collation is valid
>  c) expression whose result type is user-defined
>  d) expression which uses user-defined function
>  e) array expression whose elements has user-defined type
>  f) expression which uses user-defined operator
>  g) expression which uses mutable function
>
> As the result, pgsql_fdw can push down very limited conditions such as
> numeric comparisons, but it would be still useful.  I hope that these
> restriction are enough to avoid problems about semantics difference
> between remote and local.
>
> To implement d), I added exprFunction to nodefuncs.c which returns Oid
> of function which is used in the expression node, but I'm not sure that
> it should be there.  Should we have it inside pgsql_fdw?
>
> I'd like to thank everyone who commented on this topic!
>
> Regards,
> --
> Shigeru Hanada



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Robert Haas
Date:
2012/1/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>     Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT
> a, b FROM public.t1 WHERE (a > 2)
>  (3 rows)

Shouldn't we be using protocol-level cursors rather than SQL-level cursors?

> [Design comment]
> When "BEGIN" should be issued on the remote-side?
> The connect_pg_server() is an only chance to issue "BEGIN" command
> at the remote-side on connection being opened. However, the connection
> shall be kept unless an error is not raised. Thus, the remote-side will
> continue to work within a single transaction block, even if local-side iterates
> a pair of "BEGIN" and "COMMIT".
> I'd like to suggest to close the transaction block at the timing of either
> end of the scan, transaction or sub-transaction.

I suspect this is ultimately going to need to be configurable.  Some
people might want to close the transaction on the remote side ASAP,
while other people might want to hold it open until commit.  For a
first version I think it's most likely best to do whatever seems
simplest to code, planning to add more options later.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/01/30 4:39), Kohei KaiGai wrote:
> I checked the "fdw_helper_funcs_v3.patch", "pgsql_fdw_v5.patch" and
> "pgsql_fdw_pushdown_v1.patch". My comments are below.

Thanks for the review!

> [BUG]
> Even though pgsql_fdw tries to push-down qualifiers being executable
> on the remove side at the deparseSql(), it does not remove qualifiers
> being pushed down from the baserel->baserestrictinfo, thus, these
> qualifiers are eventually executed twice.
>
> See the result of this EXPLAIN.
>    postgres=# EXPLAIN SELECT * FROM ft1 WHERE a>  2 AND f_leak(b);
>                                                  QUERY PLAN
>    ------------------------------------------------------------------------------------------------------
>     Foreign Scan on ft1  (cost=107.43..122.55 rows=410 width=36)
>       Filter: (f_leak(b) AND (a>  2))
>       Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT
> a, b FROM public.t1 WHERE (a>  2)
>    (3 rows)
>
> My expectation is (a>  2) being executed on the remote-side and f_leak(b)
> being executed on the local-side. But, filter of foreign-scan on ft1 has both
> of qualifiers. It has to be removed, if a RestrictInfo get pushed-down.

It's intentional that pgsql_fdw keeps pushed-down qualifier in
baserestrictinfo, because I saw some undesirable behavior when I
implemented so that with such optimization when plan is reused, but it's
not clear to me now.  I'll try to recall what I saw...

BTW, I think evaluating pushed-down qualifiers again on local side is
safe and has no semantic problem, though we must pay little for such
overhead.  Do you have concern about performance?

> [Design comment]
> I'm not sure the reason why store_result() uses MessageContext to save
> the Tuplestorestate within PgsqlFdwExecutionState.
> The source code comment says it is used to avoid memory leaks in error
> cases. I also have a similar experience on implementation of my fdw module,
> so, I could understand per-scan context is already cleared at the timing of
> resource-release-callback, thus, handlers to external resource have to be
> saved on separated memory context.
> In my personal opinion, the right design is to declare a memory context for
> pgsql_fdw itself, instead of the abuse of existing memory context.
> (More wise design is to define sub-memory-context for each foreign-scan,
> then, remove the sub-memory-context after release handlers.)

I simply chose built-in context which has enough lifespan, but now I
think that using MessageContext directly is not recommended way.  As you
say, creating new context as child of MessageContext for each scan in
BeginForeignScan (or first IterateForeignScan) would be better.  Please
see attached patch.

One other option is getting rid of tuplestore by holding result rows as
PGresult, and track it for error cases which might happen.
ResourceOwner callback can be used to release PGresult on error, similar
to PGconn.

> [Design comment]
> When "BEGIN" should be issued on the remote-side?
> The connect_pg_server() is an only chance to issue "BEGIN" command
> at the remote-side on connection being opened. However, the connection
> shall be kept unless an error is not raised. Thus, the remote-side will
> continue to work within a single transaction block, even if local-side iterates
> a pair of "BEGIN" and "COMMIT".
> I'd like to suggest to close the transaction block at the timing of either
> end of the scan, transaction or sub-transaction.

Indeed, remote transactions should be terminated at some timing.
Terminating at the end of a scan seems troublesome because a connection
might be shared by multiple scans in a query.  I'd prefer aborting
remote transaction at the end of local query.  Please see
abort_remote_tx in attached patch.

> [Comment to Improve]
> Also, which transaction isolation level should be specified in this case?
> An idea is its isolation level is specified according to the current isolation
> level on the local-side.
> (Of course, it is your choice if it is not supported right now.)

Choosing same as local seems better.  Please see start_remote_tx
function in attached patch.

> [Comment to improve]
> It seems to me the design of exprFunction is not future-proof, if we add
> a new node type that contains two or more function calls, because it can
> return an oid of functions.
> I think, the right design is to handle individual node-types within the
> large switch statement at foreign_expr_walker().
> Of course, it is just my sense.

You mean that exprFunction should have capability to handle multiple
Oids for one node, maybe return List<oid> or something?  IMO it's
overkill at this time.
Though I'm not sure that it's reasonable, but exprInputCollation too
seems to not assume that multiple input collation might be stored in one
node.

> [Comment to improve]
> The pgsql_fdw_handler() allocates FdwRoutine using makeNode(),
> then it set function-pointers on each fields.
> Why don't you return a pointer to statically declared FdwRoutine
> variable being initialized at compile time, like:
>
>    static FdwRoutine pgsql_fdw_handler = {
>        .type               = T_FdwRoutine,
>        .PlanForeignScan    = pgsqlPlanForeignScan,
>        .ExplainForeignScan = pgsqlExplainForeignScan,
>        .BeginForeignScan   = pgsqlBeginForeignScan,
>        .IterateForeignScan = pgsqlIterateForeignScan,
>        .ReScanForeignScan  = pgsqlReScanForeignScan,
>        .EndForeignScan     = pgsqlEndForeignScan,
>    };
>
>    Datum
>    pgsql_fdw_handler(PG_FUNCTION_ARGS)
>    {
>          PG_RETURN_POINTER(&pgsql_fdw_handler);
>    }

Fixed to static variable without designated initializers, because it's
one of C99 feature.  Is there any written policy about using C99
features in PG development?  I've read Developer's FAQ (wiki) and code
formatting section of document, but I couldn't find any mention.

> [Question to implementation]
> At pgsqlIterateForeignScan(), it applies null-check on festate->tuples
> and bool-checks on festete->cursor_opened.
> Do we have a possible scenario that festate->tuples is not null, but
> festate->cursor_opened, or an opposite combination?
> If null-check on festate->tuples is enough to detect the first call of
> the iterate callback, it is not my preference to have redundant flag.

[ checking... ] No such scenario.  It's undesired remain of obsolete
support of simple SELECT mode; pgsql_fdw once had two fetching mode;
simple SELECT mode for small result and CURSOR mode for huge result.
I've removed cursor_opened from PgsqlFdwExecutionState structure.

[Extra change]
In addition to your comments, I found that some regression tests fail
because current planner produces different plan tree from expected
results, and fixed such tests.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/01 3:56), Robert Haas wrote:
> 2012/1/29 Kohei KaiGai<kaigai@kaigai.gr.jp>:
>>      Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT
>> a, b FROM public.t1 WHERE (a>  2)
>>   (3 rows)
> 
> Shouldn't we be using protocol-level cursors rather than SQL-level cursors?

Yes, we should, if we have protocol-level cursor :)
I checked libpq interface but I couldn't find any function for
protocol-level cursor.

>> [Design comment]
>> When "BEGIN" should be issued on the remote-side?
>> The connect_pg_server() is an only chance to issue "BEGIN" command
>> at the remote-side on connection being opened. However, the connection
>> shall be kept unless an error is not raised. Thus, the remote-side will
>> continue to work within a single transaction block, even if local-side iterates
>> a pair of "BEGIN" and "COMMIT".
>> I'd like to suggest to close the transaction block at the timing of either
>> end of the scan, transaction or sub-transaction.
> 
> I suspect this is ultimately going to need to be configurable.  Some
> people might want to close the transaction on the remote side ASAP,
> while other people might want to hold it open until commit.  For a
> first version I think it's most likely best to do whatever seems
> simplest to code, planning to add more options later.

I fixed pgsql_fdw to abort remote transaction at the end of each local
query.  I chose this timing because local query might include multiple
scans on same foreign server.  I think this would be "ASAP" timing in
your comment.

It would be useful to make length of remote transaction same as local's,
I'll try RegisterXactCallback for this purpose, though we need to
preload FDW module to catch BEGIN preceding query using foreign tables.

-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Marko Kreen
Date:
On Tue, Jan 31, 2012 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 2012/1/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>>     Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT
>> a, b FROM public.t1 WHERE (a > 2)
>>  (3 rows)
>
> Shouldn't we be using protocol-level cursors rather than SQL-level cursors?

I think you want this instead:
 https://commitfest.postgresql.org/action/patch_view?id=769

--
marko


Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
2012年2月1日12:15 Shigeru Hanada <shigeru.hanada@gmail.com>:
> (2012/01/30 4:39), Kohei KaiGai wrote:
>> I checked the "fdw_helper_funcs_v3.patch", "pgsql_fdw_v5.patch" and
>> "pgsql_fdw_pushdown_v1.patch". My comments are below.
>
> Thanks for the review!
>
>> [BUG]
>> Even though pgsql_fdw tries to push-down qualifiers being executable
>> on the remove side at the deparseSql(), it does not remove qualifiers
>> being pushed down from the baserel->baserestrictinfo, thus, these
>> qualifiers are eventually executed twice.
>>
>> See the result of this EXPLAIN.
>>    postgres=# EXPLAIN SELECT * FROM ft1 WHERE a>  2 AND f_leak(b);
>>                                                  QUERY PLAN
>>    ------------------------------------------------------------------------------------------------------
>>     Foreign Scan on ft1  (cost=107.43..122.55 rows=410 width=36)
>>       Filter: (f_leak(b) AND (a>  2))
>>       Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT
>> a, b FROM public.t1 WHERE (a>  2)
>>    (3 rows)
>>
>> My expectation is (a>  2) being executed on the remote-side and f_leak(b)
>> being executed on the local-side. But, filter of foreign-scan on ft1 has both
>> of qualifiers. It has to be removed, if a RestrictInfo get pushed-down.
>
> It's intentional that pgsql_fdw keeps pushed-down qualifier in
> baserestrictinfo, because I saw some undesirable behavior when I
> implemented so that with such optimization when plan is reused, but it's
> not clear to me now.  I'll try to recall what I saw...
>
> BTW, I think evaluating pushed-down qualifiers again on local side is
> safe and has no semantic problem, though we must pay little for such
> overhead.  Do you have concern about performance?
>
Yes. In my opinion, one significant benefit of pgsql_fdw is to execute
qualifiers on the distributed nodes; that enables to utilize multiple
CPU resources efficiently.
Duplicate checks are reliable way to keep invisible tuples being filtered
out, indeed. But it shall degrade one competitive characteristics of the
pgsql_fdw.

https://github.com/kaigai/pg_strom/blob/master/plan.c#L693
In my module, qualifiers being executable on device side are detached
from the baserel->baserestrictinfo, and remaining qualifiers are chained
to the list.
The is_device_executable_qual() is equivalent to is_foreign_expr() in
the pgsql_fdw module.

Of course, it is your decision, and I might miss something.

BTW, what is the undesirable behavior on your previous implementation?

>> [Design comment]
>> I'm not sure the reason why store_result() uses MessageContext to save
>> the Tuplestorestate within PgsqlFdwExecutionState.
>> The source code comment says it is used to avoid memory leaks in error
>> cases. I also have a similar experience on implementation of my fdw module,
>> so, I could understand per-scan context is already cleared at the timing of
>> resource-release-callback, thus, handlers to external resource have to be
>> saved on separated memory context.
>> In my personal opinion, the right design is to declare a memory context for
>> pgsql_fdw itself, instead of the abuse of existing memory context.
>> (More wise design is to define sub-memory-context for each foreign-scan,
>> then, remove the sub-memory-context after release handlers.)
>
> I simply chose built-in context which has enough lifespan, but now I
> think that using MessageContext directly is not recommended way.  As you
> say, creating new context as child of MessageContext for each scan in
> BeginForeignScan (or first IterateForeignScan) would be better.  Please
> see attached patch.
>
> One other option is getting rid of tuplestore by holding result rows as
> PGresult, and track it for error cases which might happen.
> ResourceOwner callback can be used to release PGresult on error, similar
> to PGconn.
>
If we could have set of results on per-query memory context (thus,
no need to explicit release on error timing), it is more ideal design.
It it possible to implement based on the libpq APIs?

Please note that per-query memory context is already released on
ResourceOwner callback is launched, so, it is unavailable to implement
if libpq requires to release some resource.

>> [Design comment]
>> When "BEGIN" should be issued on the remote-side?
>> The connect_pg_server() is an only chance to issue "BEGIN" command
>> at the remote-side on connection being opened. However, the connection
>> shall be kept unless an error is not raised. Thus, the remote-side will
>> continue to work within a single transaction block, even if local-side iterates
>> a pair of "BEGIN" and "COMMIT".
>> I'd like to suggest to close the transaction block at the timing of either
>> end of the scan, transaction or sub-transaction.
>
> Indeed, remote transactions should be terminated at some timing.
> Terminating at the end of a scan seems troublesome because a connection
> might be shared by multiple scans in a query.  I'd prefer aborting
> remote transaction at the end of local query.  Please see
> abort_remote_tx in attached patch.
>
It seems to me abort_remote_tx in ReleaseConnection() is reasonable.
However, isn't it needed to have ABORT in GetConnection() at first time?

>> [Comment to Improve]
>> Also, which transaction isolation level should be specified in this case?
>> An idea is its isolation level is specified according to the current isolation
>> level on the local-side.
>> (Of course, it is your choice if it is not supported right now.)
>
> Choosing same as local seems better.  Please see start_remote_tx
> function in attached patch.
>
It seems to me reasonable.

>> [Comment to improve]
>> It seems to me the design of exprFunction is not future-proof, if we add
>> a new node type that contains two or more function calls, because it can
>> return an oid of functions.
>> I think, the right design is to handle individual node-types within the
>> large switch statement at foreign_expr_walker().
>> Of course, it is just my sense.
>
> You mean that exprFunction should have capability to handle multiple
> Oids for one node, maybe return List<oid> or something?  IMO it's
> overkill at this time.
> Though I'm not sure that it's reasonable, but exprInputCollation too
> seems to not assume that multiple input collation might be stored in one
> node.
>
I'm still skeptical on the current logic to determine whether qualifier
is available to push down, or not.

For example, it checks oid of operator and oid of function being
invoked on this operator at OpExpr.
However, the purpose of this check is to ensure same result on
execution of qualifier, thus it restricts qualifiers to be pushed down
built-in database objects.
So, isn't it enough to check whether oid of OpExpr is built-in, or not?
(If oid of operator is built-in, its function is also built-in. Right?)

>> [Comment to improve]
>> The pgsql_fdw_handler() allocates FdwRoutine using makeNode(),
>> then it set function-pointers on each fields.
>> Why don't you return a pointer to statically declared FdwRoutine
>> variable being initialized at compile time, like:
>>
>>    static FdwRoutine pgsql_fdw_handler = {
>>        .type               = T_FdwRoutine,
>>        .PlanForeignScan    = pgsqlPlanForeignScan,
>>        .ExplainForeignScan = pgsqlExplainForeignScan,
>>        .BeginForeignScan   = pgsqlBeginForeignScan,
>>        .IterateForeignScan = pgsqlIterateForeignScan,
>>        .ReScanForeignScan  = pgsqlReScanForeignScan,
>>        .EndForeignScan     = pgsqlEndForeignScan,
>>    };
>>
>>    Datum
>>    pgsql_fdw_handler(PG_FUNCTION_ARGS)
>>    {
>>          PG_RETURN_POINTER(&pgsql_fdw_handler);
>>    }
>
> Fixed to static variable without designated initializers, because it's
> one of C99 feature.  Is there any written policy about using C99
> features in PG development?  I've read Developer's FAQ (wiki) and code
> formatting section of document, but I couldn't find any mention.
>
OK. It was my wrong suggestion.


>> [Question to implementation]
>> At pgsqlIterateForeignScan(), it applies null-check on festate->tuples
>> and bool-checks on festete->cursor_opened.
>> Do we have a possible scenario that festate->tuples is not null, but
>> festate->cursor_opened, or an opposite combination?
>> If null-check on festate->tuples is enough to detect the first call of
>> the iterate callback, it is not my preference to have redundant flag.
>
> [ checking... ] No such scenario.  It's undesired remain of obsolete
> support of simple SELECT mode; pgsql_fdw once had two fetching mode;
> simple SELECT mode for small result and CURSOR mode for huge result.
> I've removed cursor_opened from PgsqlFdwExecutionState structure.
>
OK, I understood the background of this design.

> [Extra change]
> In addition to your comments, I found that some regression tests fail
> because current planner produces different plan tree from expected
> results, and fixed such tests.
>

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
2012/2/2 Marko Kreen <markokr@gmail.com>:
> I think you want this instead:
>
> Â https://commitfest.postgresql.org/action/patch_view?id=769

Somehow I've missed this cool feature.  Thanks for the suggestion!

-- 
Shigeru Hanada



Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
Thanks for the comments!

(2012/02/06 5:08), Kohei KaiGai wrote:
> Yes. In my opinion, one significant benefit of pgsql_fdw is to execute
> qualifiers on the distributed nodes; that enables to utilize multiple
> CPU resources efficiently.
> Duplicate checks are reliable way to keep invisible tuples being filtered
> out, indeed. But it shall degrade one competitive characteristics of the
> pgsql_fdw.
> 
> https://github.com/kaigai/pg_strom/blob/master/plan.c#L693
> In my module, qualifiers being executable on device side are detached
> from the baserel->baserestrictinfo, and remaining qualifiers are chained
> to the list.
> The is_device_executable_qual() is equivalent to is_foreign_expr() in
> the pgsql_fdw module.

Agreed, I too think that pushed-down qualifiers should not be evaluated
on local side again from the viewpoint of performance.

> Of course, it is your decision, and I might miss something.
> 
> BTW, what is the undesirable behavior on your previous implementation?

In early development, maybe during testing PREPARE/EXECUTE or DECALRE, I
saw that iterated execution of foreign scan produce wrong result which
includes rows which are NOT match pushed-down qualifiers.  And, at last,
I could recall what happened at that time.  It was just trivial bug I
made.  Perhaps I've removed pushed-down qualifiers in Path generation
phase, so generated plan node has lost qualifiers permanently.

In short, I'll remove remote qualifiers from baserestrictinfo, like
pg_storm.

>>> [Design comment]
>>> I'm not sure the reason why store_result() uses MessageContext to save
>>> the Tuplestorestate within PgsqlFdwExecutionState.
>>> The source code comment says it is used to avoid memory leaks in error
>>> cases. I also have a similar experience on implementation of my fdw module,
>>> so, I could understand per-scan context is already cleared at the timing of
>>> resource-release-callback, thus, handlers to external resource have to be
>>> saved on separated memory context.
>>> In my personal opinion, the right design is to declare a memory context for
>>> pgsql_fdw itself, instead of the abuse of existing memory context.
>>> (More wise design is to define sub-memory-context for each foreign-scan,
>>> then, remove the sub-memory-context after release handlers.)
>>
>> I simply chose built-in context which has enough lifespan, but now I
>> think that using MessageContext directly is not recommended way.  As you
>> say, creating new context as child of MessageContext for each scan in
>> BeginForeignScan (or first IterateForeignScan) would be better.  Please
>> see attached patch.
>>
>> One other option is getting rid of tuplestore by holding result rows as
>> PGresult, and track it for error cases which might happen.
>> ResourceOwner callback can be used to release PGresult on error, similar
>> to PGconn.
>>
> If we could have set of results on per-query memory context (thus,
> no need to explicit release on error timing), it is more ideal design.
> It it possible to implement based on the libpq APIs?

Currently no, so I used tuplestore even though it needs coping results.
However, Kyotaro Horiguchi's patch might make it possible. I'm reading
his patch to determine whether it suits pgsql_fdw.

http://archives.postgresql.org/message-id/20120202143057.GA12434@gmail.com

> Please note that per-query memory context is already released on
> ResourceOwner callback is launched, so, it is unavailable to implement
> if libpq requires to release some resource.

I see.  We need to use context which has longer lifetime if we want to
track malloc'ed PQresult.  I already use CacheContext for connection
pooling, so linking PGreslts to its source connection would be a solutions.

>>> [Design comment]
>>> When "BEGIN" should be issued on the remote-side?
>>> The connect_pg_server() is an only chance to issue "BEGIN" command
>>> at the remote-side on connection being opened. However, the connection
>>> shall be kept unless an error is not raised. Thus, the remote-side will
>>> continue to work within a single transaction block, even if local-side iterates
>>> a pair of "BEGIN" and "COMMIT".
>>> I'd like to suggest to close the transaction block at the timing of either
>>> end of the scan, transaction or sub-transaction.
>>
>> Indeed, remote transactions should be terminated at some timing.
>> Terminating at the end of a scan seems troublesome because a connection
>> might be shared by multiple scans in a query.  I'd prefer aborting
>> remote transaction at the end of local query.  Please see
>> abort_remote_tx in attached patch.
>>
> It seems to me abort_remote_tx in ReleaseConnection() is reasonable.
> However, isn't it needed to have ABORT in GetConnection() at first time?

Hm, forcing overhead of aborting transaction to all local queries is
unreasonable.  Redundant BEGIN doesn't cause error but just generate
WARNING, so I'll remove abort_remote_tx preceding begin_remote_tx.

>>> [Comment to improve]
>>> It seems to me the design of exprFunction is not future-proof, if we add
>>> a new node type that contains two or more function calls, because it can
>>> return an oid of functions.
>>> I think, the right design is to handle individual node-types within the
>>> large switch statement at foreign_expr_walker().
>>> Of course, it is just my sense.
>>
>> You mean that exprFunction should have capability to handle multiple
>> Oids for one node, maybe return List<oid>  or something?  IMO it's
>> overkill at this time.
>> Though I'm not sure that it's reasonable, but exprInputCollation too
>> seems to not assume that multiple input collation might be stored in one
>> node.
>>
> I'm still skeptical on the current logic to determine whether qualifier
> is available to push down, or not.
> 
> For example, it checks oid of operator and oid of function being
> invoked on this operator at OpExpr.
> However, the purpose of this check is to ensure same result on
> execution of qualifier, thus it restricts qualifiers to be pushed down
> built-in database objects.
> So, isn't it enough to check whether oid of OpExpr is built-in, or not?
> (If oid of operator is built-in, its function is also built-in. Right?)

Basically yes.  If a built-in operator has different semantics on remote
side (probably pg_operator catalog has been changed manually by
superuser), it can be said that nothing is reliable in the context of
pgsql_fdw.  I added checking function's oid of operators just in case,
and now it seems to me paranoid concern...

I'll post revised patches soon.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/06 17:37), Shigeru Hanada wrote:
> I'll post revised patches soon.

Attached revised patches.  Changes from last version are below.

[fdw_helper_v3.patch]
none

[pgsql_fdw_v7.patch]
* Don't abort remote transaction before starting remote transaction.
* Add _PQ_init/_PQ_fini functions
* Fix some typos in comments.
* Revise transaction management section in document.
* Update EXPLAIN sample in document.

[pgsql_fdw_]
* Avoid redundant evaluation of pushed-down quals.
* Fix some typos in comments.
* Update EXPLAIN sample in document.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/02 18:24), Marko Kreen wrote:
> I think you want this instead:
> 
>    https://commitfest.postgresql.org/action/patch_view?id=769

With modified version of pgsql_fdw which uses row processor to retrieve
result tuples, I found significant performance gain on simple read-only
pgbench, though scale factor was very small (-s 3).  Executed command
was "pgbench -S -n -c 5 T 30".

Average tps (excluding connections establishing) of 3 times measurements
are:

pgsql_fdw with SQL-cursor    :  622
pgsql_fdw with Row Processor : 1219 - 2.0x faster than SQL-cursor
w/o pgsql_fdw(direct access) : 7719 - 6.3x faster than Row Processor

Row processor was almost 2x faster than SQL-cursor!  I'm looking forward
to this feature.

In addition to performance gain, of course memory usage was kept at very
low level. :)

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/08 20:51), Shigeru Hanada wrote:
> Attached revised patches.  Changes from last version are below.
<snip>

I've found and fixed a bug which generates wrong remote query when any
column of a foreign table has been dropped.  Also regression test for
this case is added.

I attached only pgsql_fdw_v8.patch, because pgsql_fdw_pushdown_v3.patch
in last post still can be applied onto v8 patch.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/02/10 20:39), Shigeru Hanada wrote:
> (2012/02/08 20:51), Shigeru Hanada wrote:
>> Attached revised patches.  Changes from last version are below.
> <snip>
>
> I've found and fixed a bug which generates wrong remote query when any
> column of a foreign table has been dropped.  Also regression test for
> this case is added.
>
> I attached only pgsql_fdw_v8.patch, because pgsql_fdw_pushdown_v3.patch
> in last post still can be applied onto v8 patch.
>
> Regards,

The patches have been applied, but role-related regression tests failed
in my environment.  I fixed it in a similar fashion of
/src/test/regress/sql/foreign_data.sql.  Please find attached a updated
patch for the regression tests.

BTW, What do you think about this?

http://archives.postgresql.org/pgsql-hackers/2012-01/msg00229.php

Best regards,
Etsuro Fujita

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/13 20:50), Etsuro Fujita wrote:
> The patches have been applied, but role-related regression tests failed
> in my environment.  I fixed it in a similar fashion of
> /src/test/regress/sql/foreign_data.sql.  Please find attached a updated
> patch for the regression tests.

Good catch, thanks.  I'll revise pgsql_fdw tests little more.

> BTW, What do you think about this?
> 
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg00229.php

I'm sorry that I've left the thread unfinished...  I've given up to
propose Join-push-down of foreign tables for 9.2, because it will take a
while to achieve general semantics mapping for join push-down and WHERE
clause push-down.  For 9.2, I'm proposing pgsql_fdw which has WHERE
clause push-down for built-in elements which are free from collation.
I'd like to go back to that item after 9.2 development enters beta or
RC, hopefully :)

-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/02/14 15:15), Shigeru Hanada wrote:
> (2012/02/13 20:50), Etsuro Fujita wrote:
>> The patches have been applied, but role-related regression tests failed
>> in my environment.  I fixed it in a similar fashion of
>> /src/test/regress/sql/foreign_data.sql.  Please find attached a updated
>> patch for the regression tests.
> 
> Good catch, thanks.  I'll revise pgsql_fdw tests little more.
> 
>> BTW, What do you think about this?
>>
>> http://archives.postgresql.org/pgsql-hackers/2012-01/msg00229.php
> 
> I'm sorry that I've left the thread unfinished...  I've given up to
> propose Join-push-down of foreign tables for 9.2, because it will take a
> while to achieve general semantics mapping for join push-down and WHERE
> clause push-down.  For 9.2, I'm proposing pgsql_fdw which has WHERE
> clause push-down for built-in elements which are free from collation.
> I'd like to go back to that item after 9.2 development enters beta or
> RC, hopefully :)

OK.  But my question was about the PlanForeignScan API.  As discussed at
that thread, it would have to change the PlanForeignScan API to let the
FDW generate multiple paths and dump them all to add_path instead of
returning a FdwPlan struct.  With this change, I think it would also
have to add a new FDW API that is called from create_foreignscan_plan()
and lets the FDW generate foreignscan plan for the base relation scanned
by the best path choosed by postgres optimizer for itself.  What do you
think about it?

Best regards,
Etsuro Fujita



Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/14 15:15), Shigeru Hanada wrote:
> Good catch, thanks.  I'll revise pgsql_fdw tests little more.

Here are the updated patches.  In addition to Fujita-san's comment, I
moved DROP OPERATOR statements to clean up section of test script.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/14 17:40), Etsuro Fujita wrote:
> OK.  But my question was about the PlanForeignScan API.

Sorry for misunderstanding. :(

>                                                          As discussed at
> that thread, it would have to change the PlanForeignScan API to let the
> FDW generate multiple paths and dump them all to add_path instead of
> returning a FdwPlan struct.  With this change, I think it would also
> have to add a new FDW API that is called from create_foreignscan_plan()
> and lets the FDW generate foreignscan plan for the base relation scanned
> by the best path choosed by postgres optimizer for itself.  What do you
> think about it?

Though I have only random thoughts about this issue at the moment...

Multiple valuable Paths for a scan of a foreign table by FDW, but
changing PlanForeignScan to return list of FdwPlan in 9.2 seems too
hasty.  It would need more consideration about general interface for
possible results such as:

* Full output (no WHERE push-down) is expensive on both remote and transfer.
* Filtered output (WHERE push-down) has cheap total costs when only few
rows come through the filter.
* Ordered output (ORDER BY push-down) is expensive on remote, but has
chance to omit upper Sort node.
* Aggregated output (GROUP BY push-down) is expensive on remote, but
have chance to omit upper Agg node, and reduces data transfer.
* Limited output (LIMIT/OFFSET push-down) can reduce data transfer, and
have chance to omit upper Limit node.

Currently FDWs can consider only first two, AFAIK.  If FDW generates
multiple FdwPlan (Full and Filtered) and sets different start-up costs
and total costs to them (may be former has higher start-up and lower
total than latter), planner would choose better for the whole plan.

In addition to changing FdwRoutine, it seems worth changing FdwPlan too
so that FDWs can return more information to planner, such as pathkeys
and rows, for each possible path.

In short, I have some ideas to enhance foreign table scans, but IMO they
are half-baked and we don't have enough time to achieve them for 9.2.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/02/14 19:42), Shigeru Hanada wrote:
> (2012/02/14 17:40), Etsuro Fujita wrote:
>>                                                           As discussed at
>> that thread, it would have to change the PlanForeignScan API to let the
>> FDW generate multiple paths and dump them all to add_path instead of
>> returning a FdwPlan struct.  With this change, I think it would also
>> have to add a new FDW API that is called from create_foreignscan_plan()
>> and lets the FDW generate foreignscan plan for the base relation scanned
>> by the best path choosed by postgres optimizer for itself.  What do you
>> think about it?

> In short, I have some ideas to enhance foreign table scans, but IMO they
> are half-baked and we don't have enough time to achieve them for 9.2.

OK.  Thank you for your answer.

Best regards,
Etsuro Fujita


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> (2012/02/14 17:40), Etsuro Fujita wrote:
>> As discussed at
>> that thread, it would have to change the PlanForeignScan API to let the
>> FDW generate multiple paths and dump them all to add_path instead of
>> returning a FdwPlan struct.

> Multiple valuable Paths for a scan of a foreign table by FDW, but
> changing PlanForeignScan to return list of FdwPlan in 9.2 seems too
> hasty.

I would really like to see that happen in 9.2, because the longer we let
that mistake live, the harder it will be to change.  More and more FDWs
are getting written.  I don't think it's that hard to do: we just have
to agree that PlanForeignScan should return void and call add_path for
itself, possibly more than once.  If we do that, I'm inclined to think
we cou;d get rid of the separate Node type FdwPlan, and just incorporate
"List *fdw_private" into ForeignPath and ForeignScan.

This does mean that FDWs will be a bit more tightly coupled to the
planner, because they'll have to change whenever we add new fields to
struct Path; but that is not really something that happens often.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/02/14 23:50), Tom Lane wrote:
> Shigeru Hanada<shigeru.hanada@gmail.com>  writes:
>> (2012/02/14 17:40), Etsuro Fujita wrote:
>>> As discussed at
>>> that thread, it would have to change the PlanForeignScan API to let the
>>> FDW generate multiple paths and dump them all to add_path instead of
>>> returning a FdwPlan struct.
>
>> Multiple valuable Paths for a scan of a foreign table by FDW, but
>> changing PlanForeignScan to return list of FdwPlan in 9.2 seems too
>> hasty.
>
> I would really like to see that happen in 9.2, because the longer we let
> that mistake live, the harder it will be to change.  More and more FDWs
> are getting written.  I don't think it's that hard to do: we just have
> to agree that PlanForeignScan should return void and call add_path for
> itself, possibly more than once.

Agreed.  I fixed the PlanForeignScan API.  Please find attached a patch.

> If we do that, I'm inclined to think
> we cou;d get rid of the separate Node type FdwPlan, and just incorporate
> "List *fdw_private" into ForeignPath and ForeignScan.

+1  While the patch retains the struct FdwPlan, I would like to get rid
of it at next version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
Harada-san,

I checked the v9 patch, however, it still has some uncertain implementation.

[memory context of tuple store]
It calls tuplestore_begin_heap() under the memory context of
festate->scan_cxt at pgsqlBeginForeignScan.
On the other hand, tuplestore_gettupleslot() is called under the
memory context of festate->tuples.
I could not find a callback functions being invoked on errors,
so I doubt the memory objects acquired within tuplestore_begin_heap()
shall be leaked, even though it is my suggestion to create a sub-context
under the existing one.

In my opinion, it is a good choice to use es_query_cxt of the supplied EState.
What does prevent to apply this per-query memory context?

You mention about PGresult being malloc()'ed. However, it seems to me
fetch_result() and store_result() once copy the contents on malloc()'ed
area to the palloc()'ed area, and PQresult is released on an error using
PG_TRY() ... PG_CATCH() block.

[Minor comments]
Please set NULL to "sql" variable at begin_remote_tx().
Compiler raises a warnning due to references of uninitialized variable,
even though the code path never run.

It potentially causes a problem in case when fetch_result() raises an
error because of unexpected status (!= PGRES_TUPLES_OK).
One code path is not protected with PG_TRY(), and other code path
will call PQclear towards already released PQresult.

Although it is just a preference of mine, is the exprFunction necessary?
It seems to me, the point of push-down check is whether the supplied
node is built-in object, or not. So, an sufficient check is is_builtin() onto
FuncExpr->funcid, OpExpr->opno, ScalarArrayOpExpr->opno and so on.
It does not depend on whether the function implementing these nodes
are built-in or not.

Thanks,

2012年2月14日9:09 Shigeru Hanada <shigeru.hanada@gmail.com>:
> (2012/02/14 15:15), Shigeru Hanada wrote:
>> Good catch, thanks.  I'll revise pgsql_fdw tests little more.
>
> Here are the updated patches.  In addition to Fujita-san's comment, I
> moved DROP OPERATOR statements to clean up section of test script.
>
> Regards,
> --
> Shigeru Hanada



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
Kaigai-san,

Thanks for the review.  Attached patches are revised version, though
only fdw_helper_v5.patch is unchanged.

(2012/02/16 0:09), Kohei KaiGai wrote:
> [memory context of tuple store]
> It calls tuplestore_begin_heap() under the memory context of
> festate->scan_cxt at pgsqlBeginForeignScan.

Yes, it's because tuplestore uses a context which was current when
tuplestore_begin_heap was called.  I want to use per-scan context for
tuplestore, to keep its content tuples alive through the scan.

> On the other hand, tuplestore_gettupleslot() is called under the
> memory context of festate->tuples.

Yes, result tuples to be returned to executor should be allocated in
per-scan context and live until next IterateForeignScan (or
EndForeignScan),  because such tuple will be released via ExecClearTuple
in next IterateForeignScan call.  If we don't switch context to per-scan
context, result tuple is allocated in per-tuple context and cause
double-free and server crash.

> I could not find a callback functions being invoked on errors,
> so I doubt the memory objects acquired within tuplestore_begin_heap()
> shall be leaked, even though it is my suggestion to create a sub-context
> under the existing one.

How do you confirmed that no callback function is invoked on errors?  I
think that memory objects acquired within tuplestore_begin_heap (I guess
you mean excluding stored tuples, right?) are released during cleanup of
aborted transaction.  I tested that by adding elog(ERROR) to the tail of
store_result() for intentional error, and execute large query 100 times
in a session.  I saw VIRT value (via top command) comes down to constant
level after every query.

> In my opinion, it is a good choice to use es_query_cxt of the supplied EState.
> What does prevent to apply this per-query memory context?

Ah, I've confused context management of pgsql_fdw...  I fixed pgsql_fdw
to create per-scan context as a child of es_query_cxt in
BeginForeignScan, and use it for tuplestore of the scan.  So, tuplestore
and its contents are released correctly at EndForeignScan, or cleanup of
aborted transaction in error case.

> You mention about PGresult being malloc()'ed. However, it seems to me
> fetch_result() and store_result() once copy the contents on malloc()'ed
> area to the palloc()'ed area, and PQresult is released on an error using
> PG_TRY() ... PG_CATCH() block.

During thinking about this comment, I found double-free bug of PGresult
in execute_query, thanks :)

But, sorry, I'm not sure what the concern you show here is.  The reason
why copying  tuples from malloc'ed area to palloc'ed area is to release
PGresult before returning from the IterateForeingScan call.  The reason
why using PG_TRY block is to sure that PGresult is released before jump
back to upstream in error case.

> [Minor comments]
> Please set NULL to "sql" variable at begin_remote_tx().
> Compiler raises a warnning due to references of uninitialized variable,
> even though the code path never run.

Fixed.  BTW, just out of curiosity, which compiler do you use?  My
compiler ,gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10) on Fedora 15,
doesn't warn it.

> It potentially causes a problem in case when fetch_result() raises an
> error because of unexpected status (!= PGRES_TUPLES_OK).
> One code path is not protected with PG_TRY(), and other code path
> will call PQclear towards already released PQresult.

Fixed.

> Although it is just a preference of mine, is the exprFunction necessary?
> It seems to me, the point of push-down check is whether the supplied
> node is built-in object, or not. So, an sufficient check is is_builtin() onto
> FuncExpr->funcid, OpExpr->opno, ScalarArrayOpExpr->opno and so on.
> It does not depend on whether the function implementing these nodes
> are built-in or not.

Got rid of exprFunction and fixed foreign_expr_walker to check function
oid in each case label.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru Hanada wrote:
> Thanks for the review.  Attached patches are revised version, though
> only fdw_helper_v5.patch is unchanged.

Two questions:
- Is it on purpose that you can specify all SSL client options except "sslcompression"?
- Since a rescan is done by rewinding the cursor, is it necessary to have any other remote isolation level than READ
COMMITED?There is only one query issued per transaction. 

Yours,
Laurenz Albe



Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
I found a strange behavior with v10. Is it available to reproduce?

In case of "ftbl" is declared as follows: postgres=# select * FROM ftbl;  a |  b ---+-----  1 | aaa  2 | bbb  3 | ccc
4| ddd  5 | eee (5 rows)
 

I tried to raise an error on remote side.
 postgres=# select * FROM ftbl WHERE 100 / (a - 3) > 0; The connection to the server was lost. Attempting reset:
Failed.The connection to the server was lost. Attempting reset: Failed. !> \q
 

Its call-trace was:

(gdb) bt
#0  0x00000031030810a4 in free () from /lib64/libc.so.6
#1  0x00007f2caa620bd9 in PQclear (res=0x2102500) at fe-exec.c:679
#2  0x00007f2caa83c4db in execute_query (node=0x20f20a0) at pgsql_fdw.c:722
#3  0x00007f2caa83c64a in pgsqlIterateForeignScan (node=0x20f20a0)   at pgsql_fdw.c:402
#4  0x00000000005c120f in ForeignNext (node=0x20f20a0) at nodeForeignscan.c:50
#5  0x00000000005a9b37 in ExecScanFetch (recheckMtd=0x5c11c0 <ForeignRecheck>,   accessMtd=0x5c11d0 <ForeignNext>,
node=0x20f20a0)at execScan.c:82
 
#6  ExecScan (node=0x20f20a0, accessMtd=0x5c11d0 <ForeignNext>,   recheckMtd=0x5c11c0 <ForeignRecheck>) at
execScan.c:132
#7  0x00000000005a2128 in ExecProcNode (node=0x20f20a0) at execProcnode.c:441
#8  0x000000000059edc2 in ExecutePlan (dest=0x210f280,   direction=<optimized out>, numberTuples=0, sendTuples=1
'\001',  operation=CMD_SELECT, planstate=0x20f20a0, estate=0x20f1f88)   at execMain.c:1449
 

This is the PG_CATCH block at execute_query(). fetch_result() raises
an error, then it shall be catched to release PGresult.
Although "res" should be NULL at this point, PQclear was called with
a non-zero value according to the call trace.

More strangely, I tried to inject elog(INFO, ...) to show the value of "res"
at this point. Then, it become unavailable to reproduce when I tried to
show the pointer of "res" with elog(INFO, "&res = %p", &res);

Why the "res" has a non-zero value, even though it was cleared prior
to fetch_result() and an error was raised within this function?

Thanks,

2012年2月16日13:41 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Kaigai-san,
>
> Thanks for the review.  Attached patches are revised version, though
> only fdw_helper_v5.patch is unchanged.
>
> (2012/02/16 0:09), Kohei KaiGai wrote:
>> [memory context of tuple store]
>> It calls tuplestore_begin_heap() under the memory context of
>> festate->scan_cxt at pgsqlBeginForeignScan.
>
> Yes, it's because tuplestore uses a context which was current when
> tuplestore_begin_heap was called.  I want to use per-scan context for
> tuplestore, to keep its content tuples alive through the scan.
>
>> On the other hand, tuplestore_gettupleslot() is called under the
>> memory context of festate->tuples.
>
> Yes, result tuples to be returned to executor should be allocated in
> per-scan context and live until next IterateForeignScan (or
> EndForeignScan),  because such tuple will be released via ExecClearTuple
> in next IterateForeignScan call.  If we don't switch context to per-scan
> context, result tuple is allocated in per-tuple context and cause
> double-free and server crash.
>
>> I could not find a callback functions being invoked on errors,
>> so I doubt the memory objects acquired within tuplestore_begin_heap()
>> shall be leaked, even though it is my suggestion to create a sub-context
>> under the existing one.
>
> How do you confirmed that no callback function is invoked on errors?  I
> think that memory objects acquired within tuplestore_begin_heap (I guess
> you mean excluding stored tuples, right?) are released during cleanup of
> aborted transaction.  I tested that by adding elog(ERROR) to the tail of
> store_result() for intentional error, and execute large query 100 times
> in a session.  I saw VIRT value (via top command) comes down to constant
> level after every query.
>
>> In my opinion, it is a good choice to use es_query_cxt of the supplied EState.
>> What does prevent to apply this per-query memory context?
>
> Ah, I've confused context management of pgsql_fdw...  I fixed pgsql_fdw
> to create per-scan context as a child of es_query_cxt in
> BeginForeignScan, and use it for tuplestore of the scan.  So, tuplestore
> and its contents are released correctly at EndForeignScan, or cleanup of
> aborted transaction in error case.
>
>> You mention about PGresult being malloc()'ed. However, it seems to me
>> fetch_result() and store_result() once copy the contents on malloc()'ed
>> area to the palloc()'ed area, and PQresult is released on an error using
>> PG_TRY() ... PG_CATCH() block.
>
> During thinking about this comment, I found double-free bug of PGresult
> in execute_query, thanks :)
>
> But, sorry, I'm not sure what the concern you show here is.  The reason
> why copying  tuples from malloc'ed area to palloc'ed area is to release
> PGresult before returning from the IterateForeingScan call.  The reason
> why using PG_TRY block is to sure that PGresult is released before jump
> back to upstream in error case.
>
>> [Minor comments]
>> Please set NULL to "sql" variable at begin_remote_tx().
>> Compiler raises a warnning due to references of uninitialized variable,
>> even though the code path never run.
>
> Fixed.  BTW, just out of curiosity, which compiler do you use?  My
> compiler ,gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10) on Fedora 15,
> doesn't warn it.
>
>> It potentially causes a problem in case when fetch_result() raises an
>> error because of unexpected status (!= PGRES_TUPLES_OK).
>> One code path is not protected with PG_TRY(), and other code path
>> will call PQclear towards already released PQresult.
>
> Fixed.
>
>> Although it is just a preference of mine, is the exprFunction necessary?
>> It seems to me, the point of push-down check is whether the supplied
>> node is built-in object, or not. So, an sufficient check is is_builtin() onto
>> FuncExpr->funcid, OpExpr->opno, ScalarArrayOpExpr->opno and so on.
>> It does not depend on whether the function implementing these nodes
>> are built-in or not.
>
> Got rid of exprFunction and fixed foreign_expr_walker to check function
> oid in each case label.
>
> Regards,
> --
> Shigeru Hanada



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
2012年2月16日13:41 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Kaigai-san,
>
> Thanks for the review.  Attached patches are revised version, though
> only fdw_helper_v5.patch is unchanged.
>
> (2012/02/16 0:09), Kohei KaiGai wrote:
>> [memory context of tuple store]
>> It calls tuplestore_begin_heap() under the memory context of
>> festate->scan_cxt at pgsqlBeginForeignScan.
>
> Yes, it's because tuplestore uses a context which was current when
> tuplestore_begin_heap was called.  I want to use per-scan context for
> tuplestore, to keep its content tuples alive through the scan.
>
>> On the other hand, tuplestore_gettupleslot() is called under the
>> memory context of festate->tuples.
>
> Yes, result tuples to be returned to executor should be allocated in
> per-scan context and live until next IterateForeignScan (or
> EndForeignScan),  because such tuple will be released via ExecClearTuple
> in next IterateForeignScan call.  If we don't switch context to per-scan
> context, result tuple is allocated in per-tuple context and cause
> double-free and server crash.
>
>> I could not find a callback functions being invoked on errors,
>> so I doubt the memory objects acquired within tuplestore_begin_heap()
>> shall be leaked, even though it is my suggestion to create a sub-context
>> under the existing one.
>
> How do you confirmed that no callback function is invoked on errors?  I
> think that memory objects acquired within tuplestore_begin_heap (I guess
> you mean excluding stored tuples, right?) are released during cleanup of
> aborted transaction.  I tested that by adding elog(ERROR) to the tail of
> store_result() for intentional error, and execute large query 100 times
> in a session.  I saw VIRT value (via top command) comes down to constant
> level after every query.
>
Oops, I overlooked the point where MessageContext and its children get
reset. However, as its name, I don't believe it was right usage of memory
context.
As the latest version doing, es_query_cxt is the right way to acquire
memory object with per-query duration.

>> In my opinion, it is a good choice to use es_query_cxt of the supplied EState.
>> What does prevent to apply this per-query memory context?
>
> Ah, I've confused context management of pgsql_fdw...  I fixed pgsql_fdw
> to create per-scan context as a child of es_query_cxt in
> BeginForeignScan, and use it for tuplestore of the scan.  So, tuplestore
> and its contents are released correctly at EndForeignScan, or cleanup of
> aborted transaction in error case.
>
I believe it is right direction.

>> You mention about PGresult being malloc()'ed. However, it seems to me
>> fetch_result() and store_result() once copy the contents on malloc()'ed
>> area to the palloc()'ed area, and PQresult is released on an error using
>> PG_TRY() ... PG_CATCH() block.
>
> During thinking about this comment, I found double-free bug of PGresult
> in execute_query, thanks :)
>
Unfortunately, I found the strange behavior around this code.
I doubt an interaction between longjmp and compiler optimization,
but it is not certain right now.

I'd like to push this patch to committer reviews after this problem got closed.

Right now, I don't have comments on this patch any more.

> But, sorry, I'm not sure what the concern you show here is.  The reason
> why copying  tuples from malloc'ed area to palloc'ed area is to release
> PGresult before returning from the IterateForeingScan call.  The reason
> why using PG_TRY block is to sure that PGresult is released before jump
> back to upstream in error case.
>
>> [Minor comments]
>> Please set NULL to "sql" variable at begin_remote_tx().
>> Compiler raises a warnning due to references of uninitialized variable,
>> even though the code path never run.
>
> Fixed.  BTW, just out of curiosity, which compiler do you use?  My
> compiler ,gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10) on Fedora 15,
> doesn't warn it.
>
I uses Fedora 16, and GCC 4.6.2.

[kaigai@iwashi pgsql_fdw]$ gcc --version
gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1)

It is not a matter related to compiler version, but common manner in
PostgreSQL code. You can likely found source code comments
like "/* keep compiler quiet */"

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/17 0:15), Albe Laurenz wrote:
> Shigeru Hanada wrote:
>> Thanks for the review.  Attached patches are revised version, though
>> only fdw_helper_v5.patch is unchanged.
> 
> Two questions:
> - Is it on purpose that you can specify all SSL client options
>    except "sslcompression"?

No, just an oversight.  Good catch.

> - Since a rescan is done by rewinding the cursor, is it necessary
>    to have any other remote isolation level than READ COMMITED?
>    There is only one query issued per transaction.

If multiple foreign tables on a foreign server is used in a local query,
multiple queries are executed in a remote transaction.  So IMO isolation
levels are useful even if remote query is executed only once.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/17 2:02), Kohei KaiGai wrote:
> I found a strange behavior with v10. Is it available to reproduce?
<snip>
> I tried to raise an error on remote side.
>
>    postgres=# select * FROM ftbl WHERE 100 / (a - 3)>  0;
>    The connection to the server was lost. Attempting reset: Failed.
>    The connection to the server was lost. Attempting reset: Failed.
>    !>  \q

I could reproduce the error by omitting CFLAGS=-O0 from configure
option.  I usually this for coding environment so that gdb debugging
works correctly, so I haven't noticed this issue.  I should test
optimized environment too...

Expected result in that case is:

postgres=# select * from pgbench_accounts where 100 / (aid - 3) > 0;
ERROR:  could not fetch rows from foreign server
DETAIL:  ERROR:  division by zero

HINT:  FETCH 10000 FROM pgsql_fdw_cursor_0
postgres=#

> This is the PG_CATCH block at execute_query(). fetch_result() raises
> an error, then it shall be catched to release PGresult.
> Although "res" should be NULL at this point, PQclear was called with
> a non-zero value according to the call trace.
>
> More strangely, I tried to inject elog(INFO, ...) to show the value of "res"
> at this point. Then, it become unavailable to reproduce when I tried to
> show the pointer of "res" with elog(INFO, "&res = %p",&res);
>
> Why the "res" has a non-zero value, even though it was cleared prior
> to fetch_result() and an error was raised within this function?

I've found the the problem is uninitialized PGresult variables.
Uninitialized PGresult pointer is used in some places, so its value is
garbage in PG_CATCH block when assignment code has been interrupted by
longjmp.

Probably recommended style would be like this:

<pseudo_code>
    PGresult *res = NULL;    /* must be NULL in PG_CATCH */

    PG_TRY();
    {
        res = func_might_throw_exception();
        if (PQstatus(res) != PGRES_xxx_OK)
        {
            /* error handling, pass message to caller */
            ereport(ERROR, ...);
        }

        /* success case, use result of query and release it */
        ...
        PQclear(res);
    }
    PG_CATCH();
    {
        PQclear(res);
        PG_RE_THROW();
        /* caller should catch this exception. */
    }
</pseudo_code>

I misunderstood that PGresult pointer always has valid value after that
line, because I had wrote assignment of PGresult pointer before PG_TRY
block.  Fixes for this issue are:

(1) Initialize PGresult pointer with NULL, if it is used in PG_CATCH.
(2) Move PGresult assignment into PG_TRY block so that we can get
compiler warning of uninitialized  variable, just in case.

Please find attached a patch including fixes for this issue.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru Hanada wrote:
>> - Since a rescan is done by rewinding the cursor, is it necessary
>>    to have any other remote isolation level than READ COMMITED?
>>    There is only one query issued per transaction.
>
> If multiple foreign tables on a foreign server is used in a local
query,
> multiple queries are executed in a remote transaction.  So IMO
isolation
> levels are useful even if remote query is executed only once.

Oh, I see. You are right.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Kohei KaiGai
Date:
2012年2月17日6:08 Shigeru Hanada <shigeru.hanada@gmail.com>:
> (2012/02/17 2:02), Kohei KaiGai wrote:
>> I found a strange behavior with v10. Is it available to reproduce?
> <snip>
>> I tried to raise an error on remote side.
>>
>>    postgres=# select * FROM ftbl WHERE 100 / (a - 3)>  0;
>>    The connection to the server was lost. Attempting reset: Failed.
>>    The connection to the server was lost. Attempting reset: Failed.
>>    !>  \q
>
> I could reproduce the error by omitting CFLAGS=-O0 from configure
> option.  I usually this for coding environment so that gdb debugging
> works correctly, so I haven't noticed this issue.  I should test
> optimized environment too...
>
> Expected result in that case is:
>
> postgres=# select * from pgbench_accounts where 100 / (aid - 3) > 0;
> ERROR:  could not fetch rows from foreign server
> DETAIL:  ERROR:  division by zero
>
> HINT:  FETCH 10000 FROM pgsql_fdw_cursor_0
> postgres=#
>
>> This is the PG_CATCH block at execute_query(). fetch_result() raises
>> an error, then it shall be catched to release PGresult.
>> Although "res" should be NULL at this point, PQclear was called with
>> a non-zero value according to the call trace.
>>
>> More strangely, I tried to inject elog(INFO, ...) to show the value of "res"
>> at this point. Then, it become unavailable to reproduce when I tried to
>> show the pointer of "res" with elog(INFO, "&res = %p",&res);
>>
>> Why the "res" has a non-zero value, even though it was cleared prior
>> to fetch_result() and an error was raised within this function?
>
> I've found the the problem is uninitialized PGresult variables.
> Uninitialized PGresult pointer is used in some places, so its value is
> garbage in PG_CATCH block when assignment code has been interrupted by
> longjmp.
>
> Probably recommended style would be like this:
>
> <pseudo_code>
>    PGresult *res = NULL;    /* must be NULL in PG_CATCH */
>
>    PG_TRY();
>    {
>        res = func_might_throw_exception();
>        if (PQstatus(res) != PGRES_xxx_OK)
>        {
>            /* error handling, pass message to caller */
>            ereport(ERROR, ...);
>        }
>
>        /* success case, use result of query and release it */
>        ...
>        PQclear(res);
>    }
>    PG_CATCH();
>    {
>        PQclear(res);
>        PG_RE_THROW();
>        /* caller should catch this exception. */
>    }
> </pseudo_code>
>
> I misunderstood that PGresult pointer always has valid value after that
> line, because I had wrote assignment of PGresult pointer before PG_TRY
> block.  Fixes for this issue are:
>
> (1) Initialize PGresult pointer with NULL, if it is used in PG_CATCH.
> (2) Move PGresult assignment into PG_TRY block so that we can get
> compiler warning of uninitialized  variable, just in case.
>
> Please find attached a patch including fixes for this issue.
>
I marked this patch as "Ready for Committer", since I have nothing to
comment any more.

I'd like committer help to review this patch and it get merged.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
I wrote:
> Shigeru Hanada wrote:
>>> - Since a rescan is done by rewinding the cursor, is it necessary
>>>    to have any other remote isolation level than READ COMMITED?
>>>    There is only one query issued per transaction.
>>
>> If multiple foreign tables on a foreign server is used in a local
query,
>> multiple queries are executed in a remote transaction.  So IMO
isolation
>> levels are useful even if remote query is executed only once.
>
> Oh, I see. You are right.

I thought some more about this and changed my mind.

If your query involves foreign scans on two foreign tables on the same
foreign server, these should always see the same snapshot, because
that's how it works with two scans in one query on local tables.

So I think it should be REPEATABLE READ in all cases - SERIALIZABLE
is not necessary as long as all you do is read.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
> If your query involves foreign scans on two foreign tables on the
> same foreign server, these should always see the same snapshot,
> because that's how it works with two scans in one query on local
> tables.
That makes sense.
> So I think it should be REPEATABLE READ in all cases -
> SERIALIZABLE is not necessary as long as all you do is read.
That depends on whether you only want to see states of the database
which are consistent with later states of the database and any
invariants enforced by triggers or other software.  See this example
of how a read-only transaction can see a bogus state at REPEATABLE
READ or less strict transaction isolation:
http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions
Perhaps if the transaction using the pgsql_fdw is running at the
SERIALIZABLE transaction isolation level, it should run the queries
at the that level, otherwise at REPEATABLE READ.
-Kevin


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
> > If your query involves foreign scans on two foreign tables on the
> > same foreign server, these should always see the same snapshot,
> > because that's how it works with two scans in one query on local
> > tables.
>
> That makes sense.

> > So I think it should be REPEATABLE READ in all cases -
> > SERIALIZABLE is not necessary as long as all you do is read.
>
> That depends on whether you only want to see states of the database
> which are consistent with later states of the database and any
> invariants enforced by triggers or other software.  See this example
> of how a read-only transaction can see a bogus state at REPEATABLE
> READ or less strict transaction isolation:
>
> http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions
>
> Perhaps if the transaction using the pgsql_fdw is running at the
> SERIALIZABLE transaction isolation level, it should run the queries
> at the that level, otherwise at REPEATABLE READ.

I read the example carefully, and it seems to me that it is necessary
for the read-only transaction (T3) to be SERIALIZABLE so that
T1 is aborted and the state that T3 saw remains valid.

If I understand right, I agree with your correction.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
> I read the example carefully, and it seems to me that it is
> necessary for the read-only transaction (T3) to be SERIALIZABLE so
> that T1 is aborted and the state that T3 saw remains valid.
Correct.
> If I understand right, I agree with your correction.
:-)
-Kevin


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
<p><br /> 2012/02/21 0:58 "Kevin Grittner" <<a
href="mailto:Kevin.Grittner@wicourts.gov">Kevin.Grittner@wicourts.gov</a>>:<br/> ><br /> > "Albe Laurenz"
<<ahref="mailto:laurenz.albe@wien.gv.at">laurenz.albe@wien.gv.at</a>> wrote:<br /> ><br /> > > I read
theexample carefully, and it seems to me that it is<br /> > > necessary for the read-only transaction (T3)v to be
SERIALIZABLEso<br /> > > that T1 is aborted and the state that T3 saw remains valid.<br /> ><br /> >
Correct.<br/> Hm, agreed that isolation levels < REPEATABLE READ are not sufficient for pgsql_fdw's usage.  I'll
examinethe example and fix pgsql_fdw.<p>Thanks. 

Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/02/15 20:50), Etsuro Fujita wrote:
> (2012/02/14 23:50), Tom Lane wrote:
>>> (2012/02/14 17:40), Etsuro Fujita wrote:
>>>> As discussed at
>>>> that thread, it would have to change the PlanForeignScan API to let the
>>>> FDW generate multiple paths and dump them all to add_path instead of
>>>> returning a FdwPlan struct.

>> I would really like to see that happen in 9.2, because the longer we let
>> that mistake live, the harder it will be to change.  More and more FDWs
>> are getting written.  I don't think it's that hard to do: we just have
>> to agree that PlanForeignScan should return void and call add_path for
>> itself, possibly more than once.
>
> Agreed.  I fixed the PlanForeignScan API.  Please find attached a patch.
>
>> If we do that, I'm inclined to think
>> we cou;d get rid of the separate Node type FdwPlan, and just incorporate
>> "List *fdw_private" into ForeignPath and ForeignScan.
>
> +1  While the patch retains the struct FdwPlan, I would like to get rid
> of it at next version of the patch.

Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/21 8:07), Shigeru Hanada wrote:
> Hm, agreed that isolation levels<  REPEATABLE READ are not sufficient for
> pgsql_fdw's usage.  I'll examine the example and fix pgsql_fdw.

Attached patch uses "safe" isolation level for remote transactions.
After this change, pgsql_fdw uses levels below:

      local       |     remote
------------------+-----------------
 SERIALIZABLE     | SERIALIZABLE
 REPEATABLE READ  | REPEATABLE READ
 READ COMMITTED   | REPEATABLE READ
 READ UNCOMMITTED | REPEATABLE READ

Please review document of pgsql_fdw too.

In addition, I've removed #ifdef from options.c, so that we can specify
libpq options about SSL and Kerberos even if such feature has not been
configured on the environment where pgsql_fdw was built.  This change
also avoids regression test failure on environment where any of
--with-openssl or --with-krb5 was specified for configure script.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Peter Eisentraut
Date:
Could we name this "postgresql_fdw" instead?  We already have several
${productname}_fdw out there, and I don't want to get in the business of
having to guess variant spellings.




Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/25 7:31), Peter Eisentraut wrote:
> Could we name this "postgresql_fdw" instead?  We already have several
> ${productname}_fdw out there, and I don't want to get in the business of
> having to guess variant spellings.

I worry name conflict with existing postgresql_fdw_validator, which is
implemented in backend binary and used by contrib/dblink.  I thought
that we should use another name for PostgreSQL FDW unless we can change
specification of dblink connection string.

-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Robert Haas
Date:
On Fri, Feb 24, 2012 at 5:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Could we name this "postgresql_fdw" instead?  We already have several
> ${productname}_fdw out there, and I don't want to get in the business of
> having to guess variant spellings.

If you don't like variant spellings, having anything to do with
PostgreSQL, aka Postgres, and usually discussed on the pgsql-* mailing
lists, is probably a bad idea.

Go Postgre!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Feb 24, 2012 at 5:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> Could we name this "postgresql_fdw" instead? �We already have several
>> ${productname}_fdw out there, and I don't want to get in the business of
>> having to guess variant spellings.

> If you don't like variant spellings, having anything to do with
> PostgreSQL, aka Postgres, and usually discussed on the pgsql-* mailing
> lists, is probably a bad idea.

[ snicker ]  But still, Peter has a point: pgsql is not a name for the
product, it's at best an abbreviation.  We aren't calling the other
thing orcl_fdw or ora_fdw.

I think either postgres_fdw or postgresql_fdw would be fine.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Robert Haas
Date:
On Tue, Feb 28, 2012 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Feb 24, 2012 at 5:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> Could we name this "postgresql_fdw" instead?  We already have several
>>> ${productname}_fdw out there, and I don't want to get in the business of
>>> having to guess variant spellings.
>
>> If you don't like variant spellings, having anything to do with
>> PostgreSQL, aka Postgres, and usually discussed on the pgsql-* mailing
>> lists, is probably a bad idea.
>
> [ snicker ]  But still, Peter has a point: pgsql is not a name for the
> product, it's at best an abbreviation.  We aren't calling the other
> thing orcl_fdw or ora_fdw.
>
> I think either postgres_fdw or postgresql_fdw would be fine.

I liked the shorter name, myself, but I'm not going to make a big deal about it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql_fdw, FDW for PostgreSQL server

From
Peter Eisentraut
Date:
On tis, 2012-02-28 at 11:20 -0500, Robert Haas wrote:
> > [ snicker ]  But still, Peter has a point: pgsql is not a name for
> the
> > product, it's at best an abbreviation.  We aren't calling the other
> > thing orcl_fdw or ora_fdw.
> >
> > I think either postgres_fdw or postgresql_fdw would be fine.
> 
> I liked the shorter name, myself, but I'm not going to make a big deal
> about it.

Let's at least be clear about the reasons here.  The fact that
postgresql_fdw_validator exists means (a) there is a possible naming
conflict that has not been discussed yet, and/or (b) the name is already
settled and we need to think of a way to make postgresql_fdw_validator
work with the new actual FDW.




Re: pgsql_fdw, FDW for PostgreSQL server

From
"David E. Wheeler"
Date:
On Feb 28, 2012, at 8:20 AM, Robert Haas wrote:

> I liked the shorter name, myself, but I'm not going to make a big deal about it.

pg_ is used quite a bit. what about pg_fdw?

David

Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/29 4:07), Peter Eisentraut wrote:
> Let's at least be clear about the reasons here.  The fact that
> postgresql_fdw_validator exists means (a) there is a possible naming
> conflict that has not been discussed yet, and/or (b) the name is already
> settled and we need to think of a way to make postgresql_fdw_validator
> work with the new actual FDW.

We can avoid conflict of name by using postgres_fdw or pgsql_fdw, but it
doesn't solve fundamental issue.  ISTM that maintaining two similar
validators is wasteful and confusing, and FDW for PostgreSQL should be
just one, at least in the context of core distribution.

Current pgsql_fdw_validator accepts every FDW options which is accepted
by postgresql_fdw_validator, and additionally accepts FDW specific
options such as fetch_count.  So, if dblink can ignore unknown FDW
options, pgsql_fdw_validator can be used to create foreign servers for
dblink connection.

How about removing postgresql_fdw_validator from backend binary, and
changing dblink to use contrib/postgresql_fdw's validator?  It breaks
some backward compatibility and requires contrib/postgresql_fdw to be
installed before using contrib/dblink with foreign servers, but ISTM
that it doesn't become so serious.

Of course dblink is still available by itself if user specifies
connection information with "key = value" string, not with server name.

One concern is how to avoid duplicated list of valid libpq options.
Adding new libpq function, like below, which returns 1 when given name
is a valid libpq option would help.
   int PQisValidOption(const char *keyword);

-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Pavel Stehule
Date:
2012/2/29 Shigeru Hanada <shigeru.hanada@gmail.com>:
> (2012/02/29 4:07), Peter Eisentraut wrote:
>> Let's at least be clear about the reasons here.  The fact that
>> postgresql_fdw_validator exists means (a) there is a possible naming
>> conflict that has not been discussed yet, and/or (b) the name is already
>> settled and we need to think of a way to make postgresql_fdw_validator
>> work with the new actual FDW.
>
> We can avoid conflict of name by using postgres_fdw or pgsql_fdw, but it
> doesn't solve fundamental issue.  ISTM that maintaining two similar
> validators is wasteful and confusing, and FDW for PostgreSQL should be
> just one, at least in the context of core distribution.
>
> Current pgsql_fdw_validator accepts every FDW options which is accepted
> by postgresql_fdw_validator, and additionally accepts FDW specific
> options such as fetch_count.  So, if dblink can ignore unknown FDW
> options, pgsql_fdw_validator can be used to create foreign servers for
> dblink connection.
>
> How about removing postgresql_fdw_validator from backend binary, and
> changing dblink to use contrib/postgresql_fdw's validator?  It breaks
> some backward compatibility and requires contrib/postgresql_fdw to be
> installed before using contrib/dblink with foreign servers, but ISTM
> that it doesn't become so serious.
>

+1

pavel stehule

> Of course dblink is still available by itself if user specifies
> connection information with "key = value" string, not with server name.
>
> One concern is how to avoid duplicated list of valid libpq options.
> Adding new libpq function, like below, which returns 1 when given name
> is a valid libpq option would help.
>
>    int PQisValidOption(const char *keyword);
>
> --
> Shigeru Hanada
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> How about removing postgresql_fdw_validator from backend binary, and
> changing dblink to use contrib/postgresql_fdw's validator?  It breaks
> some backward compatibility and requires contrib/postgresql_fdw to be
> installed before using contrib/dblink with foreign servers, but ISTM
> that it doesn't become so serious.

I don't think that creating such a dependency is acceptable.
Even if we didn't mind the dependency, you said yourself that
contrib/postgresql_fdw's validator will accept stuff that's not
appropriate for dblink.

If we don't think postgresql_fdw_validator belongs in core after all,
we should just move it to dblink.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/01 0:33), Tom Lane wrote:
> I don't think that creating such a dependency is acceptable.
> Even if we didn't mind the dependency, you said yourself that
> contrib/postgresql_fdw's validator will accept stuff that's not
> appropriate for dblink.

Agreed.  I think that these two contrib modules (and all FDW modules)
should have individual validator for each to avoid undesirable
dependency and naming conflict, and such validator function should be
inside each module, but not in core.

How about moving postgresql_fdw_validator into dblink, with renaming to
dblink_fdw_validator?  Attached patch achieves such changes.  I've left
postgresql_fdw_validator" in foreign_data regression test section, so
that foreign_data section can still check whether FDW DDLs invoke
validator function.  I used the name "postgresql_fdw_validator" for test
validator to make change as little as possible.

This change requires dblink to have new function, so its version should
be bumped to 1.1.

These changes have no direct relation to PostgreSQL FDW, so this patch
can be applied by itself.  If this patch has been applied, I'll rename
pgsql_fdw to postgresql_fdw which contains product name fully spelled out.

--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Peter Eisentraut
Date:
On tor, 2012-03-01 at 20:56 +0900, Shigeru Hanada wrote:
> How about moving postgresql_fdw_validator into dblink,

That's probably a good move.  If this were C++, we might try to subclass
this whole thing a bit, to avoid code duplication, but I don't see an
easy way to do that here.

>  with renaming to dblink_fdw_validator? 

Well, it's not the validator of the dblink_fdw, so maybe something like
basic_postgresql_fdw_validator.



Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
2012/3/2 Peter Eisentraut <peter_e@gmx.net>:
>>  with renaming to dblink_fdw_validator?
>
> Well, it's not the validator of the dblink_fdw, so maybe something like
> basic_postgresql_fdw_validator.

-1 for same reason.  It's not the validator of basic_postgresql_fdw.

Using "fdw" in the name of validator which doesn't have actual FDW might
confuse users.  Rather dblink_validator or libpq_option_validator is better?

One possible another idea is creating dblink_fdw which uses the
validator during "CREATE EXTENSION dblink" for users who store
connection information in FDW objects.

-- 
Shigeru Hanada



Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/02/21 20:25), Etsuro Fujita wrote:
> Please find attached an updated version of the patch.

This v2 patch can be applied on HEAD cleanly.  Compile completed with
only one expected warning of scan.c, and all regression tests for both
core and contrib modules passed.

This patch allows FDWs to return multiple ForeignPath nodes per a
PlanForeignScan call.  It also get rid of FdwPlan, FDW-private
information container, by replacing with simple List.

I've reviewed the patch closely, and have some comments about its design.

Basically a create_foo_path is responsible for creating a node object
with a particular Path-derived type, but this patch changes
create_foreignscan_path to just call PlanForeignScan and return void.
This change seems breaking module design.  IMO create_foreignscan_path
should return just one ForeignPath node per a call, so calling add_path
multiple times should be done in somewhere else.  I think
set_foreign_pathlist suites for it, because set_foo_pathlist functions
are responsible for building possible paths for a RangeTblEntry, as
comment of set_foreign_pathlist says.

/*
 * set_foreign_pathlist
 *      Build one or more access paths for a foreign table RTE
 */

In this design, FDW authors can implement PlanForeignScan by repeating
steps below for each possible scan path for a foreign table:

  (1) create a template ForeignPath node with create_foreignscan_path
  (2) customize the path as FDW wants, e.g. push down WHERE clause
  (3) store FDW-private info
  (4) estimate costs of the path
  (5) call add_path to add the path to RelOptInfo

Current design doesn't allow FDWs to provide multiple paths which have
different local filtering from each other, because all paths share a
RelOptInfo and baserestrictinfo in it.  I think this restriction
wouldn't be a serious problem.

Please find attached a patch implementing the design above.

--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/03/05 18:21), Shigeru Hanada wrote:
> (2012/02/21 20:25), Etsuro Fujita wrote:
>> Please find attached an updated version of the patch.
> 
> This v2 patch can be applied on HEAD cleanly.  Compile completed with
> only one expected warning of scan.c, and all regression tests for both
> core and contrib modules passed.
> 
> This patch allows FDWs to return multiple ForeignPath nodes per a
> PlanForeignScan call.  It also get rid of FdwPlan, FDW-private
> information container, by replacing with simple List.
> 
> I've reviewed the patch closely, and have some comments about its design.

Thank you for your review.

> Basically a create_foo_path is responsible for creating a node object
> with a particular Path-derived type, but this patch changes
> create_foreignscan_path to just call PlanForeignScan and return void.
> This change seems breaking module design.

create_index_path builds multiple index paths for a plain relation.  How
about renaming the function to create_foreign_paths?

Best regards,
Etsuro Fujita


Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/03/05 21:00), Etsuro Fujita wrote:
> (2012/03/05 18:21), Shigeru Hanada wrote:
>> (2012/02/21 20:25), Etsuro Fujita wrote:
>>> Please find attached an updated version of the patch.
>>
>> This v2 patch can be applied on HEAD cleanly.  Compile completed with
>> only one expected warning of scan.c, and all regression tests for both
>> core and contrib modules passed.
>>
>> This patch allows FDWs to return multiple ForeignPath nodes per a
>> PlanForeignScan call.  It also get rid of FdwPlan, FDW-private
>> information container, by replacing with simple List.
>>
>> I've reviewed the patch closely, and have some comments about its design.
> 
> Thank you for your review.
> 
>> Basically a create_foo_path is responsible for creating a node object
>> with a particular Path-derived type, but this patch changes
>> create_foreignscan_path to just call PlanForeignScan and return void.
>> This change seems breaking module design.
> 
> create_index_path builds multiple index paths for a plain relation.  How
> about renaming the function to create_foreign_paths?

I meant "create_foreignscan_paths".  I'm sorry about that.

Best regards,
Etsuro Fujita


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/05 21:05), Etsuro Fujita wrote:
> (2012/03/05 21:00), Etsuro Fujita wrote:
>> create_index_path builds multiple index paths for a plain relation.  How
>> about renaming the function to create_foreign_paths?
>
> I meant "create_foreignscan_paths".  I'm sorry about that.

Perhaps you are confusing create_index_path with create_index_paths.
Former creates a IndexScan path node (so it's similar to
create_foreignscan_path), and latter builds multiple IndexScan paths for
a plain relation.

So, just renaming create_foreignscan_path to plural form seems missing
the point.

-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> So, just renaming create_foreignscan_path to plural form seems missing
> the point.

I agree that that wouldn't be an improvement.  What bothers me about the
patch's version of this function is that it just creates a content-free
Path node and leaves it to the caller to fill in everything.  That
doesn't accomplish much, and it leaves the caller very exposed to errors
of omission.  It's also unlike the other create_xxx_path functions,
which generally hand back a completed Path ready to pass to add_path.

I'm inclined to think that if we provide this function in core at all,
it should take a parameter list long enough to let it fill in the Path
completely.  That would imply that any future changes in Path structs
would result in a change in the parameter list, which would break
callers --- but it would break them in an obvious way that the C
compiler would complain about.  If we leave it as-is, those same callers
would be broken silently, because they'd just be failing to fill in
the new Path fields.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
I wrote:
> I'm inclined to think that if we provide this function in core at all,
> it should take a parameter list long enough to let it fill in the Path
> completely.  That would imply that any future changes in Path structs
> would result in a change in the parameter list, which would break
> callers --- but it would break them in an obvious way that the C
> compiler would complain about.  If we leave it as-is, those same callers
> would be broken silently, because they'd just be failing to fill in
> the new Path fields.

I've committed the PlanForeignScan API change, with that change and
some other minor editorialization.  The pgsql_fdw patch now needs an
update, so I set it back to Waiting On Author state.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru Hanada wrote:
[pgsql_fdw_v12.patch]

I know this is not the latest version, but I played around with it and
tickled a bug.
It seems to have a problem with rolled back subtransactions.

test=> \d+ remote                     Foreign table "laurenz.remote"   Column |  Type   | Modifiers | FDW Options |
Storage | Description  --------+---------+-----------+-------------+----------+-------------   id     | integer | not
null |             | plain    |   val    | text    | not null  |             | extended |  Server: loopback  FDW
Options:(nspname 'laurenz', relname 'local')  Has OIDs: no 
test=> BEGIN;
test=> DECLARE x CURSOR FOR SELECT * FROM remote;  DEBUG:  Remote SQL: SELECT id, val FROM laurenz.local  DEBUG:
relid=16423fetch_count=10000  DEBUG:  starting remote transaction with "START TRANSACTION ISOLATION 
LEVEL REPEATABLE READ"
test=> FETCH x;   id | val  ----+-----    1 | one  (1 row)
test=> SAVEPOINT z;
test=> ERROR OUT;  ERROR:  syntax error at or near "ERROR"  LINE 1: ERROR OUT;
test=> ROLLBACK TO SAVEPOINT z;
test=> FETCH x;   id | val  ----+-----    2 | two  (1 row)
test=> COMMIT;  ERROR:  could not close cursor  DETAIL:  no connection to the server
  HINT:  CLOSE pgsql_fdw_cursor_0

The error message reported is not consistent, at one attempt the backend
crashed.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/06 6:19), Tom Lane wrote:
> I've committed the PlanForeignScan API change, with that change and
> some other minor editorialization.  The pgsql_fdw patch now needs an
> update, so I set it back to Waiting On Author state.

Thanks.

I've revised pgsql_fdw to catch up to this change, but I'll post those
patches after fixing the bug reported by Albe Laurenz.

BTW, what I did for this change is also needed for other existing FDWs
to make them available on 9.2.  So, I'd like to add how to change FDWs
for 9.2 to SQL/MED wiki page, where probably most of the FDW authors check.

Regards,
-- 
Shigeru Hanada



Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/06 19:09), Albe Laurenz wrote:
> I know this is not the latest version, but I played around with it and
> tickled a bug.
> It seems to have a problem with rolled back subtransactions.

Thanks for the report!

The problem was in cleanup_connection, which is called at end of
transactions.  Connection should be closed only when the trigger is a
top level transaction and it's aborting, but isTopLevel flag was not
checked.  I fixed the bug and added regression tests for such cases.

Attached patches also contains changes to catch up to the redesign of
PlanForeignScan.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru Hanada wrote:
>                Connection should be closed only when the trigger is a
> top level transaction and it's aborting, but isTopLevel flag was not
> checked.  I fixed the bug and added regression tests for such cases.

I wondered about that - is it really necessary to close the remote
connection? Wouldn't a ROLLBACK on the remote connection be good enough?

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On tor, 2012-03-01 at 20:56 +0900, Shigeru Hanada wrote:
>> How about moving postgresql_fdw_validator into dblink,

> That's probably a good move.  If this were C++, we might try to subclass
> this whole thing a bit, to avoid code duplication, but I don't see an
> easy way to do that here.

>> with renaming to dblink_fdw_validator? 

> Well, it's not the validator of the dblink_fdw, so maybe something like
> basic_postgresql_fdw_validator.

I don't understand this objection.  If we move it into dblink, then it
*is* dblink's validator, and nobody else's.

A bigger issue with postgresql_fdw_validator is that it supposes that
the core backend is authoritative as to what options libpq supports,
which is bad design on its face.  It would be much more sensible for
dblink to be asking libpq what options libpq supports, say via
PQconndefaults().

We might find that we have to leave postgresql_fdw_validator as-is
for backwards compatibility reasons (in particular, being able to load
existing FDW definitions) but I think we should migrate away from using
it.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> Attached patches also contains changes to catch up to the redesign of
> PlanForeignScan.

I started to look at this patch, which soon led me back to the
prerequisite patch fdw_helper_v5.patch (that is the last version you
posted of that one, right?).  I can see the value of
GetForeignColumnOptions, but ISTM that GetFdwOptionValue is poorly
designed and will accomplish little except to encourage inefficient
searching.  The original version was even worse, but even in the current
version there is no way to avoid a useless scan of table-level options
when you are looking for a column-level option.  Also, it's inefficient
when looking for several option values, as in this extract from
pgsql_fdw_v13.patch,

+     nspname = GetFdwOptionValue(InvalidOid, InvalidOid, relid,
+                                 InvalidAttrNumber, "nspname");
+     if (nspname == NULL)
+         nspname = get_namespace_name(get_rel_namespace(relid));
+     q_nspname = quote_identifier(nspname);
+ 
+     relname = GetFdwOptionValue(InvalidOid, InvalidOid, relid,
+                                 InvalidAttrNumber, "relname");
+     if (relname == NULL)
+         relname = get_rel_name(relid);
+     q_relname = quote_identifier(relname);

where we are going to uselessly run GetForeignTable twice.

If we had a lot of options that could usefully be specified at multiple
levels of the foreign-objects hierarchy, it might be appropriate to have
a search function defined like this; but the existing samples of FDWs
don't seem to support the idea that that's going to be common.  It looks
to me like the vast majority of options make sense at exactly one level.

So I'm thinking we should forget GetFdwOptionValue and just expect the
callers to search the option lists for the appropriate object(s).  It
might be worth providing get_options_value() as an exported function,
though surely there's not that much to it.

Another issue that get_options_value ignores defGetString() which is
what it really ought to be using, instead of assuming strVal() is
appropriate.  ISTM that it would also encourage people to take shortcuts
where they should be using functions like defGetBoolean() etc.  Not
quite sure what we should do about that; maybe we need to provide
several variants of the function that are appropriate for different
option datatypes.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/07 9:01), Tom Lane wrote:
> I started to look at this patch, which soon led me back to the
> prerequisite patch fdw_helper_v5.patch (that is the last version you
> posted of that one, right?).

Thanks for the review.  Yes, v5 is the last version of fdw_helper patch.

>                               I can see the value of
> GetForeignColumnOptions, but ISTM that GetFdwOptionValue is poorly
> designed and will accomplish little except to encourage inefficient
> searching.  The original version was even worse, but even in the current
> version there is no way to avoid a useless scan of table-level options
> when you are looking for a column-level option.  Also, it's inefficient
> when looking for several option values, as in this extract from
> pgsql_fdw_v13.patch,
>
> +     nspname = GetFdwOptionValue(InvalidOid, InvalidOid, relid,
> +                                 InvalidAttrNumber, "nspname");
> +     if (nspname == NULL)
> +         nspname = get_namespace_name(get_rel_namespace(relid));
> +     q_nspname = quote_identifier(nspname);
> +
> +     relname = GetFdwOptionValue(InvalidOid, InvalidOid, relid,
> +                                 InvalidAttrNumber, "relname");
> +     if (relname == NULL)
> +         relname = get_rel_name(relid);
> +     q_relname = quote_identifier(relname);
>
> where we are going to uselessly run GetForeignTable twice.

In addition, request for fetch_count option value via
GetFdwOptionValue() uselessly runs GetUserMapping() and
GetForeignDataWrapper() too, they are obvious waste of clocks and memory.

> If we had a lot of options that could usefully be specified at multiple
> levels of the foreign-objects hierarchy, it might be appropriate to have
> a search function defined like this; but the existing samples of FDWs
> don't seem to support the idea that that's going to be common.  It looks
> to me like the vast majority of options make sense at exactly one level.

Yes, I added GetFdwOptionValue() to provide an easy way to obtain the
value of a particular FDW option which might be stored in multiple
levels of foreign-objects hierarchy without looping per object.  I used
it in pgsql_fdw to get value of fetch_count option, which can be stored
in server and/or foreign table, but it seems the only one use case now.

> So I'm thinking we should forget GetFdwOptionValue and just expect the
> callers to search the option lists for the appropriate object(s).  It
> might be worth providing get_options_value() as an exported function,
> though surely there's not that much to it.

Agreed.  Attached fdw_helper patch doesn't contain GetFdwOptionValue()
any more, and pgsql_fdw patch accesses only necessary catalogs.

> Another issue that get_options_value ignores defGetString() which is
> what it really ought to be using, instead of assuming strVal() is
> appropriate.  ISTM that it would also encourage people to take shortcuts
> where they should be using functions like defGetBoolean() etc.  Not
> quite sure what we should do about that; maybe we need to provide
> several variants of the function that are appropriate for different
> option datatypes.

strVal() used in pgsql_fdw were replaced with defGetString().  It seems
to me that it's enough.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Peter Eisentraut
Date:
On tis, 2012-03-06 at 13:39 -0500, Tom Lane wrote:
> A bigger issue with postgresql_fdw_validator is that it supposes that
> the core backend is authoritative as to what options libpq supports,
> which is bad design on its face.  It would be much more sensible for
> dblink to be asking libpq what options libpq supports, say via
> PQconndefaults().

The validator for the proposed FDW suffers from the same problem.



Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> Agreed.  Attached fdw_helper patch doesn't contain GetFdwOptionValue()
> any more, and pgsql_fdw patch accesses only necessary catalogs.

I've committed the fdw_helper part of this, with some very minor
improvements.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/07 21:47), Shigeru Hanada wrote:
> Agreed.  Attached fdw_helper patch doesn't contain GetFdwOptionValue()
> any more, and pgsql_fdw patch accesses only necessary catalogs.

Oops, I've missed some bugs.  Attached patch fixes them.

1) foreign table's fetch_count options is always ignored
2) If given connection is not managed by connection cache,
ReleaseConnection() crashes backend process by NULL dereference.

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> [ pgsql_fdw_v15.patch ]

I've been looking at this patch a little bit over the past day or so.
I'm pretty unhappy with deparse.c --- it seems like a real kluge,
inefficient and full of corner-case bugs.  After some thought I believe
that you're ultimately going to have to abandon depending on ruleutils.c
for reverse-listing services, and it would be best to bite that bullet
now and rewrite this code from scratch.  ruleutils.c is serving two
masters already (rule-dumping and EXPLAIN) and it's not going to be
practical to tweak its behavior further for this usage; yet there are
all sorts of clear problems that you are going to run into, boiling down
to the fact that names on the remote end aren't necessarily the same as
names on the local end.  For instance, ruleutils.c is not going to be
helpful at schema-qualifying function names in a way that's correct for
the foreign server environment.  Another issue is that as soon as you
try to push down join clauses for parameterized paths, you are going to
want Vars of other relations to be printed as parameters ($n, most
likely) and ruleutils is not going to know to do that.  Seeing that
semantic constraints will greatly limit the set of node types that can
ever be pushed down anyway, I think it's likely to be easiest to just
write your own node-printing code and not even try to use ruleutils.c.

There are a couple of other points that make me think we need to revisit
the PlanForeignScan API definition some more, too.  First, deparse.c is
far from cheap.  While this doesn't matter greatly as long as there's
only one possible path for a foreign table, as soon as you want to
create more than one it's going to be annoying to do all that work N
times and then throw away N-1 of the results.  I also choked on the fact
that the pushdown patch thinks it can modify baserel->baserestrictinfo.
That might accidentally fail to malfunction right now, but it's never
going to scale to multiple paths with potentially different sets of
remotely-applied constraints.  So I'm thinking we really need to let
FDWs in on the Path versus Plan distinction --- that is, a Path just
needs to be a cheap summary of a way to do things, and then at
createplan.c time you convert the selected Path into a full-fledged
Plan.  Most of the work done in deparse.c could be postponed to
createplan time and done only once, even with multiple paths.
The baserestrictinfo hack would be unnecessary too if the FDW had more
direct control over generation of the ForeignScan plan node.

Another thing I'm thinking we should let FDWs in on is the distinction
between rowcount estimation and path generation.  When we did the first
API design last year it was okay to expect a single call to do both,
but as of a couple months ago allpaths.c does those steps in two
separate passes over the baserels, and it'd be handy if FDWs would
cooperate.

So we need to break down what PlanForeignScan currently does into three
separate steps.  The first idea that comes to mind is to call them
GetForeignRelSize, GetForeignPaths, GetForeignPlan; but maybe somebody
has a better idea for names?
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
I wrote:
> There are a couple of other points that make me think we need to revisit
> the PlanForeignScan API definition some more, too.  ...
> So we need to break down what PlanForeignScan currently does into three
> separate steps.  The first idea that comes to mind is to call them
> GetForeignRelSize, GetForeignPaths, GetForeignPlan; but maybe somebody
> has a better idea for names?

Attached is a draft patch for that.  While I was working on this
I realized that we were very far short of allowing FDWs to set up
expressions of their choice for execution; there was nothing for that in
setrefs.c, nor some other places that need to post-process expressions.
I had originally supposed that fdw_private could just contain some
expression trees, but that wasn't going to work without post-processing.
So this patch attempts to cover that too, by breaking what had been
fdw_private into a "private" part and an "fdw_exprs" list that will be
subject to expression post-processing.  (The alternative to this would
be to do post-processing on all of fdw_private, but that would
considerably restrict what can be in fdw_private, so it seemed better
to decree two separate fields.)

Working on this also helped me identify some other things that had been
subliminally bothering me about pgsql_fdw's qual pushdown code.  That
patch is set up with the idea of pushing entire quals (boolean
RestrictInfo expressions) across to the remote side, but I think that
is probably the wrong granularity, or at least not the only mechanism
we should have.  IMO it is more important to provide a structure similar
to index quals; that is, what you want to identify is RestrictInfo
expressions of the form
    remote_variable operator local_expression
where the operator has to be one that the remote can execute with the
same semantics as we think it has, but the only real restriction on the
local_expression is that it be stable, because we'll execute it locally
and send only its result value across to the remote.  (The SQL sent to
the remote looks like "remote_variable operator $1", or some such.)
Thus, to take an example that's said to be unsafe in the existing code
comments, there's no problem at all with
    remote_timestamp_col = now()
as long as we execute now() locally.

There might be some value in pushing entire quals across too, for
clauses like "remote_variable_1 = remote_variable_2", but I believe
that these are not nearly as important as "variable = constant" and
"variable = join_variable" cases.  Consider that when dealing with a
local table, only the latter two cases can be accelerated by indexes.

            regards, tom lane

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 29f203c6f10eeeed194670aa37956a6190e8e7a1..e8907709bd90a6342384dfb6f10b00e55018d65d 100644
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 26,31 ****
--- 26,33 ----
  #include "nodes/makefuncs.h"
  #include "optimizer/cost.h"
  #include "optimizer/pathnode.h"
+ #include "optimizer/planmain.h"
+ #include "optimizer/restrictinfo.h"
  #include "utils/rel.h"

  PG_MODULE_MAGIC;
*************** struct FileFdwOption
*** 48,54 ****
   * Note: If you are adding new option for user mapping, you need to modify
   * fileGetOptions(), which currently doesn't bother to look at user mappings.
   */
! static struct FileFdwOption valid_options[] = {
      /* File options */
      {"filename", ForeignTableRelationId},

--- 50,56 ----
   * Note: If you are adding new option for user mapping, you need to modify
   * fileGetOptions(), which currently doesn't bother to look at user mappings.
   */
! static const struct FileFdwOption valid_options[] = {
      /* File options */
      {"filename", ForeignTableRelationId},

*************** static struct FileFdwOption valid_option
*** 72,77 ****
--- 74,90 ----
  };

  /*
+  * FDW-specific information for RelOptInfo.fdw_private.
+  */
+ typedef struct FileFdwPlanState
+ {
+     char       *filename;        /* file to read */
+     List       *options;        /* merged COPY options, excluding filename */
+     BlockNumber pages;            /* estimate of file's physical size */
+     double        ntuples;        /* estimate of number of rows in file */
+ } FileFdwPlanState;
+
+ /*
   * FDW-specific information for ForeignScanState.fdw_state.
   */
  typedef struct FileFdwExecutionState
*************** PG_FUNCTION_INFO_V1(file_fdw_validator);
*** 93,101 ****
  /*
   * FDW callback routines
   */
! static void filePlanForeignScan(Oid foreigntableid,
!                     PlannerInfo *root,
!                     RelOptInfo *baserel);
  static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es);
  static void fileBeginForeignScan(ForeignScanState *node, int eflags);
  static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
--- 106,123 ----
  /*
   * FDW callback routines
   */
! static void fileGetForeignRelSize(PlannerInfo *root,
!                                   RelOptInfo *baserel,
!                                   Oid foreigntableid);
! static void fileGetForeignPaths(PlannerInfo *root,
!                                 RelOptInfo *baserel,
!                                 Oid foreigntableid);
! static ForeignScan *fileGetForeignPlan(PlannerInfo *root,
!                                        RelOptInfo *baserel,
!                                        Oid foreigntableid,
!                                        ForeignPath *best_path,
!                                        List *tlist,
!                                        List *scan_clauses);
  static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es);
  static void fileBeginForeignScan(ForeignScanState *node, int eflags);
  static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
*************** static bool is_valid_option(const char *
*** 109,116 ****
  static void fileGetOptions(Oid foreigntableid,
                 char **filename, List **other_options);
  static List *get_file_fdw_attribute_options(Oid relid);
  static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
!                const char *filename,
                 Cost *startup_cost, Cost *total_cost);


--- 131,140 ----
  static void fileGetOptions(Oid foreigntableid,
                 char **filename, List **other_options);
  static List *get_file_fdw_attribute_options(Oid relid);
+ static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
+               FileFdwPlanState *fdw_private);
  static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
!                FileFdwPlanState *fdw_private,
                 Cost *startup_cost, Cost *total_cost);


*************** file_fdw_handler(PG_FUNCTION_ARGS)
*** 123,129 ****
  {
      FdwRoutine *fdwroutine = makeNode(FdwRoutine);

!     fdwroutine->PlanForeignScan = filePlanForeignScan;
      fdwroutine->ExplainForeignScan = fileExplainForeignScan;
      fdwroutine->BeginForeignScan = fileBeginForeignScan;
      fdwroutine->IterateForeignScan = fileIterateForeignScan;
--- 147,155 ----
  {
      FdwRoutine *fdwroutine = makeNode(FdwRoutine);

!     fdwroutine->GetForeignRelSize = fileGetForeignRelSize;
!     fdwroutine->GetForeignPaths = fileGetForeignPaths;
!     fdwroutine->GetForeignPlan = fileGetForeignPlan;
      fdwroutine->ExplainForeignScan = fileExplainForeignScan;
      fdwroutine->BeginForeignScan = fileBeginForeignScan;
      fdwroutine->IterateForeignScan = fileIterateForeignScan;
*************** file_fdw_validator(PG_FUNCTION_ARGS)
*** 177,183 ****

          if (!is_valid_option(def->defname, catalog))
          {
!             struct FileFdwOption *opt;
              StringInfoData buf;

              /*
--- 203,209 ----

          if (!is_valid_option(def->defname, catalog))
          {
!             const struct FileFdwOption *opt;
              StringInfoData buf;

              /*
*************** file_fdw_validator(PG_FUNCTION_ARGS)
*** 249,255 ****
  static bool
  is_valid_option(const char *option, Oid context)
  {
!     struct FileFdwOption *opt;

      for (opt = valid_options; opt->optname; opt++)
      {
--- 275,281 ----
  static bool
  is_valid_option(const char *option, Oid context)
  {
!     const struct FileFdwOption *opt;

      for (opt = valid_options; opt->optname; opt++)
      {
*************** get_file_fdw_attribute_options(Oid relid
*** 381,387 ****
  }

  /*
!  * filePlanForeignScan
   *        Create possible access paths for a scan on the foreign table
   *
   *        Currently we don't support any push-down feature, so there is only one
--- 407,437 ----
  }

  /*
!  * fileGetForeignRelSize
!  *        Obtain relation size estimates for a foreign table
!  */
! static void
! fileGetForeignRelSize(PlannerInfo *root,
!                       RelOptInfo *baserel,
!                       Oid foreigntableid)
! {
!     FileFdwPlanState *fdw_private;
!
!     /*
!      * Fetch options.  We only need filename at this point, but we might
!      * as well get everything and not need to re-fetch it later in planning.
!      */
!     fdw_private = (FileFdwPlanState *) palloc(sizeof(FileFdwPlanState));
!     fileGetOptions(foreigntableid,
!                    &fdw_private->filename, &fdw_private->options);
!     baserel->fdw_private = (void *) fdw_private;
!
!     /* Estimate relation size */
!     estimate_size(root, baserel, fdw_private);
! }
!
! /*
!  * fileGetForeignPaths
   *        Create possible access paths for a scan on the foreign table
   *
   *        Currently we don't support any push-down feature, so there is only one
*************** get_file_fdw_attribute_options(Oid relid
*** 389,408 ****
   *        the data file.
   */
  static void
! filePlanForeignScan(Oid foreigntableid,
!                     PlannerInfo *root,
!                     RelOptInfo *baserel)
  {
!     char       *filename;
!     List       *options;
      Cost        startup_cost;
      Cost        total_cost;

!     /* Fetch options --- we only need filename at this point */
!     fileGetOptions(foreigntableid, &filename, &options);
!
!     /* Estimate costs and update baserel->rows */
!     estimate_costs(root, baserel, filename,
                     &startup_cost, &total_cost);

      /* Create a ForeignPath node and add it as only possible path */
--- 439,454 ----
   *        the data file.
   */
  static void
! fileGetForeignPaths(PlannerInfo *root,
!                     RelOptInfo *baserel,
!                     Oid foreigntableid)
  {
!     FileFdwPlanState *fdw_private = (FileFdwPlanState *) baserel->fdw_private;
      Cost        startup_cost;
      Cost        total_cost;

!     /* Estimate costs */
!     estimate_costs(root, baserel, fdw_private,
                     &startup_cost, &total_cost);

      /* Create a ForeignPath node and add it as only possible path */
*************** filePlanForeignScan(Oid foreigntableid,
*** 423,428 ****
--- 469,505 ----
  }

  /*
+  * fileGetForeignPlan
+  *        Create a ForeignScan plan node for scanning the foreign table
+  */
+ static ForeignScan *
+ fileGetForeignPlan(PlannerInfo *root,
+                    RelOptInfo *baserel,
+                    Oid foreigntableid,
+                    ForeignPath *best_path,
+                    List *tlist,
+                    List *scan_clauses)
+ {
+     Index        scan_relid = baserel->relid;
+
+     /*
+      * We have no native ability to evaluate restriction clauses, so we just
+      * put all the scan_clauses into the plan node's qual list for the
+      * executor to check.  So all we have to do here is strip RestrictInfo
+      * nodes from the clauses and ignore pseudoconstants (which will be
+      * handled elsewhere).
+      */
+     scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+     /* Create the ForeignScan node */
+     return make_foreignscan(tlist,
+                             scan_clauses,
+                             scan_relid,
+                             NIL, /* no expressions to evaluate */
+                             NIL); /* no private state either */
+ }
+
+ /*
   * fileExplainForeignScan
   *        Produce extra output for EXPLAIN
   */
*************** fileReScanForeignScan(ForeignScanState *
*** 568,605 ****
  }

  /*
!  * Estimate costs of scanning a foreign table.
   *
!  * In addition to setting *startup_cost and *total_cost, this should
!  * update baserel->rows.
   */
  static void
! estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
!                const char *filename,
!                Cost *startup_cost, Cost *total_cost)
  {
      struct stat stat_buf;
      BlockNumber pages;
      int            tuple_width;
      double        ntuples;
      double        nrows;
-     Cost        run_cost = 0;
-     Cost        cpu_per_tuple;

      /*
       * Get size of the file.  It might not be there at plan time, though, in
       * which case we have to use a default estimate.
       */
!     if (stat(filename, &stat_buf) < 0)
          stat_buf.st_size = 10 * BLCKSZ;

      /*
!      * Convert size to pages for use in I/O cost estimate below.
       */
      pages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
      if (pages < 1)
          pages = 1;

      /*
       * Estimate the number of tuples in the file.  We back into this estimate
       * using the planner's idea of the relation width; which is bogus if not
--- 645,682 ----
  }

  /*
!  * Estimate size of a foreign table.
   *
!  * The main result is returned in baserel->rows.  We also set
!  * fdw_private->pages and fdw_private->ntuples for later use in the cost
!  * calculation.
   */
  static void
! estimate_size(PlannerInfo *root, RelOptInfo *baserel,
!               FileFdwPlanState *fdw_private)
  {
      struct stat stat_buf;
      BlockNumber pages;
      int            tuple_width;
      double        ntuples;
      double        nrows;

      /*
       * Get size of the file.  It might not be there at plan time, though, in
       * which case we have to use a default estimate.
       */
!     if (stat(fdw_private->filename, &stat_buf) < 0)
          stat_buf.st_size = 10 * BLCKSZ;

      /*
!      * Convert size to pages for use in I/O cost estimate later.
       */
      pages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
      if (pages < 1)
          pages = 1;

+     fdw_private->pages = pages;
+
      /*
       * Estimate the number of tuples in the file.  We back into this estimate
       * using the planner's idea of the relation width; which is bogus if not
*************** estimate_costs(PlannerInfo *root, RelOpt
*** 611,616 ****
--- 688,695 ----

      ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);

+     fdw_private->ntuples = ntuples;
+
      /*
       * Now estimate the number of rows returned by the scan after applying the
       * baserestrictinfo quals.    This is pretty bogus too, since the planner
*************** estimate_costs(PlannerInfo *root, RelOpt
*** 627,638 ****

      /* Save the output-rows estimate for the planner */
      baserel->rows = nrows;

      /*
!      * Now estimate costs.    We estimate costs almost the same way as
!      * cost_seqscan(), thus assuming that I/O costs are equivalent to a
!      * regular table file of the same size.  However, we take per-tuple CPU
!      * costs as 10x of a seqscan, to account for the cost of parsing records.
       */
      run_cost += seq_page_cost * pages;

--- 706,733 ----

      /* Save the output-rows estimate for the planner */
      baserel->rows = nrows;
+ }
+
+ /*
+  * Estimate costs of scanning a foreign table.
+  *
+  * Results are returned in *startup_cost and *total_cost.
+  */
+ static void
+ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
+                FileFdwPlanState *fdw_private,
+                Cost *startup_cost, Cost *total_cost)
+ {
+     BlockNumber pages = fdw_private->pages;
+     double        ntuples = fdw_private->ntuples;
+     Cost        run_cost = 0;
+     Cost        cpu_per_tuple;

      /*
!      * We estimate costs almost the same way as cost_seqscan(), thus assuming
!      * that I/O costs are equivalent to a regular table file of the same size.
!      * However, we take per-tuple CPU costs as 10x of a seqscan, to account
!      * for the cost of parsing records.
       */
      run_cost += seq_page_cost * pages;

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index dbfcbbc2b36dd49b0f0a8ffc9893da0b2a25a891..330953ee785d861a48c26166446eeec220e7a032 100644
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 89,140 ****
      <para>
  <programlisting>
  void
! PlanForeignScan (Oid foreigntableid,
!                  PlannerInfo *root,
!                  RelOptInfo *baserel);
  </programlisting>

!      Create possible access paths for a scan on a foreign table. This is
!      called when a query is planned.
       <literal>foreigntableid</> is the <structname>pg_class</> OID of the
!      foreign table.  <literal>root</> is the planner's global information
!      about the query, and <literal>baserel</> is the planner's information
!      about this table.
!     </para>
!
!     <para>
!      The function must generate at least one access path (ForeignPath node)
!      for a scan on the foreign table and must call <function>add_path</> to
!      add the path to <literal>baserel->pathlist</>.  It's recommended to
!      use <function>create_foreignscan_path</> to build the ForeignPath node.
!      The function may generate multiple access paths, e.g., a path which has
!      valid <literal>pathkeys</> to represent a pre-sorted result.  Each access
!      path must contain cost estimates, and can contain any FDW-private
!      information that is needed to execute the foreign scan at a later time.
!      (Note that the private information must be represented in a form that
!      <function>copyObject</> knows how to copy.)
      </para>

      <para>
       The information in <literal>root</> and <literal>baserel</> can be used
       to reduce the amount of information that has to be fetched from the
!      foreign table (and therefore reduce the cost estimate).
       <literal>baserel->baserestrictinfo</> is particularly interesting, as
!      it contains restriction quals (<literal>WHERE</> clauses) that can be
       used to filter the rows to be fetched.  (The FDW is not required to
!      enforce these quals, as the finished plan will recheck them anyway.)
       <literal>baserel->reltargetlist</> can be used to determine which
       columns need to be fetched.
      </para>

      <para>
!      In addition to returning cost estimates, the function should update
!      <literal>baserel->rows</> to be the expected number of rows returned
!      by the scan, after accounting for the filtering done by the restriction
!      quals.  The initial value of <literal>baserel->rows</> is just a
!      constant default estimate, which should be replaced if at all possible.
!      The function may also choose to update <literal>baserel->width</> if
!      it can compute a better estimate of the average result row width.
      </para>

      <para>
--- 89,227 ----
      <para>
  <programlisting>
  void
! GetForeignRelSize (PlannerInfo *root,
!                    RelOptInfo *baserel,
!                    Oid foreigntableid);
  </programlisting>

!      Obtain relation size estimates for a foreign table.  This is called
!      at the beginning of planning for a query involving a foreign table.
!      <literal>root</> is the planner's global information about the query;
!      <literal>baserel</> is the planner's information about this table; and
       <literal>foreigntableid</> is the <structname>pg_class</> OID of the
!      foreign table.  (<literal>foreigntableid</> could be obtained from the
!      planner data structures, but it's passed explicitly to save effort.)
      </para>

      <para>
       The information in <literal>root</> and <literal>baserel</> can be used
       to reduce the amount of information that has to be fetched from the
!      foreign table (and therefore reduce the cost).
       <literal>baserel->baserestrictinfo</> is particularly interesting, as
!      it contains restriction quals (<literal>WHERE</> clauses) that should be
       used to filter the rows to be fetched.  (The FDW is not required to
!      enforce these quals, as the executor can check them instead.)
       <literal>baserel->reltargetlist</> can be used to determine which
       columns need to be fetched.
      </para>

      <para>
!      This function should update <literal>baserel->rows</> to be the
!      expected number of rows returned by the table scan, after accounting for
!      the filtering done by the restriction quals.  The initial value of
!      <literal>baserel->rows</> is just a constant default estimate, which
!      should be replaced if at all possible.  The function may also choose to
!      update <literal>baserel->width</> if it can compute a better estimate
!      of the average result row width.
!     </para>
!
!     <para>
!      <literal>baserel->fdw_private</> is a <type>void</> pointer that is
!      available for use by FDW planning functions.  It can be used to pass
!      information forward from <function>GetForeignRelSize</> to
!      <function>GetForeignPaths</> and/or <function>GetForeignPaths</> to
!      <function>GetForeignPlan</>, thereby avoiding recalculation.
!     </para>
!
!     <para>
! <programlisting>
! void
! GetForeignPaths (PlannerInfo *root,
!                  RelOptInfo *baserel,
!                  Oid foreigntableid);
! </programlisting>
!
!      Create possible access paths for a scan on a foreign table.
!      This is called during query planning.
!      The parameters are the same as for <function>GetForeignRelSize</>,
!      which has already been called.
!     </para>
!
!     <para>
!      This function must generate at least one access path
!      (<structname>ForeignPath</> node) for a scan on the foreign table and
!      must call <function>add_path</> to add each such path to
!      <literal>baserel->pathlist</>.  It's recommended to use
!      <function>create_foreignscan_path</> to build the ForeignPath nodes.  The
!      function may generate multiple access paths, e.g., a path which has valid
!      <literal>pathkeys</> to represent a pre-sorted result.  Each access path
!      must contain cost estimates, and can contain any FDW-private information
!      that is needed to identify the specific scan method intended.
!     </para>
!
!     <para>
!      This private information is stored in the <structfield>fdw_private</>
!      field of <structname>ForeignPath</> nodes.  <structfield>fdw_private</>
!      is declared as a <type>List</>, but could actually contain anything since
!      the core planner does not touch it.  However, best practice is to use a
!      representation that's dumpable by <function>nodeToString</>, for use with
!      debugging support available in the backend.
!     </para>
!
!     <para>
! <programlisting>
! ForeignScan *
! GetForeignPlan (PlannerInfo *root,
!                 RelOptInfo *baserel,
!                 Oid foreigntableid,
!                 ForeignPath *best_path,
!                 List *tlist,
!                 List *scan_clauses);
! </programlisting>
!
!      Create a <structname>ForeignScan</> plan node from the selected foreign
!      access path.  This is called at the end of query planning.
!      The parameters are as for <function>GetForeignRelSize</>, plus
!      the selected <structname>ForeignPath</> (previously produced by
!      <function>GetForeignPaths</>), the target list to be emitted by the
!      plan node, and the restriction clauses to be enforced by the plan node.
!     </para>
!
!     <para>
!      This function must create and return a <structname>ForeignScan</> plan
!      node; it's recommended to use <function>make_foreignscan</> to build the
!      <structname>ForeignScan</> node.  Generally the targetlist can be copied
!      into the plan as-is.  The passed scan_clauses list contains the same
!      clauses as <literal>baserel->baserestrictinfo</>, but may be
!      re-ordered for better execution efficiency.  In simple cases the FDW can
!      just strip RestrictInfo nodes from the scan_clauses list (using
!      <function>extract_actual_clauses</>) and put all the clauses into the
!      plan node's qual list, which means that all the clauses will be checked
!      by the executor at runtime.  More complex FDWs may be able to check some
!      of the clauses internally, in which case those clauses can be removed
!      from the list given to the executor.
!     </para>
!
!     <para>
!      In addition, the FDW can generate <structfield>fdw_exprs</> and
!      <structfield>fdw_private</> lists to be placed in the plan node, where
!      they will be available at execution time.  Both of these lists must be
!      represented in a form that <function>copyObject</> knows how to copy.
!      The <structfield>fdw_private</> list has no other restrictions and is
!      not interpreted by the core backend in any way.  The
!      <structfield>fdw_exprs</> list, if not NIL, is expected to contain
!      expression trees that are intended to be executed at runtime.  These
!      trees will undergo post-processing by the planner to make them fully
!      executable.  As an example, the FDW might identify some restriction
!      clauses of the form <replaceable>foreign_variable</> <literal>=</>
!      <replaceable>sub_expression</>, which it determines can be executed on
!      the remote server given the locally-evaluated value of the
!      <replaceable>sub_expression</>.  It would remove such a clause from
!      scan_clauses, but add the <replaceable>sub_expression</> to
!      <structfield>fdw_exprs</> to ensure that it gets massaged into
!      executable form.  It would probably also put some control information
!      into <structfield>fdw_private</> to tell it what to do with the
!      expression value at runtime.
      </para>

      <para>
*************** BeginForeignScan (ForeignScanState *node
*** 170,176 ****
       the table to scan is accessible through the
       <structname>ForeignScanState</> node (in particular, from the underlying
       <structname>ForeignScan</> plan node, which contains any FDW-private
!      information provided by <function>PlanForeignScan</>).
      </para>

      <para>
--- 257,263 ----
       the table to scan is accessible through the
       <structname>ForeignScanState</> node (in particular, from the underlying
       <structname>ForeignScan</> plan node, which contains any FDW-private
!      information provided by <function>GetForeignPlan</>).
      </para>

      <para>
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 868fb7130a8b28cf3e074d7d3903e58366c0c914..5cde22543f5b7d4d607224acfa22a604a419ed63 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*************** _copyForeignScan(const ForeignScan *from
*** 591,598 ****
      /*
       * copy remainder of node
       */
!     COPY_SCALAR_FIELD(fsSystemCol);
      COPY_NODE_FIELD(fdw_private);

      return newnode;
  }
--- 591,599 ----
      /*
       * copy remainder of node
       */
!     COPY_NODE_FIELD(fdw_exprs);
      COPY_NODE_FIELD(fdw_private);
+     COPY_SCALAR_FIELD(fsSystemCol);

      return newnode;
  }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9daeb3e7b43e911aeab25b7521d41191928499bd..51181a9a7438e8609ab922340d1c2d20ba73726d 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outForeignScan(StringInfo str, const Fo
*** 559,566 ****

      _outScanInfo(str, (const Scan *) node);

!     WRITE_BOOL_FIELD(fsSystemCol);
      WRITE_NODE_FIELD(fdw_private);
  }

  static void
--- 559,567 ----

      _outScanInfo(str, (const Scan *) node);

!     WRITE_NODE_FIELD(fdw_exprs);
      WRITE_NODE_FIELD(fdw_private);
+     WRITE_BOOL_FIELD(fsSystemCol);
  }

  static void
*************** _outRelOptInfo(StringInfo str, const Rel
*** 1741,1746 ****
--- 1742,1748 ----
      WRITE_FLOAT_FIELD(allvisfrac, "%.6f");
      WRITE_NODE_FIELD(subplan);
      WRITE_NODE_FIELD(subroot);
+     /* we don't try to print fdwroutine or fdw_private */
      WRITE_NODE_FIELD(baserestrictinfo);
      WRITE_NODE_FIELD(joininfo);
      WRITE_BOOL_FIELD(has_eclass_joins);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6e81ce0fc26496f73e89d5048b0fd8a19da33b74..03c604a03d6f37d9d6d975fd0809429c56d61b38 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
*************** set_foreign_size(PlannerInfo *root, RelO
*** 396,401 ****
--- 396,407 ----
  {
      /* Mark rel with estimated output rows, width, etc */
      set_foreign_size_estimates(root, rel);
+
+     /* Get FDW routine pointers for the rel */
+     rel->fdwroutine = GetFdwRoutineByRelId(rte->relid);
+
+     /* Let FDW adjust the size estimates, if it can */
+     rel->fdwroutine->GetForeignRelSize(root, rel, rte->relid);
  }

  /*
*************** set_foreign_size(PlannerInfo *root, RelO
*** 405,415 ****
  static void
  set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  {
!     FdwRoutine *fdwroutine;
!
!     /* Call the FDW's PlanForeignScan function to generate path(s) */
!     fdwroutine = GetFdwRoutineByRelId(rte->relid);
!     fdwroutine->PlanForeignScan(rte->relid, root, rel);

      /* Select cheapest path */
      set_cheapest(rel);
--- 411,418 ----
  static void
  set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  {
!     /* Call the FDW's GetForeignPaths function to generate path(s) */
!     rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);

      /* Select cheapest path */
      set_cheapest(rel);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 885d8558c319fd283df351c2c8e062a449b72d3c..24c853d47ef1aabb95156f90be4a8f3ea3d4995e 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** set_cte_size_estimates(PlannerInfo *root
*** 3745,3751 ****
   * using what will be purely datatype-driven estimates from the targetlist.
   * There is no way to do anything sane with the rows value, so we just put
   * a default estimate and hope that the wrapper can improve on it.    The
!  * wrapper's PlanForeignScan function will be called momentarily.
   *
   * The rel's targetlist and restrictinfo list must have been constructed
   * already.
--- 3745,3751 ----
   * using what will be purely datatype-driven estimates from the targetlist.
   * There is no way to do anything sane with the rows value, so we just put
   * a default estimate and hope that the wrapper can improve on it.    The
!  * wrapper's GetForeignRelSize function will be called momentarily.
   *
   * The rel's targetlist and restrictinfo list must have been constructed
   * already.
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index b1df56cafd25abfda40657555a1c832aa6db797a..94140d304f754236955452e26b80de8276ca729b 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***************
*** 20,25 ****
--- 20,26 ----
  #include <math.h>

  #include "access/skey.h"
+ #include "foreign/fdwapi.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
*************** static CteScan *make_ctescan(List *qptli
*** 119,126 ****
               Index scanrelid, int ctePlanId, int cteParam);
  static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
                     Index scanrelid, int wtParam);
- static ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
-                  Index scanrelid, bool fsSystemCol, List *fdw_private);
  static BitmapAnd *make_bitmap_and(List *bitmapplans);
  static BitmapOr *make_bitmap_or(List *bitmapplans);
  static NestLoop *make_nestloop(List *tlist,
--- 120,125 ----
*************** create_foreignscan_plan(PlannerInfo *roo
*** 1816,1822 ****
      RelOptInfo *rel = best_path->path.parent;
      Index        scan_relid = rel->relid;
      RangeTblEntry *rte;
-     bool        fsSystemCol;
      int            i;

      /* it should be a base rel... */
--- 1815,1820 ----
*************** create_foreignscan_plan(PlannerInfo *roo
*** 1825,1855 ****
      rte = planner_rt_fetch(scan_relid, root);
      Assert(rte->rtekind == RTE_RELATION);

!     /* Sort clauses into best execution order */
      scan_clauses = order_qual_clauses(root, scan_clauses);

!     /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
!     scan_clauses = extract_actual_clauses(scan_clauses, false);

!     /* Detect whether any system columns are requested from rel */
!     fsSystemCol = false;
      for (i = rel->min_attr; i < 0; i++)
      {
          if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
          {
!             fsSystemCol = true;
              break;
          }
      }

-     scan_plan = make_foreignscan(tlist,
-                                  scan_clauses,
-                                  scan_relid,
-                                  fsSystemCol,
-                                  best_path->fdw_private);
-
-     copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
-
      return scan_plan;
  }

--- 1823,1878 ----
      rte = planner_rt_fetch(scan_relid, root);
      Assert(rte->rtekind == RTE_RELATION);

!     /*
!      * Sort clauses into best execution order.  We do this first since the
!      * FDW might have more info than we do and wish to adjust the ordering.
!      */
      scan_clauses = order_qual_clauses(root, scan_clauses);

!     /*
!      * Let the FDW perform its processing on the restriction clauses and
!      * generate the plan node.  Note that the FDW might remove restriction
!      * clauses that it intends to execute remotely, or even add more (if it
!      * has selected some join clauses for remote use but also wants them
!      * rechecked locally).
!      */
!     scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
!                                                 best_path,
!                                                 tlist, scan_clauses);

!     /* Copy cost data from Path to Plan; no need to make FDW do this */
!     copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
!
!     /*
!      * Replace any outer-relation variables with nestloop params in the qual
!      * and fdw_exprs expressions.  We do this last so that the FDW doesn't
!      * have to be involved.  (Note that parts of fdw_exprs could have come
!      * from join clauses, so doing this beforehand on the scan_clauses
!      * wouldn't work.)
!      */
!     if (best_path->path.required_outer)
!     {
!         scan_plan->scan.plan.qual = (List *)
!             replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
!         scan_plan->fdw_exprs = (List *)
!             replace_nestloop_params(root, (Node *) scan_plan->fdw_exprs);
!     }
!
!     /*
!      * Detect whether any system columns are requested from rel.  This is a
!      * bit of a kluge and might go away someday, so we intentionally leave it
!      * out of the API presented to FDWs.
!      */
!     scan_plan->fsSystemCol = false;
      for (i = rel->min_attr; i < 0; i++)
      {
          if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
          {
!             scan_plan->fsSystemCol = true;
              break;
          }
      }

      return scan_plan;
  }

*************** make_worktablescan(List *qptlist,
*** 3183,3206 ****
      return node;
  }

! static ForeignScan *
  make_foreignscan(List *qptlist,
                   List *qpqual,
                   Index scanrelid,
!                  bool fsSystemCol,
                   List *fdw_private)
  {
      ForeignScan *node = makeNode(ForeignScan);
      Plan       *plan = &node->scan.plan;

!     /* cost should be inserted by caller */
      plan->targetlist = qptlist;
      plan->qual = qpqual;
      plan->lefttree = NULL;
      plan->righttree = NULL;
      node->scan.scanrelid = scanrelid;
!     node->fsSystemCol = fsSystemCol;
      node->fdw_private = fdw_private;

      return node;
  }
--- 3206,3231 ----
      return node;
  }

! ForeignScan *
  make_foreignscan(List *qptlist,
                   List *qpqual,
                   Index scanrelid,
!                  List *fdw_exprs,
                   List *fdw_private)
  {
      ForeignScan *node = makeNode(ForeignScan);
      Plan       *plan = &node->scan.plan;

!     /* cost will be filled in by create_foreignscan_plan */
      plan->targetlist = qptlist;
      plan->qual = qpqual;
      plan->lefttree = NULL;
      plan->righttree = NULL;
      node->scan.scanrelid = scanrelid;
!     node->fdw_exprs = fdw_exprs;
      node->fdw_private = fdw_private;
+     /* fsSystemCol will be filled in by create_foreignscan_plan */
+     node->fsSystemCol = false;

      return node;
  }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e1b48fb4f53061f5b06653c67275dc0b323bffd0..69396694aaa9df0edbc361ede98b3f77db6724dc 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** set_plan_refs(PlannerInfo *root, Plan *p
*** 428,433 ****
--- 428,435 ----
                      fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
                  splan->scan.plan.qual =
                      fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+                 splan->fdw_exprs =
+                     fix_scan_list(root, splan->fdw_exprs, rtoffset);
              }
              break;

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 40a420a3546f12de37459881ed587d89ae6954c3..b64db1e1c0659ea9b6af25327f689b083de845e6 100644
*** a/src/backend/optimizer/plan/subselect.c
--- b/src/backend/optimizer/plan/subselect.c
*************** finalize_plan(PlannerInfo *root, Plan *p
*** 2137,2142 ****
--- 2137,2144 ----
              break;

          case T_ForeignScan:
+             finalize_primnode((Node *) ((ForeignScan *) plan)->fdw_exprs,
+                               &context);
              context.paramids = bms_add_members(context.paramids, scan_params);
              break;

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6d1545476df7b054d48cae8a85669935d406c879..a2fc75a659e50ca7d5726ed9d024622af9e7f191 100644
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*************** create_worktablescan_path(PlannerInfo *r
*** 1767,1773 ****
   *      returning the pathnode.
   *
   * This function is never called from core Postgres; rather, it's expected
!  * to be called by the PlanForeignScan function of a foreign data wrapper.
   * We make the FDW supply all fields of the path, since we do not have any
   * way to calculate them in core.
   */
--- 1767,1773 ----
   *      returning the pathnode.
   *
   * This function is never called from core Postgres; rather, it's expected
!  * to be called by the GetForeignPaths function of a foreign data wrapper.
   * We make the FDW supply all fields of the path, since we do not have any
   * way to calculate them in core.
   */
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0cdf638c1ddb1c614eb8ef3cd2ddea4571248ff6..cee092a8810102fdf48023e180739cd8dbc3d81c 100644
*** a/src/backend/optimizer/util/relnode.c
--- b/src/backend/optimizer/util/relnode.c
*************** build_simple_rel(PlannerInfo *root, int
*** 113,118 ****
--- 113,120 ----
      rel->allvisfrac = 0;
      rel->subplan = NULL;
      rel->subroot = NULL;
+     rel->fdwroutine = NULL;
+     rel->fdw_private = NULL;
      rel->baserestrictinfo = NIL;
      rel->baserestrictcost.startup = 0;
      rel->baserestrictcost.per_tuple = 0;
*************** build_join_rel(PlannerInfo *root,
*** 366,371 ****
--- 368,375 ----
      joinrel->allvisfrac = 0;
      joinrel->subplan = NULL;
      joinrel->subroot = NULL;
+     joinrel->fdwroutine = NULL;
+     joinrel->fdw_private = NULL;
      joinrel->baserestrictinfo = NIL;
      joinrel->baserestrictcost.startup = 0;
      joinrel->baserestrictcost.per_tuple = 0;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 9e135c62069fdc200e4f3cf58fa9725847d279fb..854f17755c4543ec85a66c6a6a3376fbcffd1cda 100644
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
*************** struct ExplainState;
*** 23,31 ****
   * Callback function signatures --- see fdwhandler.sgml for more info.
   */

! typedef void (*PlanForeignScan_function) (Oid foreigntableid,
!                                           PlannerInfo *root,
!                                           RelOptInfo *baserel);

  typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
                                                      struct ExplainState *es);
--- 23,42 ----
   * Callback function signatures --- see fdwhandler.sgml for more info.
   */

! typedef void (*GetForeignRelSize_function) (PlannerInfo *root,
!                                             RelOptInfo *baserel,
!                                             Oid foreigntableid);
!
! typedef void (*GetForeignPaths_function) (PlannerInfo *root,
!                                           RelOptInfo *baserel,
!                                           Oid foreigntableid);
!
! typedef ForeignScan *(*GetForeignPlan_function) (PlannerInfo *root,
!                                                  RelOptInfo *baserel,
!                                                  Oid foreigntableid,
!                                                  ForeignPath *best_path,
!                                                  List *tlist,
!                                                  List *scan_clauses);

  typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
                                                      struct ExplainState *es);
*************** typedef struct FdwRoutine
*** 53,59 ****
  {
      NodeTag        type;

!     PlanForeignScan_function PlanForeignScan;
      ExplainForeignScan_function ExplainForeignScan;
      BeginForeignScan_function BeginForeignScan;
      IterateForeignScan_function IterateForeignScan;
--- 64,72 ----
  {
      NodeTag        type;

!     GetForeignRelSize_function GetForeignRelSize;
!     GetForeignPaths_function GetForeignPaths;
!     GetForeignPlan_function GetForeignPlan;
      ExplainForeignScan_function ExplainForeignScan;
      BeginForeignScan_function BeginForeignScan;
      IterateForeignScan_function IterateForeignScan;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 3962792d3d89a6d5077d1d682e2c3d112a6c568f..e6bb3239f4214c26aa1d70d4ca4ac50f63148ad5 100644
*** a/src/include/nodes/plannodes.h
--- b/src/include/nodes/plannodes.h
*************** typedef struct WorkTableScan
*** 462,474 ****

  /* ----------------
   *        ForeignScan node
   * ----------------
   */
  typedef struct ForeignScan
  {
      Scan        scan;
!     bool        fsSystemCol;    /* true if any "system column" is needed */
      List       *fdw_private;    /* private data for FDW */
  } ForeignScan;


--- 462,483 ----

  /* ----------------
   *        ForeignScan node
+  *
+  * fdw_exprs and fdw_private are both under the control of the foreign-data
+  * wrapper, but fdw_exprs is presumed to contain expression trees and will
+  * be post-processed accordingly by the planner; fdw_private won't be.
+  * Note that everything in both lists must be copiable by copyObject().
+  * One way to store an arbitrary blob of bytes is to represent it as a bytea
+  * Const.  Usually, though, you'll be better off choosing a representation
+  * that can be dumped usefully by nodeToString().
   * ----------------
   */
  typedef struct ForeignScan
  {
      Scan        scan;
!     List       *fdw_exprs;        /* expressions that FDW may evaluate */
      List       *fdw_private;    /* private data for FDW */
+     bool        fsSystemCol;    /* true if any "system column" is needed */
  } ForeignScan;


diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 2a686080059f3ffd26313798324323c6f2f6b56d..8616223f24a8cbe5424b89599b331538db05dd76 100644
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
*************** typedef struct PlannerInfo
*** 334,343 ****
   *        allvisfrac - fraction of disk pages that are marked all-visible
   *        subplan - plan for subquery (NULL if it's not a subquery)
   *        subroot - PlannerInfo for subquery (NULL if it's not a subquery)
   *
   *        Note: for a subquery, tuples, subplan, subroot are not set immediately
   *        upon creation of the RelOptInfo object; they are filled in when
!  *        set_base_rel_pathlist processes the object.
   *
   *        For otherrels that are appendrel members, these fields are filled
   *        in just as for a baserel.
--- 334,346 ----
   *        allvisfrac - fraction of disk pages that are marked all-visible
   *        subplan - plan for subquery (NULL if it's not a subquery)
   *        subroot - PlannerInfo for subquery (NULL if it's not a subquery)
+  *        fdwroutine - function hooks for FDW, if foreign table (else NULL)
+  *        fdw_private - private state for FDW, if foreign table (else NULL)
   *
   *        Note: for a subquery, tuples, subplan, subroot are not set immediately
   *        upon creation of the RelOptInfo object; they are filled in when
!  *        set_subquery_pathlist processes the object.  Likewise, fdwroutine
!  *        and fdw_private are filled during initial path creation.
   *
   *        For otherrels that are appendrel members, these fields are filled
   *        in just as for a baserel.
*************** typedef struct RelOptInfo
*** 414,421 ****
--- 417,428 ----
      BlockNumber pages;            /* size estimates derived from pg_class */
      double        tuples;
      double        allvisfrac;
+     /* use "struct Plan" to avoid including plannodes.h here */
      struct Plan *subplan;        /* if subquery */
      PlannerInfo *subroot;        /* if subquery */
+     /* use "struct FdwRoutine" to avoid including fdwapi.h here */
+     struct FdwRoutine *fdwroutine;    /* if foreign table */
+     void       *fdw_private;    /* if foreign table */

      /* used by various scans and joins: */
      List       *baserestrictinfo;        /* RestrictInfo structures (if base
*************** typedef struct TidPath
*** 793,806 ****
  } TidPath;

  /*
!  * ForeignPath represents a scan of a foreign table
   *
!  * fdw_private contains FDW private data about the scan, which will be copied
!  * to the final ForeignScan plan node so that it is available at execution
!  * time.  Note that everything in this list must be copiable by copyObject().
!  * One way to store an arbitrary blob of bytes is to represent it as a bytea
!  * Const.  Usually, though, you'll be better off choosing a representation
!  * that can be dumped usefully by nodeToString().
   */
  typedef struct ForeignPath
  {
--- 800,812 ----
  } TidPath;

  /*
!  * ForeignPath represents a potential scan of a foreign table
   *
!  * fdw_private stores FDW private data about the scan.  While fdw_private is
!  * not actually touched by the core code during normal operations, it's
!  * generally a good idea to use a representation that can be dumped by
!  * nodeToString(), so that you can examine the structure during debugging
!  * with tools like pprint().
   */
  typedef struct ForeignPath
  {
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 8bd603124b3ac8ab87af160646301c7bbf8dbf16..47cc39cf1d9c3646fb10b76e6c4a97e166e65e08 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern Plan *optimize_minmax_aggregates(
*** 42,47 ****
--- 42,49 ----
  extern Plan *create_plan(PlannerInfo *root, Path *best_path);
  extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
                    Index scanrelid, Plan *subplan);
+ extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
+                  Index scanrelid, List *fdw_exprs, List *fdw_private);
  extern Append *make_append(List *appendplans, List *tlist);
  extern RecursiveUnion *make_recursive_union(List *tlist,
                       Plan *lefttree, Plan *righttree, int wtParam,

Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/03/09 14:00), Tom Lane wrote:
> I wrote:
>> There are a couple of other points that make me think we need to revisit
>> the PlanForeignScan API definition some more, too.  ...
>> So we need to break down what PlanForeignScan currently does into three
>> separate steps.  The first idea that comes to mind is to call them
>> GetForeignRelSize, GetForeignPaths, GetForeignPlan; but maybe somebody
>> has a better idea for names?
>
> Attached is a draft patch for that.

1. FilefdwPlanState.pages and FileFdwPlanState.ntuples seems redundant.
  Why not use RelOptInfo.pages and RelOptInfo.tuples?

2. IMHO RelOptInfo.fdw_private seems confusing.  How about renaming it
to e.g., RelOptInfo.fdw_state?

Attached is a patch for the draft patch.

Best regards,
Etsuro Fujita

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
> (2012/03/09 14:00), Tom Lane wrote:
>> Attached is a draft patch for that.

> 1. FilefdwPlanState.pages and FileFdwPlanState.ntuples seems redundant. 
>   Why not use RelOptInfo.pages and RelOptInfo.tuples?

I intentionally avoided setting RelOptInfo.pages because that would have
other effects on planning (cf total_table_pages or whatever it's
called).  It's possible that that would actually be desirable, depending
on whether you think the external file should be counted as part of the
query's disk-access footprint; but it would take some investigation to
conclude that, which I didn't feel like doing right now.  Likewise, I'm
not sure offhand what side effects might occur from using
RelOptInfo.tuples, and didn't want to change file_fdw's behavior without
more checking.

> 2. IMHO RelOptInfo.fdw_private seems confusing.  How about renaming it 
> to e.g., RelOptInfo.fdw_state?

Why is that better?  It seems just as open to confusion with another
field (ie, the execution-time fdw_state).  I thought for a little bit
about trying to give different names to all four of the fdw private
fields (RelOptInfo, Path, Plan, PlanState) but it's not obvious what
naming rule to use, and at least the last two of those can't be changed
without breaking existing FDW code.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
I've not read whole of the patch yet, but I have basic questions.

1) IIUC, GetForeignRelSize should set baserel->rows to the number of
rows the ForeignScan node returns to upper node, but not the number
of rows FDW returns to core executor, right?

BTW, once Fujita-san's ANALYZE support patch is merged, we will be
able to get rows estimatation easily by calling clauselist_selectivity
with baserel->tuples and baserestrictinfo.  Otherwise, pgsql_fdw
would still need to execute EXPLAIN on remote side to get meaningful
rows estimation.

2) ISTM that pgsql_fdw needs to execute EXPLAIN on remote side for each
possible remote query to get meaningful costs estimation, and it
requires pgsql_fdw to generate SQL statements in GetForeignPaths.
I worry that I've misunderstood intention of your design because
you've mentioned postponing SQL deparsing to createplan time.

I'll read the document  and patch, and fix pgsql_fdw so that it can
work with new API.  As for now, I think that pgsqlPlanForeignScan
should be separated like below:

GetForeignRelSize
1) Retrieve catalog infomation via GetForeignFoo funcitons.
2) Generate simple remote query which has no WHERE clause.
3) Execute EXPLAIN of simple query, and get rows and costs estimation.
4) Set baserel->rows.

All information above are stored in baserel->fdw_private to use them
in subsequent GetForeignPaths.

If ANALYZE of foreign tables is supported, we can postpone 2) and 3)
to GetForeignPaths.

GetForeignPaths
1) Repeat for each possible remote query:
1-1) Generate remote query, such as with-WHERE and with-ORDER BY.
1-2) Execute EXPLAIN of generated query, and get costs estimation
(rows estimation is ignored because it's useless in planning).
1-3) Call add_path and create_foreignscan_path for the query.

GetForeignPlan
1) Create fdw_exprs from baserestrictinfo, with removing clauses
which are pushed down by selected path.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/09 1:18), Tom Lane wrote:
> I've been looking at this patch a little bit over the past day or so.
> I'm pretty unhappy with deparse.c --- it seems like a real kluge,
> inefficient and full of corner-case bugs.  After some thought I believe
> that you're ultimately going to have to abandon depending on ruleutils.c
> for reverse-listing services, and it would be best to bite that bullet
> now and rewrite this code from scratch.

Thanks for the review.  Agreed to write own depraser for pgsql_fdw
which handles nodes which can be pushed down.  Every SQL-based FDW
which constructs SQL statement for each local query would need such
module inside.

BTW, pgsql_fdw pushes only built-in objects which have no collation
effect down to remote side, because user-defined objects might have
different semantics on remote end.  In future, such deparser will
need some mechanism to map local object (or expression?) to remote
one, like ROUTINE MAPPING, as discussed before.  But it seems ok to
assume that built-in objects have same name and semantics on remote
end.

> There are a couple of other points that make me think we need to revisit
> the PlanForeignScan API definition some more, too.  First, deparse.c is
> far from cheap.  While this doesn't matter greatly as long as there's
> only one possible path for a foreign table, as soon as you want to
> create more than one it's going to be annoying to do all that work N
> times and then throw away N-1 of the results.

Indeed deprase.c is not cheap, but I think that pgsql_fdw can avoid
redundant works by deparsing SQL statement separately, unless we need
to consider join-push-down.  Possible parts are SELECT, FROM, WHERE
and ORDER BY clauses.  Simplest path uses SELECT and FROM, and other
paths can be built by copying necessary clauses into individual
buffers.

Comments to the rest part are in my another reply to your recent post.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
(2012/03/06 23:47), Albe Laurenz wrote:
> Shigeru Hanada wrote:
>>                 Connection should be closed only when the trigger is a
>> top level transaction and it's aborting, but isTopLevel flag was not
>> checked.  I fixed the bug and added regression tests for such cases.
>
> I wondered about that - is it really necessary to close the remote
> connection? Wouldn't a ROLLBACK on the remote connection be good enough?

Rolling back remote transaction seems enough, when the error comes
from local reason and remote connection is still available.  However,
I'd rather disconnect always to keep error handling simple and
centralized in cleanup_connection.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> I've not read whole of the patch yet, but I have basic questions.

> 1) IIUC, GetForeignRelSize should set baserel->rows to the number of
> rows the ForeignScan node returns to upper node, but not the number
> of rows FDW returns to core executor, right?

It should be the number of rows estimated to pass the baserestrictinfo
restriction clauses, so yeah, not the same as what the FDW would return,
except in cases where all the restriction clauses are handled internally
by the FDW.

> BTW, once Fujita-san's ANALYZE support patch is merged, we will be
> able to get rows estimatation easily by calling clauselist_selectivity
> with baserel->tuples and baserestrictinfo.  Otherwise, pgsql_fdw
> would still need to execute EXPLAIN on remote side to get meaningful
> rows estimation.

Yeah, one of the issues for that patch is how we see it coexisting with
the option of doing a remote-side EXPLAIN.

> 2) ISTM that pgsql_fdw needs to execute EXPLAIN on remote side for each
> possible remote query to get meaningful costs estimation, and it
> requires pgsql_fdw to generate SQL statements in GetForeignPaths.
> I worry that I've misunderstood intention of your design because
> you've mentioned postponing SQL deparsing to createplan time.

If you want to get the cost estimates that way, then yes, you'd be
needing to do some SQL-statement-construction earlier than final plan
generation.  But it's not apparent to me that those statements would
necessarily be the same as, or even very similar to, what the final
queries would be.  For instance, you'd probably try to reduce parameters
to constants for estimation purposes.

> GetForeignPaths
> 1) Repeat for each possible remote query:
> 1-1) Generate remote query, such as with-WHERE and with-ORDER BY.
> 1-2) Execute EXPLAIN of generated query, and get costs estimation
> (rows estimation is ignored because it's useless in planning).

Why do you say that?
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> Thanks for the review.  Agreed to write own depraser for pgsql_fdw
> which handles nodes which can be pushed down.  Every SQL-based FDW
> which constructs SQL statement for each local query would need such
> module inside.

Yeah.  That's kind of annoying, and the first thing you think of is that
we ought to find a way to share that code somehow.  But I think it's
folly to try to design a shared implementation until we have some
concrete implementations to compare.  An Oracle FDW, for instance, would
need to emit SQL code with many differences in detail from pgsql_fdw.
It's not clear to me whether a shared implementation is even practical,
but for sure I don't want to try to build it before we've done some
prototype single-purpose implementations.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Martijn van Oosterhout
Date:
On Sat, Mar 10, 2012 at 11:38:51AM -0500, Tom Lane wrote:
> Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> > Thanks for the review.  Agreed to write own depraser for pgsql_fdw
> > which handles nodes which can be pushed down.  Every SQL-based FDW
> > which constructs SQL statement for each local query would need such
> > module inside.
>
> Yeah.  That's kind of annoying, and the first thing you think of is that
> we ought to find a way to share that code somehow.  But I think it's
> folly to try to design a shared implementation until we have some
> concrete implementations to compare.  An Oracle FDW, for instance, would
> need to emit SQL code with many differences in detail from pgsql_fdw.
> It's not clear to me whether a shared implementation is even practical,
> but for sure I don't want to try to build it before we've done some
> prototype single-purpose implementations.

FWIW, this sounds like the "compiler" mechanism in SQLalchemy for
turning SQL node trees into strings. The basic idea is you define
functions for converting nodes to strings. Stuff like "And" and "Or"
works for every database, but then "dialects" can override different
things.

So you have for Postgres: Node(foo) => $1, but to other databases
perhaps :field1. But most of the other code can be shared..

http://docs.sqlalchemy.org/en/latest/core/compiler.html

In my experience it works well for generating custom constructs. They
have compilers for 11 different database engines, so it seems flexible
enough.  Mind you, they also handle DDL mapping (where most of the
variation is) and datatype translations, which seems a lot further than
we need here.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/03/09 23:48), Tom Lane wrote:
> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  writes:

Thank you for your answer.

>> 1. FilefdwPlanState.pages and FileFdwPlanState.ntuples seems redundant.
>>    Why not use RelOptInfo.pages and RelOptInfo.tuples?
> 
> I intentionally avoided setting RelOptInfo.pages because that would have
> other effects on planning (cf total_table_pages or whatever it's
> called).  It's possible that that would actually be desirable, depending
> on whether you think the external file should be counted as part of the
> query's disk-access footprint; but it would take some investigation to
> conclude that, which I didn't feel like doing right now.  Likewise, I'm
> not sure offhand what side effects might occur from using
> RelOptInfo.tuples, and didn't want to change file_fdw's behavior without
> more checking.

OK

>> 2. IMHO RelOptInfo.fdw_private seems confusing.  How about renaming it
>> to e.g., RelOptInfo.fdw_state?
> 
> Why is that better?  It seems just as open to confusion with another
> field (ie, the execution-time fdw_state).

I thought the risk.  However, I feel that the naming of
RelOptInfo.fdw_state is not so bad because it is used only at the query
planning time, not used along with the execution-time fdw_private.  The
naming of RelOptInfo.fdw_private seems as open to confusion to me
because it would have to be used along with Path.fdw_private or
Plan.fdw_private in FDW's functions at the planning time, while I guess
that the contents of RelOptInfo.fdw_private are relatively far from the
ones of fdw_private of Path and Plan.

Best regards,
Etsuro Fujita


Corrected: Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
(2012/03/12 13:04), Etsuro Fujita wrote:
> (2012/03/09 23:48), Tom Lane wrote:
>> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>   writes:

>>> 2. IMHO RelOptInfo.fdw_private seems confusing.  How about renaming it
>>> to e.g., RelOptInfo.fdw_state?
>>
>> Why is that better?  It seems just as open to confusion with another
>> field (ie, the execution-time fdw_state).
> 
> I thought the risk.  However, I feel that the naming of
> RelOptInfo.fdw_state is not so bad because it is used only at the query
> planning time, not used along with the execution-time fdw_private.

I wrote the execution-time fdw_private by mistake.  I meant the
execution-time fdw_state.  I'm sorry about that.

Best regards,
Etsuro Fujita


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Tom Lane wrote:
> Shigeru Hanada <shigeru.hanada@gmail.com> writes:
>> Thanks for the review.  Agreed to write own depraser for pgsql_fdw
>> which handles nodes which can be pushed down.  Every SQL-based FDW
>> which constructs SQL statement for each local query would need such
>> module inside.
>
> Yeah.  That's kind of annoying, and the first thing you think of is
that
> we ought to find a way to share that code somehow.  But I think it's
> folly to try to design a shared implementation until we have some
> concrete implementations to compare.  An Oracle FDW, for instance,
would
> need to emit SQL code with many differences in detail from pgsql_fdw.
> It's not clear to me whether a shared implementation is even
practical,
> but for sure I don't want to try to build it before we've done some
> prototype single-purpose implementations.

Having written something like that for Oracle, I tend to share that
opinion.  Anything general-purpose enough to cater for every whim and
oddity of the remote system would probably be so unwieldy that it
wouldn't be much easier to use it than to write the whole thing from
scratch.  To illustrate this, a few examples from the Oracle case:

- Empty strings have different semantics in Oracle (to wit, they mean
NULL). So you can push down all string constants except empty strings.
- Oracle can only represent intervals with just year and month or with just day of month and smaller fields.  So you
caneither punt on intervals or translate only the ones that fit the bill. 
- You can push down "-" for date arithmetic except when both operands on the Oracle side are of type DATE, because that
wouldresult in a NUMERIC value (number of days between). 

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
I fixed pgsql_fdw to use new FDW API.  Also I fixed to use own deparser
to generate remote queries, so these patch don't contain changes for
backend codes any more.  Now planning foreign scan is done in the steps
below:

1) in GerForeignRelSize, pgsql_fdw generates simple SELECT statement
which has no WHERE clause from given information such as PlannerInfo and
RelOptInfo.  This query string is used to execute remote EXPLAIN in
order to estimate number of rows which will be returned from the scan.
Remote EXPLAIN doesn't return such number directly, so pgsql_fdw calls
set_baserel_size_estimates function which calculates number of rows with
selectivity of quals and reltuples.  It also estimates width, but
currently no statistic is available for foreign tables, so pgsql_fdw
overrides it with width value provided by remote EXPLAIN.  If we support
ANALYZE for foreign tables, we would be able to defer EXPLAIN until
GetForeignPaths, because local statistics are enough information to
estimate number of rows returned by the scan.

2) in GetForeignPaths, first, pgsql_fdw considers a very simple Path
which doesn't push any expression down to remote end.  It is like
SeqScan for regular tables.  Backend will filter the result with all
quals in baserestrictinfo.  In push-down supported version, pgsql_fdw
also considers another Path which pushes conditions as much as possible.
 This would reduce the amount of data transfer, and clocks used to
convert strings to tuples at local side.  pgsql_fdw emits another
EXPLAIN for this path, though we might be able to omit.  Planner can
choose either path with basis of their costs.  If pgsql_fdw can know
that the remote table is indexed, pgsql_fdw would be able to consider
sorted Path for each remote index.

3) in GetForeignPlan, pgsql_fdw creates only one ForeignScan node from
given best_path.  Currently pgsql_fdw uses SQL-level cursor in order to
avoid out-of-memory by huge result set, so we need to construct several
SQL statements for the plan.  Old implementation has created SQL
statements in Path phase, but now it's deferred until Plan phase.  This
change would avoid possible unnecessary string operation.  I worry that
I've misunderstood the purpose of fdw_exprs...

Although the patches are still WIP, especially in WHERE push-down part,
but I'd like to post them so that I can get feedback of the design as
soon as possible.

(2012/03/11 1:34), Tom Lane wrote:
>> 1) IIUC, GetForeignRelSize should set baserel->rows to the number of
>> rows the ForeignScan node returns to upper node, but not the number
>> of rows FDW returns to core executor, right?
>
> It should be the number of rows estimated to pass the baserestrictinfo
> restriction clauses, so yeah, not the same as what the FDW would return,
> except in cases where all the restriction clauses are handled internally
> by the FDW.
>
>> BTW, once Fujita-san's ANALYZE support patch is merged, we will be
>> able to get rows estimatation easily by calling clauselist_selectivity
>> with baserel->tuples and baserestrictinfo.  Otherwise, pgsql_fdw
>> would still need to execute EXPLAIN on remote side to get meaningful
>> rows estimation.
>
> Yeah, one of the issues for that patch is how we see it coexisting with
> the option of doing a remote-side EXPLAIN.

It seems not so easy to determine whether remote EXPLAIN is better from
local statistics.  An easy way is having a per-relation FDW option like
"use_local_stats" or something for pgsql_fdw, but it doesn't sound right
because other FDWs have same problem...

>> 2) ISTM that pgsql_fdw needs to execute EXPLAIN on remote side for each
>> possible remote query to get meaningful costs estimation, and it
>> requires pgsql_fdw to generate SQL statements in GetForeignPaths.
>> I worry that I've misunderstood intention of your design because
>> you've mentioned postponing SQL deparsing to createplan time.
>
> If you want to get the cost estimates that way, then yes, you'd be
> needing to do some SQL-statement-construction earlier than final plan
> generation.  But it's not apparent to me that those statements would
> necessarily be the same as, or even very similar to, what the final
> queries would be.  For instance, you'd probably try to reduce parameters
> to constants for estimation purposes.

Hm, I though that using queries same as final ones has no overhead, if
we don't need to  deparse clauses redundantly.  In current
implementation, WHERE clause is deparsed only once for a scan, and basis
of the query (SELLECT ... FROM ...) is also deparseed only once.  Indeed
string copy is not avoidable, but I feel that's not big problem.  Thoughts?

Regards,
--
Shigeru Hanada

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/03/15 23:06), Shigeru HANADA wrote:
> Although the patches are still WIP, especially in WHERE push-down part,
> but I'd like to post them so that I can get feedback of the design as
> soon as possible.

I've implemented pgsql_fdw's own deparser and enhanced some features
since last post.  Please apply attached patches in the order below:

* pgsql_fdw_v17.patch
    - Adds pgsql_fdw as contrib module
* pgsql_fdw_pushdown_v10.patch
    - Adds WHERE push down capability to pgsql_fdw
* pgsql_fdw_analyze_v1.patch
    - Adds pgsql_fdw_analyze function for updating local stats

Changes from previous version
=============================

1) Don't use remote EXPLAIN for cost/rows estimation, so now planner
estimates result rows and costs on the basis of local statistics such as
pg_class and pg_statistic.  To update local statistics, I added
pgsql_fdw_analyze() SQL function which updates local statistics of a
foreign table by retrieving remote statistics, such as pg_class and
pg_statistic, via libpq.  This would make the planning of pgsql_fdw
simple and fast.  This function can be easily modified to handle ANALYZE
command invoked for a foreign table (Fujita-san is proposing this as
common feature in another thread).

2) Defer planning stuffs as long as possible to clarify the role of each
function.  Currently GetRelSize just estimates result rows from local
statistics, and GetPaths adds only one path which represents SeqScan on
remote side.  As result of this change, PgsqlFdwPlanState struct is
obsolete.

3) Implement pgsql_fdw's own deparser which pushes down collation-free
and immutable expressions in local WHERE clause.  This means that most
of numeric conditions can be pushed down, but conditions using character
types are not.

Most of nodes are deparsed in straightforward way, but OpExpr is not.
OpExpr is deparsed with OPERATOR() notation to specify operator's
schema explicitly.  This would prevent us from possible search_path problem.

    [local query]
    WHERE -col = -1
    [remote query]
    WHERE ((OPERATOR(pg_class.-) col) OPERATOR(pg_class.=) 1)

4) Pushed down quals are not evaluated on local side again.  When
creating ForeignScan node for chosen best path, pushed down expressions
are removed from "qpqual" parameter of make_foreignscan, so a qualifier
is evaluated only once at local or remote.

5) EXPLAIN on pgsql_fdw foreign tables show simple SELECT statement.
DECLARE statement including cursor name is still available in VERBOSE
mode.  (I feel that showing DECLARE always is little noisy...)

Regards,
--
Shigeru HANADA

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru HANADA wrote:
> I've implemented pgsql_fdw's own deparser and enhanced some features
> since last post.  Please apply attached patches in the order below:

> Changes from previous version
> =============================
>
> 1) Don't use remote EXPLAIN for cost/rows estimation, so now planner
> estimates result rows and costs on the basis of local statistics such
as
> pg_class and pg_statistic.  To update local statistics, I added
> pgsql_fdw_analyze() SQL function which updates local statistics of a
> foreign table by retrieving remote statistics, such as pg_class and
> pg_statistic, via libpq.  This would make the planning of pgsql_fdw
> simple and fast.  This function can be easily modified to handle
ANALYZE
> command invoked for a foreign table (Fujita-san is proposing this as
> common feature in another thread).
>
> 2) Defer planning stuffs as long as possible to clarify the role of
each
> function.  Currently GetRelSize just estimates result rows from local
> statistics, and GetPaths adds only one path which represents SeqScan
on
> remote side.  As result of this change, PgsqlFdwPlanState struct is
> obsolete.

I see the advantage of being able to do all this locally, but
I think there are a lot of downsides too:
- You have an additional maintenance task if you want to keep statistics for remote tables accurate.  I understand that
thismay get better in a future release. 
- You depend on the structure of pg_statistic, which means a potential incompatibility between server versions.  You
canadd cases to pgsql_fdw_analyze to cater for changes, but that is cumbersome and 
will only help for later PostgreSQL versions connecting to earlier ones.
- Planning and execution will change (improve, of course) between server versions.  The local planner may choose an
inferiorplan based on a wrong assumption of how a certain query can be handled on the remote. 
- You have no statistics if the foreign table points to a view on the remote system.

My gut feeling is that planning should be done by the server which
will execute the query.

> 3) Implement pgsql_fdw's own deparser which pushes down collation-free
> and immutable expressions in local WHERE clause.  This means that most
> of numeric conditions can be pushed down, but conditions using
character
> types are not.

I understand that this is simple and practical, but it is a pity that
this excludes equality and inequality conditions on strings.
Correct me if I am wrong, but I thought that these work the same
regardless of the collation.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
Thanks for the comments.

(2012/03/27 18:36), Albe Laurenz wrote:
>> 2) Defer planning stuffs as long as possible to clarify the role of
> each
>> function.  Currently GetRelSize just estimates result rows from local
>> statistics, and GetPaths adds only one path which represents SeqScan
> on
>> remote side.  As result of this change, PgsqlFdwPlanState struct is
>> obsolete.
> 
> I see the advantage of being able to do all this locally, but
> I think there are a lot of downsides too:
> - You have an additional maintenance task if you want to keep
>    statistics for remote tables accurate.  I understand that this may
>    get better in a future release.
> - You depend on the structure of pg_statistic, which means a potential
>    incompatibility between server versions.  You can add cases to
>    pgsql_fdw_analyze to cater for changes, but that is cumbersome and
> will
>    only help for later PostgreSQL versions connecting to earlier ones.
> - Planning and execution will change (improve, of course) between server
>    versions.  The local planner may choose an inferior plan based on a
>    wrong assumption of how a certain query can be handled on the remote.
> - You have no statistics if the foreign table points to a view on the
>    remote system.

Especially for 2nd and 4th, generating pg_statistic records without
calling do_analyze_rel() seems unpractical in multiple version
environment.  As you pointed out, I've missed another
semantics-different problem here.  We would have to use do_analyze_rel()
and custom sampling function which returns sample rows from remote data
source, if we want to have statistics of foreign data locally.  This
method would be available for most of FDWs, but requires some changes in
core.  [I'll comment on Fujita-san's ANALYZE patch about this issue soon.]

> My gut feeling is that planning should be done by the server which
> will execute the query.

Agreed, if selectivity of both local filtering and remote filtering were
available, we can estimate result rows correctly and choose better plan.

How about getting # of rows estimate by executing EXPLAIN for
fully-fledged remote query (IOW, contains pushed-down WHERE clause), and
estimate selectivity of local filter on the basis of the statistics
which are generated by FDW via do_analyze_rel() and FDW-specific
sampling function?  In this design, we would be able to use quite
correct rows estimate because we can consider filtering stuffs done on
each side separately, though it requires expensive remote EXPLAIN for
each possible path.

>> 3) Implement pgsql_fdw's own deparser which pushes down collation-free
>> and immutable expressions in local WHERE clause.  This means that most
>> of numeric conditions can be pushed down, but conditions using
> character
>> types are not.
> 
> I understand that this is simple and practical, but it is a pity that
> this excludes equality and inequality conditions on strings.
> Correct me if I am wrong, but I thought that these work the same
> regardless of the collation.

You are right, built-in equality and inequality operators don't cause
collation problem.  Perhaps allowing them would cover significant cases
of string comparison, but I'm not sure how to determine whether an
operator is = or != in generic way.  We might have to hold list of oid
for collation-safe operator/functions until we support ROUTINE MAPPING
or something like that...  Anyway, I'll fix pgsql_fdw to allow = and !=
for character types.

Regards,
-- 
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Thom Brown
Date:
2012/3/26 Shigeru HANADA <shigeru.hanada@gmail.com>:
> (2012/03/15 23:06), Shigeru HANADA wrote:
>> Although the patches are still WIP, especially in WHERE push-down part,
>> but I'd like to post them so that I can get feedback of the design as
>> soon as possible.
>
> I've implemented pgsql_fdw's own deparser and enhanced some features
> since last post.  Please apply attached patches in the order below:
>
> * pgsql_fdw_v17.patch
>    - Adds pgsql_fdw as contrib module
> * pgsql_fdw_pushdown_v10.patch
>    - Adds WHERE push down capability to pgsql_fdw
> * pgsql_fdw_analyze_v1.patch
>    - Adds pgsql_fdw_analyze function for updating local stats

Hmm... I've applied them using the latest Git master, and in the order
specified, but I can't build the contrib module.  What am I doing
wrong?

It starts off with things like:

../../src/Makefile.global:420: warning: overriding commands for target
`submake-libpq'
../../src/Makefile.global:420: warning: ignoring old commands for
target `submake-libpq'

and later:

pgsql_fdw.c: In function ‘pgsqlGetForeignPlan’:
pgsql_fdw.c:321:2: warning: implicit declaration of function
‘extractRemoteExprs’ [-Wimplicit-function-declaration]
pgsql_fdw.c:321:12: warning: assignment makes pointer from integer
without a cast [enabled by default]
pgsql_fdw.c:362:3: warning: implicit declaration of function
‘deparseWhereClause’ [-Wimplicit-function-declaration]
pgsql_fdw.c: At top level:
pgsql_fdw.c:972:1: error: redefinition of ‘Pg_magic_func’
pgsql_fdw.c:39:1: note: previous definition of ‘Pg_magic_func’ was here

--
Thom


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru HANADA wrote:
>> My gut feeling is that planning should be done by the server which
>> will execute the query.
>
> Agreed, if selectivity of both local filtering and remote filtering
were
> available, we can estimate result rows correctly and choose better
plan.
>
> How about getting # of rows estimate by executing EXPLAIN for
> fully-fledged remote query (IOW, contains pushed-down WHERE clause),
and
> estimate selectivity of local filter on the basis of the statistics
> which are generated by FDW via do_analyze_rel() and FDW-specific
> sampling function?  In this design, we would be able to use quite
> correct rows estimate because we can consider filtering stuffs done on
> each side separately, though it requires expensive remote EXPLAIN for
> each possible path.

That sounds nice.
How would that work with a query that has one condition that could be
pushed down and one that has to be filtered locally?
Would you use the (local) statistics for the full table or can you
somehow account for the fact that rows have already been filtered
out remotely, which might influence the distribution?

> You are right, built-in equality and inequality operators don't cause
> collation problem.  Perhaps allowing them would cover significant
cases
> of string comparison, but I'm not sure how to determine whether an
> operator is = or != in generic way.  We might have to hold list of oid
> for collation-safe operator/functions until we support ROUTINE MAPPING
> or something like that...  Anyway, I'll fix pgsql_fdw to allow = and
!=
> for character types.

I believe that this covers a significant percentage of real-world cases.
I'd think that every built-in operator with name "=" or "<>" could
be pushed down.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/03/27 20:32), Thom Brown wrote:
> 2012/3/26 Shigeru HANADA<shigeru.hanada@gmail.com>:
>> * pgsql_fdw_v17.patch
>>     - Adds pgsql_fdw as contrib module
>> * pgsql_fdw_pushdown_v10.patch
>>     - Adds WHERE push down capability to pgsql_fdw
>> * pgsql_fdw_analyze_v1.patch
>>     - Adds pgsql_fdw_analyze function for updating local stats
> 
> Hmm... I've applied them using the latest Git master, and in the order
> specified, but I can't build the contrib module.  What am I doing
> wrong?

I'm sorry, but I couldn't reproduce the errors with this procedure.

$ git checkout master
$ git pull upstream master      # make master branch up-to-date
$ git clean -fd                 # remove files for other branches
$ make clean                    # just in case
$ patch -p1 < /path/to/pgsql_fdw_v17.patch
$ patch -p1 < /path/to/pgsql_fdw_pushdown_v10.patch
$ patch -p1 < /path/to/pgsql_fdw_analyze_v1.patch
$ make                          # make core first for libpq et al.
$ cd contrib/pgsql_fdw
$ make                          # pgsql_fdw

Please try "git clean" and "make clean", if you have not.
FWIW, I'm using GNU Make 3.82 and gcc 4.6.0 on Fedora 15.

Regards,
-- 
Shigeru HANADA


Re: pgsql_fdw, FDW for PostgreSQL server

From
Thom Brown
Date:
2012/3/28 Shigeru HANADA <shigeru.hanada@gmail.com>:
> (2012/03/27 20:32), Thom Brown wrote:
>> 2012/3/26 Shigeru HANADA<shigeru.hanada@gmail.com>:
>>> * pgsql_fdw_v17.patch
>>>     - Adds pgsql_fdw as contrib module
>>> * pgsql_fdw_pushdown_v10.patch
>>>     - Adds WHERE push down capability to pgsql_fdw
>>> * pgsql_fdw_analyze_v1.patch
>>>     - Adds pgsql_fdw_analyze function for updating local stats
>>
>> Hmm... I've applied them using the latest Git master, and in the order
>> specified, but I can't build the contrib module.  What am I doing
>> wrong?
>
> I'm sorry, but I couldn't reproduce the errors with this procedure.
>
> $ git checkout master
> $ git pull upstream master      # make master branch up-to-date
> $ git clean -fd                 # remove files for other branches
> $ make clean                    # just in case
> $ patch -p1 < /path/to/pgsql_fdw_v17.patch
> $ patch -p1 < /path/to/pgsql_fdw_pushdown_v10.patch
> $ patch -p1 < /path/to/pgsql_fdw_analyze_v1.patch
> $ make                          # make core first for libpq et al.
> $ cd contrib/pgsql_fdw
> $ make                          # pgsql_fdw
>
> Please try "git clean" and "make clean", if you have not.
> FWIW, I'm using GNU Make 3.82 and gcc 4.6.0 on Fedora 15.

I had done a make clean, git stash and git clean -f, but I didn't try
git clean -fd.  For some reason it's working now.

Thanks

--
Thom


Re: pgsql_fdw, FDW for PostgreSQL server

From
Thom Brown
Date:
On 28 March 2012 08:13, Thom Brown <thom@linux.com> wrote:
> 2012/3/28 Shigeru HANADA <shigeru.hanada@gmail.com>:
>> (2012/03/27 20:32), Thom Brown wrote:
>>> 2012/3/26 Shigeru HANADA<shigeru.hanada@gmail.com>:
>>>> * pgsql_fdw_v17.patch
>>>>     - Adds pgsql_fdw as contrib module
>>>> * pgsql_fdw_pushdown_v10.patch
>>>>     - Adds WHERE push down capability to pgsql_fdw
>>>> * pgsql_fdw_analyze_v1.patch
>>>>     - Adds pgsql_fdw_analyze function for updating local stats
>>>
>>> Hmm... I've applied them using the latest Git master, and in the order
>>> specified, but I can't build the contrib module.  What am I doing
>>> wrong?
>>
>> I'm sorry, but I couldn't reproduce the errors with this procedure.
>>
>> $ git checkout master
>> $ git pull upstream master      # make master branch up-to-date
>> $ git clean -fd                 # remove files for other branches
>> $ make clean                    # just in case
>> $ patch -p1 < /path/to/pgsql_fdw_v17.patch
>> $ patch -p1 < /path/to/pgsql_fdw_pushdown_v10.patch
>> $ patch -p1 < /path/to/pgsql_fdw_analyze_v1.patch
>> $ make                          # make core first for libpq et al.
>> $ cd contrib/pgsql_fdw
>> $ make                          # pgsql_fdw
>>
>> Please try "git clean" and "make clean", if you have not.
>> FWIW, I'm using GNU Make 3.82 and gcc 4.6.0 on Fedora 15.
>
> I had done a make clean, git stash and git clean -f, but I didn't try
> git clean -fd.  For some reason it's working now.

Hmm.. I'm getting some rather odd errors though:

thom@test=# select * from stuff limit 3 ;
DEBUG:  StartTransactionCommand
DEBUG:  StartTransaction
DEBUG:  name: unnamed; blockState:       DEFAULT; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
LOG:  statement: select * from stuff limit 3 ;
DEBUG:  relid=16402 fetch_count=10000
DEBUG:  Remote SQL: SELECT id, stuff, age FROM public.stuff
DEBUG:  starting remote transaction with "START TRANSACTION ISOLATION
LEVEL REPEATABLE READ"
ERROR:  could not declare cursor
DETAIL:  ERROR:  relation "public.stuff" does not exist
LINE 1: ...or_6 SCROLL CURSOR FOR SELECT id, stuff, age FROM public.stu...
             ^ 

HINT:  DECLARE pgsql_fdw_cursor_6 SCROLL CURSOR FOR SELECT id, stuff,
age FROM public.stuff

The table in question indeed doesn't exist, but I'm confused as to why
the user is being exposed to such messages.

And more troublesome:

(local select on foreign server):

test=# select * from stuff limit 3 ;id |  thing   | age
----+----------+----- 1 | STANDARD |  30 2 | STANDARD |  29 3 | STANDARD |  12
(3 rows)

(foreign select on foreign server):

thom@test=# select * from stuff limit 3 ;id |      stuff      | age
----+-----------------+----- 1 | (1,STANDARD,30) |  30 2 | (2,STANDARD,29) |  29 3 | (3,STANDARD,12) |  12
(3 rows)


The row expansion seems to incorrectly rewrite the column without a
table prefix if both column and table name are identical.

--
Thom


Re: pgsql_fdw, FDW for PostgreSQL server

From
Thom Brown
Date:
On 28 March 2012 08:39, Thom Brown <thom@linux.com> wrote:
> On 28 March 2012 08:13, Thom Brown <thom@linux.com> wrote:
>> 2012/3/28 Shigeru HANADA <shigeru.hanada@gmail.com>:
>>> (2012/03/27 20:32), Thom Brown wrote:
>>>> 2012/3/26 Shigeru HANADA<shigeru.hanada@gmail.com>:
>>>>> * pgsql_fdw_v17.patch
>>>>>     - Adds pgsql_fdw as contrib module
>>>>> * pgsql_fdw_pushdown_v10.patch
>>>>>     - Adds WHERE push down capability to pgsql_fdw
>>>>> * pgsql_fdw_analyze_v1.patch
>>>>>     - Adds pgsql_fdw_analyze function for updating local stats
>>>>
>>>> Hmm... I've applied them using the latest Git master, and in the order
>>>> specified, but I can't build the contrib module.  What am I doing
>>>> wrong?
>>>
>>> I'm sorry, but I couldn't reproduce the errors with this procedure.
>>>
>>> $ git checkout master
>>> $ git pull upstream master      # make master branch up-to-date
>>> $ git clean -fd                 # remove files for other branches
>>> $ make clean                    # just in case
>>> $ patch -p1 < /path/to/pgsql_fdw_v17.patch
>>> $ patch -p1 < /path/to/pgsql_fdw_pushdown_v10.patch
>>> $ patch -p1 < /path/to/pgsql_fdw_analyze_v1.patch
>>> $ make                          # make core first for libpq et al.
>>> $ cd contrib/pgsql_fdw
>>> $ make                          # pgsql_fdw
>>>
>>> Please try "git clean" and "make clean", if you have not.
>>> FWIW, I'm using GNU Make 3.82 and gcc 4.6.0 on Fedora 15.
>>
>> I had done a make clean, git stash and git clean -f, but I didn't try
>> git clean -fd.  For some reason it's working now.
>
> Hmm.. I'm getting some rather odd errors though:
>
> thom@test=# select * from stuff limit 3 ;
> DEBUG:  StartTransactionCommand
> DEBUG:  StartTransaction
> DEBUG:  name: unnamed; blockState:       DEFAULT; state: INPROGR,
> xid/subid/cid: 0/1/0, nestlvl: 1, children:
> LOG:  statement: select * from stuff limit 3 ;
> DEBUG:  relid=16402 fetch_count=10000
> DEBUG:  Remote SQL: SELECT id, stuff, age FROM public.stuff
> DEBUG:  starting remote transaction with "START TRANSACTION ISOLATION
> LEVEL REPEATABLE READ"
> ERROR:  could not declare cursor
> DETAIL:  ERROR:  relation "public.stuff" does not exist
> LINE 1: ...or_6 SCROLL CURSOR FOR SELECT id, stuff, age FROM public.stu...
>                                                             ^
>
> HINT:  DECLARE pgsql_fdw_cursor_6 SCROLL CURSOR FOR SELECT id, stuff,
> age FROM public.stuff
>
> The table in question indeed doesn't exist, but I'm confused as to why
> the user is being exposed to such messages.
>
> And more troublesome:
>
> (local select on foreign server):
>
> test=# select * from stuff limit 3 ;
>  id |  thing   | age
> ----+----------+-----
>  1 | STANDARD |  30
>  2 | STANDARD |  29
>  3 | STANDARD |  12
> (3 rows)
>
> (foreign select on foreign server):
>
> thom@test=# select * from stuff limit 3 ;
>  id |      stuff      | age
> ----+-----------------+-----
>  1 | (1,STANDARD,30) |  30
>  2 | (2,STANDARD,29) |  29
>  3 | (3,STANDARD,12) |  12
> (3 rows)
>
>
> The row expansion seems to incorrectly rewrite the column without a
> table prefix if both column and table name are identical.

Actually, correction.  The foreign table definition names the column
the same as the table.  I accidentally omitted the 'thing' column and
instead substituted it with the table name in the definition.

Original table definition on foreign server:

create table stuff (id serial primary key, thing text, age int);

Foreign table definition:

create foreign table stuff (id int not null, stuff text, age int) server pgsql;

So it appears I'm allowed to use the table as a column in this
context.  So please disregard my complaint.

--
Thom


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/03/28 16:18), Albe Laurenz wrote:
> I wrote:
>>> How about getting # of rows estimate by executing EXPLAIN for
>>> fully-fledged remote query (IOW, contains pushed-down WHERE clause),
> and
>>> estimate selectivity of local filter on the basis of the statistics
>>> which are generated by FDW via do_analyze_rel() and FDW-specific
>>> sampling function?  In this design, we would be able to use quite
>>> correct rows estimate because we can consider filtering stuffs done
> on
>>> each side separately, though it requires expensive remote EXPLAIN for
>>> each possible path.
>>
>> That sounds nice.
> 
> ... but it still suffers from the problems of local statistics
> for remote tables I pointed out.

I guess that you mean about these issues you wrote in earlier post, so
I'd like to comment on them.

> - You have an additional maintenance task if you want to keep
>   statistics for remote tables accurate.  I understand that this may
>   get better in a future release.

I'm not sure that's what you meant, but we need to execute remote
ANALYZE before calling pgsql_fdw_analyze() to keep local statistics
accurate.  IMO DBAs are responsible to execute remote ANALYZE at
appropriate timing, so pgsql_fdw_analyze (or handler function for
ANALYZE) should just collect statistics from remote side.

> - You depend on the structure of pg_statistic, which means a potential
>   incompatibility between server versions.  You can add cases to
>   pgsql_fdw_analyze to cater for changes, but that is cumbersome and
> will
>   only help for later PostgreSQL versions connecting to earlier ones.

Indeed.  Like pg_dump, pgsql_fdw should aware of different server
version if we choose copying statistics.  Difference of catalog
structure is very easy to track and cope with, but if meanings of values
or the way to calculate statistics are changed, pgsql_fdw would need
very complex codes to convert values from different version.
I don't know such example, but IMO we should assume that statistics are
valid for only same version (at least major version).  After all, I'd
prefer collecting sample data by pgsql_fdw and leaving statistics
generation to local backend.

> - Planning and execution will change (improve, of course) between server
>   versions.  The local planner may choose an inferior plan based on a
>   wrong assumption of how a certain query can be handled on the remote.

Hm, I don't worry about detail of remote planning so much, because
remote server would do its best for a query given by pgsql_fdw.  Also
local planner would do its best for given estimation (rows, width and
costs).  One concern is that remote cost factors might be different from
local's, so FDW option which represents cost conversion coefficient (1.0
means that remote cost has same weight as local) might be useful.

> - You have no statistics if the foreign table points to a view on the
>   remote system.

ISTM that this would be enough reason to give up copying remote stats to
local.  We don't provide SELECT push-down nor GROUP BY push-down at
present, so users would want to create views which contain function call
in SELECT clauses.

> I think that these shortcomings are not justified by the gain of
> one client-server round trip less during planning.  I'd prefer
> if pgsql_fdw were not dependent on remote statistics stored in the
> local database.

I too prefer if pgsql_fdw doesn't fully depend on statistics of foreign
data, but IMO having statistics of foreign data which were calculated in
the way same as local data seems still useful for estimation about local
filtering.  Even if we have no statistics of foreign data on local side,
still we would be able to create plans on the basis of default
selectivity for each expression, as same as regular tables.

Regards,
-- 
Shigeru HANADA


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/03/28 21:07), Albe Laurenz wrote:
> I found another limitation of this approach:
> pgsql_fdw_analyze() has to run as a user who can update
> pg_statistic, and this user needs a user mapping to a remote
> user who can read pg_statistic.
> 
> This is not necessary for normal operation and needs
> to be configured specifically for getting remote statistics.
> This is cumbersome, and people might be unhappy to have to
> create user mappings for highly privileged remote users.

Agreed.  After all, supporting ANALYZE command for foreign tables seems
the only way to obtain local statistics of foreign data without granting
privileges too much.  ANALYZE is allowed to only the owner of the table
or a superuser, so assuming that an invoker has valid user mapping for a
remote user who can read corresponding foreign data seems reasonable.

ANALYZE support for foreign tables is proposed by Fujita-san in current
CF, so I'd like to push it.

Regards,
-- 
Shigeru HANADA


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
Attached are latest version of pgsql_fdw patches.  Note that
pgsql_fdw_analyze.patch is only for test the effect of local statistics.
 Please apply patches in the order below:

(1) pgsql_fdw_v18.patch
(2) pgsql_fdw_pushdown_v11.patch
(3) pgsql_fdw_analyze.patch (if you want to try local stats)

(2012/03/27 20:49), Albe Laurenz wrote:
>> How about getting # of rows estimate by executing EXPLAIN for
>> fully-fledged remote query (IOW, contains pushed-down WHERE clause),
> and
>> estimate selectivity of local filter on the basis of the statistics
>> which are generated by FDW via do_analyze_rel() and FDW-specific
>> sampling function?  In this design, we would be able to use quite
>> correct rows estimate because we can consider filtering stuffs done on
>> each side separately, though it requires expensive remote EXPLAIN for
>> each possible path.
>
> That sounds nice.
> How would that work with a query that has one condition that could be
> pushed down and one that has to be filtered locally?
>
> Would you use the (local) statistics for the full table or can you
> somehow account for the fact that rows have already been filtered
> out remotely, which might influence the distribution?

First of all, pgsql_fdw sorts conditions into three groups:

  (A) can be pushed down, and contains no Param node
  (B) can be pushed down, and contains Param node
  (C) can not be pushed down

Then pgsql_fdw generates SELECT statement which contains only (A), and
execute EXPLAIN for that query to get rough estimate.  Then, pgsql
estimates selectivity of (B) and (C) by calling
clauselist_selectivity().  Finally pgsql_fdw multiply rough estimate,
selectivity of (B) and (C).  Thus we can get estimate of # of rows
returned by the scan.

Aside that, before executing actual query, pgsql_fdw appends (B) to the
remote query string to use external parameter.  Conditions in (C) are
evaluated on local side anyway.

>> You are right, built-in equality and inequality operators don't cause
>> collation problem.  Perhaps allowing them would cover significant
> cases
>> of string comparison, but I'm not sure how to determine whether an
>> operator is = or != in generic way.  We might have to hold list of oid
>> for collation-safe operator/functions until we support ROUTINE MAPPING
>> or something like that...  Anyway, I'll fix pgsql_fdw to allow = and
> !=
>> for character types.
>
> I believe that this covers a significant percentage of real-world cases.
> I'd think that every built-in operator with name "=" or "<>" could
> be pushed down.

I've fixed pgsql_fdw to push down "=" and "<>" even if they take
collatable type as operands.  Now collation check is done for only input
type of OpExpr and ScalarArrayOpExpr.  In addition to the two, operators
below seem safe to push down.

  concatenate : ||
  LIKE        : ~~, ~~*, !~~, !~~*
  POSIX regex : ~, ~*, !~, !~*
  Text Search : @@

Regards,
--
Shigeru HANADA

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
I wrote:
>> How about getting # of rows estimate by executing EXPLAIN for
>> fully-fledged remote query (IOW, contains pushed-down WHERE clause),
and
>> estimate selectivity of local filter on the basis of the statistics
>> which are generated by FDW via do_analyze_rel() and FDW-specific
>> sampling function?  In this design, we would be able to use quite
>> correct rows estimate because we can consider filtering stuffs done
on
>> each side separately, though it requires expensive remote EXPLAIN for
>> each possible path.
>
> That sounds nice.

... but it still suffers from the problems of local statistics
for remote tables I pointed out.

I think that these shortcomings are not justified by the gain of
one client-server round trip less during planning.  I'd prefer
if pgsql_fdw were not dependent on remote statistics stored in the
local database.

Yours,
Laurenz Albe



Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
I wrote:
>> Changes from previous version
>> =============================
>>
>> 1) Don't use remote EXPLAIN for cost/rows estimation, so now planner
>> estimates result rows and costs on the basis of local statistics such
>> as pg_class and pg_statistic.  To update local statistics, I added
>> pgsql_fdw_analyze() SQL function which updates local statistics of a
>> foreign table by retrieving remote statistics, such as pg_class and
>> pg_statistic, via libpq.  This would make the planning of pgsql_fdw
>> simple and fast.  This function can be easily modified to handle
>> ANALYZE command invoked for a foreign table (Fujita-san is proposing
>> this as common feature in another thread).

> I see the advantage of being able to do all this locally, but
> I think there are a lot of downsides too:

I found another limitation of this approach:
pgsql_fdw_analyze() has to run as a user who can update
pg_statistic, and this user needs a user mapping to a remote
user who can read pg_statistic.

This is not necessary for normal operation and needs
to be configured specifically for getting remote statistics.
This is cumbersome, and people might be unhappy to have to
create user mappings for highly privileged remote users.

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru HANADA wrote:
> Attached are latest version of pgsql_fdw patches.  Note that
> pgsql_fdw_analyze.patch is only for test the effect of local
statistics.
>  Please apply patches in the order below:
>
> (1) pgsql_fdw_v18.patch
> (2) pgsql_fdw_pushdown_v11.patch
> (3) pgsql_fdw_analyze.patch (if you want to try local stats)

Since Kohei KaiGai doesn't post a review, I'll have a go.

The patch applies and compiles fine without warnings and passes
regression tests.

I found bugs in the analyze functions:

In pgsql_fdw_analyze:
nspname and relname are not initialized to NULL.
This causes failure if the corresponding option is not set
on the foreign table.

In store_remote_stats:
atttypmod is initialized to 0 and never changed.
This causes the following error for columns of type "interval":
ERROR:  unrecognized interval typmod: 0


During a foreign scan, type input functions are used to convert
the text representation of values.  If a foreign table is misconfigured,
you can get error messages from these functions, like:

ERROR:  invalid input syntax for type double precision: "etwas"
or
ERROR:  value too long for type character varying(3)

It might me nice for finding problems if the message were
something like:

ERROR:  cannot convert data in foreign scan of "tablename", column "col"
in row 42
DETAIL:  ERROR:  value too long for type character varying(3)


As stated previously, I don't think that using local stats on
foreign tables is a win.  The other patches work fine for me, and
I'd be happy if that could go into 9.2.


Once the two bugs above are fixed, should I mark it "ready for
committer"?

Yours,
Laurenz Albe




Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/04/03 22:31), Albe Laurenz wrote:
> Shigeru HANADA wrote:
>> Attached are latest version of pgsql_fdw patches.  Note that
>> pgsql_fdw_analyze.patch is only for test the effect of local
> statistics.
>>   Please apply patches in the order below:
>>
>> (1) pgsql_fdw_v18.patch
>> (2) pgsql_fdw_pushdown_v11.patch
>> (3) pgsql_fdw_analyze.patch (if you want to try local stats)
>
> Since Kohei KaiGai doesn't post a review, I'll have a go.
>
> The patch applies and compiles fine without warnings and passes
> regression tests.

Thanks for the review.

> I found bugs in the analyze functions:
>
> In pgsql_fdw_analyze:
> nspname and relname are not initialized to NULL.
> This causes failure if the corresponding option is not set
> on the foreign table.
>
> In store_remote_stats:
> atttypmod is initialized to 0 and never changed.
> This causes the following error for columns of type "interval":
> ERROR:  unrecognized interval typmod: 0

Oops, sorry for silly bugs.  However, we won't need to care them anyway,
because coping remote stats to local side seems not practical way to
obtain local statistics.  As you mentioned in another sub-thread,
copying remote stats contains some design-level problems such as
privileges and version difference.

    (2012/03/28 21:07), Albe Laurenz wrote:
    > I found another limitation of this approach:
    > pgsql_fdw_analyze() has to run as a user who can update
    > pg_statistic, and this user needs a user mapping to a remote
    > user who can read pg_statistic.
    >
    > This is not necessary for normal operation and needs
    > to be configured specifically for getting remote statistics.
    > This is cumbersome, and people might be unhappy to have to
    > create user mappings for highly privileged remote users.

So, if we want to have statistics of remote data on local side, getting
actual data from remote and calculate statistics on local side with same
(or similar) routines as local tables seems better.  I'd like to
separate this issue because it belongs to ANALYZE support patch which is
proposed by Fujita-san.

> During a foreign scan, type input functions are used to convert
> the text representation of values.  If a foreign table is misconfigured,
> you can get error messages from these functions, like:
>
> ERROR:  invalid input syntax for type double precision: "etwas"
> or
> ERROR:  value too long for type character varying(3)
>
> It might me nice for finding problems if the message were
> something like:
>
> ERROR:  cannot convert data in foreign scan of "tablename", column "col"
> in row 42
> DETAIL:  ERROR:  value too long for type character varying(3)

Agreed.  How about showing context information with errcontext() in
addition to main error message?  Of course, identifiers are quoted if
necessary.  This way doesn't need additional PG_TRY block, so overhead
would be relatively cheap.

postgres=# SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
ERROR:  invalid input syntax for integer: "1970-01-02 17:00:00+09"
CONTEXT:  column c4 of foreign table ft1

Showing index of the row seems overkill, because most cause of this kind
of error is wrong configuration, as you say, and users would be able to
address the issue without knowing which record caused the error.

> As stated previously, I don't think that using local stats on
> foreign tables is a win.  The other patches work fine for me, and
> I'd be happy if that could go into 9.2.

I have opposite opinion on this issue because we need to do some of
filtering on local side.  We can leave cost/rows estimation to remote
side about WHERE expressions which are pushed down, but we need
selectivity of extra filtering done on local side.  For such purpose,
having local stats of foreign data seems reasonable and useful.

Of course, it has downside that we need to execute explicit ANALYZE for
foreign tables which would cause full sequential scan on remote tables,
in addition to ANALYZE for remote tables done on remote side as usual
maintenance work.

> Once the two bugs above are fixed, should I mark it "ready for
> committer"?

Attached patch contains changes below:

pgsql_fdw_v19.patch
  - show context of data conversion error
  - move codes for fetch_count FDW option to option.c
    (refactoring)
pgsql_fdw_pushdown_v12.patch
  - make deparseExpr function static (refactoring)

I also attached pgsql_fdw_analyze for only testing the effect of local
statistics.  It contains both backend's ANALYZE command support and
pgsql_fdw's ANALYZE support.

Regards,
--
Shigeru HANADA

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
"Albe Laurenz"
Date:
Shigeru HANADA wrote:
>> During a foreign scan, type input functions are used to convert
>> the text representation of values.  If a foreign table is
misconfigured,
>> you can get error messages from these functions, like:
>>
>> ERROR:  invalid input syntax for type double precision: "etwas"
>> or
>> ERROR:  value too long for type character varying(3)
>>
>> It might me nice for finding problems if the message were
>> something like:
>>
>> ERROR:  cannot convert data in foreign scan of "tablename", column
"col"
>> in row 42
>> DETAIL:  ERROR:  value too long for type character varying(3)
>
> Agreed.  How about showing context information with errcontext() in
> addition to main error message?  Of course, identifiers are quoted if
> necessary.  This way doesn't need additional PG_TRY block, so overhead
> would be relatively cheap.
>
> postgres=# SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
> ERROR:  invalid input syntax for integer: "1970-01-02 17:00:00+09"
> CONTEXT:  column c4 of foreign table ft1
>
> Showing index of the row seems overkill, because most cause of this
kind
> of error is wrong configuration, as you say, and users would be able
to
> address the issue without knowing which record caused the error.

Agreed.  I think that is a better approach than what I suggested.

>> As stated previously, I don't think that using local stats on
>> foreign tables is a win.  The other patches work fine for me, and
>> I'd be happy if that could go into 9.2.
>
> I have opposite opinion on this issue because we need to do some of
> filtering on local side.  We can leave cost/rows estimation to remote
> side about WHERE expressions which are pushed down, but we need
> selectivity of extra filtering done on local side.  For such purpose,
> having local stats of foreign data seems reasonable and useful.
>
> Of course, it has downside that we need to execute explicit ANALYZE
for
> foreign tables which would cause full sequential scan on remote
tables,
> in addition to ANALYZE for remote tables done on remote side as usual
> maintenance work.

This approach is much better and does not suffer from the
limitations the original analyze patch had.

I think that the price of a remote table scan is something
we should be willing to pay for good local statistics.
And there is always the option not to analyze the foreign
table if you are not willing to pay that price.

Maybe the FDW API could be extended so that foreign data wrappers
can provide a random sample to avoid a full table scan.

> Attached patch contains changes below:
>
> pgsql_fdw_v19.patch
>   - show context of data conversion error
>   - move codes for fetch_count FDW option to option.c
>     (refactoring)
> pgsql_fdw_pushdown_v12.patch
>   - make deparseExpr function static (refactoring)
>
> I also attached pgsql_fdw_analyze for only testing the effect of local
> statistics.  It contains both backend's ANALYZE command support and
> pgsql_fdw's ANALYZE support.

I think the idea is promising.

I'll mark the patch as "ready for committer".

Yours,
Laurenz Albe


Re: pgsql_fdw, FDW for PostgreSQL server

From
Etsuro Fujita
Date:
At 22:11 12/03/28 +0900, Shigeru HANADA wrote:
>ANALYZE support for foreign tables is proposed by Fujita-san in current
>CF, so I'd like to push it.

I updated the patch to the latest HEAD.  Please find attached a patch.

Best regards,
Etsuro Fujita

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
"Etsuro Fujita"
Date:
Sorry, I sent this email without noticing Hanada-san' earlier email.  So,
please look at Hanada-san's post.

Best regards,
Etsuro Fujita

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita
Sent: Thursday, April 05, 2012 9:36 PM
To: Shigeru HANADA; Albe Laurenz
Cc: Tom Lane; Kevin Grittner; Robert Haas; PostgreSQL-development; Kohei
KaiGai; Martijn van Oosterhout; Hitoshi Harada
Subject: Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

At 22:11 12/03/28 +0900, Shigeru HANADA wrote:
>ANALYZE support for foreign tables is proposed by Fujita-san in current 
>CF, so I'd like to push it.

I updated the patch to the latest HEAD.  Please find attached a patch.

Best regards,
Etsuro Fujita



Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:
> I think that the price of a remote table scan is something
> we should be willing to pay for good local statistics.
> And there is always the option not to analyze the foreign
> table if you are not willing to pay that price.

> Maybe the FDW API could be extended so that foreign data wrappers
> can provide a random sample to avoid a full table scan.

The one thing that seems pretty clear from this discussion is that one
size doesn't fit all.  I think we really ought to provide a hook so that
the FDW can determine whether ANALYZE applies to its foreign tables at
all, and how to obtain the sample rows if it does.

Since we've already whacked the FDW API around in incompatible ways for
9.2, now is probably a good time to add that.  I'm inclined to say this
should happen whether or not we accept any of the currently proposed
patches for 9.2, because if the hook is there it will provide a way for
people to experiment with foreign-table ANALYZE operations outside of
core.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/04/06 1:29), Tom Lane wrote:
> "Albe Laurenz"<laurenz.albe@wien.gv.at>  writes:
>> Maybe the FDW API could be extended so that foreign data wrappers
>> can provide a random sample to avoid a full table scan.
> 
> The one thing that seems pretty clear from this discussion is that one
> size doesn't fit all.  I think we really ought to provide a hook so that
> the FDW can determine whether ANALYZE applies to its foreign tables at
> all, and how to obtain the sample rows if it does.
>
> Since we've already whacked the FDW API around in incompatible ways for
> 9.2, now is probably a good time to add that.  I'm inclined to say this
> should happen whether or not we accept any of the currently proposed
> patches for 9.2, because if the hook is there it will provide a way for
> people to experiment with foreign-table ANALYZE operations outside of
> core.

To support foreign-table ANALYZE by adding a new hook, we would need a
mechanism (or at least documented guide lines) to manage the chain of
hook functions, because such hook might be used by multiple FDWs (or
other plug-ins) at the same time.  A wrongly-written plug-in can easily
break the hook chain.  We might need to provide register/unregister API
for this hook point, like RegisterResourceReleaseCallback, and call each
registered function until either of them processes the request.  Is
there any other hook point which has similar issue?

Another concern is the place where we hook the process of ANALYZE.  IOW,
how much portion of ANALYZE should be overridable?  Replacing
analyze_rel or do_analyze_rel wholly requires plug-ins to copy most of
codes from original function in order to implement hook function.  From
the perspective of FDW author, I think that row sampling
(acquire_sample_rows) function seems handy place to hook.

Regards,
-- 
Shigeru HANADA


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/04/04 15:43), Shigeru HANADA wrote:
> Attached patch contains changes below:
>
> pgsql_fdw_v19.patch
>    - show context of data conversion error
>    - move codes for fetch_count FDW option to option.c
>      (refactoring)
> pgsql_fdw_pushdown_v12.patch
>    - make deparseExpr function static (refactoring)
>
> I also attached pgsql_fdw_analyze for only testing the effect of local
> statistics.  It contains both backend's ANALYZE command support and
> pgsql_fdw's ANALYZE support.

Attached patch improves pgsql_fdw so that it uses new libpq row
processor API, and get rid of overhead of SQL-level cursor.  This change
would speed up remote queries without extra memory.

This patch can be applied after pgsql_fdw_pushdown_v12.patch.

Regards,
--
Shigeru HANADA

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Kyotaro HORIGUCHI
Date:
Excuse me for cutting in,

2012/4/6 Shigeru HANADA <shigeru.hanada@gmail.com>:
> To support foreign-table ANALYZE by adding a new hook, we would need a
> mechanism (or at least documented guide lines) to manage the chain of
> hook functions, because such hook might be used by multiple FDWs (or
> other plug-ins) at the same time.  A wrongly-written plug-in can easily
> break the hook chain.  We might need to provide register/unregister API
> for this hook point, like RegisterResourceReleaseCallback, and call each
> registered function until either of them processes the request.  Is
> there any other hook point which has similar issue?

+1

Plain hook mechanism in PostgreSQL is, I think, to hang a bunch
of faceless callbacks to be registered, unregistered and called
all together. And it does not fit to manage individual callbacks
which may be registered or unregistered in arbitrary order and
are preferred to be called separately.

Although we provide RegisterResourceReleaseCallback-like staff,
it seems far more complicated than the additional field in
FdwRoutine and some analyze_rel() modifications in core-side, and
confirmation of whether it's really the time for me should be a
reluctant work in plugin-side.

Of cource, I don't think there will be so many fdw-analyze
callbacks registered but two seems sufficient.

The current mods in analyze_rel() does not look definitive, but
it does not look so bad and seems more stable than simple hook
point which will be abandoned before long.

> Another concern is the place where we hook the process of ANALYZE.  IOW,
> how much portion of ANALYZE should be overridable?  Replacing
> analyze_rel or do_analyze_rel wholly requires plug-ins to copy most of
> codes from original function in order to implement hook function.  From
> the perspective of FDW author, I think that row sampling
> (acquire_sample_rows) function seems handy place to hook.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru HANADA <shigeru.hanada@gmail.com> writes:
> (2012/04/06 1:29), Tom Lane wrote:
>> The one thing that seems pretty clear from this discussion is that one
>> size doesn't fit all.  I think we really ought to provide a hook so that
>> the FDW can determine whether ANALYZE applies to its foreign tables at
>> all, and how to obtain the sample rows if it does.

> To support foreign-table ANALYZE by adding a new hook, we would need a
> mechanism (or at least documented guide lines) to manage the chain of
> hook functions, because such hook might be used by multiple FDWs (or
> other plug-ins) at the same time.  A wrongly-written plug-in can easily
> break the hook chain.

Sorry, I used the word "hook" loosely, not with the intention of meaning
a global function pointer.  We should of course implement this with
another per-FDW method pointer, as in the postgresql-analyze patch
series.

> Another concern is the place where we hook the process of ANALYZE.  IOW,
> how much portion of ANALYZE should be overridable?

Not much, IMO.  The FDW should be able to decide whether or not to
analyze a particular table, and it should be in charge of implementing
its own version of acquire_sample_rows, but no more than that.  In
particular I do not like the specific way it's done in the v7 patch
(I've not looked at v8 yet) because the interposed logic has a
hard-wired assumption that foreign tables do not have inheritance
children.  I think that assumption has a life expectancy measured in
months at most, and I don't want to have to try to fix every FDW when
it changes.  But I think we can easily revise the hook details to fix
that, and I'm hoping to get that done today.
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru Hanada
Date:
On Fri, Apr 6, 2012 at 11:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another concern is the place where we hook the process of ANALYZE.  IOW,
>> how much portion of ANALYZE should be overridable?
>
> Not much, IMO.  The FDW should be able to decide whether or not to
> analyze a particular table, and it should be in charge of implementing
> its own version of acquire_sample_rows, but no more than that.

ISTM that we have rough consensus about what FDW should do for an
ANALYZE request.  FDW should choose either of:   a) get sample rows and return them to backend   b) tell backend that
theFDW has nothing to do for the request 

> In
> particular I do not like the specific way it's done in the v7 patch
> (I've not looked at v8 yet) because the interposed logic has a
> hard-wired assumption that foreign tables do not have inheritance
> children.  I think that assumption has a life expectancy measured in
> months at most, and I don't want to have to try to fix every FDW when
> it changes.  But I think we can easily revise the hook details to fix
> that, and I'm hoping to get that done today.

I'll try implementing the design you suggested.

Regards,
--
Shigeru Hanada


Re: pgsql_fdw, FDW for PostgreSQL server

From
Tom Lane
Date:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> On Fri, Apr 6, 2012 at 11:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In
>> particular I do not like the specific way it's done in the v7 patch
>> (I've not looked at v8 yet) because the interposed logic has a
>> hard-wired assumption that foreign tables do not have inheritance
>> children.  I think that assumption has a life expectancy measured in
>> months at most, and I don't want to have to try to fix every FDW when
>> it changes.  But I think we can easily revise the hook details to fix
>> that, and I'm hoping to get that done today.

> I'll try implementing the design you suggested.

I've already got it fixed up ...
        regards, tom lane


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/04/07 1:38), Tom Lane wrote:
> Shigeru Hanada<shigeru.hanada@gmail.com>  writes:
>> On Fri, Apr 6, 2012 at 11:20 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> In
>>> particular I do not like the specific way it's done in the v7 patch
>>> (I've not looked at v8 yet) because the interposed logic has a
>>> hard-wired assumption that foreign tables do not have inheritance
>>> children.  I think that assumption has a life expectancy measured in
>>> months at most, and I don't want to have to try to fix every FDW when
>>> it changes.  But I think we can easily revise the hook details to fix
>>> that, and I'm hoping to get that done today.
>
>> I'll try implementing the design you suggested.
>
> I've already got it fixed up ...

I've updated pgsql_fdw so that it can collect statistics from foreign
data with new FDW API.

Other changes from latest version are:
- Now pgsql_fdw uses libpq row processor API to get rid of SQL-level cursor.
- Introduced new struct ErrorPos to integrate error reporting in foreign
scan and analyze.
- Some refactoring, such as removing unnecessary parameter.
- Fix typos.

Regards,
--
Shigeru HANADA

Attachment

Re: pgsql_fdw, FDW for PostgreSQL server

From
Thom Brown
Date:
2012/4/7 Shigeru HANADA <shigeru.hanada@gmail.com>:
> I've updated pgsql_fdw so that it can collect statistics from foreign
> data with new FDW API.

I notice that if you restart the remote server, the connection is
broken, but the client doesn't notice this until it goes to fire off
another command.  Should there be an option to automatically
re-establish the connection upon noticing the connection has dropped,
and issue a NOTICE that it had done so?

Also I'm not particularly keen on the message provided to the user in
this event:

ERROR:  could not execute EXPLAIN for cost estimation
DETAIL:  FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command

There's no explanation what the "administrator" command was, and I
suspect this is really just a "I don't know what's happened here"
condition.  I don't think we should reach that point.

-- 
Thom


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
(2012/04/08 5:19), Thom Brown wrote:
> 2012/4/7 Shigeru HANADA<shigeru.hanada@gmail.com>:
>> I've updated pgsql_fdw so that it can collect statistics from foreign
>> data with new FDW API.
> 
> I notice that if you restart the remote server, the connection is
> broken, but the client doesn't notice this until it goes to fire off
> another command.  Should there be an option to automatically
> re-establish the connection upon noticing the connection has dropped,
> and issue a NOTICE that it had done so?

Hm, I'd prefer reporting the connection failure and aborting the local
transaction, because reconnecting to the server would break consistency
between the results come from multiple foreign tables.  Server shutdown
(or other troubles e.g. network failure) might happen at various timing
in the sequence of remote query (or sampling in ANALYZE).  For example,
when we execute a local query which contains two foreign tables, foo and
bar, then the sequence of libpq activity would be like this.
   1) connect to the server at the beginning of the local query   2) execute EXPLAIN for foreign table foo   3) execute
EXPLAINfor foreign table bar   4) execute actual query for foreign table foo   5) execute actual query for foreign
tablebar   6) disconnect from the server at the end of the local query
 

If the connection has broken between 4) and 5), and immediate reconnect
succeeded, retrieved results for foo and bar might be inconsistent from
the viewpoint of transaction isolation.

In current implementation, next local query which contains foreign table
of failed foreign table tries to reconnect to the server.

> Also I'm not particularly keen on the message provided to the user in
> this event:
> 
> ERROR:  could not execute EXPLAIN for cost estimation
> DETAIL:  FATAL:  terminating connection due to administrator command
> FATAL:  terminating connection due to administrator command
> 
> There's no explanation what the "administrator" command was, and I
> suspect this is really just a "I don't know what's happened here"
> condition.  I don't think we should reach that point.

That FATAL message is returned by remote backend's ProcessInterrupts()
during some administrator commands, such as immediate shutdown or
pg_terminate_backend().  If remote backend died of fast shutdown or
SIGKILL, no error message is available (see the sample below).

postgres=# select * From pgsql_branches ;
ERROR:  could not execute EXPLAIN for cost estimation
DETAIL:
HINT:  SELECT bid, bbalance, filler FROM public.pgbench_branches

I agree that the message is confusing.  How about showing message like
"pgsql_fdw connection failure on <servername>" or something with remote
error message for such cases?  It can be achieved by adding extra check
for connection status right after PQexec()/PQexecParams().  Although
some word polishing would be required :)

postgres=# select * from pgsql_branches ;
ERROR:  pgsql_fdw connection failure on subaru_pgbench
DETAIL:  FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command

This seems to impress users that remote side has some trouble.

Regards,
-- 
Shigeru HANADA


Re: pgsql_fdw, FDW for PostgreSQL server

From
Gerald Devotta
Date:

This is Gerald Devotta, Recruitment Specialist with Newt Global LLC, a global staffing solutions provider that has been serving the telecommunications and utility industries.

 

I am contacting you to let you know that your resume came to my attention while I was conducting a job search for a project in Bloomington Illinois. I have reviewed your resume with particular interest and am excited to let you know that many of your qualifications match my client’s needs

 

PostgreSQL database developer

Bloomington Illinois

2 +Years Contract

Attractive Compensation + Expenses paid

 

 

Required:

Solid experience with PgSQL programming on PostgreSQL ORBMS

Person can work 4 days at site and one day from home

 

If you are interested kindly send your update resume with expected rate/salary

 

 

 

 

 

Regards,

Gerald Devotta

Recruitment Analyst

Newt Global Consulting LLC

Phone: 202.470.2492

Email: [hidden email] | www.newtglobal.com

1300 W Walnut Hill Lane, Suite #230, Irving, TX 75038

 

From: Shigeru Hanada-2 [via PostgreSQL] [mailto:[hidden email]]
Sent: Monday, April 09, 2012 12:39 AM
To: Gerald Devotta
Subject: Re: pgsql_fdw, FDW for PostgreSQL server

 

(2012/04/08 5:19), Thom Brown wrote:
> 2012/4/7 Shigeru HANADA<[hidden email]>:
>> I've updated pgsql_fdw so that it can collect statistics from foreign
>> data with new FDW API.
>
> I notice that if you restart the remote server, the connection is
> broken, but the client doesn't notice this until it goes to fire off
> another command.  Should there be an option to automatically
> re-establish the connection upon noticing the connection has dropped,
> and issue a NOTICE that it had done so?

Hm, I'd prefer reporting the connection failure and aborting the local
transaction, because reconnecting to the server would break consistency
between the results come from multiple foreign tables.  Server shutdown
(or other troubles e.g. network failure) might happen at various timing
in the sequence of remote query (or sampling in ANALYZE).  For example,
when we execute a local query which contains two foreign tables, foo and
bar, then the sequence of libpq activity would be like this.

    1) connect to the server at the beginning of the local query
    2) execute EXPLAIN for foreign table foo
    3) execute EXPLAIN for foreign table bar
    4) execute actual query for foreign table foo
    5) execute actual query for foreign table bar
    6) disconnect from the server at the end of the local query

If the connection has broken between 4) and 5), and immediate reconnect
succeeded, retrieved results for foo and bar might be inconsistent from
the viewpoint of transaction isolation.

In current implementation, next local query which contains foreign table
of failed foreign table tries to reconnect to the server.


> Also I'm not particularly keen on the message provided to the user in
> this event:
>
> ERROR:  could not execute EXPLAIN for cost estimation
> DETAIL:  FATAL:  terminating connection due to administrator command
> FATAL:  terminating connection due to administrator command
>
> There's no explanation what the "administrator" command was, and I
> suspect this is really just a "I don't know what's happened here"
> condition.  I don't think we should reach that point.


That FATAL message is returned by remote backend's ProcessInterrupts()
during some administrator commands, such as immediate shutdown or
pg_terminate_backend().  If remote backend died of fast shutdown or
SIGKILL, no error message is available (see the sample below).

postgres=# select * From pgsql_branches ;
ERROR:  could not execute EXPLAIN for cost estimation
DETAIL:
HINT:  SELECT bid, bbalance, filler FROM public.pgbench_branches

I agree that the message is confusing.  How about showing message like
"pgsql_fdw connection failure on <servername>" or something with remote
error message for such cases?  It can be achieved by adding extra check
for connection status right after PQexec()/PQexecParams().  Although
some word polishing would be required :)

postgres=# select * from pgsql_branches ;
ERROR:  pgsql_fdw connection failure on subaru_pgbench
DETAIL:  FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command

This seems to impress users that remote side has some trouble.

Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


If you reply to this email, your message will be added to the discussion below:

http://postgresql.1045698.n5.nabble.com/pgsql-fdw-FDW-for-PostgreSQL-server-tp4935560p5626807.html

To unsubscribe from PostgreSQL, click here.
NAML

Regards, Gerald Devotta Recruitment Analyst Newt Global Consulting LLC Phone: 202.470.2492 Email: gdevotta@newtglobal.com | www.newtglobal.com 1300 W Walnut Hill Lane, Suite #230, Irving, TX 75038


View this message in context: RE: pgsql_fdw, FDW for PostgreSQL server
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: pgsql_fdw, FDW for PostgreSQL server

From
Thom Brown
Date:
2012/4/9 Shigeru HANADA <shigeru.hanada@gmail.com>:
>    1) connect to the server at the beginning of the local query
>    2) execute EXPLAIN for foreign table foo
>    3) execute EXPLAIN for foreign table bar
>    4) execute actual query for foreign table foo
>    5) execute actual query for foreign table bar
>    6) disconnect from the server at the end of the local query
>
> If the connection has broken between 4) and 5), and immediate reconnect
> succeeded, retrieved results for foo and bar might be inconsistent from
> the viewpoint of transaction isolation.
>
> In current implementation, next local query which contains foreign table
> of failed foreign table tries to reconnect to the server.

How would this apply to the scenario where you haven't even begun a
transaction yet?  There's no risk of inconsistency if the connection
is lost before the first command can execute, so why fail in such a
case?  Isn't there a line in the sand we can draw where we say that if
we have passed it, we just die, otherwise we try to reconnect as
there's no risk of undesirable results?

>> Also I'm not particularly keen on the message provided to the user in
>> this event:
>>
>> ERROR:  could not execute EXPLAIN for cost estimation
>> DETAIL:  FATAL:  terminating connection due to administrator command
>> FATAL:  terminating connection due to administrator command
>>
>> There's no explanation what the "administrator" command was, and I
>> suspect this is really just a "I don't know what's happened here"
>> condition.  I don't think we should reach that point.
>
> That FATAL message is returned by remote backend's ProcessInterrupts()
> during some administrator commands, such as immediate shutdown or
> pg_terminate_backend().  If remote backend died of fast shutdown or
> SIGKILL, no error message is available (see the sample below).
>
> postgres=# select * From pgsql_branches ;
> ERROR:  could not execute EXPLAIN for cost estimation
> DETAIL:
> HINT:  SELECT bid, bbalance, filler FROM public.pgbench_branches
>
> I agree that the message is confusing.  How about showing message like
> "pgsql_fdw connection failure on <servername>" or something with remote
> error message for such cases?  It can be achieved by adding extra check
> for connection status right after PQexec()/PQexecParams().  Although
> some word polishing would be required :)
>
> postgres=# select * from pgsql_branches ;
> ERROR:  pgsql_fdw connection failure on subaru_pgbench
> DETAIL:  FATAL:  terminating connection due to administrator command
> FATAL:  terminating connection due to administrator command

Yes, that would be an improvement.

--
Thom


Re: pgsql_fdw, FDW for PostgreSQL server

From
Shigeru HANADA
Date:
Hi all,

(2012/03/07 3:39), Tom Lane wrote:
> A bigger issue with postgresql_fdw_validator is that it supposes that
> the core backend is authoritative as to what options libpq supports,
> which is bad design on its face.  It would be much more sensible for
> dblink to be asking libpq what options libpq supports, say via
> PQconndefaults().
> 
> We might find that we have to leave postgresql_fdw_validator as-is
> for backwards compatibility reasons (in particular, being able to load
> existing FDW definitions) but I think we should migrate away from using
> it.

In the discussion about pgsql_fdw which was proposed for 9.2, some
issues about postgresql_fdw_validator are pointed out.

* The name "postgresql_fdw_validator" conflicts with the name of the FDW
for PostgreSQL which
follows the naming habit of other FDWs.
* dblink depends on postgresql_fdw_validator.
* postgresql_fdw_validator assumes that libpq supports some particular
options.

An idea to resolve these is to add dblink's own validator which doesn't
assume much about libpq, and obsolete postgresql_fdw_validator.

* Add dblink_fdw_validator to contrib/dblink, which is similar to
postgresql_fdw_validator but it assumes less about libpq.
* Add dblink_fdw as default FDW of dblink, which uses
dblink_fdw_validator, and recommend to use it.  This would prevent users
from using postgresql_fdw_validator with dblink.
* Mention that postgresql_fdw_validator might be obsolete in future
release in the document of CREATE FOREIGN DATA WRAPPER.

To make the behavior of dblink_fdw_validator similar to that of current
postgresql_fdw_validator, we need to assume that libpq supports "user"
option, and allow it and secret options in only USER MAPPING options,
and allow others in only SERVER options (and reject all debug options).IMO this is not unreasonable assumption.

Is this proposal reasonable?  Any comments and questions are welcome.

Regards,
-- 
Shigeru HANADA