Thread: pgsql_fdw in contrib

pgsql_fdw in contrib

From

Shigeru HANADA

Date:

14 June 2012, 09:30:33

I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
in core, again.  This patch is basically rebased version of the patches
proposed in 9.2 development cycle, and contains some additional changes
such as concern about volatile variables for PG_CATCH blocks.  In
addition, I combined old three patches (pgsql_fdw core functionality,
push-down support, and analyze support) into one patch for ease of
review. (I think these features are necessary for pgsql_fdw use.)

After the development for 9.2 cycle, pgsql_fdw can gather statistics
from remote side by providing sampling function to analyze module in
backend core, use them to estimate selectivity of WHERE clauses which
will be pushed-down, and retrieve query results from remote side with
custom row processor which prevents memory exhaustion from huge result set.

It would be possible to add some more features, such as ORDER BY
push-down with index information support, without changing existing
APIs, but at first add relatively simple pgsql_fdw and enhance it seems
better.  In addition, once pgsql_fdw has been merged, it would help
implementing proof-of-concept of SQL/MED-related features.

Regards,
--
Shigeru HANADA

Attachment

pgsql_fdw.patch

Re: pgsql_fdw in contrib

From

Merlin Moncure

Date:

18 June 2012, 13:24:32

On Thu, Jun 14, 2012 at 7:29 AM, Shigeru HANADA
<shigeru.hanada@gmail.com> wrote:
> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
> in core, again.  This patch is basically rebased version of the patches
> proposed in 9.2 development cycle, and contains some additional changes
> such as concern about volatile variables for PG_CATCH blocks.  In
> addition, I combined old three patches (pgsql_fdw core functionality,
> push-down support, and analyze support) into one patch for ease of
> review. (I think these features are necessary for pgsql_fdw use.)
>
> After the development for 9.2 cycle, pgsql_fdw can gather statistics
> from remote side by providing sampling function to analyze module in
> backend core, use them to estimate selectivity of WHERE clauses which
> will be pushed-down, and retrieve query results from remote side with
> custom row processor which prevents memory exhaustion from huge result set.
>
> It would be possible to add some more features, such as ORDER BY
> push-down with index information support, without changing existing
> APIs, but at first add relatively simple pgsql_fdw and enhance it seems
> better.  In addition, once pgsql_fdw has been merged, it would help
> implementing proof-of-concept of SQL/MED-related features.

I can't help but wonder (having been down the contrib/core/extension
road myself) if it isn't better to improve the facilities to register
and search for qualified extensions (like Perl CPAN) so that people
looking for code to improve their backends can find it.  That way,
you're free to release/do xyz/abandon your project as you see fit
without having to go through -hackers.  This should also remove a lot
of the stigma with not being in core since if stock postgres
installations can access the necessary modules via CREATE EXTENSION, I
think it will make it easier for projects like this to get used with
the additional benefit of decentralizing project management.

merlin

Re: pgsql_fdw in contrib

From

Robert Haas

Date:

19 June 2012, 01:01:33

On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I can't help but wonder (having been down the contrib/core/extension
> road myself) if it isn't better to improve the facilities to register
> and search for qualified extensions (like Perl CPAN) so that people
> looking for code to improve their backends can find it.  That way,
> you're free to release/do xyz/abandon your project as you see fit
> without having to go through -hackers.  This should also remove a lot
> of the stigma with not being in core since if stock postgres
> installations can access the necessary modules via CREATE EXTENSION, I
> think it will make it easier for projects like this to get used with
> the additional benefit of decentralizing project management.

Well, if you're the type that likes to install everything via RPMs
(and I am) then you wouldn't want this behavior, especially not
automagically.  It seems to open up a host of security risks, too: I
believe Tom has previously stated that Red Hat (and other distro)
packaging guidelines forbid packages from installing software in
places where they can then turn around and run it.  I suppose CPAN
must have some sort of exception to this policy, though, so maybe it's
not ironclad.

Still, it seems like a bit of a distraction in this case: I think we
want to have at least one FDW in core that actually talks to some
other database server, rather than just to a file, and it seems like
pgsql_fdw is the obvious choice by a mile.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql_fdw in contrib

From

"Dickson S. Guedes"

Date:

19 June 2012, 08:41:05

2012/6/18 Merlin Moncure <mmoncure@gmail.com>:
> I can't help but wonder (having been down the contrib/core/extension
> road myself) if it isn't better to improve the facilities to register
> and search for qualified extensions (like Perl CPAN) so that people
> looking for code to improve their backends can find it.  That way,
> you're free to release/do xyz/abandon your project as you see fit
> without having to go through -hackers.  This should also remove a lot
> of the stigma with not being in core since if stock postgres
> installations can access the necessary modules via CREATE EXTENSION, I
> think it will make it easier for projects like this to get used with
> the additional benefit of decentralizing project management.

What about PGXN?

BTW, I'm with Robert about "we want to have at least one FDW in
core that actually talks to some other database server, rather than
just to a file, and it seems like pgsql_fdw is the obvious choice".

We have dblink as contrib, why not pgsql_fdw too?

Other FDWs could be available at PGXN, pgFoundry, Github, etc.

[]s
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

Re: pgsql_fdw in contrib

From

Merlin Moncure

Date:

19 June 2012, 10:49:34

On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I can't help but wonder (having been down the contrib/core/extension
>> road myself) if it isn't better to improve the facilities to register
>> and search for qualified extensions (like Perl CPAN) so that people
>> looking for code to improve their backends can find it.  That way,
>> you're free to release/do xyz/abandon your project as you see fit
>> without having to go through -hackers.  This should also remove a lot
>> of the stigma with not being in core since if stock postgres
>> installations can access the necessary modules via CREATE EXTENSION, I
>> think it will make it easier for projects like this to get used with
>> the additional benefit of decentralizing project management.
>
> Well, if you're the type that likes to install everything via RPMs
> (and I am) then you wouldn't want this behavior, especially not
> automagically.  It seems to open up a host of security risks, too: I
> believe Tom has previously stated that Red Hat (and other distro)
> packaging guidelines forbid packages from installing software in
> places where they can then turn around and run it.  I suppose CPAN
> must have some sort of exception to this policy, though, so maybe it's
> not ironclad.

Huh? CPAN is just one example -- PyPM for python, npm for node etc
etc.  There is some distinction to that rule that is being missed so
that it doesn't apply to cases like this -- probably that it is
transacted by the user and/or requires su.

I think your points are supporting mine: the vast majority of postgres
administrators deal only with o/s packaged rpms and therefore with the
status quo are unable to utilize any extension that is not packaged
with contrib.  This means that there are two possible outcomes if you
want to write an extension:

1) get accepted into core
2) never get used

Given that running the gauntlet to #1 is not a particularly attractive
option (or even possible) in many cases for various reasons it tends
to discourage module development by 3rd parties.  There are several
very high quality extensions that could get a lot more exposure...for
example pl/sh.

But anyways, if you're happy with pgsql_fdw (aside: i looked at it,
and it's great!) in core, then by all means...

merlin

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

19 June 2012, 12:15:56

2012/6/19 Merlin Moncure <mmoncure@gmail.com>:
> On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> I can't help but wonder (having been down the contrib/core/extension
>>> road myself) if it isn't better to improve the facilities to register
>>> and search for qualified extensions (like Perl CPAN) so that people
>>> looking for code to improve their backends can find it.  That way,
>>> you're free to release/do xyz/abandon your project as you see fit
>>> without having to go through -hackers.  This should also remove a lot
>>> of the stigma with not being in core since if stock postgres
>>> installations can access the necessary modules via CREATE EXTENSION, I
>>> think it will make it easier for projects like this to get used with
>>> the additional benefit of decentralizing project management.
>>
>> Well, if you're the type that likes to install everything via RPMs
>> (and I am) then you wouldn't want this behavior, especially not
>> automagically.  It seems to open up a host of security risks, too: I
>> believe Tom has previously stated that Red Hat (and other distro)
>> packaging guidelines forbid packages from installing software in
>> places where they can then turn around and run it.  I suppose CPAN
>> must have some sort of exception to this policy, though, so maybe it's
>> not ironclad.
>
> Huh? CPAN is just one example -- PyPM for python, npm for node etc
> etc.  There is some distinction to that rule that is being missed so
> that it doesn't apply to cases like this -- probably that it is
> transacted by the user and/or requires su.
>
> I think your points are supporting mine: the vast majority of postgres
> administrators deal only with o/s packaged rpms and therefore with the
> status quo are unable to utilize any extension that is not packaged
> with contrib.  This means that there are two possible outcomes if you
> want to write an extension:
>
> 1) get accepted into core
> 2) never get used
>
> Given that running the gauntlet to #1 is not a particularly attractive
> option (or even possible) in many cases for various reasons it tends
> to discourage module development by 3rd parties.  There are several
> very high quality extensions that could get a lot more exposure...for
> example pl/sh.
>
> But anyways, if you're happy with pgsql_fdw (aside: i looked at it,
> and it's great!) in core, then by all means...
>
Let me push the pgsql_fdw in core from different perspective.

Right now, FDW is a feature that will take many enhancement in
the near future like join-pushdown, writable APIs and so on.
If we would not have a FDW extension in core that communicate
with actual RDBMS, instead of flat files, it makes our development
efforts more complex. Please assume a scenario people alwats
tries to submit two patches; one for the core, and the other for
a particular out-of-tree extension.
I believe it is reasonable choice to have a FDW for PostgreSQL
in core, rather than MySQL or Oracle. We can use this extension
to show how new feature works, whenever we try to extend FDW
APIs.

BTW, Hanada-san-
It seems to me the expected result of regression test was not
included in this patch. Please submit it again-

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Merlin Moncure

Date:

19 June 2012, 13:16:23

On Tue, Jun 19, 2012 at 10:15 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> Let me push the pgsql_fdw in core from different perspective.
>
> Right now, FDW is a feature that will take many enhancement in
> the near future like join-pushdown, writable APIs and so on.
> If we would not have a FDW extension in core that communicate
> with actual RDBMS, instead of flat files, it makes our development
> efforts more complex. Please assume a scenario people alwats
> tries to submit two patches; one for the core, and the other for
> a particular out-of-tree extension.

yeah. plus, it's nice to have a very high quality fdw implementation
in core for others to crib from.  I have no objection -- I just took a
convenient opportunity to hijack the thread for some hand-wavy ideas
about an extension repository. :-).

merlin

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

26 June 2012, 10:50:29

Harada-san,

I checked your patch, and had an impression that includes many
improvements from the previous revision that I looked at the last
commit fest.

However, I noticed several points to be revised, or investigated.

* It seems to me expected results of the regression test is not attached, even though test cases were included. Please
addit. 

* cleanup_connection() does not close the connection in case when this callback was invoked due to an error under sub-
transaction.It probably makes problematic behavior. 
 See the following steps to reproduce the matter:

postgres=# BEGIN;
BEGIN
postgres=# SELECT * FROM ft3;a |  b
---+-----1 | aaa2 | bbb3 | ccc4 | ddd5 | eee6 | fff
(6 rows)

postgres=# SAVEPOINT sv_01;
SAVEPOINT

postgres=# SELECT * FROM ft3 WHERE 1 / (a - 4) > 0;  -- should be
error on remote transaction
ERROR:  could not execute remote query
DETAIL:  ERROR:  division by zero

HINT:  SELECT a, b FROM public.t1 WHERE (((1 OPERATOR(pg_catalog./) (a
OPERATOR(pg_catalog.-) 4)) OPERATOR(pg_catalog.>) 0))

postgres=# ROLLBACK TO SAVEPOINT sv_01;
ROLLBACK

postgres=# SELECT * FROM ft3;
ERROR:  could not execute EXPLAIN for cost estimation
DETAIL:  ERROR:  current transaction is aborted, commands ignored
until end of transaction block

HINT:  SELECT a, b FROM public.t1
 Once we got an error under the remote transaction, it eventually raises an error on local side, then
cleanup_connection()should be invoked. But it ignores the error due to sub-transaction, thus, the remote transaction
beingalready aborted is kept. I'd like to suggest two remedy. 1. connections are closed, even if errors on
sub-transaction.2. close the connection if PQexecParams() returns an error,     on execute_query() prior to raise a
localerror. 
 * Regarding to deparseSimpleSql(), it pulls attributes being referenced   from baserestrictinfo and reltargetlist
usingpull_var_clause().   Is it unavailable to use local_conds instead of baserestrictinfo?   We can optimize
referencesto the variable being consumed at the   remote side only. All we need to transfer is variables referenced
intargetlist and local-clauses.   In addition, is pull_var_clause() reasonable to list up all the attribute
referencedat the both expression tree? It seems to be pull_varattnos()   is more useful API in this situation. 
 * Regarding to deparseRelation(), it scans simple_rte_array to fetch   RangeTblEntry with relation-id of the target
foreigntable.   It does not match correct entry if same foreign table is appeared in   this list twice or more, like a
caseof self-join. I'd like to recommend   to use simple_rte_array[baserel->relid] instead.   In addition, it checks
whetherrelkind is RELKIND_FOREIGN_TABLE,   or not. It seems to me this check should be Assert(), if routines of
pgsql_fdwis called towards regular relations. 
 * Regarding to deparseVar(), is it unnecessary to check relkind of   the relation being referenced by Var node, isn't
it?
 * Regarding to deparseBoolExpr(), compiler raised a warning   because op can be used without initialized.
 * Even though it is harmless, sortConditions() is a misleading function   name. How about categolizeConditions() or
screeningConditions()?

Thanks for your great jobs.

2012/6/14 Shigeru HANADA <shigeru.hanada@gmail.com>:
> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
> in core, again.  This patch is basically rebased version of the patches
> proposed in 9.2 development cycle, and contains some additional changes
> such as concern about volatile variables for PG_CATCH blocks.  In
> addition, I combined old three patches (pgsql_fdw core functionality,
> push-down support, and analyze support) into one patch for ease of
> review. (I think these features are necessary for pgsql_fdw use.)
>
> After the development for 9.2 cycle, pgsql_fdw can gather statistics
> from remote side by providing sampling function to analyze module in
> backend core, use them to estimate selectivity of WHERE clauses which
> will be pushed-down, and retrieve query results from remote side with
> custom row processor which prevents memory exhaustion from huge result set.
>
> It would be possible to add some more features, such as ORDER BY
> push-down with index information support, without changing existing
> APIs, but at first add relatively simple pgsql_fdw and enhance it seems
> better.  In addition, once pgsql_fdw has been merged, it would help
> implementing proof-of-concept of SQL/MED-related features.
>
> Regards,
> --
> Shigeru HANADA
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

03 July 2012, 04:59:15

Hanada-san,

Regarding to the issue around sub-transaction abort, an ideal
solution might be execution of SAVEPOINT command on remote
side synchronously. It allows to rollback the active transaction
into the savepoint on the remote server when local one get
rolled-back.
However, I'm not inclined to merge anything ideal within a single
patch, and I'd like to recommend to keep your idea simple then
revise the functionality on the upcoming three commit fest.

How about your opinion? The author is you.

2012/6/26 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> Harada-san,
>
> I checked your patch, and had an impression that includes many
> improvements from the previous revision that I looked at the last
> commit fest.
>
> However, I noticed several points to be revised, or investigated.
>
> * It seems to me expected results of the regression test is not
>   attached, even though test cases were included. Please add it.
>
> * cleanup_connection() does not close the connection in case
>   when this callback was invoked due to an error under sub-
>   transaction. It probably makes problematic behavior.
>
>   See the following steps to reproduce the matter:
>
> postgres=# BEGIN;
> BEGIN
> postgres=# SELECT * FROM ft3;
>  a |  b
> ---+-----
>  1 | aaa
>  2 | bbb
>  3 | ccc
>  4 | ddd
>  5 | eee
>  6 | fff
> (6 rows)
>
> postgres=# SAVEPOINT sv_01;
> SAVEPOINT
>
> postgres=# SELECT * FROM ft3 WHERE 1 / (a - 4) > 0;  -- should be
> error on remote transaction
> ERROR:  could not execute remote query
> DETAIL:  ERROR:  division by zero
>
> HINT:  SELECT a, b FROM public.t1 WHERE (((1 OPERATOR(pg_catalog./) (a
> OPERATOR(pg_catalog.-) 4)) OPERATOR(pg_catalog.>) 0))
>
> postgres=# ROLLBACK TO SAVEPOINT sv_01;
> ROLLBACK
>
> postgres=# SELECT * FROM ft3;
> ERROR:  could not execute EXPLAIN for cost estimation
> DETAIL:  ERROR:  current transaction is aborted, commands ignored
> until end of transaction block
>
> HINT:  SELECT a, b FROM public.t1
>
>   Once we got an error under the remote transaction, it eventually raises
>   an error on local side, then cleanup_connection() should be invoked.
>   But it ignores the error due to sub-transaction, thus, the remote transaction
>   being already aborted is kept.
>   I'd like to suggest two remedy.
>   1. connections are closed, even if errors on sub-transaction.
>   2. close the connection if PQexecParams() returns an error,
>       on execute_query() prior to raise a local error.
>
>   * Regarding to deparseSimpleSql(), it pulls attributes being referenced
>     from baserestrictinfo and reltargetlist using pull_var_clause().
>     Is it unavailable to use local_conds instead of baserestrictinfo?
>     We can optimize references to the variable being consumed at the
>     remote side only. All we need to transfer is variables referenced
>     in targetlist and local-clauses.
>     In addition, is pull_var_clause() reasonable to list up all the attribute
>     referenced at the both expression tree? It seems to be pull_varattnos()
>     is more useful API in this situation.
>
>   * Regarding to deparseRelation(), it scans simple_rte_array to fetch
>     RangeTblEntry with relation-id of the target foreign table.
>     It does not match correct entry if same foreign table is appeared in
>     this list twice or more, like a case of self-join. I'd like to recommend
>     to use simple_rte_array[baserel->relid] instead.
>     In addition, it checks whether relkind is RELKIND_FOREIGN_TABLE,
>     or not. It seems to me this check should be Assert(), if routines of
>     pgsql_fdw is called towards regular relations.
>
>   * Regarding to deparseVar(), is it unnecessary to check relkind of
>     the relation being referenced by Var node, isn't it?
>
>   * Regarding to deparseBoolExpr(), compiler raised a warning
>     because op can be used without initialized.
>
>   * Even though it is harmless, sortConditions() is a misleading function
>     name. How about categolizeConditions() or screeningConditions()?
>
> Thanks for your great jobs.
>
> 2012/6/14 Shigeru HANADA <shigeru.hanada@gmail.com>:
>> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
>> in core, again.  This patch is basically rebased version of the patches
>> proposed in 9.2 development cycle, and contains some additional changes
>> such as concern about volatile variables for PG_CATCH blocks.  In
>> addition, I combined old three patches (pgsql_fdw core functionality,
>> push-down support, and analyze support) into one patch for ease of
>> review. (I think these features are necessary for pgsql_fdw use.)
>>
>> After the development for 9.2 cycle, pgsql_fdw can gather statistics
>> from remote side by providing sampling function to analyze module in
>> backend core, use them to estimate selectivity of WHERE clauses which
>> will be pushed-down, and retrieve query results from remote side with
>> custom row processor which prevents memory exhaustion from huge result set.
>>
>> It would be possible to add some more features, such as ORDER BY
>> push-down with index information support, without changing existing
>> APIs, but at first add relatively simple pgsql_fdw and enhance it seems
>> better.  In addition, once pgsql_fdw has been merged, it would help
>> implementing proof-of-concept of SQL/MED-related features.
>>
>> Regards,
>> --
>> Shigeru HANADA
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Shigeru HANADA

Date:

04 July 2012, 06:53:40

Hi Kaigai-san,

Sorry for delayed reply.

On Tue, Jun 26, 2012 at 10:50 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> * It seems to me expected results of the regression test is not
>   attached, even though test cases were included. Please add it.

AFAICS the patch I posted contains both test script and expected result.If you have created new git branch and applied
patch,you might have
 
forgotten to git-add expected/pgsql_fdw.out.  Would you check that?

> * cleanup_connection() does not close the connection in case
>   when this callback was invoked due to an error under sub-
>   transaction. It probably makes problematic behavior.
>
>   See the following steps to reproduce the matter:

Oops, good catch.  I could reproduce this problem.

>   Once we got an error under the remote transaction, it eventually raises
>   an error on local side, then cleanup_connection() should be invoked.
>   But it ignores the error due to sub-transaction, thus, the remote transaction
>   being already aborted is kept.
>   I'd like to suggest two remedy.
>   1. connections are closed, even if errors on sub-transaction.
>   2. close the connection if PQexecParams() returns an error,
>       on execute_query() prior to raise a local error.

Background: the reason why pgsql_fdw doesn't disconnect at errors in
sub-transaction is to allow FETCHing from cursor which uses foreign
table and is opened before the local error.

How about to close connection when released connection is in failed
transaction?  It can be determined with PQtransactionStatus() in
ReleaseConnection().

>   * Regarding to deparseSimpleSql(), it pulls attributes being referenced
>     from baserestrictinfo and reltargetlist using pull_var_clause().
>     Is it unavailable to use local_conds instead of baserestrictinfo?
>     We can optimize references to the variable being consumed at the
>     remote side only. All we need to transfer is variables referenced
>     in targetlist and local-clauses.

It sounds possible.

>     In addition, is pull_var_clause() reasonable to list up all the attribute
>     referenced at the both expression tree? It seems to be pull_varattnos()
>     is more useful API in this situation.

Oh, I didn't know that.  It seems more efficient, I'll check it out.

>   * Regarding to deparseRelation(), it scans simple_rte_array to fetch
>     RangeTblEntry with relation-id of the target foreign table.
>     It does not match correct entry if same foreign table is appeared in
>     this list twice or more, like a case of self-join. I'd like to recommend
>     to use simple_rte_array[baserel->relid] instead.
>     In addition, it checks whether relkind is RELKIND_FOREIGN_TABLE,
>     or not. It seems to me this check should be Assert(), if routines of
>     pgsql_fdw is called towards regular relations.
>
>   * Regarding to deparseVar(), is it unnecessary to check relkind of
>     the relation being referenced by Var node, isn't it?
>
>   * Regarding to deparseBoolExpr(), compiler raised a warning
>     because op can be used without initialized.
>
>   * Even though it is harmless, sortConditions() is a misleading function
>     name. How about categolizeConditions() or screeningConditions()?

I'll reply these comments in another post.

Regards,
-- 
Shigeru Hanada

Re: pgsql_fdw in contrib

From

Shigeru HANADA

Date:

05 July 2012, 08:17:15

On Tue, Jun 26, 2012 at 10:50 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>   * Regarding to deparseSimpleSql(), it pulls attributes being referenced
>     from baserestrictinfo and reltargetlist using pull_var_clause().
>     Is it unavailable to use local_conds instead of baserestrictinfo?
>     We can optimize references to the variable being consumed at the
>     remote side only. All we need to transfer is variables referenced
>     in targetlist and local-clauses.

Indeed, fixed.

>     In addition, is pull_var_clause() reasonable to list up all the attribute
>     referenced at the both expression tree? It seems to be pull_varattnos()
>     is more useful API in this situation.

Only for searching, yes.  However, sooner or later we need Var objects
to deparse remote SELECT clause, so pull_var_clause seems better here.
ISTM one of advantage to use bitmap is matching with another bitmap,
like index only scan code checks whether all used attributes is
contained by indexed attributes.

>   * Regarding to deparseRelation(), it scans simple_rte_array to fetch
>     RangeTblEntry with relation-id of the target foreign table.
>     It does not match correct entry if same foreign table is appeared in
>     this list twice or more, like a case of self-join. I'd like to recommend
>     to use simple_rte_array[baserel->relid] instead.

Reasonable idea.  After some thoughts, I think that deparseRelation
should receive RangeTblEntry instead of relation oid (and then
PlannerInfo is not necessary).

>     In addition, it checks whether relkind is RELKIND_FOREIGN_TABLE,
>     or not. It seems to me this check should be Assert(), if routines of
>     pgsql_fdw is called towards regular relations.
>
>   * Regarding to deparseVar(), is it unnecessary to check relkind of
>     the relation being referenced by Var node, isn't it?

IIRC, this is remain of past trial for general deparser...  Removed
unnecessary relkind checks.  Fixed.

>   * Regarding to deparseBoolExpr(), compiler raised a warning
>     because op can be used without initialized.

Fixed.

>   * Even though it is harmless, sortConditions() is a misleading function
>     name. How about categolizeConditions() or screeningConditions()?

How about classifyConditions?
# I hope native English speaker's help for wording issue... :-)

Regards,
-- 
Shigeru Hanada

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

09 July 2012, 01:39:26

2012/7/5 Shigeru HANADA <shigeru.hanada@gmail.com>:
>>     In addition, is pull_var_clause() reasonable to list up all the attribute
>>     referenced at the both expression tree? It seems to be pull_varattnos()
>>     is more useful API in this situation.
>
> Only for searching, yes.  However, sooner or later we need Var objects
> to deparse remote SELECT clause, so pull_var_clause seems better here.
> ISTM one of advantage to use bitmap is matching with another bitmap,
> like index only scan code checks whether all used attributes is
> contained by indexed attributes.
>
It seems to me a reasonable decision.

>>   * Regarding to deparseRelation(), it scans simple_rte_array to fetch
>>     RangeTblEntry with relation-id of the target foreign table.
>>     It does not match correct entry if same foreign table is appeared in
>>     this list twice or more, like a case of self-join. I'd like to recommend
>>     to use simple_rte_array[baserel->relid] instead.
>
> Reasonable idea.  After some thoughts, I think that deparseRelation
> should receive RangeTblEntry instead of relation oid (and then
> PlannerInfo is not necessary).
>
I also like this design.

>>   * Even though it is harmless, sortConditions() is a misleading function
>>     name. How about categolizeConditions() or screeningConditions()?
>
> How about classifyConditions?
> # I hope native English speaker's help for wording issue... :-)
>
Even though I'm not a native English speaker, it seems to me
"classify" is less confusing term than "sort" in this context.
If it would be a matter, you can get pointed out on committer's
review. So, please go ahead.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Peter Eisentraut

Date:

11 July 2012, 18:04:38

On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote:
> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
> in core, again.

Do you have any new proposals regarding naming, and how to deal with
postgresql_fdw_validator, and dblink?

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

12 July 2012, 01:26:04

2012/7/11 Peter Eisentraut <peter_e@gmx.net>:
> On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote:
>> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
>> in core, again.
>
> Do you have any new proposals regarding naming, and how to deal with
> postgresql_fdw_validator, and dblink?
>
This patch uses pgsql_fdw_validator for its own validator naming.

Which point was the main issue in the last commit-fest?
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Peter Eisentraut

Date:

12 July 2012, 01:45:32

On tor, 2012-07-12 at 06:25 +0200, Kohei KaiGai wrote:
> 2012/7/11 Peter Eisentraut <peter_e@gmx.net>:
> > On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote:
> >> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
> >> in core, again.
> >
> > Do you have any new proposals regarding naming, and how to deal with
> > postgresql_fdw_validator, and dblink?
> >
> This patch uses pgsql_fdw_validator for its own validator naming.
> 
> Which point was the main issue in the last commit-fest?

That this module should be called postgresql_fdw.

Re: pgsql_fdw in contrib

From

"Etsuro Fujita"

Date:

12 July 2012, 07:05:58

> 2012/6/26 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> > Harada-san,
> >
> > I checked your patch, and had an impression that includes many
> > improvements from the previous revision that I looked at the last
> > commit fest.
> >
> > However, I noticed several points to be revised, or investigated.
> >
> > * It seems to me expected results of the regression test is not
> >   attached, even though test cases were included. Please add it.

KaiGai-san, Did you find the file?  That is in the contrib/pgsql_fdw/expected
directory, named pgsql_fdw.out.  If necessary, I will send the file to you.
BTW, I found some bugs on the expected results of the file.  Attached is a patch
fixing the bugs.

Thanks,

Best regards,
Etsuro Fujita

Re: pgsql_fdw in contrib

From

Shigeru HANADA

Date:

12 July 2012, 07:45:20

(2012/07/12 6:04), Peter Eisentraut wrote:
> On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote:
>> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
>> in core, again.
> 
> Do you have any new proposals regarding naming, and how to deal with
> postgresql_fdw_validator, and dblink?

Yes, I've proposed to rename existing postgresql_fdw_validator to
dblink_fdw_validator and move it into contrib/dblink so that pgsql_fdw
can use the name "postgresql_fdw" and "postgresql_fdw_validator".
Though this post has had no response...

http://archives.postgresql.org/message-id/4F854F43.4030200@gmail.com

Regards,
-- 
Shigeru HANADA

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

12 July 2012, 08:48:40

2012/7/12 Shigeru HANADA <shigeru.hanada@gmail.com>:
> (2012/07/12 6:04), Peter Eisentraut wrote:
>> On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote:
>>> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module
>>> in core, again.
>>
>> Do you have any new proposals regarding naming, and how to deal with
>> postgresql_fdw_validator, and dblink?
>
> Yes, I've proposed to rename existing postgresql_fdw_validator to
> dblink_fdw_validator and move it into contrib/dblink so that pgsql_fdw
> can use the name "postgresql_fdw" and "postgresql_fdw_validator".
> Though this post has had no response...
>
It seems to me what postgresql_fdw_validator() is doing looks like
a function to be named as "libpq_fdw_validator()".

How about your opinion? It will help this namespace conflicts.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Shigeru HANADA

Date:

12 July 2012, 22:18:29

(2012/07/12 20:48), Kohei KaiGai wrote:
> It seems to me what postgresql_fdw_validator() is doing looks like
> a function to be named as "libpq_fdw_validator()".
> 
> How about your opinion? It will help this namespace conflicts.

I'd prefer dblink_fdw_validator.

The name "libpq_fdw_validator" impresses me that a concrete FDW named
"libpq_fdw" is somewhere and it retrieves external data *from* libpq.
Indeed postgresql_fdw_validator allows only some of libpq options at the
moment, but we won't be able to rename it for backward compatibility
even if it wants to have non-libpq options in the future.

IMO basically each FDW validator should be owned by a particular FDW,
because in most cases validator should know FDW's internal deeply.  In
addition, it would want to have new options for new features.

Besides naming, as mentioned upthread, removing hard-coded libpq options
list from dblink and leaving it to libpq client library would make
dblink more robust about libpq option changes in future.

Regards,
-- 
Shigeru HANADA

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

13 July 2012, 00:04:23

2012/7/13 Shigeru HANADA <shigeru.hanada@gmail.com>:
> (2012/07/12 20:48), Kohei KaiGai wrote:
>> It seems to me what postgresql_fdw_validator() is doing looks like
>> a function to be named as "libpq_fdw_validator()".
>>
>> How about your opinion? It will help this namespace conflicts.
>
> I'd prefer dblink_fdw_validator.
>
> The name "libpq_fdw_validator" impresses me that a concrete FDW named
> "libpq_fdw" is somewhere and it retrieves external data *from* libpq.
> Indeed postgresql_fdw_validator allows only some of libpq options at the
> moment, but we won't be able to rename it for backward compatibility
> even if it wants to have non-libpq options in the future.
>
> IMO basically each FDW validator should be owned by a particular FDW,
> because in most cases validator should know FDW's internal deeply.  In
> addition, it would want to have new options for new features.
>
> Besides naming, as mentioned upthread, removing hard-coded libpq options
> list from dblink and leaving it to libpq client library would make
> dblink more robust about libpq option changes in future.
>
OK, it seems to me fair enough.

Does someone have different opinions?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

"Etsuro Fujita"

Date:

13 July 2012, 00:30:39

Hi Hanada-san,

> It would be possible to add some more features, such as ORDER BY
> push-down with index information support, without changing existing
> APIs, but at first add relatively simple pgsql_fdw and enhance it seems
> better.  In addition, once pgsql_fdw has been merged, it would help
> implementing proof-of-concept of SQL/MED-related features.

I agree with on this point.  However, I think it is preferable that pgsql_fdw
should support, from the start, the push down of PARAM_EXEC params, and thus the
parameter-change-driven remote-rescanning functionality for that.  I think that
such a functionality is necessary for pgsql_fdw to efficiently process SubLinks
on remote tables, and to realize parameterized scans in future, which I think
will be proof-of-concept code to demonstrate how to enhance FDWs to developers,
as discussed in the earlier thread of IMAP FDW...

Thanks,

Best regards,
Etsuro Fujita

Re: pgsql_fdw in contrib

From

Peter Eisentraut

Date:

13 July 2012, 16:46:23

On tor, 2012-07-12 at 19:44 +0900, Shigeru HANADA wrote:
> Yes, I've proposed to rename existing postgresql_fdw_validator to
> dblink_fdw_validator and move it into contrib/dblink so that pgsql_fdw
> can use the name "postgresql_fdw" and "postgresql_fdw_validator".

I was somehow under the impression that plproxy also used this, but I
see now that it doesn't.  So I agree with renaming the existing
postgresql_fdw and moving it into the dblink module.

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

15 July 2012, 17:03:54

2012/7/12 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
>> 2012/6/26 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> > Harada-san,
>> >
>> > I checked your patch, and had an impression that includes many
>> > improvements from the previous revision that I looked at the last
>> > commit fest.
>> >
>> > However, I noticed several points to be revised, or investigated.
>> >
>> > * It seems to me expected results of the regression test is not
>> >   attached, even though test cases were included. Please add it.
>
> KaiGai-san, Did you find the file?  That is in the contrib/pgsql_fdw/expected
> directory, named pgsql_fdw.out.  If necessary, I will send the file to you.
> BTW, I found some bugs on the expected results of the file.  Attached is a patch
> fixing the bugs.
>
Fujita-san,

Yes, I overlooked the expected regression test result. Probably, I forgot
to add my git repository as Hanada-san pointed out.

BTW, your patch does not make sense in my environment that is just
after initdb without any parameter customizing. Could you give us
the step to reproduce the Nested-Loop plan from Hash-Join?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Kohei KaiGai

Date:

19 July 2012, 06:06:41

Hanada-san,

What about the status of your patch?

Even though the 1st commit-fest is getting closed soon,
I'd like to pay efforts for reviewing to pull up the status of
pgsql_fdw into "ready for committer" by beginning of the
upcoming commit-fest.

Thanks,

2012/7/13 Shigeru HANADA <shigeru.hanada@gmail.com>:
> (2012/07/12 20:48), Kohei KaiGai wrote:
>> It seems to me what postgresql_fdw_validator() is doing looks like
>> a function to be named as "libpq_fdw_validator()".
>>
>> How about your opinion? It will help this namespace conflicts.
>
> I'd prefer dblink_fdw_validator.
>
> The name "libpq_fdw_validator" impresses me that a concrete FDW named
> "libpq_fdw" is somewhere and it retrieves external data *from* libpq.
> Indeed postgresql_fdw_validator allows only some of libpq options at the
> moment, but we won't be able to rename it for backward compatibility
> even if it wants to have non-libpq options in the future.
>
> IMO basically each FDW validator should be owned by a particular FDW,
> because in most cases validator should know FDW's internal deeply.  In
> addition, it would want to have new options for new features.
>
> Besides naming, as mentioned upthread, removing hard-coded libpq options
> list from dblink and leaving it to libpq client library would make
> dblink more robust about libpq option changes in future.
>
> Regards,
> --
> Shigeru HANADA
>
>
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: pgsql_fdw in contrib

From

Shigeru HANADA

Date:

20 July 2012, 06:25:41

On Thu, Jul 19, 2012 at 6:06 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> Hanada-san,
>
> What about the status of your patch?

Sorry for absence.  I have been struggling with connection recovery
issue, but I haven't fixed it yet.  So I'd like to withdraw this patch
from this CF and mark it as "returned with feedback" because it is far
from commit at the moment.

Of course fixing naming issue too is in my TODO list.

> Even though the 1st commit-fest is getting closed soon,
> I'd like to pay efforts for reviewing to pull up the status of
> pgsql_fdw into "ready for committer" by beginning of the
> upcoming commit-fest.

Thanks again, I'll try harder this summer. ;-)

Regards,
-- 
Shigeru Hanada

Re: pgsql_fdw in contrib

From

"Etsuro Fujita"

Date:

31 July 2012, 09:23:25

Hi KaiGai-san,

Sorry about the delay in answering.  I have been swamped with another thing
lately.

> BTW, your patch does not make sense in my environment that is just
> after initdb without any parameter customizing. Could you give us
> the step to reproduce the Nested-Loop plan from Hash-Join?

I examined both the Nested-Loop and Hash-Join plans by using the enable_xxxxx
options, and found that their total costs are almost equal.  The result might
depend on the execution environment (My environment is Linux 2.6.18 on x86_64.).
I think it is preferable to use the enable_xxxxx options for this regression
test like src/test/regress/sql/join.sql.

Thanks,

Best regards,
Etsuro Fujita