Thread: pgsql_fdw in contrib
I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module in core, again. This patch is basically rebased version of the patches proposed in 9.2 development cycle, and contains some additional changes such as concern about volatile variables for PG_CATCH blocks. In addition, I combined old three patches (pgsql_fdw core functionality, push-down support, and analyze support) into one patch for ease of review. (I think these features are necessary for pgsql_fdw use.) After the development for 9.2 cycle, pgsql_fdw can gather statistics from remote side by providing sampling function to analyze module in backend core, use them to estimate selectivity of WHERE clauses which will be pushed-down, and retrieve query results from remote side with custom row processor which prevents memory exhaustion from huge result set. It would be possible to add some more features, such as ORDER BY push-down with index information support, without changing existing APIs, but at first add relatively simple pgsql_fdw and enhance it seems better. In addition, once pgsql_fdw has been merged, it would help implementing proof-of-concept of SQL/MED-related features. Regards, -- Shigeru HANADA
Attachment
On Thu, Jun 14, 2012 at 7:29 AM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote: > I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module > in core, again. This patch is basically rebased version of the patches > proposed in 9.2 development cycle, and contains some additional changes > such as concern about volatile variables for PG_CATCH blocks. In > addition, I combined old three patches (pgsql_fdw core functionality, > push-down support, and analyze support) into one patch for ease of > review. (I think these features are necessary for pgsql_fdw use.) > > After the development for 9.2 cycle, pgsql_fdw can gather statistics > from remote side by providing sampling function to analyze module in > backend core, use them to estimate selectivity of WHERE clauses which > will be pushed-down, and retrieve query results from remote side with > custom row processor which prevents memory exhaustion from huge result set. > > It would be possible to add some more features, such as ORDER BY > push-down with index information support, without changing existing > APIs, but at first add relatively simple pgsql_fdw and enhance it seems > better. In addition, once pgsql_fdw has been merged, it would help > implementing proof-of-concept of SQL/MED-related features. I can't help but wonder (having been down the contrib/core/extension road myself) if it isn't better to improve the facilities to register and search for qualified extensions (like Perl CPAN) so that people looking for code to improve their backends can find it. That way, you're free to release/do xyz/abandon your project as you see fit without having to go through -hackers. This should also remove a lot of the stigma with not being in core since if stock postgres installations can access the necessary modules via CREATE EXTENSION, I think it will make it easier for projects like this to get used with the additional benefit of decentralizing project management. merlin
On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > I can't help but wonder (having been down the contrib/core/extension > road myself) if it isn't better to improve the facilities to register > and search for qualified extensions (like Perl CPAN) so that people > looking for code to improve their backends can find it. That way, > you're free to release/do xyz/abandon your project as you see fit > without having to go through -hackers. This should also remove a lot > of the stigma with not being in core since if stock postgres > installations can access the necessary modules via CREATE EXTENSION, I > think it will make it easier for projects like this to get used with > the additional benefit of decentralizing project management. Well, if you're the type that likes to install everything via RPMs (and I am) then you wouldn't want this behavior, especially not automagically. It seems to open up a host of security risks, too: I believe Tom has previously stated that Red Hat (and other distro) packaging guidelines forbid packages from installing software in places where they can then turn around and run it. I suppose CPAN must have some sort of exception to this policy, though, so maybe it's not ironclad. Still, it seems like a bit of a distraction in this case: I think we want to have at least one FDW in core that actually talks to some other database server, rather than just to a file, and it seems like pgsql_fdw is the obvious choice by a mile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2012/6/18 Merlin Moncure <mmoncure@gmail.com>: > I can't help but wonder (having been down the contrib/core/extension > road myself) if it isn't better to improve the facilities to register > and search for qualified extensions (like Perl CPAN) so that people > looking for code to improve their backends can find it. That way, > you're free to release/do xyz/abandon your project as you see fit > without having to go through -hackers. This should also remove a lot > of the stigma with not being in core since if stock postgres > installations can access the necessary modules via CREATE EXTENSION, I > think it will make it easier for projects like this to get used with > the additional benefit of decentralizing project management. What about PGXN? BTW, I'm with Robert about "we want to have at least one FDW in core that actually talks to some other database server, rather than just to a file, and it seems like pgsql_fdw is the obvious choice". We have dblink as contrib, why not pgsql_fdw too? Other FDWs could be available at PGXN, pgFoundry, Github, etc. []s -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> I can't help but wonder (having been down the contrib/core/extension >> road myself) if it isn't better to improve the facilities to register >> and search for qualified extensions (like Perl CPAN) so that people >> looking for code to improve their backends can find it. That way, >> you're free to release/do xyz/abandon your project as you see fit >> without having to go through -hackers. This should also remove a lot >> of the stigma with not being in core since if stock postgres >> installations can access the necessary modules via CREATE EXTENSION, I >> think it will make it easier for projects like this to get used with >> the additional benefit of decentralizing project management. > > Well, if you're the type that likes to install everything via RPMs > (and I am) then you wouldn't want this behavior, especially not > automagically. It seems to open up a host of security risks, too: I > believe Tom has previously stated that Red Hat (and other distro) > packaging guidelines forbid packages from installing software in > places where they can then turn around and run it. I suppose CPAN > must have some sort of exception to this policy, though, so maybe it's > not ironclad. Huh? CPAN is just one example -- PyPM for python, npm for node etc etc. There is some distinction to that rule that is being missed so that it doesn't apply to cases like this -- probably that it is transacted by the user and/or requires su. I think your points are supporting mine: the vast majority of postgres administrators deal only with o/s packaged rpms and therefore with the status quo are unable to utilize any extension that is not packaged with contrib. This means that there are two possible outcomes if you want to write an extension: 1) get accepted into core 2) never get used Given that running the gauntlet to #1 is not a particularly attractive option (or even possible) in many cases for various reasons it tends to discourage module development by 3rd parties. There are several very high quality extensions that could get a lot more exposure...for example pl/sh. But anyways, if you're happy with pgsql_fdw (aside: i looked at it, and it's great!) in core, then by all means... merlin
2012/6/19 Merlin Moncure <mmoncure@gmail.com>: > On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> I can't help but wonder (having been down the contrib/core/extension >>> road myself) if it isn't better to improve the facilities to register >>> and search for qualified extensions (like Perl CPAN) so that people >>> looking for code to improve their backends can find it. That way, >>> you're free to release/do xyz/abandon your project as you see fit >>> without having to go through -hackers. This should also remove a lot >>> of the stigma with not being in core since if stock postgres >>> installations can access the necessary modules via CREATE EXTENSION, I >>> think it will make it easier for projects like this to get used with >>> the additional benefit of decentralizing project management. >> >> Well, if you're the type that likes to install everything via RPMs >> (and I am) then you wouldn't want this behavior, especially not >> automagically. It seems to open up a host of security risks, too: I >> believe Tom has previously stated that Red Hat (and other distro) >> packaging guidelines forbid packages from installing software in >> places where they can then turn around and run it. I suppose CPAN >> must have some sort of exception to this policy, though, so maybe it's >> not ironclad. > > Huh? CPAN is just one example -- PyPM for python, npm for node etc > etc. There is some distinction to that rule that is being missed so > that it doesn't apply to cases like this -- probably that it is > transacted by the user and/or requires su. > > I think your points are supporting mine: the vast majority of postgres > administrators deal only with o/s packaged rpms and therefore with the > status quo are unable to utilize any extension that is not packaged > with contrib. This means that there are two possible outcomes if you > want to write an extension: > > 1) get accepted into core > 2) never get used > > Given that running the gauntlet to #1 is not a particularly attractive > option (or even possible) in many cases for various reasons it tends > to discourage module development by 3rd parties. There are several > very high quality extensions that could get a lot more exposure...for > example pl/sh. > > But anyways, if you're happy with pgsql_fdw (aside: i looked at it, > and it's great!) in core, then by all means... > Let me push the pgsql_fdw in core from different perspective. Right now, FDW is a feature that will take many enhancement in the near future like join-pushdown, writable APIs and so on. If we would not have a FDW extension in core that communicate with actual RDBMS, instead of flat files, it makes our development efforts more complex. Please assume a scenario people alwats tries to submit two patches; one for the core, and the other for a particular out-of-tree extension. I believe it is reasonable choice to have a FDW for PostgreSQL in core, rather than MySQL or Oracle. We can use this extension to show how new feature works, whenever we try to extend FDW APIs. BTW, Hanada-san- It seems to me the expected result of regression test was not included in this patch. Please submit it again- Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Tue, Jun 19, 2012 at 10:15 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > Let me push the pgsql_fdw in core from different perspective. > > Right now, FDW is a feature that will take many enhancement in > the near future like join-pushdown, writable APIs and so on. > If we would not have a FDW extension in core that communicate > with actual RDBMS, instead of flat files, it makes our development > efforts more complex. Please assume a scenario people alwats > tries to submit two patches; one for the core, and the other for > a particular out-of-tree extension. yeah. plus, it's nice to have a very high quality fdw implementation in core for others to crib from. I have no objection -- I just took a convenient opportunity to hijack the thread for some hand-wavy ideas about an extension repository. :-). merlin
Harada-san, I checked your patch, and had an impression that includes many improvements from the previous revision that I looked at the last commit fest. However, I noticed several points to be revised, or investigated. * It seems to me expected results of the regression test is not attached, even though test cases were included. Please addit. * cleanup_connection() does not close the connection in case when this callback was invoked due to an error under sub- transaction.It probably makes problematic behavior. See the following steps to reproduce the matter: postgres=# BEGIN; BEGIN postgres=# SELECT * FROM ft3;a | b ---+-----1 | aaa2 | bbb3 | ccc4 | ddd5 | eee6 | fff (6 rows) postgres=# SAVEPOINT sv_01; SAVEPOINT postgres=# SELECT * FROM ft3 WHERE 1 / (a - 4) > 0; -- should be error on remote transaction ERROR: could not execute remote query DETAIL: ERROR: division by zero HINT: SELECT a, b FROM public.t1 WHERE (((1 OPERATOR(pg_catalog./) (a OPERATOR(pg_catalog.-) 4)) OPERATOR(pg_catalog.>) 0)) postgres=# ROLLBACK TO SAVEPOINT sv_01; ROLLBACK postgres=# SELECT * FROM ft3; ERROR: could not execute EXPLAIN for cost estimation DETAIL: ERROR: current transaction is aborted, commands ignored until end of transaction block HINT: SELECT a, b FROM public.t1 Once we got an error under the remote transaction, it eventually raises an error on local side, then cleanup_connection()should be invoked. But it ignores the error due to sub-transaction, thus, the remote transaction beingalready aborted is kept. I'd like to suggest two remedy. 1. connections are closed, even if errors on sub-transaction.2. close the connection if PQexecParams() returns an error, on execute_query() prior to raise a localerror. * Regarding to deparseSimpleSql(), it pulls attributes being referenced from baserestrictinfo and reltargetlist usingpull_var_clause(). Is it unavailable to use local_conds instead of baserestrictinfo? We can optimize referencesto the variable being consumed at the remote side only. All we need to transfer is variables referenced intargetlist and local-clauses. In addition, is pull_var_clause() reasonable to list up all the attribute referencedat the both expression tree? It seems to be pull_varattnos() is more useful API in this situation. * Regarding to deparseRelation(), it scans simple_rte_array to fetch RangeTblEntry with relation-id of the target foreigntable. It does not match correct entry if same foreign table is appeared in this list twice or more, like a caseof self-join. I'd like to recommend to use simple_rte_array[baserel->relid] instead. In addition, it checks whetherrelkind is RELKIND_FOREIGN_TABLE, or not. It seems to me this check should be Assert(), if routines of pgsql_fdwis called towards regular relations. * Regarding to deparseVar(), is it unnecessary to check relkind of the relation being referenced by Var node, isn't it? * Regarding to deparseBoolExpr(), compiler raised a warning because op can be used without initialized. * Even though it is harmless, sortConditions() is a misleading function name. How about categolizeConditions() or screeningConditions()? Thanks for your great jobs. 2012/6/14 Shigeru HANADA <shigeru.hanada@gmail.com>: > I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module > in core, again. This patch is basically rebased version of the patches > proposed in 9.2 development cycle, and contains some additional changes > such as concern about volatile variables for PG_CATCH blocks. In > addition, I combined old three patches (pgsql_fdw core functionality, > push-down support, and analyze support) into one patch for ease of > review. (I think these features are necessary for pgsql_fdw use.) > > After the development for 9.2 cycle, pgsql_fdw can gather statistics > from remote side by providing sampling function to analyze module in > backend core, use them to estimate selectivity of WHERE clauses which > will be pushed-down, and retrieve query results from remote side with > custom row processor which prevents memory exhaustion from huge result set. > > It would be possible to add some more features, such as ORDER BY > push-down with index information support, without changing existing > APIs, but at first add relatively simple pgsql_fdw and enhance it seems > better. In addition, once pgsql_fdw has been merged, it would help > implementing proof-of-concept of SQL/MED-related features. > > Regards, > -- > Shigeru HANADA > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Hanada-san, Regarding to the issue around sub-transaction abort, an ideal solution might be execution of SAVEPOINT command on remote side synchronously. It allows to rollback the active transaction into the savepoint on the remote server when local one get rolled-back. However, I'm not inclined to merge anything ideal within a single patch, and I'd like to recommend to keep your idea simple then revise the functionality on the upcoming three commit fest. How about your opinion? The author is you. 2012/6/26 Kohei KaiGai <kaigai@kaigai.gr.jp>: > Harada-san, > > I checked your patch, and had an impression that includes many > improvements from the previous revision that I looked at the last > commit fest. > > However, I noticed several points to be revised, or investigated. > > * It seems to me expected results of the regression test is not > attached, even though test cases were included. Please add it. > > * cleanup_connection() does not close the connection in case > when this callback was invoked due to an error under sub- > transaction. It probably makes problematic behavior. > > See the following steps to reproduce the matter: > > postgres=# BEGIN; > BEGIN > postgres=# SELECT * FROM ft3; > a | b > ---+----- > 1 | aaa > 2 | bbb > 3 | ccc > 4 | ddd > 5 | eee > 6 | fff > (6 rows) > > postgres=# SAVEPOINT sv_01; > SAVEPOINT > > postgres=# SELECT * FROM ft3 WHERE 1 / (a - 4) > 0; -- should be > error on remote transaction > ERROR: could not execute remote query > DETAIL: ERROR: division by zero > > HINT: SELECT a, b FROM public.t1 WHERE (((1 OPERATOR(pg_catalog./) (a > OPERATOR(pg_catalog.-) 4)) OPERATOR(pg_catalog.>) 0)) > > postgres=# ROLLBACK TO SAVEPOINT sv_01; > ROLLBACK > > postgres=# SELECT * FROM ft3; > ERROR: could not execute EXPLAIN for cost estimation > DETAIL: ERROR: current transaction is aborted, commands ignored > until end of transaction block > > HINT: SELECT a, b FROM public.t1 > > Once we got an error under the remote transaction, it eventually raises > an error on local side, then cleanup_connection() should be invoked. > But it ignores the error due to sub-transaction, thus, the remote transaction > being already aborted is kept. > I'd like to suggest two remedy. > 1. connections are closed, even if errors on sub-transaction. > 2. close the connection if PQexecParams() returns an error, > on execute_query() prior to raise a local error. > > * Regarding to deparseSimpleSql(), it pulls attributes being referenced > from baserestrictinfo and reltargetlist using pull_var_clause(). > Is it unavailable to use local_conds instead of baserestrictinfo? > We can optimize references to the variable being consumed at the > remote side only. All we need to transfer is variables referenced > in targetlist and local-clauses. > In addition, is pull_var_clause() reasonable to list up all the attribute > referenced at the both expression tree? It seems to be pull_varattnos() > is more useful API in this situation. > > * Regarding to deparseRelation(), it scans simple_rte_array to fetch > RangeTblEntry with relation-id of the target foreign table. > It does not match correct entry if same foreign table is appeared in > this list twice or more, like a case of self-join. I'd like to recommend > to use simple_rte_array[baserel->relid] instead. > In addition, it checks whether relkind is RELKIND_FOREIGN_TABLE, > or not. It seems to me this check should be Assert(), if routines of > pgsql_fdw is called towards regular relations. > > * Regarding to deparseVar(), is it unnecessary to check relkind of > the relation being referenced by Var node, isn't it? > > * Regarding to deparseBoolExpr(), compiler raised a warning > because op can be used without initialized. > > * Even though it is harmless, sortConditions() is a misleading function > name. How about categolizeConditions() or screeningConditions()? > > Thanks for your great jobs. > > 2012/6/14 Shigeru HANADA <shigeru.hanada@gmail.com>: >> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module >> in core, again. This patch is basically rebased version of the patches >> proposed in 9.2 development cycle, and contains some additional changes >> such as concern about volatile variables for PG_CATCH blocks. In >> addition, I combined old three patches (pgsql_fdw core functionality, >> push-down support, and analyze support) into one patch for ease of >> review. (I think these features are necessary for pgsql_fdw use.) >> >> After the development for 9.2 cycle, pgsql_fdw can gather statistics >> from remote side by providing sampling function to analyze module in >> backend core, use them to estimate selectivity of WHERE clauses which >> will be pushed-down, and retrieve query results from remote side with >> custom row processor which prevents memory exhaustion from huge result set. >> >> It would be possible to add some more features, such as ORDER BY >> push-down with index information support, without changing existing >> APIs, but at first add relatively simple pgsql_fdw and enhance it seems >> better. In addition, once pgsql_fdw has been merged, it would help >> implementing proof-of-concept of SQL/MED-related features. >> >> Regards, >> -- >> Shigeru HANADA >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Hi Kaigai-san, Sorry for delayed reply. On Tue, Jun 26, 2012 at 10:50 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > * It seems to me expected results of the regression test is not > attached, even though test cases were included. Please add it. AFAICS the patch I posted contains both test script and expected result.If you have created new git branch and applied patch,you might have forgotten to git-add expected/pgsql_fdw.out. Would you check that? > * cleanup_connection() does not close the connection in case > when this callback was invoked due to an error under sub- > transaction. It probably makes problematic behavior. > > See the following steps to reproduce the matter: Oops, good catch. I could reproduce this problem. > Once we got an error under the remote transaction, it eventually raises > an error on local side, then cleanup_connection() should be invoked. > But it ignores the error due to sub-transaction, thus, the remote transaction > being already aborted is kept. > I'd like to suggest two remedy. > 1. connections are closed, even if errors on sub-transaction. > 2. close the connection if PQexecParams() returns an error, > on execute_query() prior to raise a local error. Background: the reason why pgsql_fdw doesn't disconnect at errors in sub-transaction is to allow FETCHing from cursor which uses foreign table and is opened before the local error. How about to close connection when released connection is in failed transaction? It can be determined with PQtransactionStatus() in ReleaseConnection(). > * Regarding to deparseSimpleSql(), it pulls attributes being referenced > from baserestrictinfo and reltargetlist using pull_var_clause(). > Is it unavailable to use local_conds instead of baserestrictinfo? > We can optimize references to the variable being consumed at the > remote side only. All we need to transfer is variables referenced > in targetlist and local-clauses. It sounds possible. > In addition, is pull_var_clause() reasonable to list up all the attribute > referenced at the both expression tree? It seems to be pull_varattnos() > is more useful API in this situation. Oh, I didn't know that. It seems more efficient, I'll check it out. > * Regarding to deparseRelation(), it scans simple_rte_array to fetch > RangeTblEntry with relation-id of the target foreign table. > It does not match correct entry if same foreign table is appeared in > this list twice or more, like a case of self-join. I'd like to recommend > to use simple_rte_array[baserel->relid] instead. > In addition, it checks whether relkind is RELKIND_FOREIGN_TABLE, > or not. It seems to me this check should be Assert(), if routines of > pgsql_fdw is called towards regular relations. > > * Regarding to deparseVar(), is it unnecessary to check relkind of > the relation being referenced by Var node, isn't it? > > * Regarding to deparseBoolExpr(), compiler raised a warning > because op can be used without initialized. > > * Even though it is harmless, sortConditions() is a misleading function > name. How about categolizeConditions() or screeningConditions()? I'll reply these comments in another post. Regards, -- Shigeru Hanada
On Tue, Jun 26, 2012 at 10:50 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > * Regarding to deparseSimpleSql(), it pulls attributes being referenced > from baserestrictinfo and reltargetlist using pull_var_clause(). > Is it unavailable to use local_conds instead of baserestrictinfo? > We can optimize references to the variable being consumed at the > remote side only. All we need to transfer is variables referenced > in targetlist and local-clauses. Indeed, fixed. > In addition, is pull_var_clause() reasonable to list up all the attribute > referenced at the both expression tree? It seems to be pull_varattnos() > is more useful API in this situation. Only for searching, yes. However, sooner or later we need Var objects to deparse remote SELECT clause, so pull_var_clause seems better here. ISTM one of advantage to use bitmap is matching with another bitmap, like index only scan code checks whether all used attributes is contained by indexed attributes. > * Regarding to deparseRelation(), it scans simple_rte_array to fetch > RangeTblEntry with relation-id of the target foreign table. > It does not match correct entry if same foreign table is appeared in > this list twice or more, like a case of self-join. I'd like to recommend > to use simple_rte_array[baserel->relid] instead. Reasonable idea. After some thoughts, I think that deparseRelation should receive RangeTblEntry instead of relation oid (and then PlannerInfo is not necessary). > In addition, it checks whether relkind is RELKIND_FOREIGN_TABLE, > or not. It seems to me this check should be Assert(), if routines of > pgsql_fdw is called towards regular relations. > > * Regarding to deparseVar(), is it unnecessary to check relkind of > the relation being referenced by Var node, isn't it? IIRC, this is remain of past trial for general deparser... Removed unnecessary relkind checks. Fixed. > * Regarding to deparseBoolExpr(), compiler raised a warning > because op can be used without initialized. Fixed. > * Even though it is harmless, sortConditions() is a misleading function > name. How about categolizeConditions() or screeningConditions()? How about classifyConditions? # I hope native English speaker's help for wording issue... :-) Regards, -- Shigeru Hanada
2012/7/5 Shigeru HANADA <shigeru.hanada@gmail.com>: >> In addition, is pull_var_clause() reasonable to list up all the attribute >> referenced at the both expression tree? It seems to be pull_varattnos() >> is more useful API in this situation. > > Only for searching, yes. However, sooner or later we need Var objects > to deparse remote SELECT clause, so pull_var_clause seems better here. > ISTM one of advantage to use bitmap is matching with another bitmap, > like index only scan code checks whether all used attributes is > contained by indexed attributes. > It seems to me a reasonable decision. >> * Regarding to deparseRelation(), it scans simple_rte_array to fetch >> RangeTblEntry with relation-id of the target foreign table. >> It does not match correct entry if same foreign table is appeared in >> this list twice or more, like a case of self-join. I'd like to recommend >> to use simple_rte_array[baserel->relid] instead. > > Reasonable idea. After some thoughts, I think that deparseRelation > should receive RangeTblEntry instead of relation oid (and then > PlannerInfo is not necessary). > I also like this design. >> * Even though it is harmless, sortConditions() is a misleading function >> name. How about categolizeConditions() or screeningConditions()? > > How about classifyConditions? > # I hope native English speaker's help for wording issue... :-) > Even though I'm not a native English speaker, it seems to me "classify" is less confusing term than "sort" in this context. If it would be a matter, you can get pointed out on committer's review. So, please go ahead. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote: > I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module > in core, again. Do you have any new proposals regarding naming, and how to deal with postgresql_fdw_validator, and dblink?
2012/7/11 Peter Eisentraut <peter_e@gmx.net>: > On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote: >> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module >> in core, again. > > Do you have any new proposals regarding naming, and how to deal with > postgresql_fdw_validator, and dblink? > This patch uses pgsql_fdw_validator for its own validator naming. Which point was the main issue in the last commit-fest? -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On tor, 2012-07-12 at 06:25 +0200, Kohei KaiGai wrote: > 2012/7/11 Peter Eisentraut <peter_e@gmx.net>: > > On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote: > >> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module > >> in core, again. > > > > Do you have any new proposals regarding naming, and how to deal with > > postgresql_fdw_validator, and dblink? > > > This patch uses pgsql_fdw_validator for its own validator naming. > > Which point was the main issue in the last commit-fest? That this module should be called postgresql_fdw.
> 2012/6/26 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > Harada-san, > > > > I checked your patch, and had an impression that includes many > > improvements from the previous revision that I looked at the last > > commit fest. > > > > However, I noticed several points to be revised, or investigated. > > > > * It seems to me expected results of the regression test is not > > attached, even though test cases were included. Please add it. KaiGai-san, Did you find the file? That is in the contrib/pgsql_fdw/expected directory, named pgsql_fdw.out. If necessary, I will send the file to you. BTW, I found some bugs on the expected results of the file. Attached is a patch fixing the bugs. Thanks, Best regards, Etsuro Fujita
(2012/07/12 6:04), Peter Eisentraut wrote: > On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote: >> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module >> in core, again. > > Do you have any new proposals regarding naming, and how to deal with > postgresql_fdw_validator, and dblink? Yes, I've proposed to rename existing postgresql_fdw_validator to dblink_fdw_validator and move it into contrib/dblink so that pgsql_fdw can use the name "postgresql_fdw" and "postgresql_fdw_validator". Though this post has had no response... http://archives.postgresql.org/message-id/4F854F43.4030200@gmail.com Regards, -- Shigeru HANADA
2012/7/12 Shigeru HANADA <shigeru.hanada@gmail.com>: > (2012/07/12 6:04), Peter Eisentraut wrote: >> On tor, 2012-06-14 at 21:29 +0900, Shigeru HANADA wrote: >>> I'd like to propose pgsql_fdw, FDW for PostgreSQL, as a contrib module >>> in core, again. >> >> Do you have any new proposals regarding naming, and how to deal with >> postgresql_fdw_validator, and dblink? > > Yes, I've proposed to rename existing postgresql_fdw_validator to > dblink_fdw_validator and move it into contrib/dblink so that pgsql_fdw > can use the name "postgresql_fdw" and "postgresql_fdw_validator". > Though this post has had no response... > It seems to me what postgresql_fdw_validator() is doing looks like a function to be named as "libpq_fdw_validator()". How about your opinion? It will help this namespace conflicts. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
(2012/07/12 20:48), Kohei KaiGai wrote: > It seems to me what postgresql_fdw_validator() is doing looks like > a function to be named as "libpq_fdw_validator()". > > How about your opinion? It will help this namespace conflicts. I'd prefer dblink_fdw_validator. The name "libpq_fdw_validator" impresses me that a concrete FDW named "libpq_fdw" is somewhere and it retrieves external data *from* libpq. Indeed postgresql_fdw_validator allows only some of libpq options at the moment, but we won't be able to rename it for backward compatibility even if it wants to have non-libpq options in the future. IMO basically each FDW validator should be owned by a particular FDW, because in most cases validator should know FDW's internal deeply. In addition, it would want to have new options for new features. Besides naming, as mentioned upthread, removing hard-coded libpq options list from dblink and leaving it to libpq client library would make dblink more robust about libpq option changes in future. Regards, -- Shigeru HANADA
2012/7/13 Shigeru HANADA <shigeru.hanada@gmail.com>: > (2012/07/12 20:48), Kohei KaiGai wrote: >> It seems to me what postgresql_fdw_validator() is doing looks like >> a function to be named as "libpq_fdw_validator()". >> >> How about your opinion? It will help this namespace conflicts. > > I'd prefer dblink_fdw_validator. > > The name "libpq_fdw_validator" impresses me that a concrete FDW named > "libpq_fdw" is somewhere and it retrieves external data *from* libpq. > Indeed postgresql_fdw_validator allows only some of libpq options at the > moment, but we won't be able to rename it for backward compatibility > even if it wants to have non-libpq options in the future. > > IMO basically each FDW validator should be owned by a particular FDW, > because in most cases validator should know FDW's internal deeply. In > addition, it would want to have new options for new features. > > Besides naming, as mentioned upthread, removing hard-coded libpq options > list from dblink and leaving it to libpq client library would make > dblink more robust about libpq option changes in future. > OK, it seems to me fair enough. Does someone have different opinions? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Hi Hanada-san, > It would be possible to add some more features, such as ORDER BY > push-down with index information support, without changing existing > APIs, but at first add relatively simple pgsql_fdw and enhance it seems > better. In addition, once pgsql_fdw has been merged, it would help > implementing proof-of-concept of SQL/MED-related features. I agree with on this point. However, I think it is preferable that pgsql_fdw should support, from the start, the push down of PARAM_EXEC params, and thus the parameter-change-driven remote-rescanning functionality for that. I think that such a functionality is necessary for pgsql_fdw to efficiently process SubLinks on remote tables, and to realize parameterized scans in future, which I think will be proof-of-concept code to demonstrate how to enhance FDWs to developers, as discussed in the earlier thread of IMAP FDW... Thanks, Best regards, Etsuro Fujita
On tor, 2012-07-12 at 19:44 +0900, Shigeru HANADA wrote: > Yes, I've proposed to rename existing postgresql_fdw_validator to > dblink_fdw_validator and move it into contrib/dblink so that pgsql_fdw > can use the name "postgresql_fdw" and "postgresql_fdw_validator". I was somehow under the impression that plproxy also used this, but I see now that it doesn't. So I agree with renaming the existing postgresql_fdw and moving it into the dblink module.
2012/7/12 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: >> 2012/6/26 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> > Harada-san, >> > >> > I checked your patch, and had an impression that includes many >> > improvements from the previous revision that I looked at the last >> > commit fest. >> > >> > However, I noticed several points to be revised, or investigated. >> > >> > * It seems to me expected results of the regression test is not >> > attached, even though test cases were included. Please add it. > > KaiGai-san, Did you find the file? That is in the contrib/pgsql_fdw/expected > directory, named pgsql_fdw.out. If necessary, I will send the file to you. > BTW, I found some bugs on the expected results of the file. Attached is a patch > fixing the bugs. > Fujita-san, Yes, I overlooked the expected regression test result. Probably, I forgot to add my git repository as Hanada-san pointed out. BTW, your patch does not make sense in my environment that is just after initdb without any parameter customizing. Could you give us the step to reproduce the Nested-Loop plan from Hash-Join? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Hanada-san, What about the status of your patch? Even though the 1st commit-fest is getting closed soon, I'd like to pay efforts for reviewing to pull up the status of pgsql_fdw into "ready for committer" by beginning of the upcoming commit-fest. Thanks, 2012/7/13 Shigeru HANADA <shigeru.hanada@gmail.com>: > (2012/07/12 20:48), Kohei KaiGai wrote: >> It seems to me what postgresql_fdw_validator() is doing looks like >> a function to be named as "libpq_fdw_validator()". >> >> How about your opinion? It will help this namespace conflicts. > > I'd prefer dblink_fdw_validator. > > The name "libpq_fdw_validator" impresses me that a concrete FDW named > "libpq_fdw" is somewhere and it retrieves external data *from* libpq. > Indeed postgresql_fdw_validator allows only some of libpq options at the > moment, but we won't be able to rename it for backward compatibility > even if it wants to have non-libpq options in the future. > > IMO basically each FDW validator should be owned by a particular FDW, > because in most cases validator should know FDW's internal deeply. In > addition, it would want to have new options for new features. > > Besides naming, as mentioned upthread, removing hard-coded libpq options > list from dblink and leaving it to libpq client library would make > dblink more robust about libpq option changes in future. > > Regards, > -- > Shigeru HANADA > > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Thu, Jul 19, 2012 at 6:06 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > Hanada-san, > > What about the status of your patch? Sorry for absence. I have been struggling with connection recovery issue, but I haven't fixed it yet. So I'd like to withdraw this patch from this CF and mark it as "returned with feedback" because it is far from commit at the moment. Of course fixing naming issue too is in my TODO list. > Even though the 1st commit-fest is getting closed soon, > I'd like to pay efforts for reviewing to pull up the status of > pgsql_fdw into "ready for committer" by beginning of the > upcoming commit-fest. Thanks again, I'll try harder this summer. ;-) Regards, -- Shigeru Hanada
Hi KaiGai-san, Sorry about the delay in answering. I have been swamped with another thing lately. > BTW, your patch does not make sense in my environment that is just > after initdb without any parameter customizing. Could you give us > the step to reproduce the Nested-Loop plan from Hash-Join? I examined both the Nested-Loop and Hash-Join plans by using the enable_xxxxx options, and found that their total costs are almost equal. The result might depend on the execution environment (My environment is Linux 2.6.18 on x86_64.). I think it is preferable to use the enable_xxxxx options for this regression test like src/test/regress/sql/join.sql. Thanks, Best regards, Etsuro Fujita