Thread: Foreign join pushdown vs EvalPlanQual
Hi, While reviewing the foreign join pushdown core patch, I noticed that the patch doesn't perform an EvalPlanQual recheck properly. The example that crashes the server will be shown below (it uses the postgres_fdw patch [1]). I think the reason for that is because the ForeignScan node performing the foreign join remotely has scanrelid = 0 while ExecScanFetch assumes that its scan node has scanrelid > 0. I think this is a bug. I've not figured out how to fix this yet, but I thought we would also need another plan that evaluates the join locally for the test tuples for EvalPlanQual. Though I'm missing something though. Create an environment: postgres=# create table tab (a int, b int); CREATE TABLE postgres=# create foreign table foo (a int) server myserver options (table_name 'foo'); CREATE FOREIGN TABLE postgres=# create foreign table bar (a int) server myserver options (table_name 'bar'); CREATE FOREIGN TABLE postgres=# insert into tab values (1, 1); INSERT 0 1 postgres=# insert into foo values (1); INSERT 0 1 postgres=# insert into bar values (1); INSERT 0 1 postgres=# analyze tab; ANALYZE postgres=# analyze foo; ANALYZE postgres=# analyze bar; ANALYZE Run the example: [Terminal 1] postgres=# begin; BEGIN postgres=# update tab set b = b + 1 where a = 1; UPDATE 1 [Terminal 2] postgres=# explain verbose select tab.* from tab, foo, bar where tab.a = foo.a and foo.a = bar.a for update; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------LockRows (cost=100.00..101.18rows=4 width=70) Output: tab.a, tab.b, tab.ctid, foo.*, bar.* -> Nested Loop (cost=100.00..101.14rows=4 width=70) Output: tab.a, tab.b, tab.ctid, foo.*, bar.* Join Filter: (foo.a = tab.a) -> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14) Output: tab.a, tab.b, tab.ctid -> Foreign Scan (cost=100.00..100.08 rows=4 width=64) Output: foo.*, foo.a, bar.*, bar.a Relations: (public.foo) INNER JOIN (public.bar) Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2)) (11 rows) postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a = bar.a for update; [Terminal 1] postgres=# commit; COMMIT [Terminal 2] (After the commit in Terminal 1, Terminal 2 will show the following.) server closed the connection unexpectedly This probably means the server terminated abnormally before or whileprocessing the request. The connection to the server was lost. Attempting reset: Failed. !> Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
Does it make sense to put the result tuple of remote join on evety estate->es_epqTupleSet[] slot represented by this ForeignScan if scanrelid==0? It allows to recheck qualifier for each LockRow that intends to lock base foreign table underlying the remote join. ForeignScan->fdw_relids tells us which rtindexes are represented by this ForeignScan, so infrastructure side may be able to handle. Thanks, 2015-06-24 11:40 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > Hi, > > While reviewing the foreign join pushdown core patch, I noticed that the > patch doesn't perform an EvalPlanQual recheck properly. The example > that crashes the server will be shown below (it uses the postgres_fdw > patch [1]). I think the reason for that is because the ForeignScan node > performing the foreign join remotely has scanrelid = 0 while > ExecScanFetch assumes that its scan node has scanrelid > 0. > > I think this is a bug. I've not figured out how to fix this yet, but I > thought we would also need another plan that evaluates the join locally > for the test tuples for EvalPlanQual. Though I'm missing something though. > > Create an environment: > > postgres=# create table tab (a int, b int); > CREATE TABLE > postgres=# create foreign table foo (a int) server myserver options > (table_name 'foo'); > CREATE FOREIGN TABLE > postgres=# create foreign table bar (a int) server myserver options > (table_name 'bar'); > CREATE FOREIGN TABLE > postgres=# insert into tab values (1, 1); > INSERT 0 1 > postgres=# insert into foo values (1); > INSERT 0 1 > postgres=# insert into bar values (1); > INSERT 0 1 > postgres=# analyze tab; > ANALYZE > postgres=# analyze foo; > ANALYZE > postgres=# analyze bar; > ANALYZE > > Run the example: > > [Terminal 1] > postgres=# begin; > BEGIN > postgres=# update tab set b = b + 1 where a = 1; > UPDATE 1 > > [Terminal 2] > postgres=# explain verbose select tab.* from tab, foo, bar where tab.a = > foo.a and foo.a = bar.a for update; > > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------------------------------------------------------------------------------------------ > LockRows (cost=100.00..101.18 rows=4 width=70) > Output: tab.a, tab.b, tab.ctid, foo.*, bar.* > -> Nested Loop (cost=100.00..101.14 rows=4 width=70) > Output: tab.a, tab.b, tab.ctid, foo.*, bar.* > Join Filter: (foo.a = tab.a) > -> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14) > Output: tab.a, tab.b, tab.ctid > -> Foreign Scan (cost=100.00..100.08 rows=4 width=64) > Output: foo.*, foo.a, bar.*, bar.a > Relations: (public.foo) INNER JOIN (public.bar) > Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT > ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, > a2) INNER > JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR > UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2)) > (11 rows) > > postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a > = bar.a for update; > > [Terminal 1] > postgres=# commit; > COMMIT > > [Terminal 2] > (After the commit in Terminal 1, Terminal 2 will show the following.) > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> > > Best regards, > Etsuro Fujita > > [1] > http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Fujita-san, > Does it make sense to put the result tuple of remote join on evety > estate->es_epqTupleSet[] slot represented by this ForeignScan if > scanrelid==0? > Sorry, I misunderstood behavior of the es_epqTupleSet[]. I'd like to suggest a solution that re-construct remote tuple according to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0. It enables to run local qualifier associated with the ForeignScan node, and it will also work for the case when tuple in es_epqTupleSet[] was local heap. For details: The es_epqTuple[] is set by EvalPlanQualSetTuple(). It put a tuple exactly reflects a particular base relation (that has positive rtindex). Even if it is a foreign-table, ExecLockRows() put a tuple dynamically constructed via whole-row-reference at EvalPlanQualFetchRowMarks(). So, regardless of copy or reference to heap, we can expect es_epqTuple[] keeps tuples of the base relations for each. On the other hands, ForeignScan that replaced local join by remote join has a valid fdw_scan_tlist list. It contains expression node to construct individual attribute of the pseudo scan target-list. So, all we need to do is, (1) if scanrelid == 0 on ExecScanFetch(), (2) it should be ForeignScan or CustomScan, with *_scan_tlist. (3) then, we reconstruct a tuple of the pseudo scan based on the *_scan_tlist, instead of simple reference to es_epqTupleSet[], (4) and, evaluate local qualifiers of the node. How about your thought? BTW, if you try newer version of postgres_fdw foreign join patch, please provide me to reproduce the problem/ Also, as an aside, postgres_fdw does not implement RefetchForeignRow() at this moment. Does it make a problem? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai > Sent: Wednesday, June 24, 2015 10:02 PM > To: Etsuro Fujita > Cc: PostgreSQL-development > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Does it make sense to put the result tuple of remote join on evety > estate->es_epqTupleSet[] slot represented by this ForeignScan if > scanrelid==0? > > It allows to recheck qualifier for each LockRow that intends to lock > base foreign table underlying the remote join. > ForeignScan->fdw_relids tells us which rtindexes are represented > by this ForeignScan, so infrastructure side may be able to handle. > > Thanks, > > > 2015-06-24 11:40 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>: > > Hi, > > > > While reviewing the foreign join pushdown core patch, I noticed that the > > patch doesn't perform an EvalPlanQual recheck properly. The example > > that crashes the server will be shown below (it uses the postgres_fdw > > patch [1]). I think the reason for that is because the ForeignScan node > > performing the foreign join remotely has scanrelid = 0 while > > ExecScanFetch assumes that its scan node has scanrelid > 0. > > > > I think this is a bug. I've not figured out how to fix this yet, but I > > thought we would also need another plan that evaluates the join locally > > for the test tuples for EvalPlanQual. Though I'm missing something though. > > > > Create an environment: > > > > postgres=# create table tab (a int, b int); > > CREATE TABLE > > postgres=# create foreign table foo (a int) server myserver options > > (table_name 'foo'); > > CREATE FOREIGN TABLE > > postgres=# create foreign table bar (a int) server myserver options > > (table_name 'bar'); > > CREATE FOREIGN TABLE > > postgres=# insert into tab values (1, 1); > > INSERT 0 1 > > postgres=# insert into foo values (1); > > INSERT 0 1 > > postgres=# insert into bar values (1); > > INSERT 0 1 > > postgres=# analyze tab; > > ANALYZE > > postgres=# analyze foo; > > ANALYZE > > postgres=# analyze bar; > > ANALYZE > > > > Run the example: > > > > [Terminal 1] > > postgres=# begin; > > BEGIN > > postgres=# update tab set b = b + 1 where a = 1; > > UPDATE 1 > > > > [Terminal 2] > > postgres=# explain verbose select tab.* from tab, foo, bar where tab.a = > > foo.a and foo.a = bar.a for update; > > > > QUERY PLAN > > > > > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > > > ---------------------------------------------------------------------------- > -------------------------------- > > LockRows (cost=100.00..101.18 rows=4 width=70) > > Output: tab.a, tab.b, tab.ctid, foo.*, bar.* > > -> Nested Loop (cost=100.00..101.14 rows=4 width=70) > > Output: tab.a, tab.b, tab.ctid, foo.*, bar.* > > Join Filter: (foo.a = tab.a) > > -> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14) > > Output: tab.a, tab.b, tab.ctid > > -> Foreign Scan (cost=100.00..100.08 rows=4 width=64) > > Output: foo.*, foo.a, bar.*, bar.a > > Relations: (public.foo) INNER JOIN (public.bar) > > Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT > > ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, > > a2) INNER > > JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR > > UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2)) > > (11 rows) > > > > postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a > > = bar.a for update; > > > > [Terminal 1] > > postgres=# commit; > > COMMIT > > > > [Terminal 2] > > (After the commit in Terminal 1, Terminal 2 will show the following.) > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > The connection to the server was lost. Attempting reset: Failed. > > !> > > > > Best regards, > > Etsuro Fujita > > > > [1] > > > http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj > 8wTze+CYJUHg@mail.gmail.com > > > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Hi KaiGai-san, I'd like to work on this issue with you! On 2015/06/25 10:48, Kouhei Kaigai wrote: > I'd like to suggest a solution that re-construct remote tuple according > to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0. > It enables to run local qualifier associated with the ForeignScan node, > and it will also work for the case when tuple in es_epqTupleSet[] was > local heap. Maybe I'm missing something, but I don't think your proposal works properly because we don't have any component ForeignScan state node or subsidiary join state node once we've replaced the entire join with the ForeignScan performing the join remotely, IIUC. So, my image was to have another subplan for EvalPlanQual as well as the ForeignScan, to do the entire join locally for the component test tuples if we are inside an EvalPlanQual recheck. > BTW, if you try newer version of postgres_fdw foreign join patch, > please provide me to reproduce the problem/ OK > Also, as an aside, postgres_fdw does not implement RefetchForeignRow() > at this moment. Does it make a problem? I don't think so, though I think it would be better to test that the foreign join pushdown API patch also allows late row locking using the postgres_fdw. Best regards, Etsuro Fujita
Fujita-san, > > BTW, if you try newer version of postgres_fdw foreign join patch, > > please provide me to reproduce the problem/ > > OK > Did you forget to attach the patch, or v17 is in use? > > I'd like to suggest a solution that re-construct remote tuple according > > to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0. > > It enables to run local qualifier associated with the ForeignScan node, > > and it will also work for the case when tuple in es_epqTupleSet[] was > > local heap. > > Maybe I'm missing something, but I don't think your proposal works > properly because we don't have any component ForeignScan state node or > subsidiary join state node once we've replaced the entire join with the > ForeignScan performing the join remotely, IIUC. So, my image was to > have another subplan for EvalPlanQual as well as the ForeignScan, to do > the entire join locally for the component test tuples if we are inside > an EvalPlanQual recheck. > Hmm... Probably, we have two standpoints to tackle the problem. The first standpoint tries to handle the base foreign table as a prime relation for locking. Thus, we have to provide a way to fetch a remote tuple identified with the supplied ctid. The advantage of this approach is the way to fetch tuples from base relation is quite similar to the existing form, however, its disadvantage is another side of the same coin, because the ForeignScan node with scanrelid==0 (that represents remote join query) may have local qualifiers which shall run on the tuple according to fdw_scan_tlist. One other standpoint tries to handle a bunch of base foreign tables as a unit. That means, if any of base foreign table is the target of locking, it prompts FDW driver to fetch the latest "joined" tuple identified by "ctid", even if this join contains multiple base relations to be locked. The advantage of this approach is that we can use qualifiers of the ForeignScan node with scanrelid==0 and no need to pay attention of remote qualifier and/or join condition individually. Its disadvantage is, we may extend EState structure to keep the "joined" tuples, in addition to es_epqTupleSet[]. I'm inclined to think the later standpoint works well, because it does not need to reproduce an alternative execution path in local side again, even if a ForeignScan node represents much complicated remote query. If we would fetch tuples of individual base relations, we need to reconstruct identical join path to be executed on remote- side, don't it? IIUC, the purpose of EvalPlanQual() is to ensure the tuples to be locked is still visible, so it is not an essential condition to fetch base tuples individually. Just an aside, please tell me if someone know, does EvalPlanQual logic work correctly even if the tuple to be locked located in the right tree of HashJoin? In this case, it seems to me ExecHashJoin does not refresh Hash table again even if ExecProcNode() is invoked with es_epqTupleSet[], thus, old tuple is already visible and checked, isn't it? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Thursday, June 25, 2015 3:12 PM > To: Kaigai Kouhei(海外 浩平) > Cc: PostgreSQL-development > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Hi KaiGai-san, > > I'd like to work on this issue with you! > > On 2015/06/25 10:48, Kouhei Kaigai wrote: > > I'd like to suggest a solution that re-construct remote tuple according > > to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0. > > It enables to run local qualifier associated with the ForeignScan node, > > and it will also work for the case when tuple in es_epqTupleSet[] was > > local heap. > > Maybe I'm missing something, but I don't think your proposal works > properly because we don't have any component ForeignScan state node or > subsidiary join state node once we've replaced the entire join with the > ForeignScan performing the join remotely, IIUC. So, my image was to > have another subplan for EvalPlanQual as well as the ForeignScan, to do > the entire join locally for the component test tuples if we are inside > an EvalPlanQual recheck. > > > BTW, if you try newer version of postgres_fdw foreign join patch, > > please provide me to reproduce the problem/ > > OK > > > Also, as an aside, postgres_fdw does not implement RefetchForeignRow() > > at this moment. Does it make a problem? > > I don't think so, though I think it would be better to test that the > foreign join pushdown API patch also allows late row locking using the > postgres_fdw. > > Best regards, > Etsuro Fujita > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Hi KaiGai-san, On 2015/06/27 21:09, Kouhei Kaigai wrote: >>> BTW, if you try newer version of postgres_fdw foreign join patch, >>> please provide me to reproduce the problem/ >> OK > Did you forget to attach the patch, or v17 is in use? Sorry, I made a mistake. The problem was produced using v16 [1]. >>> I'd like to suggest a solution that re-construct remote tuple according >>> to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0. >>> It enables to run local qualifier associated with the ForeignScan node, >>> and it will also work for the case when tuple in es_epqTupleSet[] was >>> local heap. >> Maybe I'm missing something, but I don't think your proposal works >> properly because we don't have any component ForeignScan state node or >> subsidiary join state node once we've replaced the entire join with the >> ForeignScan performing the join remotely, IIUC. So, my image was to >> have another subplan for EvalPlanQual as well as the ForeignScan, to do >> the entire join locally for the component test tuples if we are inside >> an EvalPlanQual recheck. > Hmm... Probably, we have two standpoints to tackle the problem. > > The first standpoint tries to handle the base foreign table as > a prime relation for locking. Thus, we have to provide a way to > fetch a remote tuple identified with the supplied ctid. > The advantage of this approach is the way to fetch tuples from > base relation is quite similar to the existing form, however, > its disadvantage is another side of the same coin, because the > ForeignScan node with scanrelid==0 (that represents remote join > query) may have local qualifiers which shall run on the tuple > according to fdw_scan_tlist. IIUC, I think this approach would also need to evaluate join conditions and remote qualifiers in addition to local qualifiers in the local, for component tuples that were re-fetched from the remote (and remaining component tuples that were copied from whole-row vars, if any), in cases where the re-fetched tuples were updated versions of those tuples rather than the same versions priviously obtained. > One other standpoint tries to handle a bunch of base foreign > tables as a unit. That means, if any of base foreign table is > the target of locking, it prompts FDW driver to fetch the latest > "joined" tuple identified by "ctid", even if this join contains > multiple base relations to be locked. > The advantage of this approach is that we can use qualifiers of > the ForeignScan node with scanrelid==0 and no need to pay attention > of remote qualifier and/or join condition individually. > Its disadvantage is, we may extend EState structure to keep the > "joined" tuples, in addition to es_epqTupleSet[]. That is an idea. However, ISTM there is another disadvantage; that is not efficient because that would need to perform another remote join query having such additional conditions during an EvalPlanQual check, as you proposed. > I'm inclined to think the later standpoint works well, because > it does not need to reproduce an alternative execution path in > local side again, even if a ForeignScan node represents much > complicated remote query. > If we would fetch tuples of individual base relations, we need > to reconstruct identical join path to be executed on remote- > side, don't it? Yeah, that was my image for fixing this issue. > IIUC, the purpose of EvalPlanQual() is to ensure the tuples to > be locked is still visible, so it is not an essential condition > to fetch base tuples individually. I think so too, but taking the similarity and/or efficiency of processing into consideration, I would vote for the idea of having an alternative execution path in the local. That would also allow FDW authors to write the foreign join pushdown functionality in their FDWs by smaller efforts. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
Hi Fujita-san, Sorry for my late. > On 2015/06/27 21:09, Kouhei Kaigai wrote: > >>> BTW, if you try newer version of postgres_fdw foreign join patch, > >>> please provide me to reproduce the problem/ > > >> OK > > > Did you forget to attach the patch, or v17 is in use? > > Sorry, I made a mistake. The problem was produced using v16 [1]. > > >>> I'd like to suggest a solution that re-construct remote tuple according > >>> to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0. > >>> It enables to run local qualifier associated with the ForeignScan node, > >>> and it will also work for the case when tuple in es_epqTupleSet[] was > >>> local heap. > > >> Maybe I'm missing something, but I don't think your proposal works > >> properly because we don't have any component ForeignScan state node or > >> subsidiary join state node once we've replaced the entire join with the > >> ForeignScan performing the join remotely, IIUC. So, my image was to > >> have another subplan for EvalPlanQual as well as the ForeignScan, to do > >> the entire join locally for the component test tuples if we are inside > >> an EvalPlanQual recheck. > > > Hmm... Probably, we have two standpoints to tackle the problem. > > > > The first standpoint tries to handle the base foreign table as > > a prime relation for locking. Thus, we have to provide a way to > > fetch a remote tuple identified with the supplied ctid. > > The advantage of this approach is the way to fetch tuples from > > base relation is quite similar to the existing form, however, > > its disadvantage is another side of the same coin, because the > > ForeignScan node with scanrelid==0 (that represents remote join > > query) may have local qualifiers which shall run on the tuple > > according to fdw_scan_tlist. > > IIUC, I think this approach would also need to evaluate join conditions > and remote qualifiers in addition to local qualifiers in the local, for > component tuples that were re-fetched from the remote (and remaining > component tuples that were copied from whole-row vars, if any), in cases > where the re-fetched tuples were updated versions of those tuples rather > than the same versions priviously obtained. > > > One other standpoint tries to handle a bunch of base foreign > > tables as a unit. That means, if any of base foreign table is > > the target of locking, it prompts FDW driver to fetch the latest > > "joined" tuple identified by "ctid", even if this join contains > > multiple base relations to be locked. > > The advantage of this approach is that we can use qualifiers of > > the ForeignScan node with scanrelid==0 and no need to pay attention > > of remote qualifier and/or join condition individually. > > Its disadvantage is, we may extend EState structure to keep the > > "joined" tuples, in addition to es_epqTupleSet[]. > > That is an idea. However, ISTM there is another disadvantage; that is > not efficient because that would need to perform another remote join > query having such additional conditions during an EvalPlanQual check, as > you proposed. > > > I'm inclined to think the later standpoint works well, because > > it does not need to reproduce an alternative execution path in > > local side again, even if a ForeignScan node represents much > > complicated remote query. > > If we would fetch tuples of individual base relations, we need > > to reconstruct identical join path to be executed on remote- > > side, don't it? > > Yeah, that was my image for fixing this issue. > > > IIUC, the purpose of EvalPlanQual() is to ensure the tuples to > > be locked is still visible, so it is not an essential condition > > to fetch base tuples individually. > > I think so too, but taking the similarity and/or efficiency of > processing into consideration, I would vote for the idea of having an > alternative execution path in the local. That would also allow FDW > authors to write the foreign join pushdown functionality in their FDWs > by smaller efforts. > Even though I'd like to see committer's opinion, I could not come up with the idea better than what you proposed; foreign-/custom-scan has alternative plan if scanrelid==0. Let me introduce a few cases we should pay attention. Foreign/CustomScan node may stack; that means a Foreign/CustomScan node may have child node that includes another Foreign/CustomScan node with scanrelid==0. (At this moment, ForeignScan cannot have child node, however, more aggressive push-down [1] will need same feature to fetch tuples from local relation and construct VALUES() clause.) In this case, the highest Foreign/CustomScan node (that is also nearest to LockRows or ModifyTuples) run the alternative sub-plan that includes scan/join plans dominated by fdw_relids or custom_relids. For example: LockRows -> HashJoin -> CustomScan (AliceJoin) -> SeqScan on t1 -> CustomScan (CarolJoin) -> SeqScan on t2 -> SeqScan on t3 -> Hash -> CustomScan (BobJoin) -> SeqScan on t4 -> ForeignScan(remote join involves ft5, ft6) In this case, AliceJoin will have alternative sub-plan to join t1, t2 and t3, then it shall be used on EvalPlanQual(). Also, BobJoin will have alternative sub-plan to join t4, ft5 and ft6. CarolJoin and the ForeignScan will also have alternative sub-plan, however, these are not used in this case. Probably, it works fine. Do we have potential scenario if foreign-/custom-join is located over LockRows node. (Subquery expansion may give such a case?) Anyway, doesn't it make a problem, does it? On the next step, how do we implement this design? I guess that planner needs to keep a path that contains neither foreign-join nor custom-join with scanrelid==0. Probably, "cheapest_builtin_path" of RelOptInfo is needed that never contains these remote/custom join logic, as a seed of alternative sub-plan. create_foreignscan_plan() or create_customscan_plan() will be able to construct these alternative plan, regardless of the extensions. So, individual FDW/CSP don't need to care about this alternative sub-plan, do it? After that, once ExecScanFetch() is called under EvalPlanQual(), these Foreign/CustomScan with scanrelid==0 runs the alternative sub-plan, to validate the latest tuple. Hmm... It looks to me a workable approach. Fujita-san, are you available to make a patch with this approach? If so, I'd like to volunteer its reviewing. [1] http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hi KaiGai-san, On 2015/07/02 18:31, Kouhei Kaigai wrote: > Even though I'd like to see committer's opinion, I could not come up > with the idea better than what you proposed; foreign-/custom-scan > has alternative plan if scanrelid==0. I'd also like to hear the opinion! > Let me introduce a few cases we should pay attention. > > Foreign/CustomScan node may stack; that means a Foreign/CustomScan node > may have child node that includes another Foreign/CustomScan node with > scanrelid==0. > (At this moment, ForeignScan cannot have child node, however, more > aggressive push-down [1] will need same feature to fetch tuples from > local relation and construct VALUES() clause.) > In this case, the highest Foreign/CustomScan node (that is also nearest > to LockRows or ModifyTuples) run the alternative sub-plan that includes > scan/join plans dominated by fdw_relids or custom_relids. > > For example: > LockRows > -> HashJoin > -> CustomScan (AliceJoin) > -> SeqScan on t1 > -> CustomScan (CarolJoin) > -> SeqScan on t2 > -> SeqScan on t3 > -> Hash > -> CustomScan (BobJoin) > -> SeqScan on t4 > -> ForeignScan (remote join involves ft5, ft6) > > In this case, AliceJoin will have alternative sub-plan to join t1, t2 > and t3, then it shall be used on EvalPlanQual(). Also, BobJoin will > have alternative sub-plan to join t4, ft5 and ft6. CarolJoin and the > ForeignScan will also have alternative sub-plan, however, these are > not used in this case. > Probably, it works fine. Yeah, I think so too. > Do we have potential scenario if foreign-/custom-join is located over > LockRows node. (Subquery expansion may give such a case?) > Anyway, doesn't it make a problem, does it? IIUC, I don't think we have such a case. > On the next step, how do we implement this design? > I guess that planner needs to keep a path that contains neither > foreign-join nor custom-join with scanrelid==0. > Probably, "cheapest_builtin_path" of RelOptInfo is needed that > never contains these remote/custom join logic, as a seed of > alternative sub-plan. Yeah, I think so too, but I've not fugiured out how to implement this yet. To be honest, ISTM that it's difficult to do that simply and efficiently for the foreign/custom-join-pushdown API that we have for 9.5. It's a little late, but what I started thinking is to redesign that API so that that API is called at standard_join_search, as discussed in [2]; (1) to place that API call *after* the set_cheapest call and (2) to place another set_cheapest call after that API call for each joinrel. By the first set_cheapest call, I think we could probably save an alternative path that we need in "cheapest_builtin_path". By the second set_cheapest call following that API call, we could consider foreign/custom-join-pushdown paths also. What do you think about this idea? > create_foreignscan_plan() or create_customscan_plan() will be > able to construct these alternative plan, regardless of the > extensions. So, individual FDW/CSP don't need to care about > this alternative sub-plan, do it? > > After that, once ExecScanFetch() is called under EvalPlanQual(), > these Foreign/CustomScan with scanrelid==0 runs the alternative > sub-plan, to validate the latest tuple. > > Hmm... It looks to me a workable approach. Year, I think so too. > Fujita-san, are you available to make a patch with this approach? > If so, I'd like to volunteer its reviewing. Yeah, I'm willing to make a patch if we obtain the consensus! And I'd be happy if you help me doing the work! Best regards, Etsuro Fujita [2] http://www.postgresql.org/message-id/5451.1426271510@sss.pgh.pa.us
> > Let me introduce a few cases we should pay attention. > > > > Foreign/CustomScan node may stack; that means a Foreign/CustomScan node > > may have child node that includes another Foreign/CustomScan node with > > scanrelid==0. > > (At this moment, ForeignScan cannot have child node, however, more > > aggressive push-down [1] will need same feature to fetch tuples from > > local relation and construct VALUES() clause.) > > In this case, the highest Foreign/CustomScan node (that is also nearest > > to LockRows or ModifyTuples) run the alternative sub-plan that includes > > scan/join plans dominated by fdw_relids or custom_relids. > > > > For example: > > LockRows > > -> HashJoin > > -> CustomScan (AliceJoin) > > -> SeqScan on t1 > > -> CustomScan (CarolJoin) > > -> SeqScan on t2 > > -> SeqScan on t3 > > -> Hash > > -> CustomScan (BobJoin) > > -> SeqScan on t4 > > -> ForeignScan (remote join involves ft5, ft6) > > > > In this case, AliceJoin will have alternative sub-plan to join t1, t2 > > and t3, then it shall be used on EvalPlanQual(). Also, BobJoin will > > have alternative sub-plan to join t4, ft5 and ft6. CarolJoin and the > > ForeignScan will also have alternative sub-plan, however, these are > > not used in this case. > > Probably, it works fine. > > Yeah, I think so too. > Sorry, I need to adjust my explanation above a bit: In this case, AliceJoin will have alternative sub-plan to join t1 and CarolJoin, then CarolJoin will have alternative sub-plan to join t2 and t3. Also, BobJoin will have alternative sub-plan to join t4 and the ForeignScan with remote join, and this ForeignScan node will have alternative sub-plan to join ft5 and ft6. Why this recursive design is better? Because it makes planner enhancement much simple than overall approach. Please see my explanation in the section below. > > On the next step, how do we implement this design? > > I guess that planner needs to keep a path that contains neither > > foreign-join nor custom-join with scanrelid==0. > > Probably, "cheapest_builtin_path" of RelOptInfo is needed that > > never contains these remote/custom join logic, as a seed of > > alternative sub-plan. > > Yeah, I think so too, but I've not fugiured out how to implement this yet. > > To be honest, ISTM that it's difficult to do that simply and efficiently > for the foreign/custom-join-pushdown API that we have for 9.5. It's a > little late, but what I started thinking is to redesign that API so that > that API is called at standard_join_search, as discussed in [2]; (1) to > place that API call *after* the set_cheapest call and (2) to place > another set_cheapest call after that API call for each joinrel. By the > first set_cheapest call, I think we could probably save an alternative > path that we need in "cheapest_builtin_path". By the second > set_cheapest call following that API call, we could consider > foreign/custom-join-pushdown paths also. What do you think about this idea? > Disadvantage is larger than advantage, sorry. The reason why we put foreign/custom-join hook on add_paths_to_joinrel() is that the source relations (inner/outer) were not obvious, thus, we cannot reproduce which relations are the source of this join. So, I had to throw a spoon when I tried this approach before. My idea is that we save the cheapest_total_path of RelOptInfo onto the new cheapest_builtin_path just before the GetForeignJoinPaths() hook. Why? It should be a built-in join logic, never be a foreign/custom-join, because of the hook location; only built-in logic shall be added here. Even if either/both of join sub-trees contains foreign/custom-join, these path have own alternative sub-plan at their level, no need to care about at current level. (It is the reason why I adjust my explanation above.) Once this built-in path is kept and foreign/custom-join get chosen by set_cheapest(), it is easy to attach this sub-plan to ForeignScan or CustomScan node. I don't find any significant down-side in this approach. How about your opinion? Regarding to the development timeline, I prefer to put something workaround not to kick Assert() on ExecScanFetch(). We may add a warning in the documentation not to replace built-in join if either/both of sub-trees are target of UPDATE/DELETE or FOR SHARE/UPDATE. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/07/02 23:13, Kouhei Kaigai wrote: >> To be honest, ISTM that it's difficult to do that simply and efficiently >> for the foreign/custom-join-pushdown API that we have for 9.5. It's a >> little late, but what I started thinking is to redesign that API so that >> that API is called at standard_join_search, as discussed in [2]; (1) to >> place that API call *after* the set_cheapest call and (2) to place >> another set_cheapest call after that API call for each joinrel. By the >> first set_cheapest call, I think we could probably save an alternative >> path that we need in "cheapest_builtin_path". By the second >> set_cheapest call following that API call, we could consider >> foreign/custom-join-pushdown paths also. What do you think about this idea? > Disadvantage is larger than advantage, sorry. > The reason why we put foreign/custom-join hook on add_paths_to_joinrel() > is that the source relations (inner/outer) were not obvious, thus, > we cannot reproduce which relations are the source of this join. > So, I had to throw a spoon when I tried this approach before. Maybe I'm missing something, but my image about this approach is that if base relations for a given joinrel are all foreign tables and belong to the same foreign server, then by calling that API there, we consider the remote join over all the foreign tables, and that if not, we give up to consider the remote join. > My idea is that we save the cheapest_total_path of RelOptInfo onto the > new cheapest_builtin_path just before the GetForeignJoinPaths() hook. > Why? It should be a built-in join logic, never be a foreign/custom-join, > because of the hook location; only built-in logic shall be added here. My concern about your idea is that since that (a) add_paths_to_joinrel is called multiple times per joinrel and that (b) repetitive add_path calls through GetForeignJoinPaths in add_paths_to_joinrel might remove old paths that are builtin, it's possible to save a path that is not builtin onto the cheapest_total_path and thus to save that path wrongly onto the cheapest_builtin_path. There might be a good way to cope with that, though. > Regarding to the development timeline, I prefer to put something > workaround not to kick Assert() on ExecScanFetch(). > We may add a warning in the documentation not to replace built-in > join if either/both of sub-trees are target of UPDATE/DELETE or > FOR SHARE/UPDATE. I'm not sure that that is a good idea, but anyway, I think we need to hurry fixing this issue. Best regards, Etsuro Fujita
> On 2015/07/02 23:13, Kouhei Kaigai wrote: > >> To be honest, ISTM that it's difficult to do that simply and efficiently > >> for the foreign/custom-join-pushdown API that we have for 9.5. It's a > >> little late, but what I started thinking is to redesign that API so that > >> that API is called at standard_join_search, as discussed in [2]; (1) to > >> place that API call *after* the set_cheapest call and (2) to place > >> another set_cheapest call after that API call for each joinrel. By the > >> first set_cheapest call, I think we could probably save an alternative > >> path that we need in "cheapest_builtin_path". By the second > >> set_cheapest call following that API call, we could consider > >> foreign/custom-join-pushdown paths also. What do you think about this idea? > > > Disadvantage is larger than advantage, sorry. > > The reason why we put foreign/custom-join hook on add_paths_to_joinrel() > > is that the source relations (inner/outer) were not obvious, thus, > > we cannot reproduce which relations are the source of this join. > > So, I had to throw a spoon when I tried this approach before. > > Maybe I'm missing something, but my image about this approach is that if > base relations for a given joinrel are all foreign tables and belong to > the same foreign server, then by calling that API there, we consider the > remote join over all the foreign tables, and that if not, we give up to > consider the remote join. > Your understanding is correct, but missing a point. Once foreign tables to be joined are informed as a bitmap (joinrel->relids), it is not obvious for extensions which relations are joined with INNER JOIN, and which ones are joined with OUTER JOIN. I tried to implement a common utility function under the v9.5 cycle, however, it was suspicious whether we can make a reliable logic. Also, I don't want to stick on the assumption that relations involved in remote join are all managed by same foreign-server no longer. The following two ideas introduce possible enhancement of remote join feature that involved local relations; replicated table or transformed to VALUES() clause. http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=Xw@mail.gmail.com http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp Once we have to pay attention to the case of local/foreign relations mixed, we have to care about the path of underlying local or foreign relations managed by other foreign server. I think add_paths_to_joinrel() is the best location for foreign-join, not only custom-join. Relocation to standard_join_search() has larger disadvantage than its advantage. > > My idea is that we save the cheapest_total_path of RelOptInfo onto the > > new cheapest_builtin_path just before the GetForeignJoinPaths() hook. > > Why? It should be a built-in join logic, never be a foreign/custom-join, > > because of the hook location; only built-in logic shall be added here. > > My concern about your idea is that since that (a) add_paths_to_joinrel > is called multiple times per joinrel and that (b) repetitive add_path > calls through GetForeignJoinPaths in add_paths_to_joinrel might remove > old paths that are builtin, it's possible to save a path that is not > builtin onto the cheapest_total_path and thus to save that path wrongly > onto the cheapest_builtin_path. There might be a good way to cope with > that, though. > For the concern (a), FDW driver can reference RelOptInfo->fdw_private that shall be initialized to NULL, then FDW driver will set valid data if it preliminary adds something. IIRC, postgres_fdw also skips to add same path multiple times. For the concern (b), yep, we may enhance add_path() to retain built-in path, instead of the add_paths_to_joinrel(). Let's adjust the logic a bit. The add_path() can know whether the given path is usual or exceptional (ForeignPath/CustomPath towards none base relation) one. If path is exceptional, the cheapest_builtin_path shall be retained unconditionally. Elsewhere, the cheapest one replace here, then the cheapest built-in path will survive. Is it still problematic? > > Regarding to the development timeline, I prefer to put something > > workaround not to kick Assert() on ExecScanFetch(). > > We may add a warning in the documentation not to replace built-in > > join if either/both of sub-trees are target of UPDATE/DELETE or > > FOR SHARE/UPDATE. > > I'm not sure that that is a good idea, but anyway, I think we need to > hurry fixing this issue. > My approach is not fix, but avoid. :-) It may be an idea to implement the above fixup even though it may be too large/late to apply v9.5 features, but we can understand how many changes are needed to fixup this problem. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/07/03 15:32, Kouhei Kaigai wrote: >> On 2015/07/02 23:13, Kouhei Kaigai wrote: >>>> To be honest, ISTM that it's difficult to do that simply and efficiently >>>> for the foreign/custom-join-pushdown API that we have for 9.5. It's a >>>> little late, but what I started thinking is to redesign that API so that >>>> that API is called at standard_join_search, as discussed in [2]; >>> Disadvantage is larger than advantage, sorry. >>> The reason why we put foreign/custom-join hook on add_paths_to_joinrel() >>> is that the source relations (inner/outer) were not obvious, thus, >>> we cannot reproduce which relations are the source of this join. >>> So, I had to throw a spoon when I tried this approach before. >> Maybe I'm missing something, but my image about this approach is that if >> base relations for a given joinrel are all foreign tables and belong to >> the same foreign server, then by calling that API there, we consider the >> remote join over all the foreign tables, and that if not, we give up to >> consider the remote join. > Your understanding is correct, but missing a point. Once foreign tables > to be joined are informed as a bitmap (joinrel->relids), it is not obvious > for extensions which relations are joined with INNER JOIN, and which ones > are joined with OUTER JOIN. Can't FDWs get the join information through the root, which I think we would pass to the API as the argument? > Also, I don't want to stick on the assumption that relations involved in > remote join are all managed by same foreign-server no longer. > The following two ideas introduce possible enhancement of remote join > feature that involved local relations; replicated table or transformed > to VALUES() clause. > > http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=Xw@mail.gmail.com > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp Interesting! > I think add_paths_to_joinrel() is the best location for foreign-join, > not only custom-join. Relocation to standard_join_search() has larger > disadvantage than its advantage. I agree with you that it's important to ensure the expandability, and my question is, is it possible that the API call from standard_join_search also realize those idea if FDWs can get the join information through the root or something like that? Best regards, Etsuro Fujita
> > Also, I don't want to stick on the assumption that relations involved in > > remote join are all managed by same foreign-server no longer. > > The following two ideas introduce possible enhancement of remote join > > feature that involved local relations; replicated table or transformed > > to VALUES() clause. > > > > > http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SC > vL-HvOpBR=Xw@mail.gmail.com > > > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F20A > D@BPXM15GP.gisp.nec.co.jp > > Interesting! > > > I think add_paths_to_joinrel() is the best location for foreign-join, > > not only custom-join. Relocation to standard_join_search() has larger > > disadvantage than its advantage. > > I agree with you that it's important to ensure the expandability, and my > question is, is it possible that the API call from standard_join_search > also realize those idea if FDWs can get the join information through the > root or something like that? > I don't deny its possibility, even though I once gave up to implement to reproduce join information - which relations and other ones are joined in this level - using PlannerInfo and RelOptInfo. However, we need to pay attention on advantages towards the alternatives. Hooks on add_paths_to_joinrel() enables to implement identical stuff, with less complicated logic to reproduce left / right relations from RelOptInfo of the joinrel. (Note that RelOptInfo->fdw_private enables to avoid path- construction multiple times.) I'm uncertain why this API change is necessary to fix up the problem around EvalPlanQual. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/07/06 9:42, Kouhei Kaigai wrote: >>> Also, I don't want to stick on the assumption that relations involved in >>> remote join are all managed by same foreign-server no longer. >>> The following two ideas introduce possible enhancement of remote join >>> feature that involved local relations; replicated table or transformed >>> to VALUES() clause. >>> I think add_paths_to_joinrel() is the best location for foreign-join, >>> not only custom-join. Relocation to standard_join_search() has larger >>> disadvantage than its advantage. >> I agree with you that it's important to ensure the expandability, and my >> question is, is it possible that the API call from standard_join_search >> also realize those idea if FDWs can get the join information through the >> root or something like that? > I don't deny its possibility, even though I once gave up to implement to > reproduce join information - which relations and other ones are joined in > this level - using PlannerInfo and RelOptInfo. OK > However, we need to pay attention on advantages towards the alternatives. > Hooks on add_paths_to_joinrel() enables to implement identical stuff, with > less complicated logic to reproduce left / right relations from RelOptInfo > of the joinrel. (Note that RelOptInfo->fdw_private enables to avoid path- > construction multiple times.) > I'm uncertain why this API change is necessary to fix up the problem > around EvalPlanQual. Yeah, maybe we wouldn't need any API change. I think we would be able to fix this by complicating add_path as you pointed out upthread. I'm not sure that complicating it is a good idea, though. I think that it might be possible that the callback in standard_join_search would allow us to fix this without complicating the core path-cost-comparison stuff such as add_path. I noticed that what I proposed upthread doesn't work properly, though. Actually, I have another concern about the callback location that you proposed; that might meaninglessly increase planning time in the postgres_fdw case when using remote estimates, which the proposed postgres_fdw patch doesn't support currently IIUC, but I think it should support that. Let me explain about that. If you have A JOIN B JOIN C all on the same foreign server, for example, we'll have only to perform a remote EXPLAIN for A-B-C for the estimates (when adopting a strategy that we push down a join as large as possible into the remote server). However, ISTM that the callback in add_paths_to_joinrel would perform remote EXPLAINs not only for A-B-C but for A-B, A-C and B-C according to the dynamic programming algorithm. (Duplicated remote EXPLAINs for A-B-C can be eliminated using a way you proposed.) Thus the remote EXPLAINs for A-B, A-C and B-C seem to me meaningless while incurring performance degradation in query planning. Maybe I'm missing something, though. Best regards, Etsuro Fujita
On 2015/07/07 19:15, Etsuro Fujita wrote: > On 2015/07/06 9:42, Kouhei Kaigai wrote: >> However, we need to pay attention on advantages towards the alternatives. >> Hooks on add_paths_to_joinrel() enables to implement identical stuff, >> with >> less complicated logic to reproduce left / right relations from >> RelOptInfo >> of the joinrel. (Note that RelOptInfo->fdw_private enables to avoid path- >> construction multiple times.) >> I'm uncertain why this API change is necessary to fix up the problem >> around EvalPlanQual. > > Yeah, maybe we wouldn't need any API change. I think we would be able > to fix this by complicating add_path as you pointed out upthread. I'm > not sure that complicating it is a good idea, though. I think that it > might be possible that the callback in standard_join_search would allow > us to fix this without complicating the core path-cost-comparison stuff > such as add_path. I noticed that what I proposed upthread doesn't work > properly, though. To resolve this issue, I tried to make the core create an alternative plan that will be used in an EvalPlanQual recheck, instead of a foreign scan that performs a foreign join remotely (ie, scanrelid = 0). But I changed that idea. Instead, I'd like to propose that it's the FDW's responsibility to provide such a plan. Specifically, I'd propose that (1) we add a new Path field, say subpath, to the ForeignPath data structure and that (2) when generating a ForeignPath node for a foreign join, an FDW must provide the subpath Path node by itself. As before, it'd be recommended to use ForeignPath * create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel, double rows, Cost startup_cost, Cost total_cost, List *pathkeys, Relids required_outer, Path*subpath, List *fdw_private) where subpath is the subpath Path node that has the pathkeys and the required_outer rels. (subpath is NULL if scanning a base relation.) Also, it'd be recommended that an FDW generates such ForeignPath nodes by considering, for each of paths in the rel's pathlist, whether to push down that path (to generate a ForeignPath node for a foreign join that has the same pathkeys and parameterization as that path). So, if generating the ForeignPath node, that path could be used as the subpath Path node. (I think the current postgres_fdw patch only considers an unsorted, unparameterized path for performing a foreign join remotely, but I think we should also consider presorted and/or parameterized paths.) I think this idea would apply to the API location that you proposed. However, ISTM that this idea would work better for the API call from standard_join_search because the rel's pathlist at that point has more paths worthy of consideration, in view of not only costs and sizes but pathkeys and parameterization. I think the subplan created from the subpath Path node could be used in an EvalPlanQual recheck, instead of a foreign scan that performs a foreign join remotely, as discussed previously. Comments welcome! Best regards, Etsuro Fujita
On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Can't FDWs get the join information through the root, which I think we would > pass to the API as the argument? This is exactly what Tom suggested originally, and it has some appeal, but neither KaiGai nor I could see how to make it work . Do you have an idea? It's not too late to go back and change the API. The problem that was bothering us (or at least what was bothering me) is that the PlannerInfo provides only a list of SpecialJoinInfo structures, which don't directly give you the original join order. In fact, min_righthand and min_lefthand are intended to constraint the *possible* join orders, and are deliberately designed *not* to specify a single join order. If you're sending a query to a remote PostgreSQL node, you don't want to know what all the possible join orders are; it's the remote side's job to plan the query. You do, however, need an easy way to identify one join order that you can use to construct a query. It didn't seem easy to do that without duplicating make_join_rel(), which seemed like a bad idea. But maybe there's a good way to do it. Tom wasn't crazy about this hook both because of the frequency of calls and also because of the long argument list. I think those concerns are legitimate; I just couldn't see how to make the other way work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > Can't FDWs get the join information through the root, which I think we would > > pass to the API as the argument? > > This is exactly what Tom suggested originally, and it has some appeal, > but neither KaiGai nor I could see how to make it work . Do you have > an idea? It's not too late to go back and change the API. > > The problem that was bothering us (or at least what was bothering me) > is that the PlannerInfo provides only a list of SpecialJoinInfo > structures, which don't directly give you the original join order. In > fact, min_righthand and min_lefthand are intended to constraint the > *possible* join orders, and are deliberately designed *not* to specify > a single join order. If you're sending a query to a remote PostgreSQL > node, you don't want to know what all the possible join orders are; > it's the remote side's job to plan the query. You do, however, need > an easy way to identify one join order that you can use to construct a > query. It didn't seem easy to do that without duplicating > make_join_rel(), which seemed like a bad idea. > > But maybe there's a good way to do it. Tom wasn't crazy about this > hook both because of the frequency of calls and also because of the > long argument list. I think those concerns are legitimate; I just > couldn't see how to make the other way work. > I could have a discussion with Fujita-san about this topic. He has a little bit tricky, but I didn't have a clear reason to deny, idea to tackle this matter. At the line just above set_cheapest() of the standard_join_search(), at least one built-in join logic are already added to the RelOptInfo, thus, FDW driver can reference the cheapest path by built-in logic and its lefttree and righttree that construct a joinrel. Its assumption is, the best paths by built-in logic are at least enough reasonable join order than other potential ones. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Robert Haas <robertmhaas@gmail.com> writes: > The problem that was bothering us (or at least what was bothering me) > is that the PlannerInfo provides only a list of SpecialJoinInfo > structures, which don't directly give you the original join order. In > fact, min_righthand and min_lefthand are intended to constraint the > *possible* join orders, and are deliberately designed *not* to specify > a single join order. If you're sending a query to a remote PostgreSQL > node, you don't want to know what all the possible join orders are; > it's the remote side's job to plan the query. You do, however, need > an easy way to identify one join order that you can use to construct a > query. It didn't seem easy to do that without duplicating > make_join_rel(), which seemed like a bad idea. In principle it seems like you could traverse root->parse->jointree as a guide to reconstructing the original syntactic structure; though I'm not sure how hard it would be to ignore the parts of that tree that correspond to relations you're not shipping. > But maybe there's a good way to do it. Tom wasn't crazy about this > hook both because of the frequency of calls and also because of the > long argument list. I think those concerns are legitimate; I just > couldn't see how to make the other way work. In my vision you probably really only want one call per build_join_rel event (that is, per construction of a new RelOptInfo), not per make_join_rel event. It's possible that an FDW that wants to handle joins but is not talking to a remote query planner would need to grovel through all the join ordering possibilities individually, and then maybe hooking at make_join_rel is sensible rather than having to reinvent that logic. But I'd want to see a concrete use-case first, and I certainly don't think that that's the main case to design the API around. regards, tom lane
> I could have a discussion with Fujita-san about this topic. > Also, let me share with the discussion towards entire solution. The primitive reason of this problem is, Scan node with scanrelid==0 represents a relation join that can involve multiple relations, thus, its TupleDesc of the records will not fit base relations, however, ExecScanFetch() was not updated when scanrelid==0 gets supported. FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible to generate records according to the fdw_/custom_scan_tlist that reflects the definition of relation join, and only FDW/CSP know how to combine these base relations. In addition, host-side expressions (like Plan->qual) are initialized to reference the records generated by FDW/CSP, so the least invasive approach is to allow FDW/CSP to have own logic to recheck, I think. Below is the structure of ExecScanFetch(). ExecScanFetch(ScanState *node, ExecScanAccessMtd accessMtd, ExecScanRecheckMtd recheckMtd) { EState *estate = node->ps.state; if (estate->es_epqTuple != NULL) { /* * We are insidean EvalPlanQual recheck. Return the test tuple if * one is available, after rechecking any access-method-specific * conditions. */ Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; Assert(scanrelid > 0); if (estate->es_epqTupleSet[scanrelid - 1]) { TupleTableSlot *slot = node->ss_ScanTupleSlot; : return slot; } } return(*accessMtd) (node); } When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and checks its visibility (ForeignRecheck() always say 'yep, it is visible'), then ExecScan() applies its qualifiers by ExecQual(). So, as long as FDW/CSP can return a record that satisfies the TupleDesc of this relation, made by the tuples in es_epqTuple[] array, rest of the code paths are common. I have an idea to solve the problem. It adds recheckMtd() call if scanrelid==0 just before the assertion above, and add a callback of FDW on ForeignRecheck(). The role of this new callback is to set up the supplied TupleTableSlot and check its visibility, but does not define how to do this. It is arbitrarily by FDW driver, like invocation of alternative plan consists of only built-in logic. Invocation of alternative plan is one of the most feasible way to implement EPQ logic on FDW, so I think FDW also needs a mechanism that takes child path-nodes like custom_paths in CustomPath node. Once a valid path node is linked to this list, createplan.c transform them to relevant plan node, then FDW can initialize and invoke this plan node during execution, like ForeignRecheck(). This design can solve another problem Fujita-san has also mentioned. If scan qualifier is pushed-down to the remote query and its expression node is saved in the private area of ForeignScan, the callback on ForeignRecheck() can evaluate the qualifier by itself. (Note that only FDW driver can know where and how expression node being pushed-down is saved in the private area.) In the summary, the following three enhancements are a straightforward way to fix up the problem he reported. 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 2. Add a callback of FDW in ForeignRecheck() - to construct a record according to the fdw_scan_tlist definition and to evaluateits visibility, or to evaluate qualifier pushed-down if base relation. 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, to construct plan nodes for EPQ evaluation. On the other hands, we also need to pay attention the development timeline. It is a really problem of v9.5, however, it looks to me the straight forward solution needs enhancement of FDW APIs. I'd like to see people's comment. -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Saturday, August 01, 2015 10:35 PM > To: Robert Haas; Etsuro Fujita > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > > > Can't FDWs get the join information through the root, which I think we would > > > pass to the API as the argument? > > > > This is exactly what Tom suggested originally, and it has some appeal, > > but neither KaiGai nor I could see how to make it work . Do you have > > an idea? It's not too late to go back and change the API. > > > > The problem that was bothering us (or at least what was bothering me) > > is that the PlannerInfo provides only a list of SpecialJoinInfo > > structures, which don't directly give you the original join order. In > > fact, min_righthand and min_lefthand are intended to constraint the > > *possible* join orders, and are deliberately designed *not* to specify > > a single join order. If you're sending a query to a remote PostgreSQL > > node, you don't want to know what all the possible join orders are; > > it's the remote side's job to plan the query. You do, however, need > > an easy way to identify one join order that you can use to construct a > > query. It didn't seem easy to do that without duplicating > > make_join_rel(), which seemed like a bad idea. > > > > But maybe there's a good way to do it. Tom wasn't crazy about this > > hook both because of the frequency of calls and also because of the > > long argument list. I think those concerns are legitimate; I just > > couldn't see how to make the other way work. > > > I could have a discussion with Fujita-san about this topic. > He has a little bit tricky, but I didn't have a clear reason to deny, > idea to tackle this matter. > At the line just above set_cheapest() of the standard_join_search(), > at least one built-in join logic are already added to the RelOptInfo, > thus, FDW driver can reference the cheapest path by built-in logic > and its lefttree and righttree that construct a joinrel. > Its assumption is, the best paths by built-in logic are at least > enough reasonable join order than other potential ones. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 7, 2015 at 3:37 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> I could have a discussion with Fujita-san about this topic. >> > Also, let me share with the discussion towards entire solution. > > The primitive reason of this problem is, Scan node with scanrelid==0 > represents a relation join that can involve multiple relations, thus, > its TupleDesc of the records will not fit base relations, however, > ExecScanFetch() was not updated when scanrelid==0 gets supported. > > FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible > to generate records according to the fdw_/custom_scan_tlist that > reflects the definition of relation join, and only FDW/CSP know how > to combine these base relations. > In addition, host-side expressions (like Plan->qual) are initialized > to reference the records generated by FDW/CSP, so the least invasive > approach is to allow FDW/CSP to have own logic to recheck, I think. > > Below is the structure of ExecScanFetch(). > > ExecScanFetch(ScanState *node, > ExecScanAccessMtd accessMtd, > ExecScanRecheckMtd recheckMtd) > { > EState *estate = node->ps.state; > > if (estate->es_epqTuple != NULL) > { > /* > * We are inside an EvalPlanQual recheck. Return the test tuple if > * one is available, after rechecking any access-method-specific > * conditions. > */ > Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; > > Assert(scanrelid > 0); > if (estate->es_epqTupleSet[scanrelid - 1]) > { > TupleTableSlot *slot = node->ss_ScanTupleSlot; > : > return slot; > } > } > return (*accessMtd) (node); > } > > When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and > checks its visibility (ForeignRecheck() always say 'yep, it is visible'), > then ExecScan() applies its qualifiers by ExecQual(). > So, as long as FDW/CSP can return a record that satisfies the TupleDesc > of this relation, made by the tuples in es_epqTuple[] array, rest of the > code paths are common. > > I have an idea to solve the problem. > It adds recheckMtd() call if scanrelid==0 just before the assertion above, > and add a callback of FDW on ForeignRecheck(). > The role of this new callback is to set up the supplied TupleTableSlot > and check its visibility, but does not define how to do this. > It is arbitrarily by FDW driver, like invocation of alternative plan > consists of only built-in logic. > > Invocation of alternative plan is one of the most feasible way to > implement EPQ logic on FDW, so I think FDW also needs a mechanism > that takes child path-nodes like custom_paths in CustomPath node. > Once a valid path node is linked to this list, createplan.c transform > them to relevant plan node, then FDW can initialize and invoke this > plan node during execution, like ForeignRecheck(). > > This design can solve another problem Fujita-san has also mentioned. > If scan qualifier is pushed-down to the remote query and its expression > node is saved in the private area of ForeignScan, the callback on > ForeignRecheck() can evaluate the qualifier by itself. (Note that only > FDW driver can know where and how expression node being pushed-down > is saved in the private area.) > > In the summary, the following three enhancements are a straightforward > way to fix up the problem he reported. > 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 > 2. Add a callback of FDW in ForeignRecheck() - to construct a record > according to the fdw_scan_tlist definition and to evaluate its > visibility, or to evaluate qualifier pushed-down if base relation. > 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, > to construct plan nodes for EPQ evaluation. > > On the other hands, we also need to pay attention the development > timeline. It is a really problem of v9.5, however, it looks to me > the straight forward solution needs enhancement of FDW APIs. > > I'd like to see people's comment. I'm not an expert in this area, but this plan does not seem unreasonable to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/08/12 7:21, Robert Haas wrote: > On Fri, Aug 7, 2015 at 3:37 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> I could have a discussion with Fujita-san about this topic. >>> >> Also, let me share with the discussion towards entire solution. >> >> The primitive reason of this problem is, Scan node with scanrelid==0 >> represents a relation join that can involve multiple relations, thus, >> its TupleDesc of the records will not fit base relations, however, >> ExecScanFetch() was not updated when scanrelid==0 gets supported. >> >> FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible >> to generate records according to the fdw_/custom_scan_tlist that >> reflects the definition of relation join, and only FDW/CSP know how >> to combine these base relations. >> In addition, host-side expressions (like Plan->qual) are initialized >> to reference the records generated by FDW/CSP, so the least invasive >> approach is to allow FDW/CSP to have own logic to recheck, I think. >> >> Below is the structure of ExecScanFetch(). >> >> ExecScanFetch(ScanState *node, >> ExecScanAccessMtd accessMtd, >> ExecScanRecheckMtd recheckMtd) >> { >> EState *estate = node->ps.state; >> >> if (estate->es_epqTuple != NULL) >> { >> /* >> * We are inside an EvalPlanQual recheck. Return the test tuple if >> * one is available, after rechecking any access-method-specific >> * conditions. >> */ >> Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; >> >> Assert(scanrelid > 0); >> if (estate->es_epqTupleSet[scanrelid - 1]) >> { >> TupleTableSlot *slot = node->ss_ScanTupleSlot; >> : >> return slot; >> } >> } >> return (*accessMtd) (node); >> } >> >> When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and >> checks its visibility (ForeignRecheck() always say 'yep, it is visible'), >> then ExecScan() applies its qualifiers by ExecQual(). >> So, as long as FDW/CSP can return a record that satisfies the TupleDesc >> of this relation, made by the tuples in es_epqTuple[] array, rest of the >> code paths are common. >> >> I have an idea to solve the problem. >> It adds recheckMtd() call if scanrelid==0 just before the assertion above, >> and add a callback of FDW on ForeignRecheck(). >> The role of this new callback is to set up the supplied TupleTableSlot >> and check its visibility, but does not define how to do this. >> It is arbitrarily by FDW driver, like invocation of alternative plan >> consists of only built-in logic. >> >> Invocation of alternative plan is one of the most feasible way to >> implement EPQ logic on FDW, so I think FDW also needs a mechanism >> that takes child path-nodes like custom_paths in CustomPath node. >> Once a valid path node is linked to this list, createplan.c transform >> them to relevant plan node, then FDW can initialize and invoke this >> plan node during execution, like ForeignRecheck(). >> >> This design can solve another problem Fujita-san has also mentioned. >> If scan qualifier is pushed-down to the remote query and its expression >> node is saved in the private area of ForeignScan, the callback on >> ForeignRecheck() can evaluate the qualifier by itself. (Note that only >> FDW driver can know where and how expression node being pushed-down >> is saved in the private area.) >> >> In the summary, the following three enhancements are a straightforward >> way to fix up the problem he reported. >> 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 >> 2. Add a callback of FDW in ForeignRecheck() - to construct a record >> according to the fdw_scan_tlist definition and to evaluate its >> visibility, or to evaluate qualifier pushed-down if base relation. >> 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, >> to construct plan nodes for EPQ evaluation. > I'm not an expert in this area, but this plan does not seem unreasonable to me. IIRC the discussion with KaiGai-san, I think that that would work. I think that that would be more suitable for CSPs, though. Correct me if I'm wrong, KaiGai-san. In either case, I'm not sure that the idea of transferring both processing to a single callback routine hooked in ForeignRecheck is a good idea: (a) check to see if the test tuple for each component foreign table satisfies the remote qual condition and (b) check to see if those tuples satisfy the remote join condition. I think that that would be too complicated, probably making the callback routine bug-prone. So, I'd still propose that *the core* processes (a) and (b) *separately*. * As for (a), the core checks the remote qual condition as in [1]. * As for (b), the core executes an alternative subplan locally if inside an EPQ recheck. The subplan is created as described in [2]. Attached is a WIP patch for that against 9.5 (fdw-eval-plan-qual-0.1.patch), which includes an updated version of the patch in [1]. I haven't done anything about custom joins yet. Also, I left the join pushdown API as-is. But I still think that it would be better that we hook that API in standard_join_search. So, I plan to modify the patch so in the next version. For tests, I did a very basic update of the latest postgres_fdw patch in [3] and attach that (foreign_join_v16_efujita.patch). You can apply the patches in the following order: fdw-eval-plan-qual-0.1.patch usermapping_matching.patch (in [3]) add_GetUserMappingById.patch (in [3]) foreign_join_v16_efujita.patch (Note that you cannot do tests of [1]. For that, apply fdw-eval-plan-qual-0.1.patch and the postgres_fdw patch in [1] in this order.) Comments welcome! Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/55B204A0.1080507@lab.ntt.co.jp [2] http://www.postgresql.org/message-id/55B9F95F.5060506@lab.ntt.co.jp [3] http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
Attachment
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Wednesday, August 12, 2015 8:26 PM > To: Robert Haas; Kaigai Kouhei(海外 浩平) > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/08/12 7:21, Robert Haas wrote: > > On Fri, Aug 7, 2015 at 3:37 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >>> I could have a discussion with Fujita-san about this topic. > >>> > >> Also, let me share with the discussion towards entire solution. > >> > >> The primitive reason of this problem is, Scan node with scanrelid==0 > >> represents a relation join that can involve multiple relations, thus, > >> its TupleDesc of the records will not fit base relations, however, > >> ExecScanFetch() was not updated when scanrelid==0 gets supported. > >> > >> FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible > >> to generate records according to the fdw_/custom_scan_tlist that > >> reflects the definition of relation join, and only FDW/CSP know how > >> to combine these base relations. > >> In addition, host-side expressions (like Plan->qual) are initialized > >> to reference the records generated by FDW/CSP, so the least invasive > >> approach is to allow FDW/CSP to have own logic to recheck, I think. > >> > >> Below is the structure of ExecScanFetch(). > >> > >> ExecScanFetch(ScanState *node, > >> ExecScanAccessMtd accessMtd, > >> ExecScanRecheckMtd recheckMtd) > >> { > >> EState *estate = node->ps.state; > >> > >> if (estate->es_epqTuple != NULL) > >> { > >> /* > >> * We are inside an EvalPlanQual recheck. Return the test tuple > if > >> * one is available, after rechecking any access-method-specific > >> * conditions. > >> */ > >> Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; > >> > >> Assert(scanrelid > 0); > >> if (estate->es_epqTupleSet[scanrelid - 1]) > >> { > >> TupleTableSlot *slot = node->ss_ScanTupleSlot; > >> : > >> return slot; > >> } > >> } > >> return (*accessMtd) (node); > >> } > >> > >> When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and > >> checks its visibility (ForeignRecheck() always say 'yep, it is visible'), > >> then ExecScan() applies its qualifiers by ExecQual(). > >> So, as long as FDW/CSP can return a record that satisfies the TupleDesc > >> of this relation, made by the tuples in es_epqTuple[] array, rest of the > >> code paths are common. > >> > >> I have an idea to solve the problem. > >> It adds recheckMtd() call if scanrelid==0 just before the assertion above, > >> and add a callback of FDW on ForeignRecheck(). > >> The role of this new callback is to set up the supplied TupleTableSlot > >> and check its visibility, but does not define how to do this. > >> It is arbitrarily by FDW driver, like invocation of alternative plan > >> consists of only built-in logic. > >> > >> Invocation of alternative plan is one of the most feasible way to > >> implement EPQ logic on FDW, so I think FDW also needs a mechanism > >> that takes child path-nodes like custom_paths in CustomPath node. > >> Once a valid path node is linked to this list, createplan.c transform > >> them to relevant plan node, then FDW can initialize and invoke this > >> plan node during execution, like ForeignRecheck(). > >> > >> This design can solve another problem Fujita-san has also mentioned. > >> If scan qualifier is pushed-down to the remote query and its expression > >> node is saved in the private area of ForeignScan, the callback on > >> ForeignRecheck() can evaluate the qualifier by itself. (Note that only > >> FDW driver can know where and how expression node being pushed-down > >> is saved in the private area.) > >> > >> In the summary, the following three enhancements are a straightforward > >> way to fix up the problem he reported. > >> 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 > >> 2. Add a callback of FDW in ForeignRecheck() - to construct a record > >> according to the fdw_scan_tlist definition and to evaluate its > >> visibility, or to evaluate qualifier pushed-down if base relation. > >> 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, > >> to construct plan nodes for EPQ evaluation. > > > I'm not an expert in this area, but this plan does not seem unreasonable to > me. > > IIRC the discussion with KaiGai-san, I think that that would work. I > think that that would be more suitable for CSPs, though. Correct me if > I'm wrong, KaiGai-san. In either case, I'm not sure that the idea of > transferring both processing to a single callback routine hooked in > ForeignRecheck is a good idea: (a) check to see if the test tuple for > each component foreign table satisfies the remote qual condition and (b) > check to see if those tuples satisfy the remote join condition. I think > that that would be too complicated, probably making the callback routine > bug-prone. So, I'd still propose that *the core* processes (a) and (b) > *separately*. > > * As for (a), the core checks the remote qual condition as in [1]. > > * As for (b), the core executes an alternative subplan locally if inside > an EPQ recheck. The subplan is created as described in [2]. > I don't think it is "too" complicated because (a) visibility check of the base tuples (saved in es_epqTuple[]) shall be done in the underlying base foreign-scan node, executed as a part of alternative plan, and (b) evaluation of remote qual is done with ExecQual() call. I seems to me your proposition tends to assume a particular design towards FDW drivers, however, we already have various kind of FDW drivers not only wrapper of remote RDBMS. https://wiki.postgresql.org/wiki/Foreign_data_wrappers Is the [1] and [2] suitable for "all" of them, actually? Let's assume a FDW module that implements own columnar storage, has a special JOIN capability if both side are its columnar storage. Does it need alternative sub-plan for EPQ rechecks? Probably no, because it has own capability to run JOIN by itself. It is inconvenience for this FDW if core automatically kicks sub- plan in spite of its own functionality/capability. If potential bugs are concerned, a common part shall be cut down and provided as a utility function. FDW can determine whether it shall be used, but never enforced. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Fujita-san, The attached patch enhanced the FDW interface according to the direction below (but not tested yet). >> In the summary, the following three enhancements are a straightforward >> way to fix up the problem he reported. >> 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 >> 2. Add a callback of FDW in ForeignRecheck() - to construct a record >> according to the fdw_scan_tlist definition and to evaluate its >> visibility, or to evaluate qualifier pushed-down if base relation. >> 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, >> to construct plan nodes for EPQ evaluation. Likely, what you need to do are... 1. Save the alternative path on fdw_paths when foreign join push-down. GetForeignJoinPaths() may be called multiple times towards a particular joinrel according to the combination of innerrel/outerrel. RelOptInfo->fdw_private allows to avoid construction of same remote join path multiple times. On the second or later invocation, it may be a good tactics to reference cheapest_startup_path and replace the saved one if later invocation have cheaper one, prior to exit. 2. Save the alternative Plan nodes on fdw_plans or lefttree/righttree somewhere you like at the GetForeignPlan() 3. Makes BeginForeignScan() to call ExecInitNode() towards the plan node saved at (2), then save the PlanState on fdw_ps, lefttree/righttree, or somewhere private area if not displayed on EXPLAIN. 4. Implement ForeignRecheck() routine. If scanrelid==0, it kicks the planstate node saved at (3) to generate tuple slot. Then, call the ExecQual() to check qualifiers being pushed down. 5. Makes EndForeignScab() to call ExecEndNode() towards the PlanState saved at (3). I never think above steps are "too" complicated for people who can write FDW drivers. It is what developer usually does. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Wednesday, August 12, 2015 11:17 PM > To: 'Etsuro Fujita'; Robert Haas > Cc: PostgreSQL-development; 花田茂 > Subject: RE: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > > Sent: Wednesday, August 12, 2015 8:26 PM > > To: Robert Haas; Kaigai Kouhei(海外 浩平) > > Cc: PostgreSQL-development; 花田茂 > > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > > On 2015/08/12 7:21, Robert Haas wrote: > > > On Fri, Aug 7, 2015 at 3:37 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > >>> I could have a discussion with Fujita-san about this topic. > > >>> > > >> Also, let me share with the discussion towards entire solution. > > >> > > >> The primitive reason of this problem is, Scan node with scanrelid==0 > > >> represents a relation join that can involve multiple relations, thus, > > >> its TupleDesc of the records will not fit base relations, however, > > >> ExecScanFetch() was not updated when scanrelid==0 gets supported. > > >> > > >> FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible > > >> to generate records according to the fdw_/custom_scan_tlist that > > >> reflects the definition of relation join, and only FDW/CSP know how > > >> to combine these base relations. > > >> In addition, host-side expressions (like Plan->qual) are initialized > > >> to reference the records generated by FDW/CSP, so the least invasive > > >> approach is to allow FDW/CSP to have own logic to recheck, I think. > > >> > > >> Below is the structure of ExecScanFetch(). > > >> > > >> ExecScanFetch(ScanState *node, > > >> ExecScanAccessMtd accessMtd, > > >> ExecScanRecheckMtd recheckMtd) > > >> { > > >> EState *estate = node->ps.state; > > >> > > >> if (estate->es_epqTuple != NULL) > > >> { > > >> /* > > >> * We are inside an EvalPlanQual recheck. Return the test tuple > > if > > >> * one is available, after rechecking any access-method-specific > > >> * conditions. > > >> */ > > >> Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; > > >> > > >> Assert(scanrelid > 0); > > >> if (estate->es_epqTupleSet[scanrelid - 1]) > > >> { > > >> TupleTableSlot *slot = node->ss_ScanTupleSlot; > > >> : > > >> return slot; > > >> } > > >> } > > >> return (*accessMtd) (node); > > >> } > > >> > > >> When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and > > >> checks its visibility (ForeignRecheck() always say 'yep, it is visible'), > > >> then ExecScan() applies its qualifiers by ExecQual(). > > >> So, as long as FDW/CSP can return a record that satisfies the TupleDesc > > >> of this relation, made by the tuples in es_epqTuple[] array, rest of the > > >> code paths are common. > > >> > > >> I have an idea to solve the problem. > > >> It adds recheckMtd() call if scanrelid==0 just before the assertion above, > > >> and add a callback of FDW on ForeignRecheck(). > > >> The role of this new callback is to set up the supplied TupleTableSlot > > >> and check its visibility, but does not define how to do this. > > >> It is arbitrarily by FDW driver, like invocation of alternative plan > > >> consists of only built-in logic. > > >> > > >> Invocation of alternative plan is one of the most feasible way to > > >> implement EPQ logic on FDW, so I think FDW also needs a mechanism > > >> that takes child path-nodes like custom_paths in CustomPath node. > > >> Once a valid path node is linked to this list, createplan.c transform > > >> them to relevant plan node, then FDW can initialize and invoke this > > >> plan node during execution, like ForeignRecheck(). > > >> > > >> This design can solve another problem Fujita-san has also mentioned. > > >> If scan qualifier is pushed-down to the remote query and its expression > > >> node is saved in the private area of ForeignScan, the callback on > > >> ForeignRecheck() can evaluate the qualifier by itself. (Note that only > > >> FDW driver can know where and how expression node being pushed-down > > >> is saved in the private area.) > > >> > > >> In the summary, the following three enhancements are a straightforward > > >> way to fix up the problem he reported. > > >> 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 > > >> 2. Add a callback of FDW in ForeignRecheck() - to construct a record > > >> according to the fdw_scan_tlist definition and to evaluate its > > >> visibility, or to evaluate qualifier pushed-down if base relation. > > >> 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, > > >> to construct plan nodes for EPQ evaluation. > > > > > I'm not an expert in this area, but this plan does not seem unreasonable to > > me. > > > > IIRC the discussion with KaiGai-san, I think that that would work. I > > think that that would be more suitable for CSPs, though. Correct me if > > I'm wrong, KaiGai-san. In either case, I'm not sure that the idea of > > transferring both processing to a single callback routine hooked in > > ForeignRecheck is a good idea: (a) check to see if the test tuple for > > each component foreign table satisfies the remote qual condition and (b) > > check to see if those tuples satisfy the remote join condition. I think > > that that would be too complicated, probably making the callback routine > > bug-prone. So, I'd still propose that *the core* processes (a) and (b) > > *separately*. > > > > * As for (a), the core checks the remote qual condition as in [1]. > > > > * As for (b), the core executes an alternative subplan locally if inside > > an EPQ recheck. The subplan is created as described in [2]. > > > I don't think it is "too" complicated because (a) visibility check of > the base tuples (saved in es_epqTuple[]) shall be done in the underlying > base foreign-scan node, executed as a part of alternative plan, and > (b) evaluation of remote qual is done with ExecQual() call. > > I seems to me your proposition tends to assume a particular design > towards FDW drivers, however, we already have various kind of FDW > drivers not only wrapper of remote RDBMS. > https://wiki.postgresql.org/wiki/Foreign_data_wrappers > > Is the [1] and [2] suitable for "all" of them, actually? > > Let's assume a FDW module that implements own columnar storage, > has a special JOIN capability if both side are its columnar storage. > Does it need alternative sub-plan for EPQ rechecks? Probably no, > because it has own capability to run JOIN by itself. > It is inconvenience for this FDW if core automatically kicks sub- > plan in spite of its own functionality/capability. > > If potential bugs are concerned, a common part shall be cut down > and provided as a utility function. FDW can determine whether it > shall be used, but never enforced. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Fujita-san, How about your opinion towards the solution? CF:Sep will start next week, so I'd like to make a consensus of the direction, at least. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Thursday, August 13, 2015 10:13 AM > To: Etsuro Fujita; Robert Haas > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Fujita-san, > > The attached patch enhanced the FDW interface according to the direction > below (but not tested yet). > > >> In the summary, the following three enhancements are a straightforward > >> way to fix up the problem he reported. > >> 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 > >> 2. Add a callback of FDW in ForeignRecheck() - to construct a record > >> according to the fdw_scan_tlist definition and to evaluate its > >> visibility, or to evaluate qualifier pushed-down if base relation. > >> 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, > >> to construct plan nodes for EPQ evaluation. > > Likely, what you need to do are... > 1. Save the alternative path on fdw_paths when foreign join push-down. > GetForeignJoinPaths() may be called multiple times towards a particular > joinrel according to the combination of innerrel/outerrel. > RelOptInfo->fdw_private allows to avoid construction of same remote > join path multiple times. On the second or later invocation, it may be > a good tactics to reference cheapest_startup_path and replace the saved > one if later invocation have cheaper one, prior to exit. > 2. Save the alternative Plan nodes on fdw_plans or lefttree/righttree > somewhere you like at the GetForeignPlan() > 3. Makes BeginForeignScan() to call ExecInitNode() towards the plan node > saved at (2), then save the PlanState on fdw_ps, lefttree/righttree, > or somewhere private area if not displayed on EXPLAIN. > 4. Implement ForeignRecheck() routine. If scanrelid==0, it kicks the > planstate node saved at (3) to generate tuple slot. Then, call the > ExecQual() to check qualifiers being pushed down. > 5. Makes EndForeignScab() to call ExecEndNode() towards the PlanState > saved at (3). > > I never think above steps are "too" complicated for people who can write > FDW drivers. It is what developer usually does. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: Kaigai Kouhei(海外 浩平) > > Sent: Wednesday, August 12, 2015 11:17 PM > > To: 'Etsuro Fujita'; Robert Haas > > Cc: PostgreSQL-development; 花田茂 > > Subject: RE: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > > > -----Original Message----- > > > From: pgsql-hackers-owner@postgresql.org > > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > > > Sent: Wednesday, August 12, 2015 8:26 PM > > > To: Robert Haas; Kaigai Kouhei(海外 浩平) > > > Cc: PostgreSQL-development; 花田茂 > > > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > > > > On 2015/08/12 7:21, Robert Haas wrote: > > > > On Fri, Aug 7, 2015 at 3:37 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > >>> I could have a discussion with Fujita-san about this topic. > > > >>> > > > >> Also, let me share with the discussion towards entire solution. > > > >> > > > >> The primitive reason of this problem is, Scan node with scanrelid==0 > > > >> represents a relation join that can involve multiple relations, thus, > > > >> its TupleDesc of the records will not fit base relations, however, > > > >> ExecScanFetch() was not updated when scanrelid==0 gets supported. > > > >> > > > >> FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible > > > >> to generate records according to the fdw_/custom_scan_tlist that > > > >> reflects the definition of relation join, and only FDW/CSP know how > > > >> to combine these base relations. > > > >> In addition, host-side expressions (like Plan->qual) are initialized > > > >> to reference the records generated by FDW/CSP, so the least invasive > > > >> approach is to allow FDW/CSP to have own logic to recheck, I think. > > > >> > > > >> Below is the structure of ExecScanFetch(). > > > >> > > > >> ExecScanFetch(ScanState *node, > > > >> ExecScanAccessMtd accessMtd, > > > >> ExecScanRecheckMtd recheckMtd) > > > >> { > > > >> EState *estate = node->ps.state; > > > >> > > > >> if (estate->es_epqTuple != NULL) > > > >> { > > > >> /* > > > >> * We are inside an EvalPlanQual recheck. Return the test tuple > > > if > > > >> * one is available, after rechecking any > access-method-specific > > > >> * conditions. > > > >> */ > > > >> Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; > > > >> > > > >> Assert(scanrelid > 0); > > > >> if (estate->es_epqTupleSet[scanrelid - 1]) > > > >> { > > > >> TupleTableSlot *slot = node->ss_ScanTupleSlot; > > > >> : > > > >> return slot; > > > >> } > > > >> } > > > >> return (*accessMtd) (node); > > > >> } > > > >> > > > >> When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and > > > >> checks its visibility (ForeignRecheck() always say 'yep, it is visible'), > > > >> then ExecScan() applies its qualifiers by ExecQual(). > > > >> So, as long as FDW/CSP can return a record that satisfies the TupleDesc > > > >> of this relation, made by the tuples in es_epqTuple[] array, rest of the > > > >> code paths are common. > > > >> > > > >> I have an idea to solve the problem. > > > >> It adds recheckMtd() call if scanrelid==0 just before the assertion above, > > > >> and add a callback of FDW on ForeignRecheck(). > > > >> The role of this new callback is to set up the supplied TupleTableSlot > > > >> and check its visibility, but does not define how to do this. > > > >> It is arbitrarily by FDW driver, like invocation of alternative plan > > > >> consists of only built-in logic. > > > >> > > > >> Invocation of alternative plan is one of the most feasible way to > > > >> implement EPQ logic on FDW, so I think FDW also needs a mechanism > > > >> that takes child path-nodes like custom_paths in CustomPath node. > > > >> Once a valid path node is linked to this list, createplan.c transform > > > >> them to relevant plan node, then FDW can initialize and invoke this > > > >> plan node during execution, like ForeignRecheck(). > > > >> > > > >> This design can solve another problem Fujita-san has also mentioned. > > > >> If scan qualifier is pushed-down to the remote query and its expression > > > >> node is saved in the private area of ForeignScan, the callback on > > > >> ForeignRecheck() can evaluate the qualifier by itself. (Note that only > > > >> FDW driver can know where and how expression node being pushed-down > > > >> is saved in the private area.) > > > >> > > > >> In the summary, the following three enhancements are a straightforward > > > >> way to fix up the problem he reported. > > > >> 1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0 > > > >> 2. Add a callback of FDW in ForeignRecheck() - to construct a record > > > >> according to the fdw_scan_tlist definition and to evaluate its > > > >> visibility, or to evaluate qualifier pushed-down if base relation. > > > >> 3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths, > > > >> to construct plan nodes for EPQ evaluation. > > > > > > > I'm not an expert in this area, but this plan does not seem unreasonable > to > > > me. > > > > > > IIRC the discussion with KaiGai-san, I think that that would work. I > > > think that that would be more suitable for CSPs, though. Correct me if > > > I'm wrong, KaiGai-san. In either case, I'm not sure that the idea of > > > transferring both processing to a single callback routine hooked in > > > ForeignRecheck is a good idea: (a) check to see if the test tuple for > > > each component foreign table satisfies the remote qual condition and (b) > > > check to see if those tuples satisfy the remote join condition. I think > > > that that would be too complicated, probably making the callback routine > > > bug-prone. So, I'd still propose that *the core* processes (a) and (b) > > > *separately*. > > > > > > * As for (a), the core checks the remote qual condition as in [1]. > > > > > > * As for (b), the core executes an alternative subplan locally if inside > > > an EPQ recheck. The subplan is created as described in [2]. > > > > > I don't think it is "too" complicated because (a) visibility check of > > the base tuples (saved in es_epqTuple[]) shall be done in the underlying > > base foreign-scan node, executed as a part of alternative plan, and > > (b) evaluation of remote qual is done with ExecQual() call. > > > > I seems to me your proposition tends to assume a particular design > > towards FDW drivers, however, we already have various kind of FDW > > drivers not only wrapper of remote RDBMS. > > https://wiki.postgresql.org/wiki/Foreign_data_wrappers > > > > Is the [1] and [2] suitable for "all" of them, actually? > > > > Let's assume a FDW module that implements own columnar storage, > > has a special JOIN capability if both side are its columnar storage. > > Does it need alternative sub-plan for EPQ rechecks? Probably no, > > because it has own capability to run JOIN by itself. > > It is inconvenience for this FDW if core automatically kicks sub- > > plan in spite of its own functionality/capability. > > > > If potential bugs are concerned, a common part shall be cut down > > and provided as a utility function. FDW can determine whether it > > shall be used, but never enforced. > > > > Thanks, > > -- > > NEC Business Creation Division / PG-Strom Project > > KaiGai Kohei <kaigai@ak.jp.nec.com>
Hi KaiGai-san, On 2015/08/25 10:18, Kouhei Kaigai wrote: > How about your opinion towards the solution? >> Likely, what you need to do are... >> 1. Save the alternative path on fdw_paths when foreign join push-down. >> GetForeignJoinPaths() may be called multiple times towards a particular >> joinrel according to the combination of innerrel/outerrel. >> RelOptInfo->fdw_private allows to avoid construction of same remote >> join path multiple times. On the second or later invocation, it may be >> a good tactics to reference cheapest_startup_path and replace the saved >> one if later invocation have cheaper one, prior to exit. I'm not sure that the tactics is a good one. I think you probably assume that GetForeignJoinPaths executes set_cheapest each time that gets called, but ISTM that that would be expensive. (That is one of the reason why I think it would be better to hook that routine in standard_join_search.) >> 2. Save the alternative Plan nodes on fdw_plans or lefttree/righttree >> somewhere you like at the GetForeignPlan() >> 3. Makes BeginForeignScan() to call ExecInitNode() towards the plan node >> saved at (2), then save the PlanState on fdw_ps, lefttree/righttree, >> or somewhere private area if not displayed on EXPLAIN. >> 4. Implement ForeignRecheck() routine. If scanrelid==0, it kicks the >> planstate node saved at (3) to generate tuple slot. Then, call the >> ExecQual() to check qualifiers being pushed down. >> 5. Makes EndForeignScab() to call ExecEndNode() towards the PlanState >> saved at (3). >> I never think above steps are "too" complicated for people who can write >> FDW drivers. It is what developer usually does. Sorry, my explanation was not accurate, but the design that you proposed looks complicated beyond necessity. I think we should add an FDW API for doing something if FDWs have more knowledge about doing that than the core, but in your proposal, instead of the core, an FDW has to eventually do a lot of the core's work: ExecInitNode, ExecProcNode, ExecQual, ExecEndNode and so on. Thank you for the comments! Best regards, Etsuro Fujita
> On 2015/08/25 10:18, Kouhei Kaigai wrote: > > How about your opinion towards the solution? > > >> Likely, what you need to do are... > >> 1. Save the alternative path on fdw_paths when foreign join push-down. > >> GetForeignJoinPaths() may be called multiple times towards a particular > >> joinrel according to the combination of innerrel/outerrel. > >> RelOptInfo->fdw_private allows to avoid construction of same remote > >> join path multiple times. On the second or later invocation, it may be > >> a good tactics to reference cheapest_startup_path and replace the saved > >> one if later invocation have cheaper one, prior to exit. > > I'm not sure that the tactics is a good one. I think you probably > assume that GetForeignJoinPaths executes set_cheapest each time that > gets called, but ISTM that that would be expensive. (That is one of the > reason why I think it would be better to hook that routine in > standard_join_search.) > Here is two different problems. I'd like to identify whether the problem is "must be solved" or "nice to have". Obviously, failure on EPQ check is a problem must be solved, however, hook location is nice to have. In addition, you may misunderstand the proposition of mine above. You can check RelOptInfo->fdw_private on top of the GetForeignJoinPaths, then, if it is second or later invocation, you can check cost of the alternative path kept in the ForeignPath node previously constructed. If cheapest_total_path at the moment of GetForeignJoinPaths invocation is cheaper than the saved alternative path, you can adjust the node to replace the alternative path node. > >> 2. Save the alternative Plan nodes on fdw_plans or lefttree/righttree > >> somewhere you like at the GetForeignPlan() > >> 3. Makes BeginForeignScan() to call ExecInitNode() towards the plan node > >> saved at (2), then save the PlanState on fdw_ps, lefttree/righttree, > >> or somewhere private area if not displayed on EXPLAIN. > >> 4. Implement ForeignRecheck() routine. If scanrelid==0, it kicks the > >> planstate node saved at (3) to generate tuple slot. Then, call the > >> ExecQual() to check qualifiers being pushed down. > >> 5. Makes EndForeignScab() to call ExecEndNode() towards the PlanState > >> saved at (3). > > >> I never think above steps are "too" complicated for people who can write > >> FDW drivers. It is what developer usually does. > > Sorry, my explanation was not accurate, but the design that you proposed > looks complicated beyond necessity. I think we should add an FDW API > for doing something if FDWs have more knowledge about doing that than > the core, but in your proposal, instead of the core, an FDW has to > eventually do a lot of the core's work: ExecInitNode, ExecProcNode, > ExecQual, ExecEndNode and so on. > It is a trade-off problem between interface flexibility and code smallness of FDW extension if it fits scope of the core support. I stand on the viewpoint that gives highest priority on the flexibility, especially, in case when unpredictable type of modules are expected. Your proposition is comfortable to FDW on behalf of RDBMS, however, nobody can promise it is beneficial to FDW on behalf of columnar-store for example. If you stick on the code smallness of FDW on behalf of RDBMS, we can add utility functions on foreign.c or somewhere. It will be able to provide equivalent functionality, but FDW can determine whether it use the routines. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/08/26 13:49, Kouhei Kaigai wrote: >> On 2015/08/25 10:18, Kouhei Kaigai wrote: >>>> Likely, what you need to do are... >>>> 1. Save the alternative path on fdw_paths when foreign join push-down. >>>> GetForeignJoinPaths() may be called multiple times towards a particular >>>> joinrel according to the combination of innerrel/outerrel. >>>> RelOptInfo->fdw_private allows to avoid construction of same remote >>>> join path multiple times. On the second or later invocation, it may be >>>> a good tactics to reference cheapest_startup_path and replace the saved >>>> one if later invocation have cheaper one, prior to exit. >> I'm not sure that the tactics is a good one. I think you probably >> assume that GetForeignJoinPaths executes set_cheapest each time that >> gets called, but ISTM that that would be expensive. (That is one of the >> reason why I think it would be better to hook that routine in >> standard_join_search.) > Here is two different problems. I'd like to identify whether the problem > is "must be solved" or "nice to have". Obviously, failure on EPQ check > is a problem must be solved, however, hook location is nice to have. OK I'll focus on the "must be solved" problem at least on this thread. > In addition, you may misunderstand the proposition of mine above. > You can check RelOptInfo->fdw_private on top of the GetForeignJoinPaths, > then, if it is second or later invocation, you can check cost of the > alternative path kept in the ForeignPath node previously constructed. > If cheapest_total_path at the moment of GetForeignJoinPaths invocation > is cheaper than the saved alternative path, you can adjust the node to > replace the alternative path node. To get the (probably unparameterized) cheapest_total_path, IIUC, we need to do set_cheapest during GetForeignJoinPaths in each subsequent invocation of that routine, don't we? And set_cheapest is expensive, isn't it? >>>> 2. Save the alternative Plan nodes on fdw_plans or lefttree/righttree >>>> somewhere you like at the GetForeignPlan() >>>> 3. Makes BeginForeignScan() to call ExecInitNode() towards the plan node >>>> saved at (2), then save the PlanState on fdw_ps, lefttree/righttree, >>>> or somewhere private area if not displayed on EXPLAIN. >>>> 4. Implement ForeignRecheck() routine. If scanrelid==0, it kicks the >>>> planstate node saved at (3) to generate tuple slot. Then, call the >>>> ExecQual() to check qualifiers being pushed down. >>>> 5. Makes EndForeignScab() to call ExecEndNode() towards the PlanState >>>> saved at (3). >> but the design that you proposed >> looks complicated beyond necessity. I think we should add an FDW API >> for doing something if FDWs have more knowledge about doing that than >> the core, but in your proposal, instead of the core, an FDW has to >> eventually do a lot of the core's work: ExecInitNode, ExecProcNode, >> ExecQual, ExecEndNode and so on. > It is a trade-off problem between interface flexibility and code smallness > of FDW extension if it fits scope of the core support. > I stand on the viewpoint that gives highest priority on the flexibility, > especially, in case when unpredictable type of modules are expected. > Your proposition is comfortable to FDW on behalf of RDBMS, however, nobody > can promise it is beneficial to FDW on behalf of columnar-store for example. Maybe I'm missing something, but why do we need such a flexiblity for the columnar-stores? > If you stick on the code smallness of FDW on behalf of RDBMS, we can add > utility functions on foreign.c or somewhere. It will be able to provide > equivalent functionality, but FDW can determine whether it use the routines. That might be an idea, but I'd like to hear the opinions of others. Best regards, Etsuro Fujita
> > In addition, you may misunderstand the proposition of mine above. > > You can check RelOptInfo->fdw_private on top of the GetForeignJoinPaths, > > then, if it is second or later invocation, you can check cost of the > > alternative path kept in the ForeignPath node previously constructed. > > If cheapest_total_path at the moment of GetForeignJoinPaths invocation > > is cheaper than the saved alternative path, you can adjust the node to > > replace the alternative path node. > > To get the (probably unparameterized) cheapest_total_path, IIUC, we need > to do set_cheapest during GetForeignJoinPaths in each subsequent > invocation of that routine, don't we? And set_cheapest is expensive, > isn't it? > add_path() usually drop paths that are obviously lesser than others, so walk on join->pathlist shall have reasonable length. Even though it has hundreds items on the pathlist, you CAN implement EPQ fallback using alternative built-in logic. > >>>> 2. Save the alternative Plan nodes on fdw_plans or lefttree/righttree > >>>> somewhere you like at the GetForeignPlan() > >>>> 3. Makes BeginForeignScan() to call ExecInitNode() towards the plan node > >>>> saved at (2), then save the PlanState on fdw_ps, lefttree/righttree, > >>>> or somewhere private area if not displayed on EXPLAIN. > >>>> 4. Implement ForeignRecheck() routine. If scanrelid==0, it kicks the > >>>> planstate node saved at (3) to generate tuple slot. Then, call the > >>>> ExecQual() to check qualifiers being pushed down. > >>>> 5. Makes EndForeignScab() to call ExecEndNode() towards the PlanState > >>>> saved at (3). > > >> but the design that you proposed > >> looks complicated beyond necessity. I think we should add an FDW API > >> for doing something if FDWs have more knowledge about doing that than > >> the core, but in your proposal, instead of the core, an FDW has to > >> eventually do a lot of the core's work: ExecInitNode, ExecProcNode, > >> ExecQual, ExecEndNode and so on. > > > It is a trade-off problem between interface flexibility and code smallness > > of FDW extension if it fits scope of the core support. > > I stand on the viewpoint that gives highest priority on the flexibility, > > especially, in case when unpredictable type of modules are expected. > > Your proposition is comfortable to FDW on behalf of RDBMS, however, nobody > > can promise it is beneficial to FDW on behalf of columnar-store for example. > > Maybe I'm missing something, but why do we need such a flexiblity for > the columnar-stores? > We have various kind of FDW drivers, some of use cases were unpredictable preliminary. Our community knows 86 kind of FDW drivers in total, and only 15 of them are for RDBMS but 71 of them for other data source. https://wiki.postgresql.org/wiki/Foreign_data_wrappers Even if we enforce them a new interface specification comfortable to RDBMS, we cannot guarantee it is also comfortable to other type of FDW drivers. If module-X wants to implement the EPQ fallback routine by itself, without alternative plan, too rich interface design prevents what module-X really wants to do. > > If you stick on the code smallness of FDW on behalf of RDBMS, we can add > > utility functions on foreign.c or somewhere. It will be able to provide > > equivalent functionality, but FDW can determine whether it use the routines. > > That might be an idea, but I'd like to hear the opinions of others. > -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/08/26 16:07, Kouhei Kaigai wrote: I wrote: >> Maybe I'm missing something, but why do we need such a flexiblity for >> the columnar-stores? > Even if we enforce them a new interface specification comfortable to RDBMS, > we cannot guarantee it is also comfortable to other type of FDW drivers. Specifically, what kind of points about the patch are specific to RDBMS? > If module-X wants to implement the EPQ fallback routine by itself, without > alternative plan, too rich interface design prevents what module-X really > wants to do. Sorry, I fail to see the need or advantage for module-X to do so, in practice because I think EPQ testing is only execute a subplan for a *single* set of component test tuples. Maybe I'm missing something, though. Best regards, Etsuro Fujita
> On 2015/08/26 16:07, Kouhei Kaigai wrote: > I wrote: > >> Maybe I'm missing something, but why do we need such a flexiblity for > >> the columnar-stores? > > > Even if we enforce them a new interface specification comfortable to RDBMS, > > we cannot guarantee it is also comfortable to other type of FDW drivers. > > Specifically, what kind of points about the patch are specific to RDBMS? > *** 88,93 **** ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot) --- 99,122 ---- TupleTableSlot * ExecForeignScan(ForeignScanState*node) { + EState *estate = node->ss.ps.state; + + if (estate->es_epqTuple!= NULL) + { + /* + * We are inside an EvalPlanQual recheck. If foreign join, getnext + * tuple from subplan. + */ + Index scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;+ + if (scanrelid == 0) + { + PlanState *outerPlan = outerPlanState(node);+ + return ExecProcNode(outerPlan); + } + } + return ExecScan((ScanState*) node, (ExecScanAccessMtd) ForeignNext, (ExecScanRecheckMtd)ForeignRecheck); It might not be specific to RDBMS, however, we cannot guarantee all the FDW are comfortable to run the alternative plan node on EPQ recheck. This design does not allow FDW drivers to implement own EPQ recheck, possibly more efficient than built-in logic. I never deny to run the alternative plan to implement EPQ recheck, according to the decision by FDW driver, however, it is unacceptable pain to enforce all the FDW driver to use alternative plan as a solution of EPQ check. > > If module-X wants to implement the EPQ fallback routine by itself, without > > alternative plan, too rich interface design prevents what module-X really > > wants to do. > > Sorry, I fail to see the need or advantage for module-X to do so, in > practice because I think EPQ testing is only execute a subplan for a > *single* set of component test tuples. Maybe I'm missing something, though. > You may think execution of alternative plan is the best way for EPQ rechecks, however, other folks may think their own implementation is the best for EPQ rechecks. I never argue which approach is better. What I point out is freedom/flexibility of implementation choice. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/08/26 17:05, Kouhei Kaigai wrote: >> On 2015/08/26 16:07, Kouhei Kaigai wrote: >>> Even if we enforce them a new interface specification comfortable to RDBMS, >>> we cannot guarantee it is also comfortable to other type of FDW drivers. >> Specifically, what kind of points about the patch are specific to RDBMS? > TupleTableSlot * > ExecForeignScan(ForeignScanState *node) > { > + EState *estate = node->ss.ps.state; > + > + if (estate->es_epqTuple != NULL) > + { > + /* > + * We are inside an EvalPlanQual recheck. If foreign join, get next > + * tuple from subplan. > + */ > + Index scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid; > + > + if (scanrelid == 0) > + { > + PlanState *outerPlan = outerPlanState(node); > + > + return ExecProcNode(outerPlan); > + } > + } > It might not be specific to RDBMS, however, we cannot guarantee all the FDW are > comfortable to run the alternative plan node on EPQ recheck. > This design does not allow FDW drivers to implement own EPQ recheck, possibly > more efficient than built-in logic. As I said below, EPQ testing is only execute a subplan for a *single* set of component test tuples, so I think the performance gain by its own EPQ testing implemented by an FDW would be probably negligible in practice. No? >>> If module-X wants to implement the EPQ fallback routine by itself, without >>> alternative plan, too rich interface design prevents what module-X really >>> wants to do. >> >> Sorry, I fail to see the need or advantage for module-X to do so, in >> practice because I think EPQ testing is only execute a subplan for a >> *single* set of component test tuples. Maybe I'm missing something, though. >> > You may think execution of alternative plan is the best way for EPQ rechecks, > however, other folks may think their own implementation is the best for EPQ > rechecks. I never argue which approach is better. > What I point out is freedom/flexibility of implementation choice. No, I just want to know the need or advantage for that specifically. Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Wednesday, August 26, 2015 5:38 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/08/26 17:05, Kouhei Kaigai wrote: > >> On 2015/08/26 16:07, Kouhei Kaigai wrote: > >>> Even if we enforce them a new interface specification comfortable to RDBMS, > >>> we cannot guarantee it is also comfortable to other type of FDW drivers. > > >> Specifically, what kind of points about the patch are specific to RDBMS? > > > TupleTableSlot * > > ExecForeignScan(ForeignScanState *node) > > { > > + EState *estate = node->ss.ps.state; > > + > > + if (estate->es_epqTuple != NULL) > > + { > > + /* > > + * We are inside an EvalPlanQual recheck. If foreign join, get > next > > + * tuple from subplan. > > + */ > > + Index scanrelid = ((Scan *) > node->ss.ps.plan)->scanrelid; > > + > > + if (scanrelid == 0) > > + { > > + PlanState *outerPlan = outerPlanState(node); > > + > > + return ExecProcNode(outerPlan); > > + } > > + } > > > It might not be specific to RDBMS, however, we cannot guarantee all the FDW > are > > comfortable to run the alternative plan node on EPQ recheck. > > This design does not allow FDW drivers to implement own EPQ recheck, possibly > > more efficient than built-in logic. > > As I said below, EPQ testing is only execute a subplan for a *single* > set of component test tuples, so I think the performance gain by its own > EPQ testing implemented by an FDW would be probably negligible in > practice. No? > > >>> If module-X wants to implement the EPQ fallback routine by itself, without > >>> alternative plan, too rich interface design prevents what module-X really > >>> wants to do. > >> > >> Sorry, I fail to see the need or advantage for module-X to do so, in > >> practice because I think EPQ testing is only execute a subplan for a > >> *single* set of component test tuples. Maybe I'm missing something, though. > >> > > You may think execution of alternative plan is the best way for EPQ rechecks, > > however, other folks may think their own implementation is the best for EPQ > > rechecks. I never argue which approach is better. > > What I point out is freedom/flexibility of implementation choice. > > No, I just want to know the need or advantage for that specifically. > I'm not interested in advantage / disadvantage of individual FDW driver's implementation. It is matter of FDW drivers, not a matter of core PostgreSQL. The only and significant point I repeatedly emphasized is, it is developer's choice thus it is important to provide options for developers. If they want, FDW developer can follow the manner of alternative plan execution for EPQ rechecks. I never deny your idea, but should be one of the options we can take. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/08/26 18:01, Kouhei Kaigai wrote: >>> You may think execution of alternative plan is the best way for EPQ rechecks, >>> however, other folks may think their own implementation is the best for EPQ >>> rechecks. I never argue which approach is better. >>> What I point out is freedom/flexibility of implementation choice. Maybe my explanation was not accurate, but I just want to know use cases, to understand the need to provide the flexiblity. > The only and significant point I repeatedly emphasized is, it is developer's > choice thus it is important to provide options for developers. > If they want, FDW developer can follow the manner of alternative plan > execution for EPQ rechecks. I never deny your idea, but should be one of > the options we can take. I don't object about your idea either, but I have a concern about that; it looks like that the more flexiblity we provide, the more the FDWs implementing their own EPQ would be subject to an internal change in the core. Best regards, Etsuro Fujita
> On 2015/08/26 18:01, Kouhei Kaigai wrote: > >>> You may think execution of alternative plan is the best way for EPQ rechecks, > >>> however, other folks may think their own implementation is the best for EPQ > >>> rechecks. I never argue which approach is better. > >>> What I point out is freedom/flexibility of implementation choice. > > Maybe my explanation was not accurate, but I just want to know use > cases, to understand the need to provide the flexiblity. > Let's assume the following situation: Someone wants to implement FPGA acceleration feature on top of FDW. (You may know the earliest PG-Strom was built on FDW interface) It enables to run SQL join workloads on FPGA device, but has equivalent fallback routines to be executed if FPGA returned an error. On EPQ check case, it is quite natural that he wants to re-use this fallback routine to validate EPQ tuple. Alternative plan may consume additional (at least not zero) memory and other system resource. As I have said repeatedly, it is software design decision by the author of extension. Even if it consumes 100 times larger memory and 1000 times slower, it is his decision and responsibility. Why he has to be forced to use a particular logic despite his intension? > > The only and significant point I repeatedly emphasized is, it is developer's > > choice thus it is important to provide options for developers. > > If they want, FDW developer can follow the manner of alternative plan > > execution for EPQ rechecks. I never deny your idea, but should be one of > > the options we can take. > > I don't object about your idea either, but I have a concern about that; > it looks like that the more flexiblity we provide, the more the FDWs > implementing their own EPQ would be subject to an internal change in the > core. > We never guarantee interface compatibility across major versions. All we can say is 'best efforts'. So, it is always role of extension owner, as long as he continue to maintain his module. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/08/27 11:08, Kouhei Kaigai wrote: >> On 2015/08/26 18:01, Kouhei Kaigai wrote: >>>>> You may think execution of alternative plan is the best way for EPQ rechecks, >>>>> however, other folks may think their own implementation is the best for EPQ >>>>> rechecks. I never argue which approach is better. >>>>> What I point out is freedom/flexibility of implementation choice. >> Maybe my explanation was not accurate, but I just want to know use >> cases, to understand the need to provide the flexiblity. > Let's assume the following situation: > Someone wants to implement FPGA acceleration feature on top of FDW. > It enables to run SQL join workloads on FPGA device, but has equivalent > fallback routines to be executed if FPGA returned an error. > On EPQ check case, it is quite natural that he wants to re-use this > fallback routine to validate EPQ tuple. Alternative plan may consume > additional (at least not zero) memory and other system resource. Thanks for the answer, but I'm not still convinced. I think the EPQ testing shown in that use-case would probably not efficient, compared to the core's. > As I have said repeatedly, it is software design decision by the author > of extension. Even if it consumes 100 times larger memory and 1000 times > slower, it is his decision and responsibility. > Why he has to be forced to use a particular logic despite his intension? I don't understand what you proposed, but ISTM that your proposal is more like a feature, rather than a bugfix. For what you proposed, I think we should also improve the existing EPQ mechanism including the corresponding FDW routines. One possible improvement is the behavior of late row locking. Currently, we do that by 1) re-fetching each component tuple from the foreign table after locking it by RefetchForeignRow and then 2) if necessary, doing an EPQ recheck, ie, re-running the query locally for such component tuples by the core. So, if we could re-run the join part of the query remotely without tranferring such component tuples from the foreign tables, we would be able to not only avoid useless data transfer but improve concurrency when the join fails. So, how about addressing this issue in two steps; first, work on the bugfix patch in [1], and then, work on what you propsed. The latter would need more discussion/work, so I think it would be better to take that in 9.6. If it's OK, I'll update the patch in [1] and add it to the upcoming CF. >> I don't object about your idea either, but I have a concern about that; >> it looks like that the more flexiblity we provide, the more the FDWs >> implementing their own EPQ would be subject to an internal change in the >> core. > We never guarantee interface compatibility across major versions. All we > can say is 'best efforts'. So, it is always role of extension owner, as > long as he continue to maintain his module. I think we cannot 100% guarantee the compatibility. That is why I think we should avoid an FDW improvement that would be subject to an internal change in the core, unless there is a good reason or use-case for that. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/55CB2D45.7040100@lab.ntt.co.jp
> > As I have said repeatedly, it is software design decision by the author > > of extension. Even if it consumes 100 times larger memory and 1000 times > > slower, it is his decision and responsibility. > > Why he has to be forced to use a particular logic despite his intension? > > I don't understand what you proposed, > What I'm talking about is philosophy of software/interface design. I understand EPQ recheck by alternative plan is "one" reasonable way, however, people often have different ideas and may be better than your idea depending on its context/environment/prerequisites/etc... It is always unpredictable, only God can know what is the best solution. In other words, I didn't talk about taste of restaurant, the problem is lack of variation on the menu. You may not want, but we have freedom to eat terrible taste meal. > but ISTM that your proposal is > more like a feature, rather than a bugfix. > Yes, the problem we are facing is lack of a feature. It might be my oversight when I designed join pushdown infrastructure. Sorry. So, it is quite natural to add the missing piece to fix up the bug. > For what you proposed, I > think we should also improve the existing EPQ mechanism including the > corresponding FDW routines. One possible improvement is the behavior of > late row locking. Currently, we do that by 1) re-fetching each > component tuple from the foreign table after locking it by > RefetchForeignRow and then 2) if necessary, doing an EPQ recheck, ie, > re-running the query locally for such component tuples by the core. So, > if we could re-run the join part of the query remotely without > tranferring such component tuples from the foreign tables, we would be > able to not only avoid useless data transfer but improve concurrency > when the join fails. > > So, how about addressing this issue in two steps; first, work on the > bugfix patch in [1], and then, work on what you propsed. The latter > would need more discussion/work, so I think it would be better to take > that in 9.6. If it's OK, I'll update the patch in [1] and add it to the > upcoming CF. > It seems to me too invasive for bugfix, and assumes a particular solution. Please do the rechecking part in the extension, not in the core. > >> I don't object about your idea either, but I have a concern about that; > >> it looks like that the more flexiblity we provide, the more the FDWs > >> implementing their own EPQ would be subject to an internal change in the > >> core. > > > We never guarantee interface compatibility across major versions. All we > > can say is 'best efforts'. So, it is always role of extension owner, as > > long as he continue to maintain his module. > > I think we cannot 100% guarantee the compatibility. That is why I think > we should avoid an FDW improvement that would be subject to an internal > change in the core, unless there is a good reason or use-case for that. > It does not make sense unless we don't provide stable and well specified interface, because developers will have validation and adjustment of their extension to new major versions. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/08/27 16:52, Kouhei Kaigai wrote: I wrote: >> I don't understand what you proposed, > What I'm talking about is philosophy of software/interface design. > I understand EPQ recheck by alternative plan is "one" reasonable way, > however, people often have different ideas and may be better than > your idea depending on its context/environment/prerequisites/etc... > It is always unpredictable, only God can know what is the best solution. > > In other words, I didn't talk about taste of restaurant, the problem is > lack of variation on the menu. You may not want, but we have freedom to > eat terrible taste meal. >> but ISTM that your proposal is >> more like a feature, rather than a bugfix. > Yes, the problem we are facing is lack of a feature. It might be my > oversight when I designed join pushdown infrastructure. Sorry. > So, it is quite natural to add the missing piece to fix up the bug. >> For what you proposed, I >> think we should also improve the existing EPQ mechanism including the >> corresponding FDW routines. One possible improvement is the behavior of >> late row locking. Currently, we do that by 1) re-fetching each >> component tuple from the foreign table after locking it by >> RefetchForeignRow and then 2) if necessary, doing an EPQ recheck, ie, >> re-running the query locally for such component tuples by the core. So, >> if we could re-run the join part of the query remotely without >> tranferring such component tuples from the foreign tables, we would be >> able to not only avoid useless data transfer but improve concurrency >> when the join fails. >> >> So, how about addressing this issue in two steps; first, work on the >> bugfix patch in [1], and then, work on what you propsed. The latter >> would need more discussion/work, so I think it would be better to take >> that in 9.6. If it's OK, I'll update the patch in [1] and add it to the >> upcoming CF. > It seems to me too invasive for bugfix, and assumes a particular solution. > Please do the rechecking part in the extension, not in the core. I think we would probably need others' opinions about this issue. Best regards, Etsuro Fujita
On 2015/08/27 17:30, Etsuro Fujita wrote: > I think we would probably need others' opinions about this issue. Attached is an updated version of the patch [1]. I'd be happy if it helps people discuss about this issue. Changes: * rebased to HEAD. * add some more docs and comments. * fix a bug in handling tlist of a ForeignScan node when the node is the top node. * fix a bug in doing ExecAssignScanTypeFromOuterPlan at the top of a ForeignScan node. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/55CB2D45.7040100@lab.ntt.co.jp
Attachment
Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Etsuro Fujita
Date:
On 2015/08/01 23:25, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> The problem that was bothering us (or at least what was bothering me) >> is that the PlannerInfo provides only a list of SpecialJoinInfo >> structures, which don't directly give you the original join order. In >> fact, min_righthand and min_lefthand are intended to constraint the >> *possible* join orders, and are deliberately designed *not* to specify >> a single join order. If you're sending a query to a remote PostgreSQL >> node, you don't want to know what all the possible join orders are; >> it's the remote side's job to plan the query. You do, however, need >> an easy way to identify one join order that you can use to construct a >> query. It didn't seem easy to do that without duplicating >> make_join_rel(), which seemed like a bad idea. > In principle it seems like you could traverse root->parse->jointree > as a guide to reconstructing the original syntactic structure; though > I'm not sure how hard it would be to ignore the parts of that tree > that correspond to relations you're not shipping. I'll investigate this. >> But maybe there's a good way to do it. Tom wasn't crazy about this >> hook both because of the frequency of calls and also because of the >> long argument list. I think those concerns are legitimate; I just >> couldn't see how to make the other way work. > In my vision you probably really only want one call per build_join_rel > event (that is, per construction of a new RelOptInfo), not per > make_join_rel event. > > It's possible that an FDW that wants to handle joins but is not talking to > a remote query planner would need to grovel through all the join ordering > possibilities individually, and then maybe hooking at make_join_rel is > sensible rather than having to reinvent that logic. But I'd want to see a > concrete use-case first, and I certainly don't think that that's the main > case to design the API around. I'd vote for hooking at standard_join_search. Here is a use-case: * When the callback routine is hooked at that funcition (right after allpaths.c:1817), an FDW would collect lists of all the available local-join-path orderings and parameterizations by looking at each path in rel->pathlist (if the join rel only contains foreign tables that all belong to the same foreign server). * Then the FDW would use these as a heuristic to indcate which sort orderings and parameterizations we should build foreign-join paths for.(These would be also used as alternative paths forEvalPlanQual handling, as discussed upthread.) It seems reasonable to me to consider pushed-down versions of these paths as first candidates, but foreign-join paths to build are not limited to such ones. The FDW is allowed to consider any foreign-join paths as long as their alternative paths are provided. IMO one thing to consider for the postgres_fdw case would be the use_remote_estimate option. In the case when the option is true, I think we should perform remote EXPLAINs for pushed-down-join queries to obtain cost estimates. But it would require too much time to do that for each of the possible join rel. So, I think it would be better to put off the callback routine's work as long as possible. I think that that could probably be done by looking at rel->joininfo, root->join_info_list and/or something like that. (When considering a join rel A JOIN B both on the same foreign server, for example, we can skip the routine's work if the join rel proved to be joined with C on the same foreign server by looking at rel->joininfo, for example.) Maybe I'm missing something, though. Best regards, Etsuro Fujita
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Tom Lane
Date:
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: > On 2015/08/01 23:25, Tom Lane wrote: >> In my vision you probably really only want one call per build_join_rel >> event (that is, per construction of a new RelOptInfo), not per >> make_join_rel event. > I'd vote for hooking at standard_join_search. I think that method would require the FDW to duplicate a whole lot of the join search mechanism, for not a whole lot of benefit. It's possible that there'd be value in doing some initial reconnaissance once we've examined all the baserels, so I'm not necessarily against providing a hook there. But if you have in mind that typical FDWs would actually create join paths at that point, consider that 1. The FDW would have to find all the combinations of its supplied relations (unless you are only intending to generate one path for the union of all such rels, which seems pretty narrow-minded from here). 2. The FDW would have to account for join_is_legal considerations. 3. The FDW would have to arrange for creation of joinrel RelOptInfo structures. While that's possible, the available infrastructure for it assumes that joinrels are built up from pairs of simpler joinrels, so you couldn't go directly to the union of all the FDW's rels anyway. So I still think that the most directly useful infrastructure here would involve, when build_join_rel() first creates a given joinrel, noticing whether both sides belong to the same foreign server and if so giving the FDW a callback to consider producing pushed-down joins. That would be extremely cheap to do and it would not involve adding overhead for an FDW to discover what the valid sets of joins are. In a large join problem, that's *not* going to be a cheap thing to duplicate. If there are multiple FDWs involved, the idea that each one of them would do its own join search is particularly horrid. One other problem with the proposal is that we might never call standard_join_search at all: GEQO overrides it, and so can external users of join_search_hook. regards, tom lane
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Robert Haas
Date:
On Wed, Sep 2, 2015 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > But if you have in mind that typical FDWs would actually create join paths > at that point, consider that > > 1. The FDW would have to find all the combinations of its supplied > relations (unless you are only intending to generate one path for the > union of all such rels, which seems pretty narrow-minded from here). Well, if the remote end is another database server, presumably we can leave it to optimize the query, so why would we need more than one path? I can see that we need more than one path because of sort-order considerations, which would affect the query we ship to the remote side. But I don't see the point of considering multiple join orders unless the remote end is dumber than our optimizer, which might be true in some cases, but not if the remote end is PostgreSQL. > 2. The FDW would have to account for join_is_legal considerations. I agree with this. > 3. The FDW would have to arrange for creation of joinrel RelOptInfo > structures. While that's possible, the available infrastructure for it > assumes that joinrels are built up from pairs of simpler joinrels, so > you couldn't go directly to the union of all the FDW's rels anyway. And with this. > So I still think that the most directly useful infrastructure here > would involve, when build_join_rel() first creates a given joinrel, > noticing whether both sides belong to the same foreign server and > if so giving the FDW a callback to consider producing pushed-down > joins. That would be extremely cheap to do and it would not involve > adding overhead for an FDW to discover what the valid sets of joins > are. In a large join problem, that's *not* going to be a cheap > thing to duplicate. If there are multiple FDWs involved, the idea > that each one of them would do its own join search is particularly > horrid. So, the problem is that I don't think this entirely skirts the join_is_legal issues, which are a principal point of concern for me. Say this is a joinrel between (A B) and (C D E). We need to generate an SQL query for (A B C D E). We know that the outermost syntactic join can be (A B) to (C D E). But how do we know which join orders are legal as among (C D E)? Maybe there's a simple way to handle this that I'm not seeing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Sep 2, 2015 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But if you have in mind that typical FDWs would actually create join paths >> at that point, consider that >> >> 1. The FDW would have to find all the combinations of its supplied >> relations (unless you are only intending to generate one path for the >> union of all such rels, which seems pretty narrow-minded from here). > Well, if the remote end is another database server, presumably we can > leave it to optimize the query, so why would we need more than one > path? If you have say 5 relations in the query, 3 of which are foreign, it might make sense to join all 3 at the remote end, or maybe you should only join 2 of them remotely because it's better to then join to one of the local rels before joining the last remote rel. Even if you claim that that would never make sense from a cost standpoint (a claim easily seen to be silly), there might not be any legal way to join all 3 directly because of join order constraints. The larger point is that we can't expect the remote server to be fully responsible for optimizing, because it will know nothing of what's being done on our end. > I can see that we need more than one path because of sort-order > considerations, which would affect the query we ship to the remote > side. But I don't see the point of considering multiple join orders > unless the remote end is dumber than our optimizer, which might be > true in some cases, but not if the remote end is PostgreSQL. (1) not all remote ends are Postgres, (2) the remote end doesn't have any access to info about our end. > So, the problem is that I don't think this entirely skirts the > join_is_legal issues, which are a principal point of concern for me. > Say this is a joinrel between (A B) and (C D E). We need to generate > an SQL query for (A B C D E). We know that the outermost syntactic > join can be (A B) to (C D E). But how do we know which join orders > are legal as among (C D E)? Maybe there's a simple way to handle this > that I'm not seeing. Well, if the joins get built up in the way I think should happen, we'd have already considered (C D E), and we could have recorded the legal join orders within that at the time. (I imagine that we should allow FDWs to store some data within RelOptInfo structs that represent foreign joins belonging entirely to them, so that there'd be a handy place to keep that data till later.) Or we could trawl through the paths associated with the child joinrel, which will presumably include instances of every reasonable sub-join combination. Or the FDW could look at the SpecialJoinInfo data and determine things for itself (or more likely, ask join_is_legal about that). In the case of postgres_fdw, I think the actual requirement will be to be able to reconstruct a SQL query that correctly expresses the join; that is, we need to send over something like "from c left join d on (...) full join e on (...)", not just "from c, d, e", or we'll get totally bogus estimates as well as bogus execution results. Offhand I think that the most likely way to build that text will be to examine the query's jointree to see where c,d,e appear in it. But in any case, that's a separate issue and I fail to see how plopping the join search problem into the FDW's lap would make it any easier. regards, tom lane
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Tom Lane
Date:
I wrote: > ... I imagine that we should allow FDWs to > store some data within RelOptInfo structs that represent foreign joins > belonging entirely to them, so that there'd be a handy place to keep that > data till later. Actually, if we do that (ie, provide a "void *fdw_state" field in join RelOptInfos), then the FDW could use the nullness or not-nullness of such a field to realize whether or not it had already considered this join relation. So I'm now thinking that the best API is to call the FDW at the end of each make_join_rel call, whether it's the first one for the joinrel or not. If the FDW wants a call for each legal pair of input sub-relations, it's got one. If it only wants one call per joinrel, it can just make sure to put something into fdw_state, and then on subsequent calls for the same joinrel it can just exit immediately if fdw_state is already non-null. So we have both use-cases covered. Also, by doing this at the end, the FDW can look at the "regular" (local join execution) paths that were already generated, should it wish to. regards, tom lane
, On Wed, Aug 26, 2015 at 4:05 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> On 2015/08/26 16:07, Kouhei Kaigai wrote: >> I wrote: >> >> Maybe I'm missing something, but why do we need such a flexiblity for >> >> the columnar-stores? >> >> > Even if we enforce them a new interface specification comfortable to RDBMS, >> > we cannot guarantee it is also comfortable to other type of FDW drivers. >> >> Specifically, what kind of points about the patch are specific to RDBMS? >> > > *** 88,93 **** ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot) > --- 99,122 ---- > TupleTableSlot * > ExecForeignScan(ForeignScanState *node) > { > + EState *estate = node->ss.ps.state; > + > + if (estate->es_epqTuple != NULL) > + { > + /* > + * We are inside an EvalPlanQual recheck. If foreign join, get next > + * tuple from subplan. > + */ > + Index scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid; > + > + if (scanrelid == 0) > + { > + PlanState *outerPlan = outerPlanState(node); > + > + return ExecProcNode(outerPlan); > + } > + } > + > return ExecScan((ScanState *) node, > (ExecScanAccessMtd) ForeignNext, > (ExecScanRecheckMtd) ForeignRecheck); > > It might not be specific to RDBMS, however, we cannot guarantee all the FDW are > comfortable to run the alternative plan node on EPQ recheck. > This design does not allow FDW drivers to implement own EPQ recheck, possibly > more efficient than built-in logic. I'm not convinced that this problem is more than hypothetical. EPQ rechecks should be quite rare, so it shouldn't really matter if we jump through a few extra hoops when they happen. And really, are those hoops all that expensive? It's not as if ExecInitNode should be doing any sort of expensive operation, or ExecEndScan either. And they will be able to tell if they're being called for an EPQ-recheck by fishing out the estate, so if there's some processing that they want to short-circuit for that case, they can. So I'm not seeing the problem. Do you have any evidence that either the performance cost or the code complexity cost is significant for PG-Strom or any other extension? That having been said, I don't entirely like Fujita-san's patch either. Much of the new code is called immediately adjacent to an FDW callback which could pretty trivially do the same thing itself, if needed. And much of it is contingent on whether estate->es_epqTuple != NULL and scanrelid == 0, but perhaps out would be better to check whether the subplan is actually present instead of checking whether we think it should be present. Also, the naming is a bit weird: node->fs_subplan gets shoved into outerPlanState(), which seems like a kludge. I'm wondering if there's another approach. If I understand correctly, there are two reasons why the current situation is untenable. The first is that ForeignRecheck always returns true, but we could instead call an FDW-supplied callback routine there. The callback could be optional, so that we just return true if there is none, which is nice for already-existing FDWs that then don't need to do anything. The second is that ExecScanFetch needs scanrelid > 0 so that estate->es_epqTupleSet[scanrelid - 1] isn't indexing off the beginning of the array, and similarly estate->es_epqScanDone[scanrelid - 1] and estate->es_epqTuple[scanrelid - 1]. But, waving my hands wildly, that also seems like a solvable problem. I mean, we're joining a non-empty set of relations, so the entries in the EPQ-related arrays for those RTIs are not getting used for anything, so we can use any of them for the joinrel. We need some way for this code to decide what RTI to use, but that shouldn't be too hard to finagle. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/09/03 9:41, Robert Haas wrote: > That having been said, I don't entirely like Fujita-san's patch > either. Much of the new code is called immediately adjacent to an FDW > callback which could pretty trivially do the same thing itself, if > needed. Another idea about that code is to call that code in eg, ExecProcNode, instead of calling ExecForeignScan there. I think that that might be much cleaner and resolve the naming problem below. > And much of it is contingent on whether estate->es_epqTuple > != NULL and scanrelid == 0, but perhaps out would be better to check > whether the subplan is actually present instead of checking whether we > think it should be present. Agreed with this. > Also, the naming is a bit weird: > node->fs_subplan gets shoved into outerPlanState(), which seems like a > kludge. And with this. Proposals welcome. > I'm wondering if there's another approach. If I understand correctly, > there are two reasons why the current situation is untenable. The > first is that ForeignRecheck always returns true, but we could instead > call an FDW-supplied callback routine there. The callback could be > optional, so that we just return true if there is none, which is nice > for already-existing FDWs that then don't need to do anything. My question about this is, is the callback really needed? If there are any FDWs that want to do the work *in their own way*, instead of just doing ExecProcNode for executing a local join execution plan in case of foreign join (or just doing ExecQual for checking remote quals in case of foreign table), I'd agree with introducing the callback, but if not, I don't think that that makes much sense. Best regards, Etsuro Fujita
On 2015/09/03 14:22, Etsuro Fujita wrote: > On 2015/09/03 9:41, Robert Haas wrote: >> That having been said, I don't entirely like Fujita-san's patch >> either. Much of the new code is called immediately adjacent to an FDW >> callback which could pretty trivially do the same thing itself, if >> needed. > Another idea about that code is to call that code in eg, ExecProcNode, > instead of calling ExecForeignScan there. I think that that might be > much cleaner and resolve the naming problem below. I gave it another thought; the following changes to ExecInitNode would make the patch much simpler, ie, we would no longer need to call the new code in ExecInitForeignScan, ExecForeignScan, ExecEndForeignScan, and ExecReScanForeignScan. I think that would resolve the name problem also. *** a/src/backend/executor/execProcnode.c --- b/src/backend/executor/execProcnode.c *************** *** 247,254 **** ExecInitNode(Plan *node, EState *estate, int eflags) break; case T_ForeignScan: ! result = (PlanState *) ExecInitForeignScan((ForeignScan *) node, ! estate, eflags); break; case T_CustomScan: --- 247,269 ---- break; case T_ForeignScan: ! { ! Index scanrelid = ((ForeignScan *) node)->scan.scanrelid; ! ! if (estate->es_epqTuple != NULL && scanrelid == 0) ! { ! /* ! * We are in foreign join inside an EvalPlanQual recheck. ! * Initialize local join execution plan, instead. ! */ ! Plan *subplan = ((ForeignScan *) node)->fs_subplan; ! ! result = ExecInitNode(subplan, estate, eflags); ! } ! else ! result = (PlanState *) ExecInitForeignScan((ForeignScan *) node, ! estate, eflags); ! } break; Best regards, Etsuro Fujita
> , On Wed, Aug 26, 2015 at 4:05 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> On 2015/08/26 16:07, Kouhei Kaigai wrote: > >> I wrote: > >> >> Maybe I'm missing something, but why do we need such a flexiblity for > >> >> the columnar-stores? > >> > >> > Even if we enforce them a new interface specification comfortable to RDBMS, > >> > we cannot guarantee it is also comfortable to other type of FDW drivers. > >> > >> Specifically, what kind of points about the patch are specific to RDBMS? > >> > > > > *** 88,93 **** ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot) > > --- 99,122 ---- > > TupleTableSlot * > > ExecForeignScan(ForeignScanState *node) > > { > > + EState *estate = node->ss.ps.state; > > + > > + if (estate->es_epqTuple != NULL) > > + { > > + /* > > + * We are inside an EvalPlanQual recheck. If foreign join, > get next > > + * tuple from subplan. > > + */ > > + Index scanrelid = ((Scan *) > node->ss.ps.plan)->scanrelid; > > + > > + if (scanrelid == 0) > > + { > > + PlanState *outerPlan = outerPlanState(node); > > + > > + return ExecProcNode(outerPlan); > > + } > > + } > > + > > return ExecScan((ScanState *) node, > > (ExecScanAccessMtd) ForeignNext, > > (ExecScanRecheckMtd) > ForeignRecheck); > > > > It might not be specific to RDBMS, however, we cannot guarantee all the FDW > are > > comfortable to run the alternative plan node on EPQ recheck. > > This design does not allow FDW drivers to implement own EPQ recheck, possibly > > more efficient than built-in logic. > > I'm not convinced that this problem is more than hypothetical. EPQ > rechecks should be quite rare, so it shouldn't really matter if we > jump through a few extra hoops when they happen. And really, are > those hoops all that expensive? It's not as if ExecInitNode should be > doing any sort of expensive operation, or ExecEndScan either. And > they will be able to tell if they're being called for an EPQ-recheck > by fishing out the estate, so if there's some processing that they > want to short-circuit for that case, they can. So I'm not seeing the > problem. Do you have any evidence that either the performance cost or > the code complexity cost is significant for PG-Strom or any other > extension? > Even though PG-Strom does not implement EPQ recheck mechanism yet (and not implemented on top of FDW), I plan to re-use CPU fallback mechanism (*1) rather than having alternative plan approach. I also don't care about performance penalty, however, don't want to have alternative plan because of code complexity. I don't deny individual extensions have alternative path by their decision, but should not be enforced. (*1) GPU often cannot execute expression because of exceptional data like very long numeric or external toast etc..., but to be executable. In this case, PG-Strom evaluates this expression in the CPU side (of course, it is worse than normal execution path but better than error). This logic is almost same as what we need on EPQ recheck. > I'm wondering if there's another approach. If I understand correctly, > there are two reasons why the current situation is untenable. The > first is that ForeignRecheck always returns true, but we could instead > call an FDW-supplied callback routine there. The callback could be > optional, so that we just return true if there is none, which is nice > for already-existing FDWs that then don't need to do anything. The > second is that ExecScanFetch needs scanrelid > 0 so that > estate->es_epqTupleSet[scanrelid - 1] isn't indexing off the beginning > of the array, and similarly estate->es_epqScanDone[scanrelid - 1] and > estate->es_epqTuple[scanrelid - 1]. But, waving my hands wildly, that > also seems like a solvable problem. I mean, we're joining a non-empty > set of relations, so the entries in the EPQ-related arrays for those > RTIs are not getting used for anything, so we can use any of them for > the joinrel. We need some way for this code to decide what RTI to > use, but that shouldn't be too hard to finagle. > ForeignScan->fs_relids and CustomScan->custom_relids know which RTIs shall be involved in this joinrel. However, only extension know how these relations (including the case of N-way join) shall be joined. FDW drivers may keep joinrestrictinfo in their comfortable way, like a compiled GPU native binary, so I don't think core side can do something relevant reasonably. Even though Fujita-san proposed a new special fields in ForeignScan to attach expression node that was pushed down, however, it looks to me interface contract makes more complicated. Rather than various special purpose fields, it is more straightforward to call back extension when scanrelid==0. We can provide equivalent feature as a utility function that has capability Fujita-san wants. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Robert Haas
Date:
On Wed, Sep 2, 2015 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Sep 2, 2015 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> But if you have in mind that typical FDWs would actually create join paths >>> at that point, consider that >>> >>> 1. The FDW would have to find all the combinations of its supplied >>> relations (unless you are only intending to generate one path for the >>> union of all such rels, which seems pretty narrow-minded from here). > >> Well, if the remote end is another database server, presumably we can >> leave it to optimize the query, so why would we need more than one >> path? > > If you have say 5 relations in the query, 3 of which are foreign, it might > make sense to join all 3 at the remote end, or maybe you should only join > 2 of them remotely because it's better to then join to one of the local > rels before joining the last remote rel. True. But that's not the problem I'm concerned about. Suppose the query looks like this: SELECT * FROM ft1 LEFT JOIN ft2 ON ft1.x = ft2.x LEFT JOIN t1 ON ft2.y = t1.y LEFT JOIN ft3 ON ft1.z = ft3.z LEFT JOIN t2 ON ft1.w = t2.w; Now, no matter where we put the hooks, we'll consider foreign join paths for all of the various combinations of tables that we could push down. We'll decide between those various options based on cost, which is fine. But let's consider just one joinrel, the one that includes (ft1 ft2 ft3). Assuming that the remote tables have the same name as the local tables. The path that implements a pushed-down join of all three tables will send one of these two queries to the remote server: SELECT * FROM ft1 LEFT JOIN ft2 ON ft1.x = ft2.x LEFT JOIN ft3 ON ft1.z = ft3.z; SELECT * FROM ft1 LEFT JOIN ft3 ON ft1.z = ft3.z LEFT JOIN ft2 ON ft1.x = ft2.x ; We need to generate one of those two queries, and we need to figure out what the remote server thinks it will cost to execute. We presumably do not to cost both of them, because if it's legal to commute the joins, the remote server can and will do that itself. It would be stupid to cost both possible queries if the remote server is going to pick the same plan either way. However - and this is the key point - the one we choose to generate *must represent a legal join order*. If the ft1-ft2 join were a FULL JOIN instead of a LEFT JOIN, the second query wouldn't be a legal thing to send to the remote server. So, the problem I'm worried about is: given that we know we want to at least consider the path that pushes the whole join to the remote server, how do we construct an SQL query that embodies a legal join order of the relations being pushed down? > Even if you claim that that > would never make sense from a cost standpoint (a claim easily seen to be > silly), there might not be any legal way to join all 3 directly because of > join order constraints. > > The larger point is that we can't expect the remote server to be fully > responsible for optimizing, because it will know nothing of what's being > done on our end. No argument with any of that. >> So, the problem is that I don't think this entirely skirts the >> join_is_legal issues, which are a principal point of concern for me. >> Say this is a joinrel between (A B) and (C D E). We need to generate >> an SQL query for (A B C D E). We know that the outermost syntactic >> join can be (A B) to (C D E). But how do we know which join orders >> are legal as among (C D E)? Maybe there's a simple way to handle this >> that I'm not seeing. > > Well, if the joins get built up in the way I think should happen, we'd > have already considered (C D E), and we could have recorded the legal join > orders within that at the time. (I imagine that we should allow FDWs to > store some data within RelOptInfo structs that represent foreign joins > belonging entirely to them, so that there'd be a handy place to keep that > data till later.) Yes, that would help. Can fdw_private serve that purpose, or do we need something else? > Or we could trawl through the paths associated with the > child joinrel, which will presumably include instances of every reasonable > sub-join combination. Or the FDW could look at the SpecialJoinInfo data > and determine things for itself (or more likely, ask join_is_legal about > that). Yeah, this is the part I'm worried will be complex, which accounts for the current hook placement. I'm worried that trawling through that SpecialJoinInfo data will end up needing to duplicate much of make_join_rel and add_paths_to_joinrel. For example, consider: SELECT * FROM verysmall v JOIN (bigft1 FULL JOIN bigft2 ON bigft1.x = bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r; The best path for this plan is presumably something like this: Nested Loop -> Seq Scan on verysmall v -> Foreign Scan on bigft1 and bigft2 Remote SQL: SELECT * FROM bigft1 FULL JOIN bigft2 ON bigft1.x = bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 Now, how is the FDW going to figure out that it needs to generate this parameterized path without duplicating this code from add_paths_to_joinrel? /* * Decide whether it's sensible to generate parameterized paths for this * joinrel, and if so, which relationssuch paths should require. There * is usually no need to create a parameterized result path unless there ... Maybe there's a very simple answer to this question and I'm just not seeing it, but I really don't see how that's going to work. > In the case of postgres_fdw, I think the actual requirement will be to be > able to reconstruct a SQL query that correctly expresses the join; that > is, we need to send over something like "from c left join d on (...) full > join e on (...)", not just "from c, d, e", or we'll get totally bogus > estimates as well as bogus execution results. Agreed. > Offhand I think that the > most likely way to build that text will be to examine the query's jointree > to see where c,d,e appear in it. But in any case, that's a separate issue > and I fail to see how plopping the join search problem into the FDW's lap > would make it any easier. Yeah, I am not advocating for putting the hook in standard_join_search. I'm explaining why I put it in add_paths_to_joinrel instead of, as I believe you were advocating, in make_join_rel prior to the big switch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Sep 2, 2015 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Offhand I think that the >> most likely way to build that text will be to examine the query's jointree >> to see where c,d,e appear in it. But in any case, that's a separate issue >> and I fail to see how plopping the join search problem into the FDW's lap >> would make it any easier. > Yeah, I am not advocating for putting the hook in > standard_join_search. I'm explaining why I put it in > add_paths_to_joinrel instead of, as I believe you were advocating, in > make_join_rel prior to the big switch. If you had a solution to the how-to-build-the-query-text problem, and it depended on that hook placement, then your argument might make some sense. As is, you've entirely failed to convince me that this placement is not wrong, wasteful, and likely to create unnecessary API breaks for FDWs. (Also, per my last message on the subject, *after* the switch is what I think makes sense.) regards, tom lane
On 2015/09/03 19:25, Etsuro Fujita wrote: > On 2015/09/03 14:22, Etsuro Fujita wrote: >> On 2015/09/03 9:41, Robert Haas wrote: >>> That having been said, I don't entirely like Fujita-san's patch >>> either. Much of the new code is called immediately adjacent to an FDW >>> callback which could pretty trivially do the same thing itself, if >>> needed. >> Another idea about that code is to call that code in eg, ExecProcNode, >> instead of calling ExecForeignScan there. I think that that might be >> much cleaner and resolve the naming problem below. > I gave it another thought; the following changes to ExecInitNode would > make the patch much simpler, ie, we would no longer need to call the new > code in ExecInitForeignScan, ExecForeignScan, ExecEndForeignScan, and > ExecReScanForeignScan. I think that would resolve the name problem also. I'm attaching an updated version of the patch. The patch is based on the SS_finalize_plan patch that has been recently committed. I'd be happy if this helps people discuss more about how to fix this issue. Best regards, Etsuro Fujita
Attachment
On 2015/09/04 19:50, Etsuro Fujita wrote: > I'm attaching an updated version of the patch. The patch is based on > the SS_finalize_plan patch that has been recently committed. I'd be > happy if this helps people discuss more about how to fix this issue. In the updated version, I modified finalize_plan so that initPlans attached to a ForeignScan node doing a remote join are considered for the computed params for a local join plan for EvalPlanQual testing. But I noticed no need for that. The reason is, no initPlans will be attached to the ForeignScan node due to that the ForeignScan node is unable to be the topmost plan node for the query level in case of EvalPlanQual testing. So, I removed that code. Patch attached. (That no longer depends on the SS_finalize_plan patch.) Best regards, Etsuro Fujita
Attachment
Hello, sorry in advance for possible brought up of past discussions or pointless discussion. > I'm attaching an updated version of the patch. The patch is based on > the SS_finalize_plan patch that has been recently committed. I'd be > happy if this helps people discuss more about how to fix this issue. The two patches make a good contrast to clarify the problem for me, maybe. > > code in ExecInitForeignScan, ExecForeignScan, ExecEndForeignScan, and > > ExecReScanForeignScan. I think that would resolve the name problem > > also. I found two points in this discussion. 1. Where (or When) to initialize a foreign/custom scan node for recheck. Having a new list to hold substitute plans in planner global (and PlannedStmt) is added, EvalPlanQualStart() looks tobe the best place to initialize them. Of couse it could not be a solution unless the new member and related code are not acceptable or rather unreasonable.The possible timing left for the case would be ExecInitNode() (as v2.0) or FDW routines (as v1.0). 2. How the core informs fdw/custom scan handlers wheter it is during recheck. In v1.0 patch, nodeForignscan.c routines detect the situation using es_epqTuple and Scan.scanrelid which the core as is gives, and v2.0 alternatively replaces scan node implicitly (and maybe irregularly) itself on initialization. The latter don't looks to me tidy. I think refining v1.0 would be more desirable, and resolving the redundancy would be simply a matter of notation. If I understand there correctly, Exec*ForeignScan() other than ExecInitForeignScan() can determine the behavior simply looking outerPlanState(scanstate). (If we continue to use the member lefttree for the purpose..). Is it right? anddoes it eliminate the redundancy? ExecEndForeignScan() { if ((outerplan = outerPlanState(node)) != NULL) ExecEndNode(outerPlan); ... regards, At Fri, 04 Sep 2015 19:50:46 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <55E97786.30404@lab.ntt.co.jp> > On 2015/09/03 19:25, Etsuro Fujita wrote: > > On 2015/09/03 14:22, Etsuro Fujita wrote: > >> On 2015/09/03 9:41, Robert Haas wrote: > >>> That having been said, I don't entirely like Fujita-san's patch > >>> either. Much of the new code is called immediately adjacent to an FDW > >>> callback which could pretty trivially do the same thing itself, if > >>> needed. ... > > I gave it another thought; the following changes to ExecInitNode would > > make the patch much simpler, ie, we would no longer need to call the > > new > > code in ExecInitForeignScan, ExecForeignScan, ExecEndForeignScan, and > > ExecReScanForeignScan. I think that would resolve the name problem > > also. > > I'm attaching an updated version of the patch. The patch is based on > the SS_finalize_plan patch that has been recently committed. I'd be > happy if this helps people discuss more about how to fix this issue. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Etsuro Fujita
Date:
On 2015/09/02 23:30, Tom Lane wrote: > Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: >> On 2015/08/01 23:25, Tom Lane wrote: >>> In my vision you probably really only want one call per build_join_rel >>> event (that is, per construction of a new RelOptInfo), not per >>> make_join_rel event. >> I'd vote for hooking at standard_join_search. > I think that method would require the FDW to duplicate a whole lot of the > join search mechanism, for not a whole lot of benefit. It's possible that > there'd be value in doing some initial reconnaissance once we've examined > all the baserels, so I'm not necessarily against providing a hook there. > But if you have in mind that typical FDWs would actually create join paths > at that point, consider that > > 1. The FDW would have to find all the combinations of its supplied > relations (unless you are only intending to generate one path for the > union of all such rels, which seems pretty narrow-minded from here). > > 2. The FDW would have to account for join_is_legal considerations. > > 3. The FDW would have to arrange for creation of joinrel RelOptInfo > structures. While that's possible, the available infrastructure for it > assumes that joinrels are built up from pairs of simpler joinrels, so > you couldn't go directly to the union of all the FDW's rels anyway. Maybe my explanation was not correct, but the hook placement I think is just before the set_cheapest call for each joinrel in standard_join_search, as you proposed in [1]. And I think that if that joinrel contains only foreign tables that all belong to the same foreign server, then we give the FDW a chance to consider producing pushed-down joins for that joinrel, ie, remote joins for all the foreign tables contained in that joinrel. So, there is no need for #2 and #3. Also I think that would allow us to consider producing pushed-down joins for all the legal combinations of foreign tables that belong to the same foreign server, according to the dynamic-programming method, in principle. I've not had a solution to the how-to-build-the-query-text problem yet, though. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/5451.1426271510@sss.pgh.pa.us
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Etsuro Fujita
Date:
On 2015/09/04 0:33, Robert Haas wrote: > I'm worried that trawling through that > SpecialJoinInfo data will end up needing to duplicate much of > make_join_rel and add_paths_to_joinrel. For example, consider: > > SELECT * FROM verysmall v JOIN (bigft1 FULL JOIN bigft2 ON bigft1.x = > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r; > > The best path for this plan is presumably something like this: > > Nested Loop > -> Seq Scan on verysmall v > -> Foreign Scan on bigft1 and bigft2 > Remote SQL: SELECT * FROM bigft1 FULL JOIN bigft2 ON bigft1.x = > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > > Now, how is the FDW going to figure out that it needs to generate this > parameterized path without duplicating this code from > add_paths_to_joinrel? > > /* > * Decide whether it's sensible to generate parameterized paths for this > * joinrel, and if so, which relations such paths should require. There > * is usually no need to create a parameterized result path unless there > ... > > Maybe there's a very simple answer to this question and I'm just not > seeing it, but I really don't see how that's going to work. Why don't you look at the "regular" (local join execution) paths that were already generated. I think that if we called the FDW at a proper hook location, the FDW could probably find a regular path in rel->pathlist of the join rel (bigft1, bigft2) that possibly generates something like: Nested Loop -> Seq Scan on verysmall v -> Nested Loop Join Filter: (bigft1.a = bigft2.a) -> Foreign Scan onbigft1 Remote SQL: SELECT * FROM bigft1 WHERE bigft1.q = $1 -> Foreign Scan on bigft2 RemoteSQL: SELECT * FROM bigft2 WHERE bigft2.r = $2 From the parameterization of the regular nestloop path for joining bigft1 and bigft2 locally, I think that the FDW could find that it's sensible to generate the foreign-join path for (bigft1, bigft2) with the parameterization. Best regards, Etsuro Fujita
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Robert Haas
Date:
On Tue, Sep 8, 2015 at 5:35 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > On 2015/09/04 0:33, Robert Haas wrote: >> I'm worried that trawling through that >> SpecialJoinInfo data will end up needing to duplicate much of >> make_join_rel and add_paths_to_joinrel. For example, consider: >> >> SELECT * FROM verysmall v JOIN (bigft1 FULL JOIN bigft2 ON bigft1.x = >> bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r; >> >> The best path for this plan is presumably something like this: >> >> Nested Loop >> -> Seq Scan on verysmall v >> -> Foreign Scan on bigft1 and bigft2 >> Remote SQL: SELECT * FROM bigft1 FULL JOIN bigft2 ON bigft1.x = >> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 >> >> Now, how is the FDW going to figure out that it needs to generate this >> parameterized path without duplicating this code from >> add_paths_to_joinrel? >> >> /* >> * Decide whether it's sensible to generate parameterized paths for >> this >> * joinrel, and if so, which relations such paths should require. >> There >> * is usually no need to create a parameterized result path unless >> there >> ... >> >> Maybe there's a very simple answer to this question and I'm just not >> seeing it, but I really don't see how that's going to work. > > > Why don't you look at the "regular" (local join execution) paths that were > already generated. I think that if we called the FDW at a proper hook > location, the FDW could probably find a regular path in rel->pathlist of the > join rel (bigft1, bigft2) that possibly generates something like: > > Nested Loop > -> Seq Scan on verysmall v > -> Nested Loop > Join Filter: (bigft1.a = bigft2.a) > -> Foreign Scan on bigft1 > Remote SQL: SELECT * FROM bigft1 WHERE bigft1.q = $1 > -> Foreign Scan on bigft2 > Remote SQL: SELECT * FROM bigft2 WHERE bigft2.r = $2 > > From the parameterization of the regular nestloop path for joining bigft1 > and bigft2 locally, I think that the FDW could find that it's sensible to > generate the foreign-join path for (bigft1, bigft2) with the > parameterization. But that path might have already been discarded on the basis of cost. I think Tom's idea is better: let the FDW consult some state cached for this purpose in the RelOptInfo. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Robert Haas
Date:
On Thu, Sep 3, 2015 at 11:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Sep 2, 2015 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Offhand I think that the >>> most likely way to build that text will be to examine the query's jointree >>> to see where c,d,e appear in it. But in any case, that's a separate issue >>> and I fail to see how plopping the join search problem into the FDW's lap >>> would make it any easier. > >> Yeah, I am not advocating for putting the hook in >> standard_join_search. I'm explaining why I put it in >> add_paths_to_joinrel instead of, as I believe you were advocating, in >> make_join_rel prior to the big switch. > > If you had a solution to the how-to-build-the-query-text problem, > and it depended on that hook placement, then your argument might > make some sense. As is, you've entirely failed to convince me > that this placement is not wrong, wasteful, and likely to create > unnecessary API breaks for FDWs. > > (Also, per my last message on the subject, *after* the switch > is what I think makes sense.) After re-reading a few emails, I've realized that I've let myself get a bit confused here and have unwittingly switched sides in this argument. <puts brown paper bag over head> When we originally discussed this back in April, I was arguing for either make_join_rel() or add_paths_to_joinrel() and you were arguing for standard_join_search(). See here: http://www.postgresql.org/message-id/CA+TgmobOADxTbsCt-j+dDVefWGK1WxY4p8AVDp1Pz48_TX4XTA@mail.gmail.com I thought we were still having the same argument, but we're not. You're now arguing for make_one_rel(), which back then was perfectly acceptable to me, and now that I've gotten by thinking un-fuzzed, really still is, except for the question posed in the closing paragraph of that email, which is (mostly) whether clients like postgres_fdw are going to need extra_lateral_rels in order to do the right thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Etsuro Fujita
Date:
On 2015/09/09 3:53, Robert Haas wrote: > On Tue, Sep 8, 2015 at 5:35 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> On 2015/09/04 0:33, Robert Haas wrote: >>> I'm worried that trawling through that >>> SpecialJoinInfo data will end up needing to duplicate much of >>> make_join_rel and add_paths_to_joinrel. For example, consider: >>> >>> SELECT * FROM verysmall v JOIN (bigft1 FULL JOIN bigft2 ON bigft1.x = >>> bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r; >>> >>> The best path for this plan is presumably something like this: >>> >>> Nested Loop >>> -> Seq Scan on verysmall v >>> -> Foreign Scan on bigft1 and bigft2 >>> Remote SQL: SELECT * FROM bigft1 FULL JOIN bigft2 ON bigft1.x = >>> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 >>> >>> Now, how is the FDW going to figure out that it needs to generate this >>> parameterized path without duplicating this code from >>> add_paths_to_joinrel? >>> >>> /* >>> * Decide whether it's sensible to generate parameterized paths for >>> this >>> * joinrel, and if so, which relations such paths should require. >>> There >>> * is usually no need to create a parameterized result path unless >>> there >>> ... >>> >>> Maybe there's a very simple answer to this question and I'm just not >>> seeing it, but I really don't see how that's going to work. >> Why don't you look at the "regular" (local join execution) paths that were >> already generated. I think that if we called the FDW at a proper hook >> location, the FDW could probably find a regular path in rel->pathlist of the >> join rel (bigft1, bigft2) that possibly generates something like: >> >> Nested Loop >> -> Seq Scan on verysmall v >> -> Nested Loop >> Join Filter: (bigft1.a = bigft2.a) >> -> Foreign Scan on bigft1 >> Remote SQL: SELECT * FROM bigft1 WHERE bigft1.q = $1 >> -> Foreign Scan on bigft2 >> Remote SQL: SELECT * FROM bigft2 WHERE bigft2.r = $2 >> >> From the parameterization of the regular nestloop path for joining bigft1 >> and bigft2 locally, I think that the FDW could find that it's sensible to >> generate the foreign-join path for (bigft1, bigft2) with the >> parameterization. > But that path might have already been discarded on the basis of cost. > I think Tom's idea is better: let the FDW consult some state cached > for this purpose in the RelOptInfo. Do you have an idea of what information would be collected into the state and how the FDW would derive parameterizations to consider producing pushed-down joins with from that information? What I'm concerned about that is to reduce the number of parameterizations to consider, to reduce overhead in costing the corresponding queries. I'm missing something, though. Best regards, Etsuro Fujita
On Thu, Sep 3, 2015 at 6:25 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I gave it another thought; the following changes to ExecInitNode would make > the patch much simpler, ie, we would no longer need to call the new code in > ExecInitForeignScan, ExecForeignScan, ExecEndForeignScan, and > ExecReScanForeignScan. I think that would resolve the name problem also. > > *** a/src/backend/executor/execProcnode.c > --- b/src/backend/executor/execProcnode.c > *************** > *** 247,254 **** ExecInitNode(Plan *node, EState *estate, int eflags) > break; > > case T_ForeignScan: > ! result = (PlanState *) ExecInitForeignScan((ForeignScan *) node, > ! estate, eflags); > break; > > case T_CustomScan: > --- 247,269 ---- > break; > > case T_ForeignScan: > ! { > ! Index scanrelid = ((ForeignScan *) > node)->scan.scanrelid; > ! > ! if (estate->es_epqTuple != NULL && scanrelid == 0) > ! { > ! /* > ! * We are in foreign join inside an EvalPlanQual > recheck. > ! * Initialize local join execution plan, instead. > ! */ > ! Plan *subplan = ((ForeignScan *) > node)->fs_subplan; > ! > ! result = ExecInitNode(subplan, estate, eflags); > ! } > ! else > ! result = (PlanState *) ExecInitForeignScan((ForeignScan > *) node, > ! estate, > eflags); > ! } > break; I don't think that's a good idea. The Plan tree and the PlanState tree should be mirror images of each other; breaking that equivalence will cause confusion, at least. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 3, 2015 at 1:22 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> I'm wondering if there's another approach. If I understand correctly, >> there are two reasons why the current situation is untenable. The >> first is that ForeignRecheck always returns true, but we could instead >> call an FDW-supplied callback routine there. The callback could be >> optional, so that we just return true if there is none, which is nice >> for already-existing FDWs that then don't need to do anything. > > My question about this is, is the callback really needed? If there are any > FDWs that want to do the work *in their own way*, instead of just doing > ExecProcNode for executing a local join execution plan in case of foreign > join (or just doing ExecQual for checking remote quals in case of foreign > table), I'd agree with introducing the callback, but if not, I don't think > that that makes much sense. It doesn't seem to me that it hurts much of anything to add the callback there, and it does provide some flexibility. Actually, I'm not really sure why we're thinking we need a subplan here at all, rather than just having a ForeignRecheck callback that can do whatever it needs to do with no particular help from the core infrastructure. I think you wrote some code to show how postgres_fdw would use the API you are proposing, but I can't find it. Can you point me in the right direction? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Robert Haas
Date:
On Wed, Sep 9, 2015 at 2:30 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> But that path might have already been discarded on the basis of cost. >> I think Tom's idea is better: let the FDW consult some state cached >> for this purpose in the RelOptInfo. > > Do you have an idea of what information would be collected into the state > and how the FDW would derive parameterizations to consider producing > pushed-down joins with from that information? What I'm concerned about that > is to reduce the number of parameterizations to consider, to reduce overhead > in costing the corresponding queries. I'm missing something, though. I think the thing we'd want to store in the state would be enough information to reconstruct a valid join nest. For example, the reloptinfo for (A B) might note that A needs to be left-joined to B. When we go to construct paths for (A B C), and there is no SpecialJoinInfo that mentions C, we know that we can construct (A LJ B) IJ C rather than (A IJ B) IJ C. If any paths survived, we could find a way to pull that information out of the path, but pulling it out of the RelOptInfo should always work. I am not sure what to do about parameterizations. That's one of my remaining concerns about moving the hook. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/09/11 6:24, Robert Haas wrote: > On Thu, Sep 3, 2015 at 1:22 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> I'm wondering if there's another approach. If I understand correctly, >>> there are two reasons why the current situation is untenable. The >>> first is that ForeignRecheck always returns true, but we could instead >>> call an FDW-supplied callback routine there. The callback could be >>> optional, so that we just return true if there is none, which is nice >>> for already-existing FDWs that then don't need to do anything. >> >> My question about this is, is the callback really needed? If there are any >> FDWs that want to do the work *in their own way*, instead of just doing >> ExecProcNode for executing a local join execution plan in case of foreign >> join (or just doing ExecQual for checking remote quals in case of foreign >> table), I'd agree with introducing the callback, but if not, I don't think >> that that makes much sense. > > It doesn't seem to me that it hurts much of anything to add the > callback there, and it does provide some flexibility. Actually, I'm > not really sure why we're thinking we need a subplan here at all, > rather than just having a ForeignRecheck callback that can do whatever > it needs to do with no particular help from the core infrastructure. > I think you wrote some code to show how postgres_fdw would use the API > you are proposing, but I can't find it. Can you point me in the right > direction? I've proposed the following API changes: * I modified create_foreignscan_path, which is called from postgresGetForeignJoinPaths/postgresGetForeignPaths, so that a path, subpath, is passed as the eighth argument of the function. subpath represents a local join execution path if scanrelid==0, but NULL if scanrelid>0. * I modified make_foreignscan, which is called from postgresGetForeignPlan, so that a list of quals, fdw_quals, is passed as the last argument of the function. fdw_quals represents remote quals if scanrelid>0, but NIL if scanrelid==0. You can find that code in the postgres_fdw patch (foreign_join_v16_efujita.patch) attached to [1]. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/55CB2D45.7040100@lab.ntt.co.jp
Hello, At Thu, 10 Sep 2015 17:24:00 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmobxksR2=3wEdY5cEgpd1hQ6Z0WoZEBBoxgs=XKZpbfUXA@mail.gmail.com> > On Thu, Sep 3, 2015 at 1:22 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > >> I'm wondering if there's another approach. If I understand correctly, > >> there are two reasons why the current situation is untenable. The > >> first is that ForeignRecheck always returns true, but we could instead > >> call an FDW-supplied callback routine there. The callback could be > >> optional, so that we just return true if there is none, which is nice > >> for already-existing FDWs that then don't need to do anything. > > > > My question about this is, is the callback really needed? If there are any > > FDWs that want to do the work *in their own way*, instead of just doing > > ExecProcNode for executing a local join execution plan in case of foreign > > join (or just doing ExecQual for checking remote quals in case of foreign > > table), I'd agree with introducing the callback, but if not, I don't think > > that that makes much sense. > > It doesn't seem to me that it hurts much of anything to add the > callback there, and it does provide some flexibility. Actually, I'm > not really sure why we're thinking we need a subplan here at all, > rather than just having a ForeignRecheck callback that can do whatever > it needs to do with no particular help from the core infrastructure. > I think you wrote some code to show how postgres_fdw would use the API > you are proposing, but I can't find it. Can you point me in the right > direction? I've heard that the reason for the (fs_)subplan is that it should be initialized using create_plan_recurse, set_plan_refs and finalyze_plan (or others), which are static functions in the planner, unavailable in fdw code. Is this pointless? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Sorry, that's quite wrong.. Please let me fix it. - Is this pointless? + Does it make sense? ===== Hello, At Thu, 10 Sep 2015 17:24:00 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmobxksR2=3wEdY5cEgpd1hQ6Z0WoZEBBoxgs=XKZpbfUXA@mail.gmail.com> > On Thu, Sep 3, 2015 at 1:22 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > >> I'm wondering if there's another approach. If I understand correctly, > >> there are two reasons why the current situation is untenable. The > >> first is that ForeignRecheck always returns true, but we could instead > >> call an FDW-supplied callback routine there. The callback could be > >> optional, so that we just return true if there is none, which is nice > >> for already-existing FDWs that then don't need to do anything. > > > > My question about this is, is the callback really needed? If there are any > > FDWs that want to do the work *in their own way*, instead of just doing > > ExecProcNode for executing a local join execution plan in case of foreign > > join (or just doing ExecQual for checking remote quals in case of foreign > > table), I'd agree with introducing the callback, but if not, I don't think > > that that makes much sense. > > It doesn't seem to me that it hurts much of anything to add the > callback there, and it does provide some flexibility. Actually, I'm > not really sure why we're thinking we need a subplan here at all, > rather than just having a ForeignRecheck callback that can do whatever > it needs to do with no particular help from the core infrastructure. > I think you wrote some code to show how postgres_fdw would use the API > you are proposing, but I can't find it. Can you point me in the right > direction? I've heard that the reason for the (fs_)subplan is that it should be initialized using create_plan_recurse, set_plan_refs and finalyze_plan (or others), which are static functions in the planner, unavailable in fdw code. Does it make sense? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, > -----Original Message----- > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp] > Sent: Friday, September 11, 2015 2:05 PM > To: robertmhaas@gmail.com > Cc: fujita.etsuro@lab.ntt.co.jp; Kaigai Kouhei(海外 浩平); > pgsql-hackers@postgresql.org; shigeru.hanada@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Hello, > > At Thu, 10 Sep 2015 17:24:00 -0400, Robert Haas <robertmhaas@gmail.com> wrote > in <CA+TgmobxksR2=3wEdY5cEgpd1hQ6Z0WoZEBBoxgs=XKZpbfUXA@mail.gmail.com> > > On Thu, Sep 3, 2015 at 1:22 AM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > > >> I'm wondering if there's another approach. If I understand correctly, > > >> there are two reasons why the current situation is untenable. The > > >> first is that ForeignRecheck always returns true, but we could instead > > >> call an FDW-supplied callback routine there. The callback could be > > >> optional, so that we just return true if there is none, which is nice > > >> for already-existing FDWs that then don't need to do anything. > > > > > > My question about this is, is the callback really needed? If there are any > > > FDWs that want to do the work *in their own way*, instead of just doing > > > ExecProcNode for executing a local join execution plan in case of foreign > > > join (or just doing ExecQual for checking remote quals in case of foreign > > > table), I'd agree with introducing the callback, but if not, I don't think > > > that that makes much sense. > > > > It doesn't seem to me that it hurts much of anything to add the > > callback there, and it does provide some flexibility. Actually, I'm > > not really sure why we're thinking we need a subplan here at all, > > rather than just having a ForeignRecheck callback that can do whatever > > it needs to do with no particular help from the core infrastructure. > > I think you wrote some code to show how postgres_fdw would use the API > > you are proposing, but I can't find it. Can you point me in the right > > direction? > > I've heard that the reason for the (fs_)subplan is that it should > be initialized using create_plan_recurse, set_plan_refs and > finalyze_plan (or others), which are static functions in the > planner, unavailable in fdw code. > It was a discussion when custom-scan/join interface got merged, because I primarily designed the interface to call create_plan_recurse() from the extension, however, we concluded that we keep this function as static and tells the core a bunch of path-nodes to be initialized. It also reduced interface complexity because we can omit callbacks to be placed on the setrefs.c and subselect.c. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Friday, September 11, 2015 12:36 PM > To: Robert Haas > Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/09/11 6:24, Robert Haas wrote: > > On Thu, Sep 3, 2015 at 1:22 AM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > >>> I'm wondering if there's another approach. If I understand correctly, > >>> there are two reasons why the current situation is untenable. The > >>> first is that ForeignRecheck always returns true, but we could instead > >>> call an FDW-supplied callback routine there. The callback could be > >>> optional, so that we just return true if there is none, which is nice > >>> for already-existing FDWs that then don't need to do anything. > >> > >> My question about this is, is the callback really needed? If there are any > >> FDWs that want to do the work *in their own way*, instead of just doing > >> ExecProcNode for executing a local join execution plan in case of foreign > >> join (or just doing ExecQual for checking remote quals in case of foreign > >> table), I'd agree with introducing the callback, but if not, I don't think > >> that that makes much sense. > > > > It doesn't seem to me that it hurts much of anything to add the > > callback there, and it does provide some flexibility. Actually, I'm > > not really sure why we're thinking we need a subplan here at all, > > rather than just having a ForeignRecheck callback that can do whatever > > it needs to do with no particular help from the core infrastructure. > > I think you wrote some code to show how postgres_fdw would use the API > > you are proposing, but I can't find it. Can you point me in the right > > direction? > > I've proposed the following API changes: > > * I modified create_foreignscan_path, which is called from > postgresGetForeignJoinPaths/postgresGetForeignPaths, so that a path, > subpath, is passed as the eighth argument of the function. subpath > represents a local join execution path if scanrelid==0, but NULL if > scanrelid>0. > I like to suggest to have multiple path nodes, like custom-scan, because the infrastructure will be also helpful to implement FDW driver that can have multiple sub-plans. One expected usage is here: http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp > * I modified make_foreignscan, which is called from > postgresGetForeignPlan, so that a list of quals, fdw_quals, is passed as > the last argument of the function. fdw_quals represents remote quals if > scanrelid>0, but NIL if scanrelid==0. > If a callback on ForeignRecheck processes EPQ rechecks, the core PostgreSQL don't need to know what expression was pushed down and how does it kept in the private field (fdw_exprs). Only FDW driver knows which private field is the expression node that was pushed down to the remote side. It shall not be an interface contract. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/09/11 6:02, Robert Haas wrote: > On Thu, Sep 3, 2015 at 6:25 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I gave it another thought; the following changes to ExecInitNode would make >> the patch much simpler, ie, we would no longer need to call the new code in >> ExecInitForeignScan, ExecForeignScan, ExecEndForeignScan, and >> ExecReScanForeignScan. I think that would resolve the name problem also. >> >> *** a/src/backend/executor/execProcnode.c >> --- b/src/backend/executor/execProcnode.c >> *************** >> *** 247,254 **** ExecInitNode(Plan *node, EState *estate, int eflags) >> break; >> >> case T_ForeignScan: >> ! result = (PlanState *) ExecInitForeignScan((ForeignScan *) node, >> ! estate, eflags); >> break; >> >> case T_CustomScan: >> --- 247,269 ---- >> break; >> >> case T_ForeignScan: >> ! { >> ! Index scanrelid = ((ForeignScan *) >> node)->scan.scanrelid; >> ! >> ! if (estate->es_epqTuple != NULL && scanrelid == 0) >> ! { >> ! /* >> ! * We are in foreign join inside an EvalPlanQual >> recheck. >> ! * Initialize local join execution plan, instead. >> ! */ >> ! Plan *subplan = ((ForeignScan *) >> node)->fs_subplan; >> ! >> ! result = ExecInitNode(subplan, estate, eflags); >> ! } >> ! else >> ! result = (PlanState *) ExecInitForeignScan((ForeignScan >> *) node, >> ! estate, >> eflags); >> ! } >> break; > > I don't think that's a good idea. The Plan tree and the PlanState > tree should be mirror images of each other; breaking that equivalence > will cause confusion, at least. IIRC, Horiguchi-san also pointed that out. Honestly, I also think that that is weird, but IIUC, I think it can't hurt. What I was concerned about was EXPLAIN, but EXPLAIN doesn't handle an EvalPlanQual PlanState tree at least currently. Best regards, Etsuro Fujita
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Etsuro Fujita
Date:
On 2015/09/11 6:30, Robert Haas wrote: > On Wed, Sep 9, 2015 at 2:30 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> But that path might have already been discarded on the basis of cost. >>> I think Tom's idea is better: let the FDW consult some state cached >>> for this purpose in the RelOptInfo. >> >> Do you have an idea of what information would be collected into the state >> and how the FDW would derive parameterizations to consider producing >> pushed-down joins with from that information? What I'm concerned about that >> is to reduce the number of parameterizations to consider, to reduce overhead >> in costing the corresponding queries. I'm missing something, though. > > I think the thing we'd want to store in the state would be enough > information to reconstruct a valid join nest. For example, the > reloptinfo for (A B) might note that A needs to be left-joined to B. > When we go to construct paths for (A B C), and there is no > SpecialJoinInfo that mentions C, we know that we can construct (A LJ > B) IJ C rather than (A IJ B) IJ C. If any paths survived, we could > find a way to pull that information out of the path, but pulling it > out of the RelOptInfo should always work. So, information to address the how-to-build-the-query-text problem would be stored in the state, in other words. Right? > I am not sure what to do about parameterizations. That's one of my > remaining concerns about moving the hook. I think we should also make it clear what to do about sort orderings. Best regards, Etsuro Fujita
On Thu, Sep 10, 2015 at 11:36 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I've proposed the following API changes: > > * I modified create_foreignscan_path, which is called from > postgresGetForeignJoinPaths/postgresGetForeignPaths, so that a path, > subpath, is passed as the eighth argument of the function. subpath > represents a local join execution path if scanrelid==0, but NULL if > scanrelid>0. OK, I see now. But I don't much like the way get_unsorted_unparameterized_path() looks. First, it's basically praying that MergePath, NodePath, and NestPath can be flat-copied without breaking anything. In general, we have copyfuncs.c support for nodes that we need to be able to copy, and we use copyObject() to do it. Even if what you've got here works today, it's not very future-proof. Second, what guarantee do we have that we'll find a path with no pathkeys and a NULL param_info? Why can't all of the paths for a join relation have pathkeys? Why can't they all be parameterized? I can't think of anything that would guarantee that. Third, even if such a guarantee existed, why is this the right behavior? Any join type will produce the same output; it's just a question of performance. And if you have only one tuple on each side, surely a nested loop would be fine. It seems to me that what you ought to be doing is using data hung off the fdw_private field of each RelOptInfo to cache a NestPath that can be used for EPQ rechecks at that level. When you go to consider pushing down another join, you can build up a new NestPath that's suitable for the new level. That seems much cleaner than groveling through the list of surviving paths and hoping you find the right kind of thing. And all that having been said, I still don't really understand why you are resisting the idea of providing a callback so that the FDW can execute arbitrary code in the recheck path. There doesn't seem to be any reason not to let the FDW take control of the rechecks if it wishes, and there's no real cost in complexity that I can see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 11, 2015 at 2:01 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > If a callback on ForeignRecheck processes EPQ rechecks, the core PostgreSQL > don't need to know what expression was pushed down and how does it kept in > the private field (fdw_exprs). Only FDW driver knows which private field is > the expression node that was pushed down to the remote side. It shall not be > an interface contract. I agree. It seems needless to involve the core code here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 11, 2015 at 3:08 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > IIRC, Horiguchi-san also pointed that out. Honestly, I also think that that > is weird, but IIUC, I think it can't hurt. What I was concerned about was > EXPLAIN, but EXPLAIN doesn't handle an EvalPlanQual PlanState tree at least > currently. This has come up a few times before and some people have argued for changing the coding rule. Nevertheless, for now, it is the rule. IMHO, it's a pretty good rule that makes things easier to understand and reason about. If there's an argument for changing it, it's performance, not developer convenience. Anyway, we should try to fix this problem without getting tangled in that argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Robert Haas
Date:
On Fri, Sep 11, 2015 at 3:12 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > So, information to address the how-to-build-the-query-text > problem would be stored in the state, in other words. Right? Right. >> I am not sure what to do about parameterizations. That's one of my >> remaining concerns about moving the hook. > > I think we should also make it clear what to do about sort orderings. How does that come into it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Sent: Saturday, September 12, 2015 1:39 AM > To: Etsuro Fujita > Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On Thu, Sep 10, 2015 at 11:36 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > I've proposed the following API changes: > > > > * I modified create_foreignscan_path, which is called from > > postgresGetForeignJoinPaths/postgresGetForeignPaths, so that a path, > > subpath, is passed as the eighth argument of the function. subpath > > represents a local join execution path if scanrelid==0, but NULL if > > scanrelid>0. > > OK, I see now. But I don't much like the way > get_unsorted_unparameterized_path() looks. > > First, it's basically praying that MergePath, NodePath, and NestPath > can be flat-copied without breaking anything. In general, we have > copyfuncs.c support for nodes that we need to be able to copy, and we > use copyObject() to do it. Even if what you've got here works today, > it's not very future-proof. > > Second, what guarantee do we have that we'll find a path with no > pathkeys and a NULL param_info? Why can't all of the paths for a join > relation have pathkeys? Why can't they all be parameterized? I can't > think of anything that would guarantee that. > > Third, even if such a guarantee existed, why is this the right > behavior? Any join type will produce the same output; it's just a > question of performance. And if you have only one tuple on each side, > surely a nested loop would be fine. > > It seems to me that what you ought to be doing is using data hung off > the fdw_private field of each RelOptInfo to cache a NestPath that can > be used for EPQ rechecks at that level. When you go to consider > pushing down another join, you can build up a new NestPath that's > suitable for the new level. That seems much cleaner than groveling > through the list of surviving paths and hoping you find the right kind > of thing. > > And all that having been said, I still don't really understand why you > are resisting the idea of providing a callback so that the FDW can > execute arbitrary code in the recheck path. There doesn't seem to be > any reason not to let the FDW take control of the rechecks if it > wishes, and there's no real cost in complexity that I can see. > The discussion has been pending for two weeks, even though we put this problem on the open item towards v9.5; that means we recognize it is a problem to be fixed by the v9.5 release. The attached patch allows FDW driver to handle EPQ recheck by its own preferable way, even if it is alternative local join or ExecQual to the expression being pushed down. Regarding to the alternative join path selection, I initially thought it is valuable to choose the best path from performance standpoint, however, what we need to do here is visibility check towards all the EPQ tuples already loaded to EState. So, unparametalized NestLoop is sufficient to execute qualifier across relations. (What happen if HashJoin is chosen? It's probably problematic.) So, if your modified postgres_fdw keeps an alternative path, what we need to do is construction of dummy NestPath with no param_info, no pathkeys, and dummy cost. Then, give this path on fdw_paths of ForeignPath. It shall be transformed to plan-nodes, then eventually transformed to plan-state-node by postgres_fdw itself. I cannot find out something difficult to do any more. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch allows FDW driver to handle EPQ recheck by its own > preferable way, even if it is alternative local join or ExecQual to > the expression being pushed down. Thanks. I was all set to commit this, or at least part of it, when I noticed that we already have an FDW callback called RefetchForeignRow. We seem to be intending that this new callback should refetch the row from the foreign server and verify that any pushed-down quals apply to it. But why can't RefetchForeignRow do that? That seems to be what it's for. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Sent: Tuesday, September 29, 2015 5:46 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Etsuro Fujita; PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > The attached patch allows FDW driver to handle EPQ recheck by its own > > preferable way, even if it is alternative local join or ExecQual to > > the expression being pushed down. > > Thanks. I was all set to commit this, or at least part of it, when I > noticed that we already have an FDW callback called RefetchForeignRow. > We seem to be intending that this new callback should refetch the row > from the foreign server and verify that any pushed-down quals apply to > it. But why can't RefetchForeignRow do that? That seems to be what > it's for. > At least here are two matters to solve the problem with RefetchForeignRow. 1. RefetchForeignRow() does not take ForeignScanState argument, so it is not obvious how to cooperate with the private statein ForeignScanState; that may include expression pushed down, and so on. 2. ForeignScan with scanrelid == 0 represents the result of joined relations. Even if the refetched tuple is visible onbase-relation level, it may not survive the join condition at the upper level. Once relations join get pushed down, onlyFDW driver knows how base relations are joined. So, it is the only reasonable way to ask FDW driver on ExecScanFetch, to check visibility of a particular tuple or another tuple made from this. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/09/29 9:13, Kouhei Kaigai wrote: >> -----Original Message----- >> From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas >> Sent: Tuesday, September 29, 2015 5:46 AM >> To: Kaigai Kouhei(海外 浩平) >> Cc: Etsuro Fujita; PostgreSQL-development; 花田茂 >> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual >> >> On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> The attached patch allows FDW driver to handle EPQ recheck by its own >>> preferable way, even if it is alternative local join or ExecQual to >>> the expression being pushed down. Thanks for the work, KaiGai-san! >> Thanks. I was all set to commit this, or at least part of it, when I >> noticed that we already have an FDW callback called RefetchForeignRow. >> We seem to be intending that this new callback should refetch the row >> from the foreign server and verify that any pushed-down quals apply to >> it. But why can't RefetchForeignRow do that? That seems to be what >> it's for. Thanks for the comments, Robert! I thought the same thing [1]. While I thought it was relatively easy to make changes to RefetchForeignRow that way for the foreign table case (scanrelid>0), I was not sure how hard it would be to do so for the foreign join case (scanrelid==0). So, I proposed to leave that changes for 9.6. I'll have a rethink on this issue along the lines of that approach. Sorry for having had no response. I was on vacation. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/55DEB5A9.8010604@lab.ntt.co.jp
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Tuesday, September 29, 2015 12:15 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/09/29 9:13, Kouhei Kaigai wrote: > >> -----Original Message----- > >> From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > >> Sent: Tuesday, September 29, 2015 5:46 AM > >> To: Kaigai Kouhei(海外 浩平) > >> Cc: Etsuro Fujita; PostgreSQL-development; 花田茂 > >> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > >> > >> On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >>> The attached patch allows FDW driver to handle EPQ recheck by its own > >>> preferable way, even if it is alternative local join or ExecQual to > >>> the expression being pushed down. > > Thanks for the work, KaiGai-san! > > >> Thanks. I was all set to commit this, or at least part of it, when I > >> noticed that we already have an FDW callback called RefetchForeignRow. > >> We seem to be intending that this new callback should refetch the row > >> from the foreign server and verify that any pushed-down quals apply to > >> it. But why can't RefetchForeignRow do that? That seems to be what > >> it's for. > > Thanks for the comments, Robert! > > I thought the same thing [1]. While I thought it was relatively easy to > make changes to RefetchForeignRow that way for the foreign table case > (scanrelid>0), I was not sure how hard it would be to do so for the > foreign join case (scanrelid==0). So, I proposed to leave that changes > for 9.6. I'll have a rethink on this issue along the lines of that > approach. > Even if base relation case, is it really easy to do? RefetchForeignRow() does not take ForeignScanState as its argument, so it is not obvious to access its private field, isn't it? ExecRowMark contains "rti" field, so it might be feasible to find out the target PlanState using walker routine recently supported, although it is not a simple enough. Unless we don't have reference to the private field, it is not feasible to access expression that was pushed down to the remote-side, therefore, it does not allow to apply proper rechecks here. In addition, it is problematic when scanrelid==0 because we have no relevant ForeignScanState which represents the base relations, even though ExecRowMark is associated with a particular base relation. In case of scanrelid==0, EPQ recheck routine also have to ensure the EPQ tuple is visible towards the join condition in addition to the qualifier of base relation. These information is also stored within private data field, so it has to have a reference to the private data of ForeignScanState of the remote join (scanrelid==0) which contains the target relation. Could you introduce us (1) how to access private data field of ForeignScanState from the RefetchForeignRow callback? (2) why it is reasonable to implement than the callback on ForeignRecheck(). > Sorry for having had no response. I was on vacation. > Me too. :-) Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/09/29 13:55, Kouhei Kaigai wrote: >> From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita >> On 2015/09/29 9:13, Kouhei Kaigai wrote: >>>> From: pgsql-hackers-owner@postgresql.org >>>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas >>>> On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>>>> The attached patch allows FDW driver to handle EPQ recheck by its own >>>>> preferable way, even if it is alternative local join or ExecQual to >>>>> the expression being pushed down. >>>> Thanks. I was all set to commit this, or at least part of it, when I >>>> noticed that we already have an FDW callback called RefetchForeignRow. >>>> We seem to be intending that this new callback should refetch the row >>>> from the foreign server and verify that any pushed-down quals apply to >>>> it. But why can't RefetchForeignRow do that? That seems to be what >>>> it's for. >> I thought the same thing [1]. While I thought it was relatively easy to >> make changes to RefetchForeignRow that way for the foreign table case >> (scanrelid>0), I was not sure how hard it would be to do so for the >> foreign join case (scanrelid==0). So, I proposed to leave that changes >> for 9.6. I'll have a rethink on this issue along the lines of that >> approach. > Even if base relation case, is it really easy to do? > > RefetchForeignRow() does not take ForeignScanState as its argument, > so it is not obvious to access its private field, isn't it? > ExecRowMark contains "rti" field, so it might be feasible to find out > the target PlanState using walker routine recently supported, although > it is not a simple enough. > Unless we don't have reference to the private field, it is not feasible > to access expression that was pushed down to the remote-side, therefore, > it does not allow to apply proper rechecks here. > > In addition, it is problematic when scanrelid==0 because we have no > relevant ForeignScanState which represents the base relations, even > though ExecRowMark is associated with a particular base relation. > In case of scanrelid==0, EPQ recheck routine also have to ensure > the EPQ tuple is visible towards the join condition in addition to > the qualifier of base relation. These information is also stored within > private data field, so it has to have a reference to the private data > of ForeignScanState of the remote join (scanrelid==0) which contains > the target relation. > > Could you introduce us (1) how to access private data field of > ForeignScanState from the RefetchForeignRow callback? (2) why it > is reasonable to implement than the callback on ForeignRecheck(). For the foreign table case (scanrelid>0), I imagined an approach different than yours. In that case, I thought the issue would be probably addressed by just modifying the remote query performed in RefetchForeignRow, which would be of the form "SELECT ctid, * FROM remote table WHERE ctid = $1", so that the modified query would be of the form "SELECT ctid, * FROM remote table WHERE ctid = $1 AND *remote quals*". For the foreign join case (scanrelid==0), in my vision, I think we would need some changes not only to RefetchForeignRow but to the existing EvalPlanQual machinery in the core. I've not had a clear image yet, though. Best regards, Etsuro Fujita
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Tuesday, September 29, 2015 4:36 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/09/29 13:55, Kouhei Kaigai wrote: > >> From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > >> On 2015/09/29 9:13, Kouhei Kaigai wrote: > > >>>> From: pgsql-hackers-owner@postgresql.org > >>>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > >>>> On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > >>>>> The attached patch allows FDW driver to handle EPQ recheck by its own > >>>>> preferable way, even if it is alternative local join or ExecQual to > >>>>> the expression being pushed down. > > >>>> Thanks. I was all set to commit this, or at least part of it, when I > >>>> noticed that we already have an FDW callback called RefetchForeignRow. > >>>> We seem to be intending that this new callback should refetch the row > >>>> from the foreign server and verify that any pushed-down quals apply to > >>>> it. But why can't RefetchForeignRow do that? That seems to be what > >>>> it's for. > > >> I thought the same thing [1]. While I thought it was relatively easy to > >> make changes to RefetchForeignRow that way for the foreign table case > >> (scanrelid>0), I was not sure how hard it would be to do so for the > >> foreign join case (scanrelid==0). So, I proposed to leave that changes > >> for 9.6. I'll have a rethink on this issue along the lines of that > >> approach. > > > Even if base relation case, is it really easy to do? > > > > RefetchForeignRow() does not take ForeignScanState as its argument, > > so it is not obvious to access its private field, isn't it? > > ExecRowMark contains "rti" field, so it might be feasible to find out > > the target PlanState using walker routine recently supported, although > > it is not a simple enough. > > Unless we don't have reference to the private field, it is not feasible > > to access expression that was pushed down to the remote-side, therefore, > > it does not allow to apply proper rechecks here. > > > > In addition, it is problematic when scanrelid==0 because we have no > > relevant ForeignScanState which represents the base relations, even > > though ExecRowMark is associated with a particular base relation. > > In case of scanrelid==0, EPQ recheck routine also have to ensure > > the EPQ tuple is visible towards the join condition in addition to > > the qualifier of base relation. These information is also stored within > > private data field, so it has to have a reference to the private data > > of ForeignScanState of the remote join (scanrelid==0) which contains > > the target relation. > > > > Could you introduce us (1) how to access private data field of > > ForeignScanState from the RefetchForeignRow callback? (2) why it > > is reasonable to implement than the callback on ForeignRecheck(). > > For the foreign table case (scanrelid>0), I imagined an approach > different than yours. In that case, I thought the issue would be > probably addressed by just modifying the remote query performed in > RefetchForeignRow, which would be of the form "SELECT ctid, * FROM > remote table WHERE ctid = $1", so that the modified query would be of > the form "SELECT ctid, * FROM remote table WHERE ctid = $1 AND *remote > quals*". > My question is how to pull expression of the remote query. It shall be stored at somewhere private field of ForeignScanState, however, RefetchForeignRow does not have direct access to the relevant ForeignScanState node. It is what I asked at the question (1). Also note that EvalPlanQualFetchRowMarks() will raise an error if RefetchForeignRow callback returned NULL tuple. Is it right or expected behavior? It looks to me this callback is designed to pull out a particular tuple identified by the remote-row-id, regardless of the qualifier checks based on the latest value. > For the foreign join case (scanrelid==0), in my vision, I think we would > need some changes not only to RefetchForeignRow but to the existing > EvalPlanQual machinery in the core. I've not had a clear image yet, though. > If people agree with FDW remote join is incomplete feature in v9.5, the attached fix-up is the minimum requirement from the standpoint of custom-scan/join. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On 2015/09/29 17:49, Kouhei Kaigai wrote: >> From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita >>> RefetchForeignRow() does not take ForeignScanState as its argument, >>> so it is not obvious to access its private field, isn't it? >>> ExecRowMark contains "rti" field, so it might be feasible to find out >>> the target PlanState using walker routine recently supported, although >>> it is not a simple enough. >>> Unless we don't have reference to the private field, it is not feasible >>> to access expression that was pushed down to the remote-side, therefore, >>> it does not allow to apply proper rechecks here. >>> Could you introduce us (1) how to access private data field of >>> ForeignScanState from the RefetchForeignRow callback? >> For the foreign table case (scanrelid>0), I imagined an approach >> different than yours. In that case, I thought the issue would be >> probably addressed by just modifying the remote query performed in >> RefetchForeignRow, which would be of the form "SELECT ctid, * FROM >> remote table WHERE ctid = $1", so that the modified query would be of >> the form "SELECT ctid, * FROM remote table WHERE ctid = $1 AND *remote >> quals*". Sorry, I forgot to add "FOR UPDATE" to the before/after queries. > My question is how to pull expression of the remote query. > It shall be stored at somewhere private field of ForeignScanState, > however, RefetchForeignRow does not have direct access to the > relevant ForeignScanState node. > It is what I asked at the question (1). I imagined the following steps to get the remote query string: (1) create the remote query string and store it in fdw_private during postgresGetForeignPlan, (2) extract the string from fdw_private and store it in erm->ermExtra during postgresBeginForeignScan, and (3) extract the string from erm->ermExtra in postgresRefetchForeignRow. > Also note that EvalPlanQualFetchRowMarks() will raise an error > if RefetchForeignRow callback returned NULL tuple. > Is it right or expected behavior? IIUC, I think that that behavior is reasonable. > It looks to me this callback is designed to pull out a particular > tuple identified by the remote-row-id, regardless of the qualifier > checks based on the latest value. Because erm->markType==ROW_MARK_REFERENCE, I don't think that that behavior would cause any problem. Maybe I'm missing something, though. Best regards, Etsuro Fujita
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Tuesday, September 29, 2015 8:00 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/09/29 17:49, Kouhei Kaigai wrote: > >> From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > > >>> RefetchForeignRow() does not take ForeignScanState as its argument, > >>> so it is not obvious to access its private field, isn't it? > >>> ExecRowMark contains "rti" field, so it might be feasible to find out > >>> the target PlanState using walker routine recently supported, although > >>> it is not a simple enough. > >>> Unless we don't have reference to the private field, it is not feasible > >>> to access expression that was pushed down to the remote-side, therefore, > >>> it does not allow to apply proper rechecks here. > > >>> Could you introduce us (1) how to access private data field of > >>> ForeignScanState from the RefetchForeignRow callback? > > >> For the foreign table case (scanrelid>0), I imagined an approach > >> different than yours. In that case, I thought the issue would be > >> probably addressed by just modifying the remote query performed in > >> RefetchForeignRow, which would be of the form "SELECT ctid, * FROM > >> remote table WHERE ctid = $1", so that the modified query would be of > >> the form "SELECT ctid, * FROM remote table WHERE ctid = $1 AND *remote > >> quals*". > > Sorry, I forgot to add "FOR UPDATE" to the before/after queries. > > > My question is how to pull expression of the remote query. > > It shall be stored at somewhere private field of ForeignScanState, > > however, RefetchForeignRow does not have direct access to the > > relevant ForeignScanState node. > > It is what I asked at the question (1). > > I imagined the following steps to get the remote query string: (1) > create the remote query string and store it in fdw_private during > postgresGetForeignPlan, (2) extract the string from fdw_private and > store it in erm->ermExtra during postgresBeginForeignScan, and (3) > extract the string from erm->ermExtra in postgresRefetchForeignRow. > > > Also note that EvalPlanQualFetchRowMarks() will raise an error > > if RefetchForeignRow callback returned NULL tuple. > > Is it right or expected behavior? > > IIUC, I think that that behavior is reasonable. > > > It looks to me this callback is designed to pull out a particular > > tuple identified by the remote-row-id, regardless of the qualifier > > checks based on the latest value. > > Because erm->markType==ROW_MARK_REFERENCE, I don't think that that > behavior would cause any problem. Maybe I'm missing something, though. > Really? ExecLockRows() calls EvalPlanQualFetchRowMarks() to fill up EPQ tuple slot prior to EvalPlanQualNext(), because these tuples are referenced during EPQ rechecks. The purpose of EvalPlanQualNext() is evaluate whether the current bunch of rows are visible towards the qualifiers of underlying scan/join. Then, if not visible, it *ignores* the current tuples, as follows. /* * Now fetch any non-locked source rows --- the EPQ logic knows how to * do that. */ EvalPlanQualSetSlot(&node->lr_epqstate, slot); EvalPlanQualFetchRowMarks(&node->lr_epqstate); <--- LOAD REMOTE ROWS /* * And finally we can re-evaluate the tuple. */ slot = EvalPlanQualNext(&node->lr_epqstate); <--- EVALUATE QUALIFIERS if (TupIsNull(slot)) { /* Updatedtuple fails qual, so ignore it and go on */ goto lnext; <-- IGNORE THE ROW, NOT RAISE AN ERROR } What happen if RefetchForeignRow raise an error in case when the latest row exists but violated towards the "remote quals" ? This is the case to be ignored, unlike the case when remote row identified by row-id didn't exist. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Tue, Sep 29, 2015 at 4:49 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Also note that EvalPlanQualFetchRowMarks() will raise an error > if RefetchForeignRow callback returned NULL tuple. > Is it right or expected behavior? That's not how I read the code. If RefetchForeignRow returns NULL, we just ignore the row and continue on to the next one: if (copyTuple == NULL) { /* couldn't get the lock, so skip this row */ gotolnext; } And that seems exactly right: RefetchForeignRow needs to test that the tuple is still present on the remote side, and that any remote quals are matched. If either of those is false, it can return NULL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I thought the same thing [1]. While I thought it was relatively easy to > make changes to RefetchForeignRow that way for the foreign table case > (scanrelid>0), I was not sure how hard it would be to do so for the foreign > join case (scanrelid==0). So, I proposed to leave that changes for 9.6. > I'll have a rethink on this issue along the lines of that approach. Well, I spent some more time looking at this today, and testing it out using a fixed-up version of your foreign_join_v16 patch, and I decided that RefetchForeignRow is basically a red herring. That's only used for FDWs that do late row locking, but postgres_fdw (and probably many others) do early row locking, in which case RefetchForeignRow never gets called. Instead, the row is treated as a "non-locked source row" by ExecLockRows (even though it is in fact locked) and is re-fetched by EvalPlanQualFetchRowMarks. We should probably update the comment about non-locked source rows to mention the case of FDWs that do early row locking. Anyway, everything appears to work OK up to this point: we correctly retrieve the saved whole-rows from the foreign side and call EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and es_epqTupleSet[rti - 1]. So far, so good. Now we call EvalPlanQualNext, and that's where we get into trouble. We've got the already-locked tuples from the foreign side and those tuples CANNOT have gone away or been modified because we have already locked them. So, all the foreign join needs to do is return the same tuple that it returned before: the EPQ recheck was triggered by some *other* table involved in the plan, not our table. A local table also involved in the query, or conceivably a foreign table that does late row locking, could have had something change under it after the row was fetched, but in postgres_fdw that can't happen because we locked the row up front. And thus, again, all we need to do is re-return the same tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has caused us to preserve a copy of each *baserel* tuple. Now, this is as sad as can be. Early row locking has huge advantages for FDWs, both in terms of minimizing server round trips and also because the FDW doesn't really need to do anything about EPQ. Sure, it's inefficient to carry around whole-row references, but it makes life easy for the FDW author. So, if we wanted to fix this in a way that preserves the spirit of what's there now, it seems to me that we'd want the FDW to return something that's like a whole row reference, but represents the output of the foreign join rather than some underlying base table. And then get the EPQ machinery to have the evaluation of the ForeignScan for the join, when it happens in an EPQ context, to return that tuple. But I don't really have a good idea how to do that. More thought seems needed here... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/09/29 21:38, Kouhei Kaigai wrote: >>> Also note that EvalPlanQualFetchRowMarks() will raise an error >>> if RefetchForeignRow callback returned NULL tuple. >>> Is it right or expected behavior? >> IIUC, I think that that behavior is reasonable. >>> It looks to me this callback is designed to pull out a particular >>> tuple identified by the remote-row-id, regardless of the qualifier >>> checks based on the latest value. >> Because erm->markType==ROW_MARK_REFERENCE, I don't think that that >> behavior would cause any problem. Maybe I'm missing something, though. > Really? Yeah, I think RefetchForeignRow should work differently depending on the rowmark type. When erm->markType==ROW_MARK_REFERENCE, the callback should fetch a particular tuple identified by the rowid (ie, the same version previously obtained) successfully. So for that case, I don't think the remote quals need to be checked during RefetchForeignRow. > ExecLockRows() calls EvalPlanQualFetchRowMarks() to fill up EPQ tuple > slot prior to EvalPlanQualNext(), because these tuples are referenced > during EPQ rechecks. > The purpose of EvalPlanQualNext() is evaluate whether the current bunch > of rows are visible towards the qualifiers of underlying scan/join. > Then, if not visible, it *ignores* the current tuples, as follows. > > /* > * Now fetch any non-locked source rows --- the EPQ logic knows how to > * do that. > */ > EvalPlanQualSetSlot(&node->lr_epqstate, slot); > EvalPlanQualFetchRowMarks(&node->lr_epqstate); <--- LOAD REMOTE ROWS > > /* > * And finally we can re-evaluate the tuple. > */ > slot = EvalPlanQualNext(&node->lr_epqstate); <--- EVALUATE QUALIFIERS > if (TupIsNull(slot)) > { > /* Updated tuple fails qual, so ignore it and go on */ > goto lnext; <-- IGNORE THE ROW, NOT RAISE AN ERROR > } > > What happen if RefetchForeignRow raise an error in case when the latest > row exists but violated towards the "remote quals" ? > This is the case to be ignored, unlike the case when remote row identified > by row-id didn't exist. IIUC, I think that that depends on where RefetchForeignRow is called (ie, the rowmark type). When it is called from EvalPlanQualFetchRowMarks, the transaction should be aborted as I mentioned above, if it couldn't fetch the same version previously obtained. But when RefetchForeignRow is called from ExecLockRows, the tuple should be just ignored as the above code, if the latest version on the remote side didn't satisfy the remote quals. Best regards, Etsuro Fujita
On 2015/09/30 6:55, Robert Haas wrote: > On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I thought the same thing [1]. While I thought it was relatively easy to >> make changes to RefetchForeignRow that way for the foreign table case >> (scanrelid>0), I was not sure how hard it would be to do so for the foreign >> join case (scanrelid==0). So, I proposed to leave that changes for 9.6. >> I'll have a rethink on this issue along the lines of that approach. > > Well, I spent some more time looking at this today, and testing it out > using a fixed-up version of your foreign_join_v16 patch, and I decided > that RefetchForeignRow is basically a red herring. That's only used > for FDWs that do late row locking, but postgres_fdw (and probably many > others) do early row locking, in which case RefetchForeignRow never > gets called. Instead, the row is treated as a "non-locked source row" > by ExecLockRows (even though it is in fact locked) and is re-fetched > by EvalPlanQualFetchRowMarks. We should probably update the comment > about non-locked source rows to mention the case of FDWs that do early > row locking. > > Anyway, everything appears to work OK up to this point: we correctly > retrieve the saved whole-rows from the foreign side and call > EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and > es_epqTupleSet[rti - 1]. So far, so good. Now we call > EvalPlanQualNext, and that's where we get into trouble. We've got the > already-locked tuples from the foreign side and those tuples CANNOT > have gone away or been modified because we have already locked them. > So, all the foreign join needs to do is return the same tuple that it > returned before: the EPQ recheck was triggered by some *other* table > involved in the plan, not our table. A local table also involved in > the query, or conceivably a foreign table that does late row locking, > could have had something change under it after the row was fetched, > but in postgres_fdw that can't happen because we locked the row up > front. And thus, again, all we need to do is re-return the same > tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has > caused us to preserve a copy of each *baserel* tuple. > > Now, this is as sad as can be. Early row locking has huge advantages > for FDWs, both in terms of minimizing server round trips and also > because the FDW doesn't really need to do anything about EPQ. Sure, > it's inefficient to carry around whole-row references, but it makes > life easy for the FDW author. > > So, if we wanted to fix this in a way that preserves the spirit of > what's there now, it seems to me that we'd want the FDW to return > something that's like a whole row reference, but represents the output > of the foreign join rather than some underlying base table. And then > get the EPQ machinery to have the evaluation of the ForeignScan for > the join, when it happens in an EPQ context, to return that tuple. > But I don't really have a good idea how to do that. I like a general solution. Can't we extend that idea so that foreign tables involved in a foreign join are allowed to have different rowmark methods other than ROW_MARK_COPY, eg, ROW_MARK_EXCLUSIVE? Best regards, Etsuro Fujita
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Sent: Wednesday, September 30, 2015 6:55 AM > To: Etsuro Fujita > Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > I thought the same thing [1]. While I thought it was relatively easy to > > make changes to RefetchForeignRow that way for the foreign table case > > (scanrelid>0), I was not sure how hard it would be to do so for the foreign > > join case (scanrelid==0). So, I proposed to leave that changes for 9.6. > > I'll have a rethink on this issue along the lines of that approach. > > Well, I spent some more time looking at this today, and testing it out > using a fixed-up version of your foreign_join_v16 patch, and I decided > that RefetchForeignRow is basically a red herring. That's only used > for FDWs that do late row locking, but postgres_fdw (and probably many > others) do early row locking, in which case RefetchForeignRow never > gets called. Instead, the row is treated as a "non-locked source row" > by ExecLockRows (even though it is in fact locked) and is re-fetched > by EvalPlanQualFetchRowMarks. We should probably update the comment > about non-locked source rows to mention the case of FDWs that do early > row locking. > Indeed, select_rowmark_type() says ROW_MARK_COPY if GetForeignRowMarkType callback is not defined. > Anyway, everything appears to work OK up to this point: we correctly > retrieve the saved whole-rows from the foreign side and call > EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and > es_epqTupleSet[rti - 1]. So far, so good. Now we call > EvalPlanQualNext, and that's where we get into trouble. We've got the > already-locked tuples from the foreign side and those tuples CANNOT > have gone away or been modified because we have already locked them. > So, all the foreign join needs to do is return the same tuple that it > returned before: the EPQ recheck was triggered by some *other* table > involved in the plan, not our table. A local table also involved in > the query, or conceivably a foreign table that does late row locking, > could have had something change under it after the row was fetched, > but in postgres_fdw that can't happen because we locked the row up > front. And thus, again, all we need to do is re-return the same > tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has > caused us to preserve a copy of each *baserel* tuple. > > Now, this is as sad as can be. Early row locking has huge advantages > for FDWs, both in terms of minimizing server round trips and also > because the FDW doesn't really need to do anything about EPQ. Sure, > it's inefficient to carry around whole-row references, but it makes > life easy for the FDW author. > I got the point. Is it helpful to add description why ROW_MARK_COPY does not need recheck on both of local/remote tuples? http://www.postgresql.org/docs/devel/static/fdw-row-locking.html > So, if we wanted to fix this in a way that preserves the spirit of > what's there now, it seems to me that we'd want the FDW to return > something that's like a whole row reference, but represents the output > of the foreign join rather than some underlying base table. And then > get the EPQ machinery to have the evaluation of the ForeignScan for > the join, when it happens in an EPQ context, to return that tuple. > But I don't really have a good idea how to do that. > > More thought seems needed here... > Alternative built-in join execution? Once it is executed under the EPQ context, built-in join node fetches a tuple from both of inner and outer side for each. It is eventually fetched from the EPQ slot, then the alternative join produce a result tuple. In case when FDW is not designed to handle join by itself, it is a reasonable fallback I think. I expect FDW driver needs to handle EPQ recheck in the case below: * ForeignScan on base relation and it uses late row locking. * ForeignScan on join relation, even if early locking. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, I caught up this thread, maybe. > > So, if we wanted to fix this in a way that preserves the spirit of > > what's there now, it seems to me that we'd want the FDW to return > > something that's like a whole row reference, but represents the output > > of the foreign join rather than some underlying base table. And then > > get the EPQ machinery to have the evaluation of the ForeignScan for > > the join, when it happens in an EPQ context, to return that tuple. > > But I don't really have a good idea how to do that. > > > > More thought seems needed here... > > > Alternative built-in join execution? > Once it is executed under the EPQ context, built-in join node fetches > a tuple from both of inner and outer side for each. It is eventually > fetched from the EPQ slot, then the alternative join produce a result > tuple. It seems quite similar to what Fujita-san is trying now by somehow *replacing* "foreign join" scan node with alternative local join plan when EPQ. I think what Robert says is that "foreign join" scans that completely behaves as a ordinary scan node on executor. Current framework of foreign join pushdown seems a bit tricky because it incompletely emulating local join on foreign scans. The mixture seems to be the root cause of this problem. 1. Somehow run local joins on current EPQ tuples currently given by "foreign join" scans. 1.1 Somehow detecting running EPQ and switch the plan to run in ExecScanFetch or somewhere else. 1.2 Replace "foreign join scan" node with the alternative local join node on ExecInit. (I don't like this.) 1.3 In-core alternative local join executor for join pushdown? 2. "foreign join" scan plan node completely compliant to current executor semantics of ordinary scan node. In other words, the node has corresponding RTE_RELATION RTE, marked with ROW_MARK_COPY on locking and returns a slot withtlist that contains join result columns and the whole-row var on them. Then, ExecPlanQualFetchRowMarks gets the whole-rowvar and set it into eqpTuple for corresponding *relid*. I prefer the 2, but have no good idea how to do that now, too. > In case when FDW is not designed to handle join by itself, it is > a reasonable fallback I think. > > I expect FDW driver needs to handle EPQ recheck in the case below: > * ForeignScan on base relation and it uses late row locking. I think this is indisputable. > * ForeignScan on join relation, even if early locking. This could be unnecessary if the "foreign join" scan node can have its own rowmark of ROW_MARK_COPY. regards, At Thu, 1 Oct 2015 02:15:29 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80114D442@BPXM15GP.gisp.nec.co.jp> > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > > Sent: Wednesday, September 30, 2015 6:55 AM > > To: Etsuro Fujita > > Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂 > > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > > On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > > > I thought the same thing [1]. While I thought it was relatively easy to > > > make changes to RefetchForeignRow that way for the foreign table case > > > (scanrelid>0), I was not sure how hard it would be to do so for the foreign > > > join case (scanrelid==0). So, I proposed to leave that changes for 9.6. > > > I'll have a rethink on this issue along the lines of that approach. > > > > Well, I spent some more time looking at this today, and testing it out > > using a fixed-up version of your foreign_join_v16 patch, and I decided > > that RefetchForeignRow is basically a red herring. That's only used > > for FDWs that do late row locking, but postgres_fdw (and probably many > > others) do early row locking, in which case RefetchForeignRow never > > gets called. Instead, the row is treated as a "non-locked source row" > > by ExecLockRows (even though it is in fact locked) and is re-fetched > > by EvalPlanQualFetchRowMarks. We should probably update the comment > > about non-locked source rows to mention the case of FDWs that do early > > row locking. > > > Indeed, select_rowmark_type() says ROW_MARK_COPY if GetForeignRowMarkType > callback is not defined. > > > Anyway, everything appears to work OK up to this point: we correctly > > retrieve the saved whole-rows from the foreign side and call > > EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and > > es_epqTupleSet[rti - 1]. So far, so good. Now we call > > EvalPlanQualNext, and that's where we get into trouble. We've got the > > already-locked tuples from the foreign side and those tuples CANNOT > > have gone away or been modified because we have already locked them. > > So, all the foreign join needs to do is return the same tuple that it > > returned before: the EPQ recheck was triggered by some *other* table > > involved in the plan, not our table. A local table also involved in > > the query, or conceivably a foreign table that does late row locking, > > could have had something change under it after the row was fetched, > > but in postgres_fdw that can't happen because we locked the row up > > front. And thus, again, all we need to do is re-return the same > > tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has > > caused us to preserve a copy of each *baserel* tuple. > > > > Now, this is as sad as can be. Early row locking has huge advantages > > for FDWs, both in terms of minimizing server round trips and also > > because the FDW doesn't really need to do anything about EPQ. Sure, > > it's inefficient to carry around whole-row references, but it makes > > life easy for the FDW author. > > > I got the point. Is it helpful to add description why ROW_MARK_COPY > does not need recheck on both of local/remote tuples? > http://www.postgresql.org/docs/devel/static/fdw-row-locking.html > > > So, if we wanted to fix this in a way that preserves the spirit of > > what's there now, it seems to me that we'd want the FDW to return > > something that's like a whole row reference, but represents the output > > of the foreign join rather than some underlying base table. And then > > get the EPQ machinery to have the evaluation of the ForeignScan for > > the join, when it happens in an EPQ context, to return that tuple. > > But I don't really have a good idea how to do that. > > > > More thought seems needed here... > > > Alternative built-in join execution? > Once it is executed under the EPQ context, built-in join node fetches > a tuple from both of inner and outer side for each. It is eventually > fetched from the EPQ slot, then the alternative join produce a result > tuple. > In case when FDW is not designed to handle join by itself, it is > a reasonable fallback I think. > > I expect FDW driver needs to handle EPQ recheck in the case below: > * ForeignScan on base relation and it uses late row locking. > * ForeignScan on join relation, even if early locking. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/10/01 11:15, Kouhei Kaigai wrote: >> From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas >> On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> I thought the same thing [1]. While I thought it was relatively easy to >>> make changes to RefetchForeignRow that way for the foreign table case >>> (scanrelid>0), I was not sure how hard it would be to do so for the foreign >>> join case (scanrelid==0). So, I proposed to leave that changes for 9.6. >>> I'll have a rethink on this issue along the lines of that approach. >> So, if we wanted to fix this in a way that preserves the spirit of >> what's there now, it seems to me that we'd want the FDW to return >> something that's like a whole row reference, but represents the output >> of the foreign join rather than some underlying base table. And then >> get the EPQ machinery to have the evaluation of the ForeignScan for >> the join, when it happens in an EPQ context, to return that tuple. >> But I don't really have a good idea how to do that. > Alternative built-in join execution? > Once it is executed under the EPQ context, built-in join node fetches > a tuple from both of inner and outer side for each. It is eventually > fetched from the EPQ slot, then the alternative join produce a result > tuple. > In case when FDW is not designed to handle join by itself, it is > a reasonable fallback I think. > > I expect FDW driver needs to handle EPQ recheck in the case below: > * ForeignScan on base relation and it uses late row locking. > * ForeignScan on join relation, even if early locking. I also think the approach would be one choice. But one thing I'm concerned about is plan creation for that by the FDW author; that would make life hard for the FDW author. (That was proposed by me ...) So, I'd like to investigate another approach that preserves the applicability of late row locking to the join pushdown case as well as the spirit of what's there now. The basic idea is (1) add a new callback routine RefetchForeignJoinRow that refetches one foreign-join tuple from the foreign server, after locking remote tuples for the component foreign tables if required, and (2) call that routine in ExecScanFetch if the target scan is for a foreign join and the component foreign tables require to be locked lately, else just return the foreign-join tuple stored in the parent's state tree, which is the tuple mentioned by Robert, for preserving the spirit of what's there now. I think that ExecLockRows and EvalPlanQualFetchRowMarks should probably be modified so as to skip foreign tables involved in a foreign join. Best regards, Etsuro Fujita
On 2015/10/01 15:38, Kyotaro HORIGUCHI wrote: >> I expect FDW driver needs to handle EPQ recheck in the case below: >> * ForeignScan on base relation and it uses late row locking. > I think this is indisputable. I think so. But I think this case would probably be handled by the existing RefetchForeignRow routine as I said upthread. >> * ForeignScan on join relation, even if early locking. > This could be unnecessary if the "foreign join" scan node can > have its own rowmark of ROW_MARK_COPY. That's an idea, but I'd vote for preserving the applicability of late row locking to the foreign join case, allowing component foreign tables involved in a foreign join to have different rowmark methods other than ROW_MARK_COPY. Best regards, Etsuro Fujita
Hello, At Thu, 1 Oct 2015 17:50:25 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <560CF3D1.9060305@lab.ntt.co.jp> > On 2015/10/01 11:15, Kouhei Kaigai wrote: > >> From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > >> On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita > >> <fujita.etsuro@lab.ntt.co.jp> wrote: > >> So, if we wanted to fix this in a way that preserves the spirit of > >> what's there now, it seems to me that we'd want the FDW to return > >> something that's like a whole row reference, but represents the output > >> of the foreign join rather than some underlying base table. And then > >> get the EPQ machinery to have the evaluation of the ForeignScan for > >> the join, when it happens in an EPQ context, to return that tuple. > >> But I don't really have a good idea how to do that. > > > Alternative built-in join execution? > > Once it is executed under the EPQ context, built-in join node fetches > > a tuple from both of inner and outer side for each. It is eventually > > fetched from the EPQ slot, then the alternative join produce a result > > tuple. > > In case when FDW is not designed to handle join by itself, it is > > a reasonable fallback I think. > > > > I expect FDW driver needs to handle EPQ recheck in the case below: > > * ForeignScan on base relation and it uses late row locking. > > * ForeignScan on join relation, even if early locking. > > I also think the approach would be one choice. But one thing I'm > concerned about is plan creation for that by the FDW author; that > would make life hard for the FDW author. (That was proposed by me > ...) > > So, I'd like to investigate another approach that preserves the > applicability of late row locking to the join pushdown case as well as > the spirit of what's there now. The basic idea is (1) add a new > callback routine RefetchForeignJoinRow that refetches one foreign-join > tuple from the foreign server, after locking remote tuples for the > component foreign tables if required, It would be the case that at least one of the component relations of a foreign join is other than ROW_MARK_COPY, which is not possible so far on postgres_fdw. For the case that some of the component relations are other than ROW_MARK_COPY, we might should call RefetchForeignRow for such relations and construct joined row involving ROW_MARK_COPY relations. Indeed we could consider some logic for the case, it is obvious that the case now we should focus on is a "foreign join" scan with all underlying foreign scans are ROW_MARK_COPY, I think. "foreign join" scan with ROW_MARK_COPY looks to be promising (for me) and in future it would be able to coexist with refetch mechanism maybe in your mind from this point of view... Maybe:p > and (2) call that routine in > ExecScanFetch if the target scan is for a foreign join and the > component foreign tables require to be locked lately, else just return > the foreign-join tuple stored in the parent's state tree, which is the > tuple mentioned by Robert, for preserving the spirit of what's there > now. I think that ExecLockRows and EvalPlanQualFetchRowMarks should > probably be modified so as to skip foreign tables involved in a > foreign join. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/10/01 19:02, Kyotaro HORIGUCHI wrote: > At Thu, 1 Oct 2015 17:50:25 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <560CF3D1.9060305@lab.ntt.co.jp> >>>> From: pgsql-hackers-owner@postgresql.org >>>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas >>>> So, if we wanted to fix this in a way that preserves the spirit of >>>> what's there now, it seems to me that we'd want the FDW to return >>>> something that's like a whole row reference, but represents the output >>>> of the foreign join rather than some underlying base table. And then >>>> get the EPQ machinery to have the evaluation of the ForeignScan for >>>> the join, when it happens in an EPQ context, to return that tuple. >>>> But I don't really have a good idea how to do that. >> So, I'd like to investigate another approach that preserves the >> applicability of late row locking to the join pushdown case as well as >> the spirit of what's there now. The basic idea is (1) add a new >> callback routine RefetchForeignJoinRow that refetches one foreign-join >> tuple from the foreign server, after locking remote tuples for the >> component foreign tables if required, > It would be the case that at least one of the component relations > of a foreign join is other than ROW_MARK_COPY, which is not > possible so far on postgres_fdw. Yes. To be exact, it's possible for the component relations to have rowmark methods other than ROW_MARK_COPY using GetForeignRowMarkType, in principle, but the server crashes ... > For the case that some of the > component relations are other than ROW_MARK_COPY, we might should > call RefetchForeignRow for such relations and construct joined > row involving ROW_MARK_COPY relations. You are saying that we should construct the joined row using an alternative local join execution plan? > Indeed we could consider some logic for the case, it is obvious > that the case now we should focus on is a "foreign join" scan > with all underlying foreign scans are ROW_MARK_COPY, I > think. "foreign join" scan with ROW_MARK_COPY looks to be > promising (for me) and in future it would be able to coexist with > refetch mechanism maybe in your mind from this point of > view... Maybe:p I agree that the approach "foreign-join scan with ROW_MARK_COPY" would be promising. Best regards, Etsuro Fujita
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Thursday, October 01, 2015 5:50 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/01 11:15, Kouhei Kaigai wrote: > >> From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > >> On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita > >> <fujita.etsuro@lab.ntt.co.jp> wrote: > >>> I thought the same thing [1]. While I thought it was relatively easy to > >>> make changes to RefetchForeignRow that way for the foreign table case > >>> (scanrelid>0), I was not sure how hard it would be to do so for the foreign > >>> join case (scanrelid==0). So, I proposed to leave that changes for 9.6. > >>> I'll have a rethink on this issue along the lines of that approach. > > >> So, if we wanted to fix this in a way that preserves the spirit of > >> what's there now, it seems to me that we'd want the FDW to return > >> something that's like a whole row reference, but represents the output > >> of the foreign join rather than some underlying base table. And then > >> get the EPQ machinery to have the evaluation of the ForeignScan for > >> the join, when it happens in an EPQ context, to return that tuple. > >> But I don't really have a good idea how to do that. > > > Alternative built-in join execution? > > Once it is executed under the EPQ context, built-in join node fetches > > a tuple from both of inner and outer side for each. It is eventually > > fetched from the EPQ slot, then the alternative join produce a result > > tuple. > > In case when FDW is not designed to handle join by itself, it is > > a reasonable fallback I think. > > > > I expect FDW driver needs to handle EPQ recheck in the case below: > > * ForeignScan on base relation and it uses late row locking. > > * ForeignScan on join relation, even if early locking. > > I also think the approach would be one choice. But one thing I'm > concerned about is plan creation for that by the FDW author; that would > make life hard for the FDW author. (That was proposed by me ...) > I don't follow the standpoint, but not valuable to repeat same discussion. > So, I'd like to investigate another approach that preserves the > applicability of late row locking to the join pushdown case as well as > the spirit of what's there now. The basic idea is (1) add a new > callback routine RefetchForeignJoinRow that refetches one foreign-join > tuple from the foreign server, after locking remote tuples for the > component foreign tables if required, and (2) call that routine in > ExecScanFetch if the target scan is for a foreign join and the component > foreign tables require to be locked lately, else just return the > foreign-join tuple stored in the parent's state tree, which is the tuple > mentioned by Robert, for preserving the spirit of what's there now. I > think that ExecLockRows and EvalPlanQualFetchRowMarks should probably be > modified so as to skip foreign tables involved in a foreign join. > As long as FDW author can choose their best way to produce a joined tuple, it may be worth to investigate. My comments are: * ForeignRecheck is the best location to call RefetchForeignJoinRow when scanrelid==0, not ExecScanFetch. Why you try toadd special case for FDW in the common routine. * It is FDW's choice where the remote join tuple is kept, even though most of FDW will keep it on the private field of ForeignScanState. Apart from FDW requirement, custom-scan/join needs recheckMtd is called when scanrelid==0 to avoid assertion fail. I hope FDW has symmetric structure, however, not a mandatory requirement for me. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, I had more condieration on this. > As long as FDW author can choose their best way to produce a joined > tuple, it may be worth to investigate. > > My comments are: > * ForeignRecheck is the best location to call RefetchForeignJoinRow > when scanrelid==0, not ExecScanFetch. Why you try to add special > case for FDW in the common routine. > * It is FDW's choice where the remote join tuple is kept, even though > most of FDW will keep it on the private field of ForeignScanState. I think that scanrelid == 0 means that the node in focus is not a scan node in current executor semantics. EvalPlanQualFetchRowMarks fetches the possiblly modified row then EvalPlanQualNext does recheck for the new row. It's the roles of each functions. In this criteria, recheck routines are not the place for refetching. EvalPlanQualFetchRowMarks is that. Again, the problem here is that "foreign join" scan node is actually a scan node but it doesn't provide all materials which executor expects for a scan node. So the way to fix this preserving the semantics would be in two choices. 1. make "foreign join" scan node to behave as complete scan node. That is, EvalPlanQualFetchRowMarks can retrieve the modifiedrow version anyhow according to the type of row mark. 2. make "foreign join" node that the node actuall a join node which has subnodes and the "foreign join" node can reconstruct the result row using the result of subnodes on EPQ. (ExecForeignJoinNode would cease to call subnodes if itis actually a scan node) "3". Any other means to break current semantics of joins and scans in executor, as you recommends. Some more adjustment wouldbe needed to go on this way. I don't know how the current disign of FDW has been built, especialy about join pushdown feature so I should be missing something but I think as the above for this issue. > Apart from FDW requirement, custom-scan/join needs recheckMtd is > called when scanrelid==0 to avoid assertion fail. I hope FDW has > symmetric structure, however, not a mandatory requirement for me. It wouldn't be needed if EvalPlanQualFetchRowMarks works as exepcted. Is this wrong? regards, At Thu, 1 Oct 2015 13:17:34 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80114D7BB@BPXM15GP.gisp.nec.co.jp> > > > In case when FDW is not designed to handle join by itself, it is > > > a reasonable fallback I think. > > > > > > I expect FDW driver needs to handle EPQ recheck in the case below: > > > * ForeignScan on base relation and it uses late row locking. > > > * ForeignScan on join relation, even if early locking. > > > > I also think the approach would be one choice. But one thing I'm > > concerned about is plan creation for that by the FDW author; that would > > make life hard for the FDW author. (That was proposed by me ...) > > > I don't follow the standpoint, but not valuable to repeat same discussion. > > > So, I'd like to investigate another approach that preserves the > > applicability of late row locking to the join pushdown case as well as > > the spirit of what's there now. The basic idea is (1) add a new > > callback routine RefetchForeignJoinRow that refetches one foreign-join > > tuple from the foreign server, after locking remote tuples for the > > component foreign tables if required, and (2) call that routine in > > ExecScanFetch if the target scan is for a foreign join and the component > > foreign tables require to be locked lately, else just return the > > foreign-join tuple stored in the parent's state tree, which is the tuple > > mentioned by Robert, for preserving the spirit of what's there now. I > > think that ExecLockRows and EvalPlanQualFetchRowMarks should probably be > > modified so as to skip foreign tables involved in a foreign join. > > > As long as FDW author can choose their best way to produce a joined > tuple, it may be worth to investigate. > > My comments are: > * ForeignRecheck is the best location to call RefetchForeignJoinRow > when scanrelid==0, not ExecScanFetch. Why you try to add special > case for FDW in the common routine. > * It is FDW's choice where the remote join tuple is kept, even though > most of FDW will keep it on the private field of ForeignScanState. > > Apart from FDW requirement, custom-scan/join needs recheckMtd is > called when scanrelid==0 to avoid assertion fail. I hope FDW has > symmetric structure, however, not a mandatory requirement for me. -- Kyotaro Horiguchi NTT Open Source Software Center
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kyotaro HORIGUCHI > Sent: Friday, October 02, 2015 9:50 AM > To: Kaigai Kouhei(海外 浩平) > Cc: fujita.etsuro@lab.ntt.co.jp; robertmhaas@gmail.com; > pgsql-hackers@postgresql.org; shigeru.hanada@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Hello, I had more condieration on this. > > > As long as FDW author can choose their best way to produce a joined > > tuple, it may be worth to investigate. > > > > My comments are: > > * ForeignRecheck is the best location to call RefetchForeignJoinRow > > when scanrelid==0, not ExecScanFetch. Why you try to add special > > case for FDW in the common routine. > > * It is FDW's choice where the remote join tuple is kept, even though > > most of FDW will keep it on the private field of ForeignScanState. > > I think that scanrelid == 0 means that the node in focus is not a > scan node in current executor > semantics. EvalPlanQualFetchRowMarks fetches the possiblly > modified row then EvalPlanQualNext does recheck for the new > row. It's the roles of each functions. > > In this criteria, recheck routines are not the place for > refetching. EvalPlanQualFetchRowMarks is that. > I never say FDW should refetch tuples on the recheck routine. All I suggest is, projection to generate a joined tuple and recheck according to the qualifier pushed down are role of FDW driver, because it knows the best strategy to do the job. > Again, the problem here is that "foreign join" scan node is > actually a scan node but it doesn't provide all materials which > executor expects for a scan node. So the way to fix this > preserving the semantics would be in two choices. > > 1. make "foreign join" scan node to behave as complete scan > node. That is, EvalPlanQualFetchRowMarks can retrieve the > modified row version anyhow according to the type of row mark. > > 2. make "foreign join" node that the node actuall a join node > which has subnodes and the "foreign join" node can reconstruct > the result row using the result of subnodes on EPQ. > (ExecForeignJoinNode would cease to call subnodes if it is > actually a scan node) > > "3". Any other means to break current semantics of joins and > scans in executor, as you recommends. Some more adjustment > would be needed to go on this way. > > > I don't know how the current disign of FDW has been built, > especialy about join pushdown feature so I should be missing > something but I think as the above for this issue. > It looks to me all of them makes the problem complicated more. I never heard why "foreign-join" scan node is difficult to construct a joined tuple using the EPQ slots that are already loaded on. Regardless of the early or late locking, EPQ slots of base relation are already filled up, aren't it? All mission of the "foreign-join" scan node is return a joined tuple as if it was executed by local join logic. Local join consumes two tuples then generate one tuple. The "foreign-join" scan node can perform equivalently, even if it is under EPQ recheck context. So, job of FDW driver is... Step-1) Fetch tuples from the EPQ slots of the base foreign relation to be joined. Please note that it is just a pointerreference. Step-2) Try to join these two (or more) tuples according to the join condition (only FDW knows because it is kept inprivate) Step-3) If result is valid, FDW driver makes a projection from these tuples, then return it. If you concern about re-invention of the code for each FDW, core can provide a utility routine to cover 95% of FDW structure. I want to keep EvalPlanQualFetchRowMarks per base relation basis. It is a bad choice to consider join at this point. > > Apart from FDW requirement, custom-scan/join needs recheckMtd is > > called when scanrelid==0 to avoid assertion fail. I hope FDW has > > symmetric structure, however, not a mandatory requirement for me. > > It wouldn't be needed if EvalPlanQualFetchRowMarks works as > exepcted. Is this wrong? > Yes, it does not work. Expected behavior EvalPlanQualFetchRowMarks is to load the tuple to be rechecked onto EPQ slot, using heap_fetch or copied image. It is per base relation basis. Who can provide a projection to generate joined tuple? It is a job of individual plan-state-node to be walked on during EvalPlanQualNext(). Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > regards, > > > At Thu, 1 Oct 2015 13:17:34 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote > in <9A28C8860F777E439AA12E8AEA7694F80114D7BB@BPXM15GP.gisp.nec.co.jp> > > > > In case when FDW is not designed to handle join by itself, it is > > > > a reasonable fallback I think. > > > > > > > > I expect FDW driver needs to handle EPQ recheck in the case below: > > > > * ForeignScan on base relation and it uses late row locking. > > > > * ForeignScan on join relation, even if early locking. > > > > > > I also think the approach would be one choice. But one thing I'm > > > concerned about is plan creation for that by the FDW author; that would > > > make life hard for the FDW author. (That was proposed by me ...) > > > > > I don't follow the standpoint, but not valuable to repeat same discussion. > > > > > So, I'd like to investigate another approach that preserves the > > > applicability of late row locking to the join pushdown case as well as > > > the spirit of what's there now. The basic idea is (1) add a new > > > callback routine RefetchForeignJoinRow that refetches one foreign-join > > > tuple from the foreign server, after locking remote tuples for the > > > component foreign tables if required, and (2) call that routine in > > > ExecScanFetch if the target scan is for a foreign join and the component > > > foreign tables require to be locked lately, else just return the > > > foreign-join tuple stored in the parent's state tree, which is the tuple > > > mentioned by Robert, for preserving the spirit of what's there now. I > > > think that ExecLockRows and EvalPlanQualFetchRowMarks should probably be > > > modified so as to skip foreign tables involved in a foreign join. > > > > > As long as FDW author can choose their best way to produce a joined > > tuple, it may be worth to investigate. > > > > My comments are: > > * ForeignRecheck is the best location to call RefetchForeignJoinRow > > when scanrelid==0, not ExecScanFetch. Why you try to add special > > case for FDW in the common routine. > > * It is FDW's choice where the remote join tuple is kept, even though > > most of FDW will keep it on the private field of ForeignScanState. > > > > Apart from FDW requirement, custom-scan/join needs recheckMtd is > > called when scanrelid==0 to avoid assertion fail. I hope FDW has > > symmetric structure, however, not a mandatory requirement for me. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/10/02 9:50, Kyotaro HORIGUCHI wrote: >> As long as FDW author can choose their best way to produce a joined >> tuple, it may be worth to investigate. >> >> My comments are: >> * ForeignRecheck is the best location to call RefetchForeignJoinRow >> when scanrelid==0, not ExecScanFetch. Why you try to add special >> case for FDW in the common routine. In my understanding, the job that ExecScanRecheckMtd should do is to check if the test tuple *already stored* in the plan node's scan slot meets the access-method conditions, in general. So, ISTM that it'd be somewhat odd to replace RefetchForeignJoinRow within ForeignRecheck, to store the remote join tuple in the slot. Also, RefetchForeignRow is called from the common routines ExecLockRows/EvalPlanQualFetchRowMarks ... >> * It is FDW's choice where the remote join tuple is kept, even though >> most of FDW will keep it on the private field of ForeignScanState. I see. To make it possible that the FDW doesn't have to do anything for cases where the FDW doesn't do any late row locking, however, I think it'd be more promising to use the remote join tuple stored in the scan slot of the corresponding ForeignScanState node in the parent's planstate tree. I haven't had a good idea for that yet, though. > EvalPlanQualFetchRowMarks fetches the possiblly > modified row then EvalPlanQualNext does recheck for the new > row. Really? EvalPlanQualFetchRowMarks fetches the tuples for any non-locked relations, so I think that that function should fetch the same version previously obtained for each such relation successfully. Best regards, Etsuro Fujita
Hello, At Fri, 2 Oct 2015 03:10:01 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80114DAEC@BPXM15GP.gisp.nec.co.jp> > > > As long as FDW author can choose their best way to produce a joined > > > tuple, it may be worth to investigate. > > > > > > My comments are: > > > * ForeignRecheck is the best location to call RefetchForeignJoinRow > > > when scanrelid==0, not ExecScanFetch. Why you try to add special > > > case for FDW in the common routine. > > > * It is FDW's choice where the remote join tuple is kept, even though > > > most of FDW will keep it on the private field of ForeignScanState. > > > > I think that scanrelid == 0 means that the node in focus is not a > > scan node in current executor > > semantics. EvalPlanQualFetchRowMarks fetches the possiblly > > modified row then EvalPlanQualNext does recheck for the new > > row. It's the roles of each functions. > > > > In this criteria, recheck routines are not the place for > > refetching. EvalPlanQualFetchRowMarks is that. > > > I never say FDW should refetch tuples on the recheck routine. > All I suggest is, projection to generate a joined tuple and > recheck according to the qualifier pushed down are role of > FDW driver, because it knows the best strategy to do the job. I have no objection that rechecking is FDW's job. I think you are thinking that all ROW_MARK_COPY base rows are held in ss_ScanTupleSlot so simply calling recheckMtd on the slot gives enough data to the function. (EPQState would also be needed to retrieve, though..) Right? All the underlying foreign tables should be marked as ROW_MARK_COPY to call recheckMtd safely. And somehow it required to know what column stores what base tuple. > It looks to me all of them makes the problem complicated more. > I never heard why "foreign-join" scan node is difficult to construct > a joined tuple using the EPQ slots that are already loaded on. > > Regardless of the early or late locking, EPQ slots of base relation > are already filled up, aren't it? recheckMtd needs to take EState as a parameter? > All mission of the "foreign-join" scan node is return a joined > tuple as if it was executed by local join logic. > Local join consumes two tuples then generate one tuple. > The "foreign-join" scan node can perform equivalently, even if it > is under EPQ recheck context. > > So, job of FDW driver is... > Step-1) Fetch tuples from the EPQ slots of the base foreign relation > to be joined. Please note that it is just a pointer reference. > Step-2) Try to join these two (or more) tuples according to the > join condition (only FDW knows because it is kept in private) > Step-3) If result is valid, FDW driver makes a projection from these > tuples, then return it. > > If you concern about re-invention of the code for each FDW, core > can provide a utility routine to cover 95% of FDW structure. > > I want to keep EvalPlanQualFetchRowMarks per base relation basis. > It is a bad choice to consider join at this point. > > > Apart from FDW requirement, custom-scan/join needs recheckMtd is > > > called when scanrelid==0 to avoid assertion fail. I hope FDW has > > > symmetric structure, however, not a mandatory requirement for me. > > > > It wouldn't be needed if EvalPlanQualFetchRowMarks works as > > exepcted. Is this wrong? > > > Yes, it does not work. > Expected behavior EvalPlanQualFetchRowMarks is to load the tuple > to be rechecked onto EPQ slot, using heap_fetch or copied image. > It is per base relation basis. Hmm. What I said by "works as expected" is that the function stores the tuple for the "foreign join" scan node. If it doesn't, you're right. > Who can provide a projection to generate joined tuple? > It is a job of individual plan-state-node to be walked on during > EvalPlanQualNext(). EvalPlanQualNext simply does recheck tuples stored in epqTuples, which are designed to be provided by EvalPlanQualFetchRowMarks. I think that that premise shouldn't be broken for convenience... regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, At Fri, 2 Oct 2015 12:51:42 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <560DFF4E.2000001@lab.ntt.co.jp> > On 2015/10/02 9:50, Kyotaro HORIGUCHI wrote: Most of the citation are of Kiagai-san's mentions:) > >> As long as FDW author can choose their best way to produce a joined > >> tuple, it may be worth to investigate. > >> > >> My comments are: > >> * ForeignRecheck is the best location to call RefetchForeignJoinRow > >> when scanrelid==0, not ExecScanFetch. Why you try to add special > >> case for FDW in the common routine. > > In my understanding, the job that ExecScanRecheckMtd should do is to > check if the test tuple *already stored* in the plan node's scan slot > meets the access-method conditions, in general. So, ISTM that it'd be > somewhat odd to replace RefetchForeignJoinRow within ForeignRecheck, > to store the remote join tuple in the slot. Also, RefetchForeignRow > is called from the common routines > ExecLockRows/EvalPlanQualFetchRowMarks ... Agreed, except for the necessity of RefetchForeignJoinRow. > >> * It is FDW's choice where the remote join tuple is kept, even though > >> most of FDW will keep it on the private field of ForeignScanState. > > I see. > > To make it possible that the FDW doesn't have to do anything for cases > where the FDW doesn't do any late row locking, however, I think it'd > be more promising to use the remote join tuple stored in the scan slot > of the corresponding ForeignScanState node in the parent's planstate > tree. I haven't had a good idea for that yet, though. One coarse idea is that adding root->rowMarks for the "foreign join" paths (then removing rowMarks for underlying scans later if the foreign join wins). Somehow propagating it to epqstate->arowMarks, EvalPlanQualFetchRowMarks will stores needed tuple into eqptuples. This is which Kaigai-san criticized as 'makes things too complex'.:) But I'm awkward to break the assumption of ExecScanFetch. > > EvalPlanQualFetchRowMarks fetches the possiblly > > modified row then EvalPlanQualNext does recheck for the new > > row. > > Really? EvalPlanQualFetchRowMarks fetches the tuples for any > non-locked relations, so I think that that function should fetch the > same version previously obtained for each such relation successfully. Sorry, please ignore "possibly modified". regards, -- Kyotaro Horiguchi NTT Open Source Software Center
> -----Original Message----- > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp] > Sent: Friday, October 02, 2015 1:28 PM > To: Kaigai Kouhei(海外 浩平) > Cc: fujita.etsuro@lab.ntt.co.jp; robertmhaas@gmail.com; > pgsql-hackers@postgresql.org; shigeru.hanada@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Hello, > > At Fri, 2 Oct 2015 03:10:01 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote > in <9A28C8860F777E439AA12E8AEA7694F80114DAEC@BPXM15GP.gisp.nec.co.jp> > > > > As long as FDW author can choose their best way to produce a joined > > > > tuple, it may be worth to investigate. > > > > > > > > My comments are: > > > > * ForeignRecheck is the best location to call RefetchForeignJoinRow > > > > when scanrelid==0, not ExecScanFetch. Why you try to add special > > > > case for FDW in the common routine. > > > > * It is FDW's choice where the remote join tuple is kept, even though > > > > most of FDW will keep it on the private field of ForeignScanState. > > > > > > I think that scanrelid == 0 means that the node in focus is not a > > > scan node in current executor > > > semantics. EvalPlanQualFetchRowMarks fetches the possiblly > > > modified row then EvalPlanQualNext does recheck for the new > > > row. It's the roles of each functions. > > > > > > In this criteria, recheck routines are not the place for > > > refetching. EvalPlanQualFetchRowMarks is that. > > > > > I never say FDW should refetch tuples on the recheck routine. > > All I suggest is, projection to generate a joined tuple and > > recheck according to the qualifier pushed down are role of > > FDW driver, because it knows the best strategy to do the job. > > I have no objection that rechecking is FDW's job. > > I think you are thinking that all ROW_MARK_COPY base rows are > held in ss_ScanTupleSlot so simply calling recheckMtd on the slot > gives enough data to the function. (EPQState would also be needed > to retrieve, though..) Right? > Not ss_ScanTupleSlot. It is initialized according to fdw_scan_tlist in case of scanrelid==0, regardless of base foreign relation's definition. My expectation is, FDW callback construct tts_values/tts_isnull of ss_ScanTupleSlot according to the preloaded tuples in EPQ slots and underlying projection. Only FDW driver knows the best way to construct this result tuple. You can pull out EState reference from PlanState portion of the ForeignScanState, so nothing needs to be changed. > All the underlying foreign tables should be marked as > ROW_MARK_COPY to call recheckMtd safely. And somehow it required > to know what column stores what base tuple. > Even if ROW_MARK_REFERENCE by later locking, the tuple to be rechecked is already loaded estate->es_epqTuple[], isn't it? Recheck routine does not needs to care about row-mark policy. > > It looks to me all of them makes the problem complicated more. > > I never heard why "foreign-join" scan node is difficult to construct > > a joined tuple using the EPQ slots that are already loaded on. > > > > Regardless of the early or late locking, EPQ slots of base relation > > are already filled up, aren't it? > > recheckMtd needs to take EState as a parameter? > No. > > All mission of the "foreign-join" scan node is return a joined > > tuple as if it was executed by local join logic. > > Local join consumes two tuples then generate one tuple. > > The "foreign-join" scan node can perform equivalently, even if it > > is under EPQ recheck context. > > > > So, job of FDW driver is... > > Step-1) Fetch tuples from the EPQ slots of the base foreign relation > > to be joined. Please note that it is just a pointer reference. > > Step-2) Try to join these two (or more) tuples according to the > > join condition (only FDW knows because it is kept in private) > > Step-3) If result is valid, FDW driver makes a projection from these > > tuples, then return it. > > > > If you concern about re-invention of the code for each FDW, core > > can provide a utility routine to cover 95% of FDW structure. > > > > I want to keep EvalPlanQualFetchRowMarks per base relation basis. > > It is a bad choice to consider join at this point. > > > > > > Apart from FDW requirement, custom-scan/join needs recheckMtd is > > > > called when scanrelid==0 to avoid assertion fail. I hope FDW has > > > > symmetric structure, however, not a mandatory requirement for me. > > > > > > It wouldn't be needed if EvalPlanQualFetchRowMarks works as > > > exepcted. Is this wrong? > > > > > Yes, it does not work. > > Expected behavior EvalPlanQualFetchRowMarks is to load the tuple > > to be rechecked onto EPQ slot, using heap_fetch or copied image. > > It is per base relation basis. > > Hmm. What I said by "works as expected" is that the function > stores the tuple for the "foreign join" scan node. If it doesn't, > you're right. > Which slot of the EPQ slot will save the joined tuple? scanrelid is zero, and we have no identifier of join planstate. > > Who can provide a projection to generate joined tuple? > > It is a job of individual plan-state-node to be walked on during > > EvalPlanQualNext(). > > EvalPlanQualNext simply does recheck tuples stored in epqTuples, > which are designed to be provided by EvalPlanQualFetchRowMarks. > > I think that that premise shouldn't be broken for convenience... > Do I see something different or understand incorrectly? EvalPlanQualNext() walks down entire subtree of the Lock node. (epqstate->planstate is entire subplan of Lock node.) TupleTableSlot * EvalPlanQualNext(EPQState *epqstate) { MemoryContext oldcontext; TupleTableSlot *slot; oldcontext= MemoryContextSwitchTo(epqstate->estate->es_query_cxt); slot = ExecProcNode(epqstate->planstate); MemoryContextSwitchTo(oldcontext); return slot; } If and when relations joins are kept in the sub-plan, ExecProcNode() processes the projection by join, doesn't it? Why projection by join is not a part of EvalPlanQualNext()? It is the core of its job. Unless projection by join, upper node cannot recheck the tuple come from child nodes. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, thank you for explanation. I understood the background. On the current planner implement, row marks are tightly bound to initial RTEs. This is quite natural for the purpose of row marks. During join search, a joinrel should be comptible between local joins and remote joins, of course target list also should be so. So it is quite difficult to add wholerow resjunk for joinrels before whole join tree is completed even if we allow row marks that are not bound to base RTEs. The result of make_rel_from_joinlist contains only winner paths so we might be able to transform target list for this joinrel so that it has join wholerows (and doesn't have unnecessary RTE wholerows), but I don't see any clean way to do that. As the result, all that LockRow can collect for EPQ are tuples for base relations. No room to pass joined whole row so far. At Fri, 2 Oct 2015 05:04:44 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80114DBFB@BPXM15GP.gisp.nec.co.jp> > > > I never say FDW should refetch tuples on the recheck routine. > > > All I suggest is, projection to generate a joined tuple and > > > recheck according to the qualifier pushed down are role of > > > FDW driver, because it knows the best strategy to do the job. > > > > I have no objection that rechecking is FDW's job. > > > > I think you are thinking that all ROW_MARK_COPY base rows are > > held in ss_ScanTupleSlot so simply calling recheckMtd on the slot > > gives enough data to the function. (EPQState would also be needed > > to retrieve, though..) Right? > > > Not ss_ScanTupleSlot. It is initialized according to fdw_scan_tlist > in case of scanrelid==0, regardless of base foreign relation's > definition. Sorry, EvalPlanQualFetchRowMarks retrieves wholerows from epqstate->origslot. > My expectation is, FDW callback construct tts_values/tts_isnull > of ss_ScanTupleSlot according to the preloaded tuples in EPQ slots > and underlying projection. Only FDW driver knows the best way to > construct this result tuple. Currently only FDW itself knows how the joined relaiton are made precisely. > You can pull out EState reference from PlanState portion of the > ForeignScanState, so nothing needs to be changed. Exactly. > > > > > Apart from FDW requirement, custom-scan/join needs recheckMtd is > > > > > called when scanrelid==0 to avoid assertion fail. I hope FDW has > > > > > symmetric structure, however, not a mandatory requirement for me. ... > > Hmm. What I said by "works as expected" is that the function > > stores the tuple for the "foreign join" scan node. If it doesn't, > > you're right. > > > Which slot of the EPQ slot will save the joined tuple? Yes, that is the second significant problem. As described above, furtermore, the way to inject joined wholrow var into the target list for the pushed-down join seems more difficult to find > scanrelid is zero, and we have no identifier of join planstate. > > > > Who can provide a projection to generate joined tuple? > > > It is a job of individual plan-state-node to be walked on during > > > EvalPlanQualNext(). > > > > EvalPlanQualNext simply does recheck tuples stored in epqTuples, > > which are designed to be provided by EvalPlanQualFetchRowMarks. > > > > I think that that premise shouldn't be broken for convenience... > > > Do I see something different or understand incorrectly? > EvalPlanQualNext() walks down entire subtree of the Lock node. > (epqstate->planstate is entire subplan of Lock node.) > > TupleTableSlot * > EvalPlanQualNext(EPQState *epqstate) > { > MemoryContext oldcontext; > TupleTableSlot *slot; > > oldcontext = MemoryContextSwitchTo(epqstate->estate->es_query_cxt); > slot = ExecProcNode(epqstate->planstate); > MemoryContextSwitchTo(oldcontext); > > return slot; > } > > If and when relations joins are kept in the sub-plan, ExecProcNode() > processes the projection by join, doesn't it? Yes, but it is needed to prepare alternative plan to do such projection. > Why projection by join is not a part of EvalPlanQualNext()? > It is the core of its job. Unless projection by join, upper node cannot > recheck the tuple come from child nodes. What I'm uneasy on is the foreign join introduced the difference in behavior between ordinary fetching and epq fetching. It is quite natural having joined whole row but is seems hard to get. Another reason is that ExecScanFetch with scanrelid == 0 on EPQ is FDW/CS specific feature and looks to be a kind of hack. (Even if it would be one of many) regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/09/29 16:36, Etsuro Fujita wrote: > For the foreign table case (scanrelid>0), I imagined an approach > different than yours. In that case, I thought the issue would be > probably addressed by just modifying the remote query performed in > RefetchForeignRow, which would be of the form "SELECT ctid, * FROM > remote table WHERE ctid = $1", so that the modified query would be of > the form "SELECT ctid, * FROM remote table WHERE ctid = $1 AND *remote > quals*". Sorry, I was wrong. I noticed that the modifieid query (that will be sent to the remote server during postgresRefetchForeignRow) should be of the form "SELECT * FROM (SELECT ctid, * FROM remote table WHERE ctid = $1) ss WHERE *remote quals*". (I think the query of the form "SELECT ctid, * FROM remote table WHERE ctid = $1 AND *remote quals*" would be okay if using a TID scan on the remote side, though.) Best regards, Etsuro Fujita
On Fri, Oct 2, 2015 at 4:26 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > During join search, a joinrel should be comptible between local > joins and remote joins, of course target list also should be > so. So it is quite difficult to add wholerow resjunk for joinrels > before whole join tree is completed even if we allow row marks > that are not bound to base RTEs. Suppose ROW_MARK_COPY is in use, and suppose the query is: SELECT ft1.a, ft1.b, ft2.a, ft2.b FROM ft1, ft2 WHERE ft1.x = ft2.x; When the foreign join is executed, there's going to be a slot that needs to be populated with ft1.a, ft1.b, ft2.a, ft2.b, and a whole row reference. Now, let's suppose the slot descriptor has 5 columns: those 4, plus a whole-row reference for ROW_MARK_COPY. If we know what values we're going to store in columns 1..4, couldn't we just form them into a tuple to populate column 5? We don't actually need to be able to fetch such a tuple from the remote side because we can just construct it. I think. Is this a dumb idea, or could it work? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > > > Who can provide a projection to generate joined tuple? > > > > It is a job of individual plan-state-node to be walked on during > > > > EvalPlanQualNext(). > > > > > > EvalPlanQualNext simply does recheck tuples stored in epqTuples, > > > which are designed to be provided by EvalPlanQualFetchRowMarks. > > > > > > I think that that premise shouldn't be broken for convenience... > > > > > Do I see something different or understand incorrectly? > > EvalPlanQualNext() walks down entire subtree of the Lock node. > > (epqstate->planstate is entire subplan of Lock node.) > > > > TupleTableSlot * > > EvalPlanQualNext(EPQState *epqstate) > > { > > MemoryContext oldcontext; > > TupleTableSlot *slot; > > > > oldcontext = MemoryContextSwitchTo(epqstate->estate->es_query_cxt); > > slot = ExecProcNode(epqstate->planstate); > > MemoryContextSwitchTo(oldcontext); > > > > return slot; > > } > > > > If and when relations joins are kept in the sub-plan, ExecProcNode() > > processes the projection by join, doesn't it? > > Yes, but it is needed to prepare alternative plan to do such > projection. > No matter. The custom-scan node is a good reference to have underlying plan nodes that can be kicked by extension. If we adopt same semantics, these alternative plan shall not be kicked unless FDW driver does not want. Also, I don't think it is difficult to construct an alternative join- path using unparametalized nested-loop (note that all we need to do is evaluation towards a most one tuples for each base relation). If people felt it is really re-invention of the wheel, core backend can provide a utility function to produce the alternative path. Probably, Path * foreign_join_alternative_local_join_path(PlannerInfo *root, RelOptInfo *joinrel) can generate what we need. > > Why projection by join is not a part of EvalPlanQualNext()? > > It is the core of its job. Unless projection by join, upper node cannot > > recheck the tuple come from child nodes. > > What I'm uneasy on is the foreign join introduced the difference > in behavior between ordinary fetching and epq fetching. It is > quite natural having joined whole row but is seems hard to get. > hard to get, and easy to construct on the fly. > Another reason is that ExecScanFetch with scanrelid == 0 on EPQ > is FDW/CS specific feature and looks to be a kind of hack. (Even > if it would be one of many) > It means these are kind of exceptional ones, thus it is reasonable to avoid fundamental changes in RowLock mechanism, isn't it? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/10/07 6:19, Robert Haas wrote: > On Fri, Oct 2, 2015 at 4:26 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> During join search, a joinrel should be comptible between local >> joins and remote joins, of course target list also should be >> so. So it is quite difficult to add wholerow resjunk for joinrels >> before whole join tree is completed even if we allow row marks >> that are not bound to base RTEs. > Suppose ROW_MARK_COPY is in use, and suppose the query is: SELECT > ft1.a, ft1.b, ft2.a, ft2.b FROM ft1, ft2 WHERE ft1.x = ft2.x; > > When the foreign join is executed, there's going to be a slot that > needs to be populated with ft1.a, ft1.b, ft2.a, ft2.b, and a whole row > reference. Now, let's suppose the slot descriptor has 5 columns: those > 4, plus a whole-row reference for ROW_MARK_COPY. IIUC, I think that if ROW_MARK_COPY is in use, the descriptor would have 6 columns: those 4, plus a whole-row var for ft1 and another whole-row bar for ft2. Maybe I'm missing something, though. > 4, plus a whole-row reference for ROW_MARK_COPY. If we know what > values we're going to store in columns 1..4, couldn't we just form > them into a tuple to populate column 5? We don't actually need to be > able to fetch such a tuple from the remote side because we can just > construct it. I think. I also was thinking whether we could replace one of the whole-row vars with a whole-row var that represents the scan slot of the ForeignScanState node. Best regards, Etsuro Fujita
Hello, At Wed, 7 Oct 2015 12:30:27 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <561491D3.3070901@lab.ntt.co.jp> > On 2015/10/07 6:19, Robert Haas wrote: > > On Fri, Oct 2, 2015 at 4:26 AM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> During join search, a joinrel should be comptible between local > >> joins and remote joins, of course target list also should be > >> so. So it is quite difficult to add wholerow resjunk for joinrels > >> before whole join tree is completed even if we allow row marks > >> that are not bound to base RTEs. > > > Suppose ROW_MARK_COPY is in use, and suppose the query is: SELECT > > ft1.a, ft1.b, ft2.a, ft2.b FROM ft1, ft2 WHERE ft1.x = ft2.x; > > > > When the foreign join is executed, there's going to be a slot that > > needs to be populated with ft1.a, ft1.b, ft2.a, ft2.b, and a whole row > > reference. Now, let's suppose the slot descriptor has 5 columns: those > > 4, plus a whole-row reference for ROW_MARK_COPY. > > IIUC, I think that if ROW_MARK_COPY is in use, the descriptor would > have 6 columns: those 4, plus a whole-row var for ft1 and another > whole-row bar for ft2. Maybe I'm missing something, though. You're right. The result tuple for the Robert's example has 6 attributes in the order of [ft1.a, ft1.b, (ft1.a, ft1.b), ft2.a...] But the point of the discussion is in another point. The problem is when such joins are joined with another local table. For such case, the whole-row reference for the intermediate foreign-join would lose the targets in top-level tuple. > > 4, plus a whole-row reference for ROW_MARK_COPY. If we know what > > values we're going to store in columns 1..4, couldn't we just form > > them into a tuple to populate column 5? We don't actually need to be > > able to fetch such a tuple from the remote side because we can just > > construct it. I think. > > I also was thinking whether we could replace one of the whole-row vars > with a whole-row var that represents the scan slot of the > ForeignScanState node. I suppose it requires additional resjunk to be added on joinrel creation, which is what Kaigai-san said as overkill. But I'm interedted in what it looks. cheers, -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Oct 6, 2015 at 11:30 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > IIUC, I think that if ROW_MARK_COPY is in use, the descriptor would have 6 > columns: those 4, plus a whole-row var for ft1 and another whole-row bar for > ft2. Maybe I'm missing something, though. Currently, yes, but I think we should change it so that... >> 4, plus a whole-row reference for ROW_MARK_COPY. If we know what >> values we're going to store in columns 1..4, couldn't we just form >> them into a tuple to populate column 5? We don't actually need to be >> able to fetch such a tuple from the remote side because we can just >> construct it. I think. > I also was thinking whether we could replace one of the whole-row vars with > a whole-row var that represents the scan slot of the ForeignScanState node. ...it works like this instead. KaiGai is suggesting that constructing a join plan to live under the foreign scan-qua-join isn't really that difficult, but that is like saying that it's OK to go out to a nice sushi restaurant without bringing any money because it won't be too hard to talk the manager into letting you leave for a quick trip to the ATM at the end of the meal. Maybe so, maybe not, but if you plan ahead and bring money then you don't have to worry about it. The only reason why we would need the join plan in the first place is because we had the foreign scan output whole-row vars for the baserels instead of its own slot. If we have it output a var for its own slot then it doesn't matter whether constructing the join plan is easy or hard, because we don't need it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 7, 2015 at 12:10 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> IIUC, I think that if ROW_MARK_COPY is in use, the descriptor would >> have 6 columns: those 4, plus a whole-row var for ft1 and another >> whole-row bar for ft2. Maybe I'm missing something, though. > > You're right. The result tuple for the Robert's example has 6 > attributes in the order of [ft1.a, ft1.b, (ft1.a, ft1.b), ft2.a...] > > But the point of the discussion is in another point. The problem > is when such joins are joined with another local table. For such > case, the whole-row reference for the intermediate foreign-join > would lose the targets in top-level tuple. Really? Would that mean that ROW_MARK_COPY is totally broken? I bet it's not. >> > 4, plus a whole-row reference for ROW_MARK_COPY. If we know what >> > values we're going to store in columns 1..4, couldn't we just form >> > them into a tuple to populate column 5? We don't actually need to be >> > able to fetch such a tuple from the remote side because we can just >> > construct it. I think. >> >> I also was thinking whether we could replace one of the whole-row vars >> with a whole-row var that represents the scan slot of the >> ForeignScanState node. > > I suppose it requires additional resjunk to be added on joinrel > creation, which is what Kaigai-san said as overkill. But I'm > interedted in what it looks. I think it rather requires *replacing* two resjunk columns by one new one. The whole-row references for the individual foreign tables are only there to support EvalPlanQual; if we instead have a column to populate the foreign scan's slot directly, then we can use that column for that purpose directly and there's no remaining use for the whole-row vars on the baserels. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, At Wed, 7 Oct 2015 00:24:57 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZRqXdtPh-RPbX-fRSdq+_c8U6dXcTovu+zgY0hrnR6TQ@mail.gmail.com> > On Wed, Oct 7, 2015 at 12:10 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> IIUC, I think that if ROW_MARK_COPY is in use, the descriptor would > >> have 6 columns: those 4, plus a whole-row var for ft1 and another > >> whole-row bar for ft2. Maybe I'm missing something, though. > > > > You're right. The result tuple for the Robert's example has 6 > > attributes in the order of [ft1.a, ft1.b, (ft1.a, ft1.b), ft2.a...] > > > > But the point of the discussion is in another point. The problem > > is when such joins are joined with another local table. For such > > case, the whole-row reference for the intermediate foreign-join > > would lose the targets in top-level tuple. > > Really? Would that mean that ROW_MARK_COPY is totally broken? I bet it's not. The semantics of ROW_MARK_COPY is the tuple should hold whole-row *value* as in resjunk column. I should misunderstood "whole row *reference*" by confising planner and executor behaviors. I understood the new story as adding to a tuple a reference to itself. If it is wrong and the correct story is having additional whole-row *value* in the whole joined tuple including resjunks passed from the underlying tuples, it should work. > >> > 4, plus a whole-row reference for ROW_MARK_COPY. If we know what > >> > values we're going to store in columns 1..4, couldn't we just form > >> > them into a tuple to populate column 5? We don't actually need to be > >> > able to fetch such a tuple from the remote side because we can just > >> > construct it. I think. > >> > >> I also was thinking whether we could replace one of the whole-row vars > >> with a whole-row var that represents the scan slot of the > >> ForeignScanState node. > > > > I suppose it requires additional resjunk to be added on joinrel > > creation, which is what Kaigai-san said as overkill. But I'm > > interedted in what it looks. > > I think it rather requires *replacing* two resjunk columns by one new > one. The whole-row references for the individual foreign tables are > only there to support EvalPlanQual; if we instead have a column to > populate the foreign scan's slot directly, then we can use that column > for that purpose directly and there's no remaining use for the > whole-row vars on the baserels. It is what I had in mind. Target lists for joinrels cannot be built straight-forward way as it is. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/10/07 15:06, Kyotaro HORIGUCHI wrote: > At Wed, 7 Oct 2015 00:24:57 -0400, Robert Haas <robertmhaas@gmail.com> wrote >> I think it rather requires *replacing* two resjunk columns by one new >> one. The whole-row references for the individual foreign tables are >> only there to support EvalPlanQual; if we instead have a column to >> populate the foreign scan's slot directly, then we can use that column >> for that purpose directly and there's no remaining use for the >> whole-row vars on the baserels. > It is what I had in mind. OK I'll investigate this further. Best regards, Etsuro Fujita
On 2015/10/07 15:39, Etsuro Fujita wrote: > On 2015/10/07 15:06, Kyotaro HORIGUCHI wrote: >> At Wed, 7 Oct 2015 00:24:57 -0400, Robert Haas <robertmhaas@gmail.com> >> wrote >>> I think it rather requires *replacing* two resjunk columns by one new >>> one. The whole-row references for the individual foreign tables are >>> only there to support EvalPlanQual; if we instead have a column to >>> populate the foreign scan's slot directly, then we can use that column >>> for that purpose directly and there's no remaining use for the >>> whole-row vars on the baserels. >> It is what I had in mind. > OK I'll investigate this further. I noticed that the approach using a column to populate the foreign scan's slot directly wouldn't work well in some cases. For example, consider: SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; The best plan is presumably something like this as you said before: LockRows -> Nested Loop -> Seq Scan on verysmall v -> Foreign Scan on bigft1 and bigft2 Remote SQL: SELECT * FROM bigft1JOIN bigft2 ON bigft1.x = bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 Consider the EvalPlanQual testing to see if the updated version of a tuple in v satisfies the query. If we use the column in the testing, we would get the wrong results in some cases. Best regards, Etsuro Fujita
On 2015/10/08 19:55, Etsuro Fujita wrote: > I noticed that the approach using a column to populate the foreign > scan's slot directly wouldn't work well in some cases. For example, > consider: > > SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; Oops, I should have written "JOIN", not "LEFT JOIN". > The best plan is presumably something like this as you said before: > > LockRows > -> Nested Loop > -> Seq Scan on verysmall v > -> Foreign Scan on bigft1 and bigft2 > Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > > Consider the EvalPlanQual testing to see if the updated version of a > tuple in v satisfies the query. If we use the column in the testing, we > would get the wrong results in some cases. More precisely, we would get the wrong result when the value of v.q or v.r in the updated version has changed. I don't have a good idea for this, so would an approach using an local join execution plan be the good way to go? Best regards, Etsuro Fujita
Hi, At Fri, 9 Oct 2015 12:00:30 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <56172DCE.7080604@lab.ntt.co.jp> > On 2015/10/08 19:55, Etsuro Fujita wrote: > > I noticed that the approach using a column to populate the foreign > > scan's slot directly wouldn't work well in some cases. For example, > > consider: > > > > SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = > > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; > > Oops, I should have written "JOIN", not "LEFT JOIN". > > > The best plan is presumably something like this as you said before: > > > > LockRows > > -> Nested Loop > > -> Seq Scan on verysmall v > > -> Foreign Scan on bigft1 and bigft2 > > Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = > > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > > > > Consider the EvalPlanQual testing to see if the updated version of a > > tuple in v satisfies the query. If we use the column in the testing, > > we > > would get the wrong results in some cases. > > More precisely, we would get the wrong result when the value of v.q or > v.r in the updated version has changed. What do you think the right behavior? Assuming that it is simply a join between local tables. SELECT * FROM t1 JOIN t2 on (t1.a = t2.a) FOR UPDATE; IIUC, if t1.a gets updated and EPQ runs, the tuple for t1 is refetched using ctid and that for t2 reused, so it would fail to be qualified and the joined tuple won't be returned. What happens on the foreign join example seems to be exactly the same thing. > I don't have a good idea for this, so would an approach using an local > join execution plan be the good way to go? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/10/09 15:04, Kyotaro HORIGUCHI wrote: > At Fri, 9 Oct 2015 12:00:30 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <56172DCE.7080604@lab.ntt.co.jp> >> On 2015/10/08 19:55, Etsuro Fujita wrote: >>> I noticed that the approach using a column to populate the foreign >>> scan's slot directly wouldn't work well in some cases. For example, >>> consider: >>> >>> SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = >>> bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; >> Oops, I should have written "JOIN", not "LEFT JOIN". >>> The best plan is presumably something like this as you said before: >>> >>> LockRows >>> -> Nested Loop >>> -> Seq Scan on verysmall v >>> -> Foreign Scan on bigft1 and bigft2 >>> Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = >>> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 >>> >>> Consider the EvalPlanQual testing to see if the updated version of a >>> tuple in v satisfies the query. If we use the column in the testing, >>> we >>> would get the wrong results in some cases. >> More precisely, we would get the wrong result when the value of v.q or >> v.r in the updated version has changed. > What do you think the right behavior? IIUC, I think that the foreign scan's slot should be set empty, that the join should fail, and that the updated version of the tuple in v should be ignored in that scenario since that for the updated version of the tuple in v, the tuples obtained from those two foreign tables wouldn't satisfy the remote query. But if populating the foreign scan's slot from that column, then the join would success and the updated version of the tuple in v would be returned wrongly, I think. Best regards, Etsuro Fujita
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Etsuro Fujita
Date:
On 2015/09/11 6:30, Robert Haas wrote: > On Wed, Sep 9, 2015 at 2:30 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> But that path might have already been discarded on the basis of cost. >>> I think Tom's idea is better: let the FDW consult some state cached >>> for this purpose in the RelOptInfo >> Do you have an idea of what information would be collected into the state >> and how the FDW would derive parameterizations to consider producing >> pushed-down joins with from that information? What I'm concerned about that >> is to reduce the number of parameterizations to consider, to reduce overhead >> in costing the corresponding queries. I'm missing something, though. > I think the thing we'd want to store in the state would be enough > information to reconstruct a valid join nest. For example, the > reloptinfo for (A B) might note that A needs to be left-joined to B. > When we go to construct paths for (A B C), and there is no > SpecialJoinInfo that mentions C, we know that we can construct (A LJ > B) IJ C rather than (A IJ B) IJ C. If any paths survived, we could > find a way to pull that information out of the path, but pulling it > out of the RelOptInfo should always work. > > I am not sure what to do about parameterizations. That's one of my > remaining concerns about moving the hook. Do you have any plan about the hook? Best regards, Etsuro Fujita
On 2015/09/12 1:38, Robert Haas wrote: > On Thu, Sep 10, 2015 at 11:36 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I've proposed the following API changes: >> >> * I modified create_foreignscan_path, which is called from >> postgresGetForeignJoinPaths/postgresGetForeignPaths, so that a path, >> subpath, is passed as the eighth argument of the function. subpath >> represents a local join execution path if scanrelid==0, but NULL if >> scanrelid>0. > OK, I see now. But I don't much like the way > get_unsorted_unparameterized_path() looks. > > First, it's basically praying that MergePath, NodePath, and NestPath > can be flat-copied without breaking anything. In general, we have > copyfuncs.c support for nodes that we need to be able to copy, and we > use copyObject() to do it. Even if what you've got here works today, > it's not very future-proof. Agreed. > Second, what guarantee do we have that we'll find a path with no > pathkeys and a NULL param_info? Why can't all of the paths for a join > relation have pathkeys? Why can't they all be parameterized? I can't > think of anything that would guarantee that. No. The reason why I've modified the patch that way is simply because the latest postgres_fdw patch doesn't support creating a remote query for a presorted or parameterized path for a remote join. > Third, even if such a guarantee existed, why is this the right > behavior? Any join type will produce the same output; it's just a > question of performance. And if you have only one tuple on each side, > surely a nested loop would be fine. Yeah, I think we would also need to consider the parameterization. > It seems to me that what you ought to be doing is using data hung off > the fdw_private field of each RelOptInfo to cache a NestPath that can > be used for EPQ rechecks at that level. When you go to consider > pushing down another join, you can build up a new NestPath that's > suitable for the new level. That seems much cleaner than groveling > through the list of surviving paths and hoping you find the right kind > of thing. Agreed. (From the first, I am not against that an FDW author creates the local join execution path by itself. The reason why I've modified the patch so as to find a local join execution path from the path list is simply because that is simple. The main point I'd like to discuss about the patch is the changes to the core code). > And all that having been said, I still don't really understand why you > are resisting the idea of providing a callback so that the FDW can > execute arbitrary code in the recheck path. There doesn't seem to be > any reason not to let the FDW take control of the rechecks if it > wishes, and there's no real cost in complexity that I can see. IMO I thought there would be not a little development burden on an FDW author. So, I was rather against the idea of providing such a callback. I know we still haven't reached a consensus on whether we address this issue by using a local join execution path. Best regards, Etsuro Fujita
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Fri, Oct 9, 2015 at 3:35 PM, Etsuro Fujita<span dir="ltr"><<a href="mailto:fujita.etsuro@lab.ntt.co.jp" target="_blank">fujita.etsuro@lab.ntt.co.jp</a>></span>wrote:<br /><br />Hi,<br /><br />Just to have hands on, I startedlooking into this issue and trying to<br />grasp it as this is totally new code for me. And later I want to review<br/>this code changes.<br /><br />I have noticed that, this thread started saying we are getting a crash<br />withthe given steps with foreign_join_v16.patch, I am correct?<br /><br />Then there are various patches which trying tofix this,<br />fdw-eval-plan-qual-*.patch<br /><br />I have tried applying foreign_join_v16.patch, which was good. Andtried<br />reproducing the crash. But instead of crash I am getting following error.<br /><br />ERROR: could not serializeaccess due to concurrent update<br />CONTEXT: Remote SQL command: SELECT a FROM public.foo FOR UPDATE<br />RemoteSQL command: SELECT a FROM public.tab FOR UPDATE<br /><br /><br />Then I have applied fdw-eval-plan-qual-3.0.patchon top of it. It was not<br />getting applied cleanly (may be due to some other changes meanwhile).<br/>I fixed the conflicts and the warnings to make it compile.<br /><br />When I run same sql sequence, I amgetting crash in terminal 2 at EXPLAIN<br />it self.<br /><br /><span style="font-family:monospace,monospace">server closedthe connection unexpectedly<br /> This probably means the server terminated abnormally<br /> before or whileprocessing the request.<br />The connection to the server was lost. Attempting reset: Failed.<br />!> </span><br/><br />Following sql statement I am using:<br /><br /><span style="font-family:monospace,monospace">create tabletab (a int, b int);<br />create foreign table foo (a int) server myserver options(table_name 'tab');<br />create foreigntable bar (a int) server myserver options(table_name 'tab');<br /><br />insert into tab values (1, 1);<br />insertinto foo values (1);<br />insert into bar values (1);<br /><br />analyze tab;<br />analyze foo;<br />analyze bar;<br/><br /><br />Run the example:<br /><br />[Terminal 1]<br />begin;<br />update tab set b = b + 1 where a = 1;<br /><br/>[Terminal 2]<br />explain verbose select tab.* from tab, foo, bar where tab.a =<br />foo.a and foo.a = bar.a for update;<br/></span><br /><br />Am I missing something here?<br />Do I need to apply any other patch from other mail-threads?<br/><br />Do you have sample test-case explaining the issue and fix?<br /><br />With these simple questions,I might have taking the thread slightly off<br />from the design considerations, please excuse me for that.<br/><br />Thanks<br /></div><br />-- <br /><div class="gmail_signature"><div dir="ltr">Jeevan B Chalke<br />PrincipalSoftware Engineer, Product Development<br />EnterpriseDB Corporation<br />The Enterprise PostgreSQL Company<br/><br /></div></div></div></div>
Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
From
Robert Haas
Date:
On Fri, Oct 9, 2015 at 5:41 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Do you have any plan about the hook? No. I think if you and Tom think it should be moved, one of you should propose a patch. Ideally accompanied by a demo of how postgres_fdw would be expected to use the revised hook. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 8, 2015 at 11:00 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> The best plan is presumably something like this as you said before: >> >> LockRows >> -> Nested Loop >> -> Seq Scan on verysmall v >> -> Foreign Scan on bigft1 and bigft2 >> Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = >> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 >> >> Consider the EvalPlanQual testing to see if the updated version of a >> tuple in v satisfies the query. If we use the column in the testing, we >> would get the wrong results in some cases. > > More precisely, we would get the wrong result when the value of v.q or v.r > in the updated version has changed. Interesting test case. It's worth considering why this works if you were to replace the Foreign Scan with an Index Scan; suppose the query is SELECT * FROM verysmall v LEFT JOIN realbiglocaltable t ON v.x = t.x FOR UPDATE OF v, so that you get: LockRows -> Nested Loop -> Seq Scan on verysmall v -> Foreign Scan on realbiglocaltable t Index Cond: v.x = t.x In your example, the remote SQL pushes down certain quals to the remote server, and so if we just return the same tuple they might no longer be satisfied. In this example, the index qual is essentially a filter condition that has been "pushed down" into the index AM. The EvalPlanQual machinery prevents this from generating wrong answers by rechecking the index cond - see IndexRecheck. Even though it's normally the AM's job to enforce the index cond, and the executor does not need to recheck, in the EvalPlanQual case it does need to recheck. I think the foreign data wrapper case should be handled the same way. Any condition that we initially pushed down to the foreign server needs to be locally rechecked if we're inside EPQ. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, At Fri, 9 Oct 2015 18:18:52 +0530, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote in <CAM2+6=XsXhMw_owFiKJP9syUx9eFc0x5U9jGOtO9v34G5epd8g@mail.gmail.com> > I have noticed that, this thread started saying we are getting a crash > with the given steps with foreign_join_v16.patch, I am correct? Your're correct. The immediate cause of the crash is an assertion failure that EvalPlanQualNext doesn't find a tuple to examine for a "foreign join" changed into a ForeignScan as the result of foreign join pushdown. > Then there are various patches which trying to fix this, > fdw-eval-plan-qual-*.patch > > I have tried applying foreign_join_v16.patch, which was good. And tried > reproducing the crash. But instead of crash I am getting following error. > > ERROR: could not serialize access due to concurrent update > CONTEXT: Remote SQL command: SELECT a FROM public.foo FOR UPDATE > Remote SQL command: SELECT a FROM public.tab FOR UPDATE It is because you took wrong steps. FDW runs a transaction in the isolation level above REPEATABLE READ. You updated a value locally while the fdw is locking the same tuple in REPEATABLE READ transaction. You should map different table as the foreign tables from the locally-modified table. - create table tab (a int, b int); - create foreign table foo (a int) server myserver options(table_name 'tab'); - create foreign table bar (a int) server myserver options(table_name 'tab'); + create table tab (a int, b int); + create table lfb (a int, b int); + create foreign table foo (a int) server myserver options(table_name 'lfb); + create foreign table bar (a int) server myserver options(table_name 'lfb'); And you'll get the following assertion failure. | TRAP: FailedAssertion("!(scanrelid > 0)", File: "execScan.c", Line: 52) | LOG: unexpected EOF on client connection with an open transaction | LOG: server process (PID 16738) was terminated by signal 6: Aborted | DETAIL: Failed process was running: explain (verbose, analyze) select t1.* from t1, ft2, ft2_2 where t1.a = ft2.a andft2.a = ft2_2.a for update; | LOG: terminating any other active server proces > Then I have applied fdw-eval-plan-qual-3.0.patch on top of it. It was not > getting applied cleanly (may be due to some other changes meanwhile). > I fixed the conflicts and the warnings to make it compile. The combination won't work because the patch requires postgres_fdw to put alternative path as subpath to create_foreignscan_path. AFAICS no corresponding forign-join patch has shown in this thread. This thread continues to discuss the desirable join pushdown API for FDW. > When I run same sql sequence, I am getting crash in terminal 2 at EXPLAIN > it self. > > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> > > Following sql statement I am using: > > create table tab (a int, b int); > create foreign table foo (a int) server myserver options(table_name 'tab'); > create foreign table bar (a int) server myserver options(table_name 'tab'); > > insert into tab values (1, 1); > insert into foo values (1); > insert into bar values (1); > > analyze tab; > analyze foo; > analyze bar; > > > Run the example: > > [Terminal 1] > begin; > update tab set b = b + 1 where a = 1; > > [Terminal 2] > explain verbose select tab.* from tab, foo, bar where tab.a = > foo.a and foo.a = bar.a for update; > > > Am I missing something here? > Do I need to apply any other patch from other mail-threads? > > Do you have sample test-case explaining the issue and fix? > > With these simple questions, I might have taking the thread slightly off > from the design considerations, please excuse me for that. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Thursday, October 08, 2015 7:56 PM > To: Kyotaro HORIGUCHI; robertmhaas@gmail.com > Cc: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org; > shigeru.hanada@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/07 15:39, Etsuro Fujita wrote: > > On 2015/10/07 15:06, Kyotaro HORIGUCHI wrote: > >> At Wed, 7 Oct 2015 00:24:57 -0400, Robert Haas <robertmhaas@gmail.com> > >> wrote > >>> I think it rather requires *replacing* two resjunk columns by one new > >>> one. The whole-row references for the individual foreign tables are > >>> only there to support EvalPlanQual; if we instead have a column to > >>> populate the foreign scan's slot directly, then we can use that column > >>> for that purpose directly and there's no remaining use for the > >>> whole-row vars on the baserels. > > >> It is what I had in mind. > > > OK I'll investigate this further. > > I noticed that the approach using a column to populate the foreign > scan's slot directly wouldn't work well in some cases. For example, > consider: > > SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; > > The best plan is presumably something like this as you said before: > > LockRows > -> Nested Loop > -> Seq Scan on verysmall v > -> Foreign Scan on bigft1 and bigft2 > Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > > Consider the EvalPlanQual testing to see if the updated version of a > tuple in v satisfies the query. If we use the column in the testing, we > would get the wrong results in some cases. > In this case, does ForeignScan have to be reset prior to ExecProcNode()? Once ExecReScanForeignScan() gets called by ExecNestLoop(), it marks EPQ slot is invalid. So, more or less, ForeignScan needs to kick the remote join again based on the new parameter come from the latest verysmall tuple. Please correct me, if I don't understand correctly. In case of unparametalized ForeignScan case, the cached join-tuple work well because it is independent from verysmall. Once again, if FDW driver is responsible to construct join-tuple from the base relation's tuple cached in EPQ slot, this case don't need to kick remote query again, because all the materials to construct join- tuple are already held locally. Right? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, At Fri, 9 Oct 2015 18:40:32 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <56178B90.4030206@lab.ntt.co.jp> > > What do you think the right behavior? # 'is' was omitted.. > IIUC, I think that the foreign scan's slot should be set empty, that Even for the case, EvalPlanQualFetchRowMarks retrieves tuples of remote tables out of the whole-row resjunks and set them to es_epqTuple[] so that EvalPlanQualNext can use it. The behavior is not different from the 'FOR UPDATE;' (for all tables) cases. I supposed that whole-row value for the joined tuple would be treated in the same manner to the case of the tuples of base foreign relations. This is because preprocess_rowmarks makes rowMarks of the type LCS_NONE for the relations other than the designated by "OF colref" for "FOR UPDATE". Then it is converted into ROW_MARK_COPY by select_rowmark_type, which causes the behavior above, as the default behavior for foreign tables. > the join should fail, and that the updated version of the tuple in v > should be ignored in that scenario since that for the updated version > of the tuple in v, the tuples obtained from those two foreign tables > wouldn't satisfy the remote query. AFAICS, no updated version for remote tables are obtained. Even though the behavior I described above is correct, the join would fail, too. But it is because v.r is no longer equal to bigft2.r in the whole-row-var tuples. This seems seemingly the same behavior with that on local tables. # LCS_NONE for local tables is converted into ROW_MARK_COPY if no # securityQuals are attached. > But if populating the foreign > scan's slot from that column, then the join would success and the > updated version of the tuple in v would be returned wrongly, I think. I might understand wrongly in some points.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, At Wed, 14 Oct 2015 03:07:31 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F801157077@BPXM15GP.gisp.nec.co.jp> > > I noticed that the approach using a column to populate the foreign > > scan's slot directly wouldn't work well in some cases. For example, > > consider: > > > > SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = > > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; > > > > The best plan is presumably something like this as you said before: > > > > LockRows > > -> Nested Loop > > -> Seq Scan on verysmall v > > -> Foreign Scan on bigft1 and bigft2 > > Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = > > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > > > > Consider the EvalPlanQual testing to see if the updated version of a > > tuple in v satisfies the query. If we use the column in the testing, we > > would get the wrong results in some cases. I have a basic (or maybe silly) qustion. Is it true that the join-inner (the foreignscan in the example) is re-executed with the modified value of v.r? I observed for a join case among only local tables that previously fetched tuples for the inner are simplly reused regardless of join types. Even when a refetch happens (I haven't confirmed but it would occur in the case of no security quals), the tuple is pointed by ctid so the re-join between local and remote would fail. Is this wrong? > In this case, does ForeignScan have to be reset prior to ExecProcNode()? > Once ExecReScanForeignScan() gets called by ExecNestLoop(), it marks EPQ > slot is invalid. So, more or less, ForeignScan needs to kick the remote > join again based on the new parameter come from the latest verysmall tuple. > Please correct me, if I don't understand correctly. So, no rescan would happen for the cases, I think. ReScan seems to be kicked only for the new(next) outer tuple that causes change of parameter, but not kicked for EPQ. I might take you wrongly.. > In case of unparametalized ForeignScan case, the cached join-tuple work > well because it is independent from verysmall. > Once again, if FDW driver is responsible to construct join-tuple from > the base relation's tuple cached in EPQ slot, this case don't need to > kick remote query again, because all the materials to construct join- > tuple are already held locally. Right? It is definitely right and should be doable. But I think the point we are argueing here is what is the desirable behavior. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/10/10 10:17, Robert Haas wrote: > On Thu, Oct 8, 2015 at 11:00 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> The best plan is presumably something like this as you said before: >>> >>> LockRows >>> -> Nested Loop >>> -> Seq Scan on verysmall v >>> -> Foreign Scan on bigft1 and bigft2 >>> Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = >>> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 >>> >>> Consider the EvalPlanQual testing to see if the updated version of a >>> tuple in v satisfies the query. If we use the column in the testing, we >>> would get the wrong results in some cases. >> More precisely, we would get the wrong result when the value of v.q or v.r >> in the updated version has changed. > Interesting test case. It's worth considering why this works if you > were to replace the Foreign Scan with an Index Scan; suppose the query > is SELECT * FROM verysmall v LEFT JOIN realbiglocaltable t ON v.x = > t.x FOR UPDATE OF v, so that you get: > > LockRows > -> Nested Loop > -> Seq Scan on verysmall v > -> Foreign Scan on realbiglocaltable t > Index Cond: v.x = t.x > > In your example, the remote SQL pushes down certain quals to the > remote server, and so if we just return the same tuple they might no > longer be satisfied. In this example, the index qual is essentially a > filter condition that has been "pushed down" into the index AM. The > EvalPlanQual machinery prevents this from generating wrong answers by > rechecking the index cond - see IndexRecheck. Even though it's > normally the AM's job to enforce the index cond, and the executor does > not need to recheck, in the EvalPlanQual case it does need to recheck. > > I think the foreign data wrapper case should be handled the same way. > Any condition that we initially pushed down to the foreign server > needs to be locally rechecked if we're inside EPQ. Agreed. As KaiGai-san also pointed out before, I think we should address this in each of the following cases: 1) remote qual (scanrelid>0) 2) remote join (scanrelid==0) As for #1, I noticed that there is a bug in handling the same kind of FDW queries, which will be shown below. As you said, I think this should be addressed by rechecking the remote quals *locally*. (I thought another fix for this kind of bug before, though.) IIUC, I think this should be fixed separately from #2, as this is a bug not only in 9.5, but in back branches. Please find attached a patch. Create an environment: mydatabase=# create table t1 (a int primary key, b text); mydatabase=# insert into t1 select a, 'notsolongtext' from generate_series(1, 1000000) a; postgres=# create server myserver foreign data wrapper postgres_fdw options (dbname 'mydatabase'); postgres=# create user mapping for current_user server myserver; postgres=# create foreign table ft1 (a int, b text) server myserver options (table_name 't1'); postgres=# alter foreign table ft1 options (add use_remote_estimate 'true'); postgres=# create table inttab (a int); postgres=# insert into inttab select a from generate_series(1, 10) a; postgres=# analyze ft1; postgres=# analyze inttab; Run concurrent transactions that produce incorrect result: [Terminal1] postgres=# begin; BEGIN postgres=# update inttab set a = a + 1 where a = 1; UPDATE 1 [Terminal2] postgres=# explain verbose select * from inttab, ft1 where inttab.a = ft1.a limit 1 for update; QUERY PLAN ------------------------------------------------------------------------------------------------- Limit (cost=100.43..198.99 rows=1 width=70) Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.* -> LockRows (cost=100.43..1086.00 rows=10 width=70) Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.* -> Nested Loop (cost=100.43..1085.90 rows=10 width=70) Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.* -> Seq Scan on public.inttab (cost=0.00..1.10 rows=10 width=10) Output: inttab.a, inttab.ctid -> Foreign Scan on public.ft1 (cost=100.43..108.47 rows=1 width=18) Output: ft1.a, ft1.b, ft1.* Remote SQL: SELECT a, b FROM public.t1 WHERE (($1::integer = a)) FOR UPDATE (11 rows) postgres=# select * from inttab, ft1 where inttab.a = ft1.a limit 1 for update; [Terminal1] postgres=# commit; COMMIT [Terminal2] (After the commit in Terminal1, the following result will be shown in Terminal2. Note that the values of inttab.a and ft1.a wouldn't satisfy the remote qual!) a | a | b ---+---+--------------- 2 | 1 | notsolongtext (1 row) As for #2, I didn't come up with any solution to locally rechecking pushed-down join conditions against a joined tuple populated from a column that we discussed. Instead, I'd like to revise a local-join-execution-plan-based approach that we discussed before, by addressing your comments such as [1]. Would it be the right way to go? Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/CA+TgmoaAzs0dR23R7PTBseQfwOtuVCPNBqDHxeBo9Gi+dMxj8w@mail.gmail.com
Attachment
On 2015/10/14 12:07, Kouhei Kaigai wrote: >> On 2015/10/07 15:39, Etsuro Fujita wrote: >> I noticed that the approach using a column to populate the foreign >> scan's slot directly wouldn't work well in some cases. For example, >> consider: >> >> SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = >> bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; >> >> The best plan is presumably something like this as you said before: >> >> LockRows >> -> Nested Loop >> -> Seq Scan on verysmall v >> -> Foreign Scan on bigft1 and bigft2 >> Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = >> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 >> >> Consider the EvalPlanQual testing to see if the updated version of a >> tuple in v satisfies the query. If we use the column in the testing, we >> would get the wrong results in some cases. > In this case, does ForeignScan have to be reset prior to ExecProcNode()? > Once ExecReScanForeignScan() gets called by ExecNestLoop(), it marks EPQ > slot is invalid. So, more or less, ForeignScan needs to kick the remote > join again based on the new parameter come from the latest verysmall tuple. > Please correct me, if I don't understand correctly. > In case of unparametalized ForeignScan case, the cached join-tuple work > well because it is independent from verysmall. > > Once again, if FDW driver is responsible to construct join-tuple from > the base relation's tuple cached in EPQ slot, this case don't need to > kick remote query again, because all the materials to construct join- > tuple are already held locally. Right? Sorry, maybe I misunderstand your words, but we are talking here about an approach using a whole-row var that would populate a join tuple that is returned by an FDW and stored in the scan slot in the corresponding ForeingScanState node in the parent state tree. Best regards, Etsuro Fujita
Ah.. I understood that what you mentioned is the lack of local recheck of foreigh tuples. Sorry for the noise. At Wed, 14 Oct 2015 17:31:16 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <561E12D4.7040403@lab.ntt.co.jp> On 2015/10/10 10:17, Robert Haas wrote: > > On Thu, Oct 8, 2015 at 11:00 PM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > >>> The best plan is presumably something like this as you said before: > >>> > >>> LockRows > >>> -> Nested Loop > >>> -> Seq Scan on verysmall v > >>> -> Foreign Scan on bigft1 and bigft2 > >>> Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = > >>> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > >>> > >>> Consider the EvalPlanQual testing to see if the updated version of a > >>> tuple in v satisfies the query. If we use the column in the testing, > >>> we > >>> would get the wrong results in some cases. > > >> More precisely, we would get the wrong result when the value of v.q or > >> v.r > >> in the updated version has changed. > > > Interesting test case. It's worth considering why this works if you > > were to replace the Foreign Scan with an Index Scan; suppose the query > > is SELECT * FROM verysmall v LEFT JOIN realbiglocaltable t ON v.x = > > t.x FOR UPDATE OF v, so that you get: > > > > LockRows > > -> Nested Loop > > -> Seq Scan on verysmall v > > -> Foreign Scan on realbiglocaltable t > > Index Cond: v.x = t.x > > > > In your example, the remote SQL pushes down certain quals to the > > remote server, and so if we just return the same tuple they might no > > longer be satisfied. In this example, the index qual is essentially a > > filter condition that has been "pushed down" into the index AM. The > > EvalPlanQual machinery prevents this from generating wrong answers by > > rechecking the index cond - see IndexRecheck. Even though it's > > normally the AM's job to enforce the index cond, and the executor does > > not need to recheck, in the EvalPlanQual case it does need to recheck. > > > > I think the foreign data wrapper case should be handled the same way. > > Any condition that we initially pushed down to the foreign server > > needs to be locally rechecked if we're inside EPQ. > > Agreed. > > As KaiGai-san also pointed out before, I think we should address this > in each of the following cases: > > 1) remote qual (scanrelid>0) > 2) remote join (scanrelid==0) > > As for #1, I noticed that there is a bug in handling the same kind of > FDW queries, which will be shown below. As you said, I think this > should be addressed by rechecking the remote quals *locally*. (I > thought another fix for this kind of bug before, though.) IIUC, I > think this should be fixed separately from #2, as this is a bug not > only in 9.5, but in back branches. Please find attached a patch. > > Create an environment: > > mydatabase=# create table t1 (a int primary key, b text); > mydatabase=# insert into t1 select a, 'notsolongtext' from > generate_series(1, 1000000) a; > > postgres=# create server myserver foreign data wrapper postgres_fdw > options (dbname 'mydatabase'); > postgres=# create user mapping for current_user server myserver; > postgres=# create foreign table ft1 (a int, b text) server myserver > options (table_name 't1'); > postgres=# alter foreign table ft1 options (add use_remote_estimate > 'true'); > postgres=# create table inttab (a int); > postgres=# insert into inttab select a from generate_series(1, 10) a; > postgres=# analyze ft1; > postgres=# analyze inttab; > > Run concurrent transactions that produce incorrect result: > > [Terminal1] > postgres=# begin; > BEGIN > postgres=# update inttab set a = a + 1 where a = 1; > UPDATE 1 > > [Terminal2] > postgres=# explain verbose select * from inttab, ft1 where inttab.a = > ft1.a limit 1 for update; > QUERY PLAN > ------------------------------------------------------------------------------------------------- > Limit (cost=100.43..198.99 rows=1 width=70) > Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.* > -> LockRows (cost=100.43..1086.00 rows=10 width=70) > Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.* > -> Nested Loop (cost=100.43..1085.90 rows=10 width=70) > Output: inttab.a, ft1.a, ft1.b, inttab.ctid, ft1.* > -> Seq Scan on public.inttab (cost=0.00..1.10 rows=10 > -> width=10) > Output: inttab.a, inttab.ctid > -> Foreign Scan on public.ft1 (cost=100.43..108.47 rows=1 > -> width=18) > Output: ft1.a, ft1.b, ft1.* > Remote SQL: SELECT a, b FROM public.t1 WHERE > (($1::integer = a)) FOR UPDATE > (11 rows) > > postgres=# select * from inttab, ft1 where inttab.a = ft1.a limit 1 > for update; > > [Terminal1] > postgres=# commit; > COMMIT > > [Terminal2] > (After the commit in Terminal1, the following result will be shown in > Terminal2. Note that the values of inttab.a and ft1.a wouldn't > satisfy the remote qual!) > a | a | b > ---+---+--------------- > 2 | 1 | notsolongtext > (1 row) > > As for #2, I didn't come up with any solution to locally rechecking > pushed-down join conditions against a joined tuple populated from a > column that we discussed. Instead, I'd like to revise a > local-join-execution-plan-based approach that we discussed before, by > addressing your comments such as [1]. Would it be the right way to > go? > > Best regards, > Etsuro Fujita > > [1] > http://www.postgresql.org/message-id/CA+TgmoaAzs0dR23R7PTBseQfwOtuVCPNBqDHxeBo9Gi+dMxj8w@mail.gmail.com -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Oct 14, 2015 at 3:10 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > AFAICS, no updated version for remote tables are obtained. You're right, but that's OK: the previously-obtained tuples fail to meet the current version of the quals, so there's no problem (that I can see). > Even though the behavior I described above is correct, the join > would fail, too. But it is because v.r is no longer equal to > bigft2.r in the whole-row-var tuples. This seems seemingly the > same behavior with that on local tables. Yeah. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 14, 2015 at 4:31 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Agreed. > > As KaiGai-san also pointed out before, I think we should address this in > each of the following cases: > > 1) remote qual (scanrelid>0) > 2) remote join (scanrelid==0) > > As for #1, I noticed that there is a bug in handling the same kind of FDW > queries, which will be shown below. As you said, I think this should be > addressed by rechecking the remote quals *locally*. (I thought another fix > for this kind of bug before, though.) IIUC, I think this should be fixed > separately from #2, as this is a bug not only in 9.5, but in back branches. > Please find attached a patch. +1 for doing something like this. However, I don't think we can commit this to released branches, despite the fact that it's a bug fix, because breaking third-party FDWs in a minor release seems unfriendly. We might be able to slip it into 9.5, though, if we act quickly. A few review comments: - nodeForeignscan.c now needs to #include "utils/memutils.h" - I think it'd be safer for finalize_plan() not to try to shortcut processing fdw_scan_quals. - You forgot to update _readForeignScan. - The documentation needs updating. - I think we should use the name fdw_recheck_quals. Here's an updated patch with those changes and some improvements to the comments. Absent objections, I will commit it and back-patch to 9.5 only. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
> -----Original Message----- > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp] > Sent: Wednesday, October 14, 2015 4:40 PM > To: Kaigai Kouhei(海外 浩平) > Cc: fujita.etsuro@lab.ntt.co.jp; pgsql-hackers@postgresql.org; > shigeru.hanada@gmail.com; robertmhaas@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Hello, > > At Wed, 14 Oct 2015 03:07:31 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote > in <9A28C8860F777E439AA12E8AEA7694F801157077@BPXM15GP.gisp.nec.co.jp> > > > I noticed that the approach using a column to populate the foreign > > > scan's slot directly wouldn't work well in some cases. For example, > > > consider: > > > > > > SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = > > > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; > > > > > > The best plan is presumably something like this as you said before: > > > > > > LockRows > > > -> Nested Loop > > > -> Seq Scan on verysmall v > > > -> Foreign Scan on bigft1 and bigft2 > > > Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = > > > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > > > > > > Consider the EvalPlanQual testing to see if the updated version of a > > > tuple in v satisfies the query. If we use the column in the testing, we > > > would get the wrong results in some cases. > > I have a basic (or maybe silly) qustion. Is it true that the > join-inner (the foreignscan in the example) is re-executed with > the modified value of v.r? I observed for a join case among only > local tables that previously fetched tuples for the inner are > simplly reused regardless of join types. Even when a refetch > happens (I haven't confirmed but it would occur in the case of no > security quals), the tuple is pointed by ctid so the re-join > between local and remote would fail. Is this wrong? > Let's dive into ExecNestLoop(). Once nl_NeedNewOuter is true, ExecProcNode(outerPlan) is called then ExecReScan(innerPlan) is called with new param-info delivered from the outer-tuple. nl_NeedNewOuter is reset just after ExecProcNode(outerPlan), then it is set once outer-tuple is needed again when inner-scan reached to end of the relation, or found a tuple on semi-join. In case of semi-join returned a joined-tuple then EPQ recheck is applied, it can call ExecProcNode(outerPlan) and reset inner-plan state. It is what I can say from the existing code. I doubt whether the behavior is right on EPQ rechecks. The above scenario introduces the inner-relation (verysmall) is updated by the concurrent session, thus param-info has to be updated. However, it does not looks to me the implementation pays attention here. If ExecNestLoop() is called under the EPQ recheck context, it needs to call ExecProcNode() towards both of outer and inner plan to ensure the visibility of joined-tuple towards the latest status. Of course, underlying scan plans for base relations never make advance the scan pointer. It just returns a tuple in EPQ slot, then I want ExecNestLoop() to evaluate whether these tuples satisfies the join-clause. > > In this case, does ForeignScan have to be reset prior to ExecProcNode()? > > Once ExecReScanForeignScan() gets called by ExecNestLoop(), it marks EPQ > > slot is invalid. So, more or less, ForeignScan needs to kick the remote > > join again based on the new parameter come from the latest verysmall tuple. > > Please correct me, if I don't understand correctly. > > So, no rescan would happen for the cases, I think. ReScan seems > to be kicked only for the new(next) outer tuple that causes > change of parameter, but not kicked for EPQ. I might take you > wrongly.. > > > In case of unparametalized ForeignScan case, the cached join-tuple work > > well because it is independent from verysmall. > > > > Once again, if FDW driver is responsible to construct join-tuple from > > the base relation's tuple cached in EPQ slot, this case don't need to > > kick remote query again, because all the materials to construct join- > > tuple are already held locally. Right? > > It is definitely right and should be doable. But I think the > point we are argueing here is what is the desirable behavior. > In case of scanrelid==0, expectation to ForeignScan/CustomScan is to behave as if local join exists here. It requires ForeignScan to generate joined-tuple as a result of remote join, that may contains multiple junk TLEs to carry whole-var references of base foreign tables. According to the criteria, the desirable behavior is clear as below: 1. FDW/CSP picks up base relation's tuple from the EPQ slots. It shall be setup by whole-row reference if earlier row-locksemantics, or by RefetchForeignRow if later row-lock semantics. 2. Fill up ss_ScanTupleSlot according to the xxx_scan_tlist. We may be able to provide a common support function here, becausethis list keeps relation between a particular attribute of the joined-tuple and its source column. 3. Apply join-clause and base-restrict that were pushed down. setrefs.c initializes expressions kept in fdw_exprs/custom_exprsto run on the ss_ScanTupleSlot. It is the easiest way to check here. 4. If joined-tuple is still visible after the step 3, FDW/CSP returns joined-tuple. Elsewhere, returns an empty slot. It is entirely compatible behavior even if local join is located on the point of ForeignScan/CustomScan with scanrelid==0. Even if remote join is parametalized by other relation, we can simply use param-info delivered from the corresponding outer scan at the step-3. EState should have the parameters already updated, FDW driver needs to care about nothing. It is quite less invasive approach towards the existing EPQ recheck mechanism. I cannot understand why Fujita-san never "try" this approach. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, I confirmed that an epqtuple of foreign parameterized scan is correctly rejected by fdw_recheck_quals with modified outer tuple. I have no objection to this and have two humble comments. In file_fdw.c, the comment for the last parameter just after the added line seems to be better to be aligned with other comments. In subselect.c, the added break is in the added curly-braces but it would be better to place it after the closing brace, like the other cases. regards, At Wed, 14 Oct 2015 15:21:41 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZ8REePoFv7ZjjDH-T54sQw40fnP4Mkr8hw5eizbxA4BA@mail.gmail.com> > On Wed, Oct 14, 2015 at 4:31 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > 1) remote qual (scanrelid>0) > > 2) remote join (scanrelid==0) > > > > As for #1, I noticed that there is a bug in handling the same kind of FDW > > queries, which will be shown below. As you said, I think this should be > > addressed by rechecking the remote quals *locally*. (I thought another fix > > for this kind of bug before, though.) IIUC, I think this should be fixed > > separately from #2, as this is a bug not only in 9.5, but in back branches. > > Please find attached a patch. > > +1 for doing something like this. However, I don't think we can > commit this to released branches, despite the fact that it's a bug > fix, because breaking third-party FDWs in a minor release seems > unfriendly. We might be able to slip it into 9.5, though, if we act > quickly. > > A few review comments: > > - nodeForeignscan.c now needs to #include "utils/memutils.h" > - I think it'd be safer for finalize_plan() not to try to shortcut > processing fdw_scan_quals. > - You forgot to update _readForeignScan. > - The documentation needs updating. > - I think we should use the name fdw_recheck_quals. > > Here's an updated patch with those changes and some improvements to > the comments. Absent objections, I will commit it and back-patch to > 9.5 only. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/10/15 11:36, Kouhei Kaigai wrote: >>> Once again, if FDW driver is responsible to construct join-tuple from >>> the base relation's tuple cached in EPQ slot, this case don't need to >>> kick remote query again, because all the materials to construct join- >>> tuple are already held locally. Right? I now understand clearly what you mean. Sorry for my misunderstanding. > In case of scanrelid==0, expectation to ForeignScan/CustomScan is to > behave as if local join exists here. It requires ForeignScan to generate > joined-tuple as a result of remote join, that may contains multiple junk > TLEs to carry whole-var references of base foreign tables. > According to the criteria, the desirable behavior is clear as below: > > 1. FDW/CSP picks up base relation's tuple from the EPQ slots. > It shall be setup by whole-row reference if earlier row-lock semantics, > or by RefetchForeignRow if later row-lock semantics. > > 2. Fill up ss_ScanTupleSlot according to the xxx_scan_tlist. > We may be able to provide a common support function here, because this > list keeps relation between a particular attribute of the joined-tuple > and its source column. > > 3. Apply join-clause and base-restrict that were pushed down. > setrefs.c initializes expressions kept in fdw_exprs/custom_exprs to run > on the ss_ScanTupleSlot. It is the easiest way to check here. > > 4. If joined-tuple is still visible after the step 3, FDW/CSP returns > joined-tuple. Elsewhere, returns an empty slot. > > It is entirely compatible behavior even if local join is located on > the point of ForeignScan/CustomScan with scanrelid==0. > > Even if remote join is parametalized by other relation, we can simply > use param-info delivered from the corresponding outer scan at the step-3. > EState should have the parameters already updated, FDW driver needs to > care about nothing. > > It is quite less invasive approach towards the existing EPQ recheck > mechanism. I see. That's an idea, but I guess that step 2 and 3 would need to add a lot of code to the core. Why don't you use a local join execution plan that we discussed? I think that that would make the series of processing much simpler. I'm now revising the patch that I created for that. If it's okay, I'd like to propose an updated version of the patch in a few days. > I cannot understand why Fujita-san never "try" this approach. Maybe my explanation was not correct, but I didn't say such a thing. What I rather objected against was to add a new FDW callback routine for rechecking pushed-down quals or pushed-down joins, which I think you insisted on. Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Thursday, October 15, 2015 7:00 PM > To: Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI > Cc: pgsql-hackers@postgresql.org; shigeru.hanada@gmail.com; > robertmhaas@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/15 11:36, Kouhei Kaigai wrote: > >>> Once again, if FDW driver is responsible to construct join-tuple from > >>> the base relation's tuple cached in EPQ slot, this case don't need to > >>> kick remote query again, because all the materials to construct join- > >>> tuple are already held locally. Right? > > I now understand clearly what you mean. Sorry for my misunderstanding. > > > In case of scanrelid==0, expectation to ForeignScan/CustomScan is to > > behave as if local join exists here. It requires ForeignScan to generate > > joined-tuple as a result of remote join, that may contains multiple junk > > TLEs to carry whole-var references of base foreign tables. > > According to the criteria, the desirable behavior is clear as below: > > > > 1. FDW/CSP picks up base relation's tuple from the EPQ slots. > > It shall be setup by whole-row reference if earlier row-lock semantics, > > or by RefetchForeignRow if later row-lock semantics. > > > > 2. Fill up ss_ScanTupleSlot according to the xxx_scan_tlist. > > We may be able to provide a common support function here, because this > > list keeps relation between a particular attribute of the joined-tuple > > and its source column. > > > > 3. Apply join-clause and base-restrict that were pushed down. > > setrefs.c initializes expressions kept in fdw_exprs/custom_exprs to run > > on the ss_ScanTupleSlot. It is the easiest way to check here. > > > > 4. If joined-tuple is still visible after the step 3, FDW/CSP returns > > joined-tuple. Elsewhere, returns an empty slot. > > > > It is entirely compatible behavior even if local join is located on > > the point of ForeignScan/CustomScan with scanrelid==0. > > > > Even if remote join is parametalized by other relation, we can simply > > use param-info delivered from the corresponding outer scan at the step-3. > > EState should have the parameters already updated, FDW driver needs to > > care about nothing. > > > > It is quite less invasive approach towards the existing EPQ recheck > > mechanism. > > I see. That's an idea, but I guess that step 2 and 3 would need to add > a lot of code to the core. Why don't you use a local join execution > plan that we discussed? I think that that would make the series of > processing much simpler. I'm now revising the patch that I created for > that. If it's okay, I'd like to propose an updated version of the patch > in a few days. > I have to introduce why above idea is simpler and suitable for v9.5 timeline. As I've consistently proposed for this two months, the step-2 and 3 are assumed to be handled in the callback routine to be kicked from ForeignRecheck(). Even if core backend eventually provides utility routines to support above tasks, it is not mandatory requirement from the beginning; v9.5 timeline at least. As long as the callback is provided, FDW driver "can" implement above features by itself, with their comfortable way. Note that alternative local join plan is one way to implement the above step-2 and -3, however, I never enforce people to use a particular way. People can choose. Regarding to scale of the code in the core backend, it is pretty small because all we need to add is just a callback in v9.5. We can implement the remaining support routine in v9.6 timeline, but not now. > > I cannot understand why Fujita-san never "try" this approach. > > Maybe my explanation was not correct, but I didn't say such a thing. > What I rather objected against was to add a new FDW callback routine for > rechecking pushed-down quals or pushed-down joins, which I think you > insisted on. > My proposition has been consistent. The interface contract (that is the job of callback implementation in other words) in the series of sequence is above 4-steps I introduced. We can use alternative local join plan, or own implementation to fill up ss_ScanTupleSlot, or something common support routine provided by core. Regardless of the implementation choice, the callback approach minimizes the impact towards existing EPQ recheck mechanism and release schedule of v9.5. Also, it can cover the case handling when scanrelid==0. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, Oct 15, 2015 at 3:04 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > I confirmed that an epqtuple of foreign parameterized scan is > correctly rejected by fdw_recheck_quals with modified outer > tuple. > > I have no objection to this and have two humble comments. > > In file_fdw.c, the comment for the last parameter just after the > added line seems to be better to be aligned with other comments. I've pgindented the file. Any other space we might choose would just be changed by the next pgindent run, so there's no point in trying to vary. > In subselect.c, the added break is in the added curly-braces but > it would be better to place it after the closing brace, like the > other cases. Changed that, and committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/10/16 2:14, Robert Haas wrote: > On Thu, Oct 15, 2015 at 3:04 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> I confirmed that an epqtuple of foreign parameterized scan is >> correctly rejected by fdw_recheck_quals with modified outer >> tuple. >> >> I have no objection to this and have two humble comments. >> >> In file_fdw.c, the comment for the last parameter just after the >> added line seems to be better to be aligned with other comments. > I've pgindented the file. Any other space we might choose would just > be changed by the next pgindent run, so there's no point in trying to > vary. >> In subselect.c, the added break is in the added curly-braces but >> it would be better to place it after the closing brace, like the >> other cases. > Changed that, and committed. Thanks, Robert and Horiguchi-san. Best regards, Etsuro Fujita
>> On 2015/10/15 11:36, Kouhei Kaigai wrote: >>> In case of scanrelid==0, expectation to ForeignScan/CustomScan is to >>> behave as if local join exists here. It requires ForeignScan to generate >>> joined-tuple as a result of remote join, that may contains multiple junk >>> TLEs to carry whole-var references of base foreign tables. >>> According to the criteria, the desirable behavior is clear as below: >>> >>> 1. FDW/CSP picks up base relation's tuple from the EPQ slots. >>> It shall be setup by whole-row reference if earlier row-lock semantics, >>> or by RefetchForeignRow if later row-lock semantics. >>> >>> 2. Fill up ss_ScanTupleSlot according to the xxx_scan_tlist. >>> We may be able to provide a common support function here, because this >>> list keeps relation between a particular attribute of the joined-tuple >>> and its source column. >>> >>> 3. Apply join-clause and base-restrict that were pushed down. >>> setrefs.c initializes expressions kept in fdw_exprs/custom_exprs to run >>> on the ss_ScanTupleSlot. It is the easiest way to check here. >>> >>> 4. If joined-tuple is still visible after the step 3, FDW/CSP returns >>> joined-tuple. Elsewhere, returns an empty slot. >>> >>> It is entirely compatible behavior even if local join is located on >>> the point of ForeignScan/CustomScan with scanrelid==0. >>> >>> Even if remote join is parametalized by other relation, we can simply >>> use param-info delivered from the corresponding outer scan at the step-3. >>> EState should have the parameters already updated, FDW driver needs to >>> care about nothing. >>> >>> It is quite less invasive approach towards the existing EPQ recheck >>> mechanism. I wrote: >> I see. That's an idea, but I guess that step 2 and 3 would need to add >> a lot of code to the core. Why don't you use a local join execution >> plan that we discussed? I think that that would make the series of >> processing much simpler. I'm now revising the patch that I created for >> that. If it's okay, I'd like to propose an updated version of the patch >> in a few days. On 2015/10/15 20:19, Kouhei Kaigai wrote: > I have to introduce why above idea is simpler and suitable for v9.5 > timeline. > As I've consistently proposed for this two months, the step-2 and 3 > are assumed to be handled in the callback routine to be kicked from > ForeignRecheck(). Honestly, I still don't think I would see the much value in doing so. As Robert mentioned in [1], I think that if we're inside EPQ, pushed-down quals and/or pushed-down joins should be locally rechecked in the same way as other cases such as IndexRecheck. So, I'll propose the updated version of the patch. Thanks for the explanation! Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/CA+Tgmoau7jVTLF0Oh9a_Mu9S=vrw7i6u_h7JSpzBXv0xtyo_Bg@mail.gmail.com
On 2015/10/14 17:31, Etsuro Fujita wrote: > As KaiGai-san also pointed out before, I think we should address this in > each of the following cases: > > 1) remote qual (scanrelid>0) > 2) remote join (scanrelid==0) As for #2, I updated the patch, which uses a local join execution plan for an EvalPlanQual rechech, according to the comment from Robert [1]. Attached is an updated version of the patch. This is a WIP patch, but it would be appreciated if I could get feedback earlier. For tests, apply the patches: foreign-recheck-for-foreign-join-1.patch usermapping_matching.patch [2] add_GetUserMappingById.patch [2] foreign_join_v16_efujita.patch [3] Since that as I said upthread, what I'd like to discuss is changes to the PG core, I didn't do anything about the postgres_fdw patches. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/CA+TgmoaAzs0dR23R7PTBseQfwOtuVCPNBqDHxeBo9Gi+dMxj8w@mail.gmail.com [2] http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com [3] http://www.postgresql.org/message-id/55CB2D45.7040100@lab.ntt.co.jp
Attachment
On Thu, Oct 15, 2015 at 10:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 15, 2015 at 3:04 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I confirmed that an epqtuple of foreign parameterized scan is
> correctly rejected by fdw_recheck_quals with modified outer
> tuple.
>
> I have no objection to this and have two humble comments.
>
> In file_fdw.c, the comment for the last parameter just after the
> added line seems to be better to be aligned with other comments.
I've pgindented the file. Any other space we might choose would just
be changed by the next pgindent run, so there's no point in trying to
vary.
> In subselect.c, the added break is in the added curly-braces but
> it would be better to place it after the closing brace, like the
> other cases.
Changed that, and committed.
With the latest sources having this commit, when I follow same steps,
I get
ERROR: unrecognized node type: 525
I get
ERROR: unrecognized node type: 525
error.
It looks like, we have missed to handle T_RestrictInfo.
I am getting this error from expression_tree_mutator().
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
> >> On 2015/10/15 11:36, Kouhei Kaigai wrote: > >>> In case of scanrelid==0, expectation to ForeignScan/CustomScan is to > >>> behave as if local join exists here. It requires ForeignScan to generate > >>> joined-tuple as a result of remote join, that may contains multiple junk > >>> TLEs to carry whole-var references of base foreign tables. > >>> According to the criteria, the desirable behavior is clear as below: > >>> > >>> 1. FDW/CSP picks up base relation's tuple from the EPQ slots. > >>> It shall be setup by whole-row reference if earlier row-lock semantics, > >>> or by RefetchForeignRow if later row-lock semantics. > >>> > >>> 2. Fill up ss_ScanTupleSlot according to the xxx_scan_tlist. > >>> We may be able to provide a common support function here, because this > >>> list keeps relation between a particular attribute of the joined-tuple > >>> and its source column. > >>> > >>> 3. Apply join-clause and base-restrict that were pushed down. > >>> setrefs.c initializes expressions kept in fdw_exprs/custom_exprs to run > >>> on the ss_ScanTupleSlot. It is the easiest way to check here. > >>> > >>> 4. If joined-tuple is still visible after the step 3, FDW/CSP returns > >>> joined-tuple. Elsewhere, returns an empty slot. > >>> > >>> It is entirely compatible behavior even if local join is located on > >>> the point of ForeignScan/CustomScan with scanrelid==0. > >>> > >>> Even if remote join is parametalized by other relation, we can simply > >>> use param-info delivered from the corresponding outer scan at the step-3. > >>> EState should have the parameters already updated, FDW driver needs to > >>> care about nothing. > >>> > >>> It is quite less invasive approach towards the existing EPQ recheck > >>> mechanism. > > I wrote: > >> I see. That's an idea, but I guess that step 2 and 3 would need to add > >> a lot of code to the core. Why don't you use a local join execution > >> plan that we discussed? I think that that would make the series of > >> processing much simpler. I'm now revising the patch that I created for > >> that. If it's okay, I'd like to propose an updated version of the patch > >> in a few days. > > On 2015/10/15 20:19, Kouhei Kaigai wrote: > > I have to introduce why above idea is simpler and suitable for v9.5 > > timeline. > > As I've consistently proposed for this two months, the step-2 and 3 > > are assumed to be handled in the callback routine to be kicked from > > ForeignRecheck(). > > Honestly, I still don't think I would see the much value in doing so. > As Robert mentioned in [1], I think that if we're inside EPQ, > pushed-down quals and/or pushed-down joins should be locally rechecked > in the same way as other cases such as IndexRecheck. So, I'll propose > the updated version of the patch. > You have never answered my question for two months. I never deny to execute the pushed-down qualifiers locally. It is likely the best tactics in most cases. But, why you try to enforce all the people a particular manner? Here are various kind of FDW drivers. How do you guarantee it is the best solution for all the people? It is basically impossible. (Please google "Probatio diabolica") You try to add two special purpose fields in ForeignScan; fdw_recheck_plan and fdw_recheck_quals. It requires FDW drivers to have pushed-down qualifier in a particular data format, and also requires FDW drivers to process EPQ recheck by alternative local plan, even if a part of FDW drivers can process these jobs by its own implementation better. I've repeatedly pointed out this issue, but never get reasonable answer from you. Again, I also admit alternative plan may be reasonable tactics for most of FDW drivers. However, only FDW author can "decide" it is the best tactics to handle the task for their module, not us. I don't think it is a good interface design to enforce people to follow a particular implementation manner. It should be discretion of the extension. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Fri, Oct 16, 2015 at 3:10 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
On Thu, Oct 15, 2015 at 10:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Oct 15, 2015 at 3:04 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I confirmed that an epqtuple of foreign parameterized scan is
> correctly rejected by fdw_recheck_quals with modified outer
> tuple.
>
> I have no objection to this and have two humble comments.
>
> In file_fdw.c, the comment for the last parameter just after the
> added line seems to be better to be aligned with other comments.
I've pgindented the file. Any other space we might choose would just
be changed by the next pgindent run, so there's no point in trying to
vary.
> In subselect.c, the added break is in the added curly-braces but
> it would be better to place it after the closing brace, like the
> other cases.
Changed that, and committed.With the latest sources having this commit, when I follow same steps,
I get
ERROR: unrecognized node type: 525error.It looks like, we have missed to handle T_RestrictInfo.I am getting this error from expression_tree_mutator().
Ignore this.
It was caused due to some compilation issue on my system.
It is working as expected in the latest sources.
Sorry for the noise and inconvenience caused.
--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
I briefly browsed the patch apart from my preference towards the approach. It has at least two oversight. *** 48,59 **** ExecScanFetch(ScanState *node, + /* + * Execute recheck plan and get the next tuple if foreign join. + */ + if (scanrelid == 0) + { + (*recheckMtd) (node, slot); + return slot; + } Ensure the slot is empty if recheckMtd returned false, as base relation case doing so. *** 347,352 **** ExecScanReScan(ScanState *node) { Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; + if (scanrelid == 0) + return; /* nothing to do */ + Assert(scanrelid > 0); estate->es_epqScanDone[scanrelid - 1] = false; Why nothing to do? Base relations managed by ForeignScan are tracked in fs_relids bitmap. As you introduced a few days before, if ForeignScan has parametalized remote join, EPQ slot contains invalid tuples based on old outer tuple. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Friday, October 16, 2015 6:01 PM > To: Robert Haas > Cc: Kyotaro HORIGUCHI; Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org; > Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/14 17:31, Etsuro Fujita wrote: > > As KaiGai-san also pointed out before, I think we should address this in > > each of the following cases: > > > > 1) remote qual (scanrelid>0) > > 2) remote join (scanrelid==0) > > As for #2, I updated the patch, which uses a local join execution plan > for an EvalPlanQual rechech, according to the comment from Robert [1]. > Attached is an updated version of the patch. This is a WIP patch, but > it would be appreciated if I could get feedback earlier. > > For tests, apply the patches: > > foreign-recheck-for-foreign-join-1.patch > usermapping_matching.patch [2] > add_GetUserMappingById.patch [2] > foreign_join_v16_efujita.patch [3] > > Since that as I said upthread, what I'd like to discuss is changes to > the PG core, I didn't do anything about the postgres_fdw patches. > > Best regards, > Etsuro Fujita > > [1] > http://www.postgresql.org/message-id/CA+TgmoaAzs0dR23R7PTBseQfwOtuVCPNBqDHxe > Bo9Gi+dMxj8w@mail.gmail.com > [2] > http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj > 8wTze+CYJUHg@mail.gmail.com > [3] http://www.postgresql.org/message-id/55CB2D45.7040100@lab.ntt.co.jp
On Fri, Oct 16, 2015 at 5:00 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > As for #2, I updated the patch, which uses a local join execution plan for > an EvalPlanQual rechech, according to the comment from Robert [1]. Attached > is an updated version of the patch. This is a WIP patch, but it would be > appreciated if I could get feedback earlier. I don't see how this can be right. You're basically just pretending EPQ doesn't exist in the remote join case, which isn't going to work at all. Those bits of code that look at es_epqTuple, es_epqTupleSet, and es_epqScanDone are not optional. You can't just skip over those as if they don't matter. Again, the root of the problem here is that the EPQ machinery provides 1 slot per RTI, and it uses scanrelid to figure out which EPQ slot is applicable for a given scan node. Here, we have scanrelid == 0, so it gets confused. But it's not the case that a pushed-down join has NO scanrelid. It actually has, in effect, *multiple* scanrelids. So we should pick any one of those, say the lowest-numbered one, and use that to decide which EPQ slot to use. The attached patch shows roughly what I have in mind, although this is just crap code to demonstrate the basic idea and doesn't pretend to adjust everything that needs fixing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
> On Fri, Oct 16, 2015 at 5:00 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > As for #2, I updated the patch, which uses a local join execution plan for > > an EvalPlanQual rechech, according to the comment from Robert [1]. Attached > > is an updated version of the patch. This is a WIP patch, but it would be > > appreciated if I could get feedback earlier. > > I don't see how this can be right. You're basically just pretending > EPQ doesn't exist in the remote join case, which isn't going to work > at all. Those bits of code that look at es_epqTuple, es_epqTupleSet, > and es_epqScanDone are not optional. You can't just skip over those > as if they don't matter. > I think, it is right approach to pretend EPQ doesn't exist if scanrelid==0, because what replaced by these ForeignScan/CustomScan node are local join node like NestLoop. They don't have its own EPQ slot, but constructs joined- tuple based on the underlying scan-tuple originally stored within EPQ slots. > Again, the root of the problem here is that the EPQ machinery provides > 1 slot per RTI, and it uses scanrelid to figure out which EPQ slot is > applicable for a given scan node. Here, we have scanrelid == 0, so it > gets confused. But it's not the case that a pushed-down join has NO > scanrelid. It actually has, in effect, *multiple* scanrelids. So we > should pick any one of those, say the lowest-numbered one, and use > that to decide which EPQ slot to use. The attached patch shows > roughly what I have in mind, although this is just crap code to > demonstrate the basic idea and doesn't pretend to adjust everything > that needs fixing. > One tricky point of this idea is ExecStoreTuple() in ExecScanFetch(), because the EPQ slot picked up by get_proxy_scanrelid() contains a tuple of base relation then it tries to put this tuple on the TupleTableSlot initialized to save the joined-tuple. Of course, recheckMtd is called soon, so callback will be able to handle the request correctly. However, it is a bit unnatural to store a tuple on incompatible TupleTableSlot. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: >> On Fri, Oct 16, 2015 at 5:00 AM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >> I don't see how this can be right. You're basically just pretending >> EPQ doesn't exist in the remote join case, which isn't going to work >> at all. Those bits of code that look at es_epqTuple, es_epqTupleSet, >> and es_epqScanDone are not optional. You can't just skip over those >> as if they don't matter. > I think, it is right approach to pretend EPQ doesn't exist if scanrelid==0, > because what replaced by these ForeignScan/CustomScan node are local join > node like NestLoop. That's just nonsense. The reason that nestloop doesn't need its own EPQ slot is that what it will be joining is tuples provided by scan nodes, and it was the scan nodes that took care of fetching correct, updated-if-need-be tuples for the EPQ check. You can't just discard that responsibility when you're implementing a join remotely ... at least not if you want to preserve semantics similar to what happens with local tables. Or maybe I misunderstood what you meant, but it's certainly not OK to say that EPQ is a no-op for a pushed-down join. Rather, it has to perform all the same checks that would have happened for any of its constitutent tables. regards, tom lane
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > >> On Fri, Oct 16, 2015 at 5:00 AM, Etsuro Fujita > >> <fujita.etsuro@lab.ntt.co.jp> wrote: > >> I don't see how this can be right. You're basically just pretending > >> EPQ doesn't exist in the remote join case, which isn't going to work > >> at all. Those bits of code that look at es_epqTuple, es_epqTupleSet, > >> and es_epqScanDone are not optional. You can't just skip over those > >> as if they don't matter. > > > I think, it is right approach to pretend EPQ doesn't exist if scanrelid==0, > > because what replaced by these ForeignScan/CustomScan node are local join > > node like NestLoop. > > That's just nonsense. The reason that nestloop doesn't need its own EPQ > slot is that what it will be joining is tuples provided by scan nodes, > and it was the scan nodes that took care of fetching correct, > updated-if-need-be tuples for the EPQ check. You can't just discard that > responsibility when you're implementing a join remotely ... at least not > if you want to preserve semantics similar to what happens with local > tables. > NestLoop itself does not need its own EPQ slot, no doubt. However, entire sub-tree of NestLoop takes at least two underlying EPQ slots of the base relations to be joined. My opinion is, simply, ForeignScan/CustomScan with scanrelid==0 takes over the responsibility of EPQ recheck of entire join sub-tree that is replaced by the ForeignScan/CustomScan node. If ForeignScan run a remote join on foreign tables: A and B, it shall apply both of scan-quals and join-clause towards the tuples kept in the EPQ slots, in some fashion depending on FDW implementation. Nobody concerned about what check shall be applied here. EPQ recheck shall be applied as if entire join sub-tree exists here. Major difference between I and Fujita-san is how to recheck it. I think FDW knows the best way to do (even if we can provide utility routines for majority cases), however, Fujita-san says a particular implementation is the best for all the people. I cannot agree with his opinion. > Or maybe I misunderstood what you meant, but it's certainly not OK to say > that EPQ is a no-op for a pushed-down join. Rather, it has to perform all > the same checks that would have happened for any of its constitutent > tables. > I've never said that EPQ should be no-op for a pushed-down join. Responsibility of the entire join sub-tree is not discarded, and not changed, even if it is replaced by a remote-join. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Fri, Oct 16, 2015 at 6:12 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > I think, it is right approach to pretend EPQ doesn't exist if scanrelid==0, > because what replaced by these ForeignScan/CustomScan node are local join > node like NestLoop. They don't have its own EPQ slot, but constructs joined- > tuple based on the underlying scan-tuple originally stored within EPQ slots. I think you've got that backwards. The fact that they don't have their own EPQ slot is the problem we need to solve. When an EPQ recheck happens, we rescan every relation in the query. Each relation needs to return 0 or 1 tuples. If it returns a tuple, the tuple it returns must be either the same tuple it previously returned, or an updated version of that tuple. But "the tuple it previously returned" does not necessarily mean the tuple it returned most recently. It means the tuple that it returned which, when passed through the rest of the plan, contributed to generate the result tuple that is being rechecked. Now, if you don't have an EPQ slot, how are you going to do this? When the EPQ machinery engages, you need to somehow get the tuple you previously returned stored someplace. And the first time thereafter that you get called by ExecProcNode, you need to return that tuple, provided that it still passes the quals. The second time you get called, and any subsequent times, you need to return an empty slot. The EPQ slot is well-suited to this task. It's got a TupleTableSlot to store the tuple you need to return, and it's got a flag indicating whether you've already returned that tuple. So you're good. But with Etsuro Fujita's patch, and I think what you have proposed has been similar, how are you going to do it? The proposal is to call the recheck method and hope for the best, but what is the recheck method going to do? Where is it going to get the previously-returned tuple? How will it know if it has already returned it during the lifetime of this EPQ check? Offhand, it looks to me like, at least in some circumstances, you're probably going to return whatever tuple you returned most recently (which has a good chance of being the right one, but not necessarily) over and over again. That's not going to fly. The bottom line is that a foreign scan that is a pushed-down join is still a *scan*, and every already-existing scan type has an EPQ slot *for a reason*. They *need* it in order to deliver the correct behavior. And foreign scans and custom scans need it to, and for the same reason. >> Again, the root of the problem here is that the EPQ machinery provides >> 1 slot per RTI, and it uses scanrelid to figure out which EPQ slot is >> applicable for a given scan node. Here, we have scanrelid == 0, so it >> gets confused. But it's not the case that a pushed-down join has NO >> scanrelid. It actually has, in effect, *multiple* scanrelids. So we >> should pick any one of those, say the lowest-numbered one, and use >> that to decide which EPQ slot to use. The attached patch shows >> roughly what I have in mind, although this is just crap code to >> demonstrate the basic idea and doesn't pretend to adjust everything >> that needs fixing. >> > One tricky point of this idea is ExecStoreTuple() in ExecScanFetch(), > because the EPQ slot picked up by get_proxy_scanrelid() contains > a tuple of base relation then it tries to put this tuple on the > TupleTableSlot initialized to save the joined-tuple. > Of course, recheckMtd is called soon, so callback will be able to > handle the request correctly. However, it is a bit unnatural to store > a tuple on incompatible TupleTableSlot. I think that the TupleTableSlot is incompatible because the dummy patch I posted only addresses half of the problem. I didn't do anything about the code that stores stuff into an EPQ slot. If that were also fixed, then the tuple which the patch retrieves from the slot would not be incompatible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Oct 16, 2015 at 7:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > My opinion is, simply, ForeignScan/CustomScan with scanrelid==0 takes > over the responsibility of EPQ recheck of entire join sub-tree that is > replaced by the ForeignScan/CustomScan node. > If ForeignScan run a remote join on foreign tables: A and B, it shall > apply both of scan-quals and join-clause towards the tuples kept in > the EPQ slots, in some fashion depending on FDW implementation. And my opinion, as I said before, is that's completely wrong. The ForeignScan which represents a pushed-down join is a *scan*. In general, scans have one EPQ slot, and that is the right number. This pushed-down join scan, though, is in a state of confusion. The code that populates the EPQ slots thinks it's got multiple slots, one per underlying relation. Meanwhile, the code that reads data back out of those slots thinks it doesn't have any slots at all. Both of those pieces of code are wrong. This foreign scan, like any other scan, should use ONE slot. Both you and Etsuro Fujita are proposing to fix this problem by somehow making it the FDW's problem to reconstruct the tuple previously produced by the join from whole-row images of the baserels. But that's not looking back far enough: why are we asking for whole-row images of the baserels when what we really want is a whole-row image of the output of the join? The output of the join is what we need to re-return. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Both you and Etsuro Fujita are proposing to fix this problem by > somehow making it the FDW's problem to reconstruct the tuple > previously produced by the join from whole-row images of the baserels. > But that's not looking back far enough: why are we asking for > whole-row images of the baserels when what we really want is a > whole-row image of the output of the join? The output of the join is > what we need to re-return. There are multiple components to the requirement though: 1. Recheck the rows that were in the baserels and possibly fetch updated versions of them. (Once we're in EPQ, we want the most recent row versions, not what the query snapshot can see.) 2. Apply relevant restriction clauses and see if the updated row versions still pass the clauses. 3. If so, form a join row and return that. Else return NULL. One way or another, the FDW has to do all of the above, or as much of it as it can't fob off on the remote server, once it's decided to bypass local implementation of the join. Just recomputing the original join row is *not* good enough. I think what Kaigai-san and Etsuro-san are after is trying to find a way to reuse some of the existing EPQ machinery to help with that. This may not be practical, or it may end up being messier than a standalone implementation; but it's not silly on its face to want to reuse some of that code. regards, tom lane
On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Both you and Etsuro Fujita are proposing to fix this problem by >> somehow making it the FDW's problem to reconstruct the tuple >> previously produced by the join from whole-row images of the baserels. >> But that's not looking back far enough: why are we asking for >> whole-row images of the baserels when what we really want is a >> whole-row image of the output of the join? The output of the join is >> what we need to re-return. > > There are multiple components to the requirement though: > > 1. Recheck the rows that were in the baserels and possibly fetch updated > versions of them. (Once we're in EPQ, we want the most recent row > versions, not what the query snapshot can see.) Check. But postgres_fdw, and probably quite a few other FDWs, use early row locking. So ROW_MARK_COPY is in use and we need not worry about refetching rows. > 2. Apply relevant restriction clauses and see if the updated row versions > still pass the clauses. Check. > 3. If so, form a join row and return that. Else return NULL. Not check. Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw. There is a local table t. The user does something like UPDATE t SET ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND .... The query planner generates something like: Update -> Join -> Scan on t -> Foreign Scan on <ft1, ft2> If an EPQ recheck occurs, the only thing that matters is that the Foreign Scan return the right output row (or possibly now rows, if the row it would have formed no longer matches the quals). It doesn't matter how it does this. Let's say the columns actually needed by the query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b. Currently, the output of the foreign scan is something like: ft1.a, ft1.b, ft2.a, ft.b, ft1.*, ft2.*. The EPQ recheck has access to ft1.* and ft2.*, but it's not straightforward for postgres_fdw to regenerate the join tuple from that. Maybe the pushed-down was a left join, maybe it was a right join, maybe it was a full join. So some of the columns could have gone to NULL. To figure it out, you need to build a secondary plan tree that mimics the structure of the join you pushed down, which is kinda hairy. Think how much easier your life would be if you hadn't bothered fetching ft1.* and ft2.*, which aren't so handy in this case, and had instead made the output of the foreign scan ft1.a, ft1.b, ft2.a, ft2.b, ROW(ft1.a, ft1.b, ft2.a, ft2.b) -- and that the output of that ROW() operation was stored in an EPQ slot. Now, you don't need the secondary plan tree any more. You've got all the data you need right in your hand. The values inside the ROW() constructor were evaluated after accounting for the goes-to-NULL effects of any pushed-down joins. This example is of the early row locking case, but I think the story is about the same if the FDW wants to do late row locking instead. If there's an EPQ recheck, it could issue individual row re-fetches against every base table and then re-do all the joins that it pushed down locally. But it would be faster and cleaner, I think, to send one query to the remote side that re-fetches all the rows at once, and whose target list is exactly what we need, rather than whole row targetlists for each baserel that then have to be rejiggered on our side. > I think what Kaigai-san and Etsuro-san are after is trying to find a way > to reuse some of the existing EPQ machinery to help with that. This may > not be practical, or it may end up being messier than a standalone > implementation; but it's not silly on its face to want to reuse some of > that code. Yeah, I think we're all in agreement that reusing as much of the EPQ machinery as is sensible is something we should do. We are not in agreement on which parts of it need to be changed or extended. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Fri, Oct 16, 2015 at 6:12 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > I think, it is right approach to pretend EPQ doesn't exist if scanrelid==0, > > because what replaced by these ForeignScan/CustomScan node are local join > > node like NestLoop. They don't have its own EPQ slot, but constructs joined- > > tuple based on the underlying scan-tuple originally stored within EPQ slots. > > I think you've got that backwards. The fact that they don't have > their own EPQ slot is the problem we need to solve. When an EPQ > recheck happens, we rescan every relation in the query. Each relation > needs to return 0 or 1 tuples. If it returns a tuple, the tuple it > returns must be either the same tuple it previously returned, or an > updated version of that tuple. But "the tuple it previously returned" > does not necessarily mean the tuple it returned most recently. It > means the tuple that it returned which, when passed through the rest > of the plan, contributed to generate the result tuple that is being > rechecked. > Yes, it is the reason why citd or whole-var (if early row locking) or something rowid (if later row locking) are required to fill up EPQ slot of base relations. I understand the tuple returned most recently is not answer here. (E.g, in case when ForeignScan is located under MergeJoin) > Now, if you don't have an EPQ slot, how are you going to do this? > When the EPQ machinery engages, you need to somehow get the tuple you > previously returned stored someplace. And the first time thereafter > that you get called by ExecProcNode, you need to return that tuple, > provided that it still passes the quals. The second time you get > called, and any subsequent times, you need to return an empty slot. > The EPQ slot is well-suited to this task. It's got a TupleTableSlot > to store the tuple you need to return, and it's got a flag indicating > whether you've already returned that tuple. So you're good. > > But with Etsuro Fujita's patch, and I think what you have proposed has > been similar, how are you going to do it? The proposal is to call the > recheck method and hope for the best, but what is the recheck method > going to do? Where is it going to get the previously-returned tuple? > How will it know if it has already returned it during the lifetime of > this EPQ check? Offhand, it looks to me like, at least in some > circumstances, you're probably going to return whatever tuple you > returned most recently (which has a good chance of being the right > one, but not necessarily) over and over again. That's not going to > fly. > I think the job of recheck method to do "hope for the best" is below. 1. Fetch every EPQ slot of base relations involved in this join. In case of ForeignScan, all the required tuples of baserelations should be filled because it is preliminary fetched by whole-row var if earlier row-locking, or by RefetchForeignRowif later row-locking. In case of CustomScan, it can call ExecProcNode() to generate the first tuple evenif it does not exists. Anyway, I assume all the component tuples of this join can be fetched using existing EPQ slotbecause they are owned by base relations. 2. The recheck callback fills up ss_ScanTupleSlot according to the fdw_scan_tlist or custom_scan_tlist. The callback knowsthe best way to reconstruct the joined tuple from the base relations' tuple fetched on the step-1. For example, ifjoined tuple is consists of (t1.a, t1.b, t2.x, t3.s), the callback picks up 't1.a' and 't1.b' from the tuple fetched from the EPQ slot of t1, then put these values onto the 1st and 2nd slot. Also, it picks up 't2.x' from the tuple fetchedfrom the EPQ slot of t2, then put this value onto the 3rd slot. Same as above for 't3'. At this point, ss_ScanTupleSlotgets filled up by the expected fields as if join clauses are satisfied. 3. The recheck callback also checks qualifiers of base relations that are pushed down. Because expression nodes kept infds_exprs or custom_exprs are initialized to reference ss_ScanTupleSlot at setrefs.c, it is more reasonable to run ExecQualafter the step-2. If one of the qualifiers of base relation was evaluated as false, the recheck callback returnsan empty slot. 4. The recheck callback also checks join-clauses to join underlying base relations. Due to same reason at step-3, it ismore reasonable to execute ExecQual after the step-2. If one of the join-clauses was evaluated as false, the recheckreturns an empty slot. Elsewhere, it returns ss_ScanTupleSlot, then ExecScan will process any further jobs. Even though Fujita-san's patch implements the step-2 to step-4 using alternative local plan with no other option, it stands on similar concept. - EPQ slot contains the tuple of base relation that contributed the join. - FDW/CSP knows the best how to construct the joined-tuple. - Joined tuple is constructed on the fly, not kept in a particular EPQ slot. > The bottom line is that a foreign scan that is a pushed-down join is > still a *scan*, and every already-existing scan type has an EPQ slot > *for a reason*. They *need* it in order to deliver the correct > behavior. And foreign scans and custom scans need it to, and for the > same reason. > Probably, it is the reason of mismatch for the solution. Even though ForeignScan/CustomScan is categorized to scan node, from the standpoint of the core backend, it is expected to take responsibility of join in addition to scan of base relation. This multi-roleness gives ForeignScan/CustomScan capability and responsibility to handle multiple EPQ slots, for join rechecks. Please assume the reason why existing scan node is associated with a particular EPQ slot is that it has only one role; to scan a particular base relation. But, what is natural manner if a scan node actually has multiple roles? > On Fri, Oct 16, 2015 at 7:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > My opinion is, simply, ForeignScan/CustomScan with scanrelid==0 takes > > over the responsibility of EPQ recheck of entire join sub-tree that is > > replaced by the ForeignScan/CustomScan node. > > If ForeignScan run a remote join on foreign tables: A and B, it shall > > apply both of scan-quals and join-clause towards the tuples kept in > > the EPQ slots, in some fashion depending on FDW implementation. > > And my opinion, as I said before, is that's completely wrong. The > ForeignScan which represents a pushed-down join is a *scan*. In > general, scans have one EPQ slot, and that is the right number. This > pushed-down join scan, though, is in a state of confusion. The code > that populates the EPQ slots thinks it's got multiple slots, one per > underlying relation. Meanwhile, the code that reads data back out of > those slots thinks it doesn't have any slots at all. Both of those > pieces of code are wrong. This foreign scan, like any other scan, > should use ONE slot. > > Both you and Etsuro Fujita are proposing to fix this problem by > somehow making it the FDW's problem to reconstruct the tuple > previously produced by the join from whole-row images of the baserels. > But that's not looking back far enough: why are we asking for > whole-row images of the baserels when what we really want is a > whole-row image of the output of the join? The output of the join is > what we need to re-return. > Yes, the output of the join is exactly what we need to re-return. On the other hands, the joined tuple image is depends on the latest image of base relation's tuples that construct joined tuples. Once a part of the base relations' tuple is re-fetched and updated, it affects to the contents of joined tuple and its visibility. It means, more or less, we need to have capability to reconstruct joined-tuple from the base relations again, in addition to rechecks. Therefore, I concluded that joined-tuple re-construction by FDW/CSP on the fly is reasonably implementable and less invasive approach than others. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/10/17 12:22, Robert Haas wrote: > On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Both you and Etsuro Fujita are proposing to fix this problem by >>> somehow making it the FDW's problem to reconstruct the tuple >>> previously produced by the join from whole-row images of the baserels. >>> But that's not looking back far enough: why are we asking for >>> whole-row images of the baserels when what we really want is a >>> whole-row image of the output of the join? The output of the join is >>> what we need to re-return. >> There are multiple components to the requirement though: >> 3. If so, form a join row and return that. Else return NULL. > Not check. > > Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw. > There is a local table t. The user does something like UPDATE t SET > ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND .... The > query planner generates something like: > > Update > -> Join > -> Scan on t > -> Foreign Scan on <ft1, ft2> > > If an EPQ recheck occurs, the only thing that matters is that the > Foreign Scan return the right output row (or possibly now rows, if the > row it would have formed no longer matches the quals). It doesn't > matter how it does this. Let's say the columns actually needed by the > query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b. > Currently, the output of the foreign scan is something like: ft1.a, > ft1.b, ft2.a, ft.b, ft1.*, ft2.*. The EPQ recheck has access to ft1.* > and ft2.*, but it's not straightforward for postgres_fdw to regenerate > the join tuple from that. Maybe the pushed-down was a left join, > maybe it was a right join, maybe it was a full join. So some of the > columns could have gone to NULL. To figure it out, you need to build > a secondary plan tree that mimics the structure of the join you pushed > down, which is kinda hairy. As Tom mentioned, just recomputing the original join tuple is not good enough. We would need to rejoin the test tuples for the baserels even if ROW_MARK_COPY is in use. Consider: A=# BEGIN; A=# UPDATE t SET a = a + 1 WHERE b = 1; B=# SELECT * from t, ft1, ft2 WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; A=# COMMIT; where the plan for the SELECT FOR UPDATE is LockRows -> Nested Loop -> Seq Scan on t -> Foreign Scan on <ft1, ft2> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c= ft2.c AND ft1.a = $1 AND ft2.b = $2 If an EPQ recheck is invoked by the A's UPDATE, just recomputing the original join tuple from the whole-row image that you proposed would output an incorrect result in the EQP recheck since the value a in the updated version of a to-be-joined tuple in t would no longer match the value ft1.a extracted from the whole-row image if the A's UPDATE has committed successfully. So I think we would need to rejoin the tuples populated from the whole-row images for the baserels ft1 and ft2, by executing the secondary plan with the new parameter values for a and b. As for the secondary plan, I think we could create the corresponding local join execution path during GetForeignJoinPaths, (1) by looking at the pathlist of the joinrel RelOptInfo, which would have already contained some local join execution paths, as does the patch, or (2) by calling a helper function that creates a local join execution path from given outer/inner paths selected from the pathlists of the outerrel/innerrel RelOptInfos, as proposed be KaiGai-san before. ISTM that the latter would be better, so I plan to propose such a function as part of the postgres_fdw join pushdown patch for 9.6. > This example is of the early row locking case, but I think the story > is about the same if the FDW wants to do late row locking instead. If > there's an EPQ recheck, it could issue individual row re-fetches > against every base table and then re-do all the joins that it pushed > down locally. But it would be faster and cleaner, I think, to send > one query to the remote side that re-fetches all the rows at once, and > whose target list is exactly what we need, rather than whole row > targetlists for each baserel that then have to be rejiggered on our > side. I agree with you on that point. (In fact, I thought that too!) But considering that many FDWs including postgres_fdw use early row locking (ie, ROW_MARK_COPY) currently, I'd like to leave that for future work. >> I think what Kaigai-san and Etsuro-san are after is trying to find a way >> to reuse some of the existing EPQ machinery to help with that. This may >> not be practical, or it may end up being messier than a standalone >> implementation; but it's not silly on its face to want to reuse some of >> that code. > Yeah, I think we're all in agreement that reusing as much of the EPQ > machinery as is sensible is something we should do. We are not in > agreement on which parts of it need to be changed or extended. Agreed. Best regards, Etsuro Fujita
> On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> Both you and Etsuro Fujita are proposing to fix this problem by > >> somehow making it the FDW's problem to reconstruct the tuple > >> previously produced by the join from whole-row images of the baserels. > >> But that's not looking back far enough: why are we asking for > >> whole-row images of the baserels when what we really want is a > >> whole-row image of the output of the join? The output of the join is > >> what we need to re-return. > > > > There are multiple components to the requirement though: > > > > 1. Recheck the rows that were in the baserels and possibly fetch updated > > versions of them. (Once we're in EPQ, we want the most recent row > > versions, not what the query snapshot can see.) > > Check. But postgres_fdw, and probably quite a few other FDWs, use > early row locking. So ROW_MARK_COPY is in use and we need not worry > about refetching rows. > > > 2. Apply relevant restriction clauses and see if the updated row versions > > still pass the clauses. > > Check. > > > 3. If so, form a join row and return that. Else return NULL. > > Not check. > > Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw. > There is a local table t. The user does something like UPDATE t SET > ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND .... The > query planner generates something like: > > Update > -> Join > -> Scan on t > -> Foreign Scan on <ft1, ft2> > > If an EPQ recheck occurs, the only thing that matters is that the > Foreign Scan return the right output row (or possibly now rows, if the > row it would have formed no longer matches the quals). It doesn't > matter how it does this. Let's say the columns actually needed by the > query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b. > Currently, the output of the foreign scan is something like: ft1.a, > ft1.b, ft2.a, ft.b, ft1.*, ft2.*. The EPQ recheck has access to ft1.* > and ft2.*, but it's not straightforward for postgres_fdw to regenerate > the join tuple from that. Maybe the pushed-down was a left join, > maybe it was a right join, maybe it was a full join. So some of the > columns could have gone to NULL. To figure it out, you need to build > a secondary plan tree that mimics the structure of the join you pushed > down, which is kinda hairy. > In case of outer join, do we need to care about join-clause, unlike scan qualifiers? Rows filled-up by NULLs appears when here is no matched tuple on other side. It means any rows in the relation of non-NUllable side are visible regardless of join-clause, even though it may be or may not be matched with the latest rows refetched based on the latest values. Example) remote table: ft1 id | val ---+-------1 | 'aaa'2 | 'bbb'3 | 'ccc' remote table: ft2 id | val ---+-------2 | 'xxx'3 | 'yyy'4 | 'zzz' If remote join query is: SELECT *, ft1.*, ft2.* FROM ft1 LEFT JOIN ft2 ON ft1.id = ft2.id WHERE ft1.id < 3; its expected result is: ft1.id | ft1.val | ft2.id | ft2.val | ft1.* | ft2.* | -------+---------+--------+---------+---------+---------+ 1 | 'aaa' | null | null |(1,'aaa')| null | 2 | 'bbb' | 2 | 'xxx' |(2,'bbb')|(2,'xxx')| The non-NULLs side (ft1 in this case) are visible regardless of the join- clause, as long as tuples in ft1 satisfies the scan-qualifier (ft1.id < 3). FDW/CSP knows the type of joins that should be responsible, so it can skip evaluation of join-clauses but apply only scan-qualifiers on base relation's tuple. > Think how much easier your life would be if you hadn't bothered > fetching ft1.* and ft2.*, which aren't so handy in this case, and had > instead made the output of the foreign scan ft1.a, ft1.b, ft2.a, > ft2.b, ROW(ft1.a, ft1.b, ft2.a, ft2.b) -- and that the output of that > ROW() operation was stored in an EPQ slot. Now, you don't need the > secondary plan tree any more. You've got all the data you need right > in your hand. The values inside the ROW() constructor were evaluated > after accounting for the goes-to-NULL effects of any pushed-down > joins. > > This example is of the early row locking case, but I think the story > is about the same if the FDW wants to do late row locking instead. If > there's an EPQ recheck, it could issue individual row re-fetches > against every base table and then re-do all the joins that it pushed > down locally. But it would be faster and cleaner, I think, to send > one query to the remote side that re-fetches all the rows at once, and > whose target list is exactly what we need, rather than whole row > targetlists for each baserel that then have to be rejiggered on our > side. > Which approach is more reasonable? In case of early row locking, FDW ensures all the rows involved in the join is protected by concurrent accesses. So, no need to concern about refetching from the remote side. On the other hands, in case of late row locking, we need to pay attention whether a part of (or all) base relations are updated by the concurrent accesses. In this case, joined-tuple is no longer valid, so we may need to fetch the joined-tuple from the remote side during rechecking. Probably, relevant rowid (ctid system column in postgres_fdw) enables to identify the tuples to be fetched from the remote side effectively, so it shall not be a heavy query, however, it needs to run a remote query once. If we reconstruct a joined tuple from the base relations kept in EPQ slot, it needs additional reconstruction cost if early row locking case (disadvantage), however, no need to run remote join again in late row locking situation because base relation's tuples are already fetched by the infrastructure (advantage). The local reconstruction approach also has an advantage - that does not need to enhance existing EPQ slot mechanism so much. All this approach needs EPQ slot holds tuple of the base relation. Please correct me, if I misunderstand your proposition. > > I think what Kaigai-san and Etsuro-san are after is trying to find a way > > to reuse some of the existing EPQ machinery to help with that. This may > > not be practical, or it may end up being messier than a standalone > > implementation; but it's not silly on its face to want to reuse some of > > that code. > > Yeah, I think we're all in agreement that reusing as much of the EPQ > machinery as is sensible is something we should do. We are not in > agreement on which parts of it need to be changed or extended. > Yes. I'd also like to reuse existing EPQ infrastructure as long as we can. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/10/17 9:58, Robert Haas wrote: > But with Etsuro Fujita's patch, and I think what you have proposed has > been similar, how are you going to do it? The proposal is to call the > recheck method and hope for the best, but what is the recheck method > going to do? Where is it going to get the previously-returned tuple? As I explained in a previous email, just returning the previously-returned tuple is not good enough. > How will it know if it has already returned it during the lifetime of > this EPQ check? Offhand, it looks to me like, at least in some > circumstances, you're probably going to return whatever tuple you > returned most recently (which has a good chance of being the right > one, but not necessarily) over and over again. That's not going to > fly. No. Since the local join execution plan is created so that the scan slot for each foreign table involved in the pushed-down join looks at its EPQ slot, I think the plan can return at most one tuple. Best regards, Etsuro Fujita
On 2015/10/16 19:03, Kouhei Kaigai wrote: > *** 48,59 **** ExecScanFetch(ScanState *node, > + /* > + * Execute recheck plan and get the next tuple if foreign join. > + */ > + if (scanrelid == 0) > + { > + (*recheckMtd) (node, slot); > + return slot; > + } > > Ensure the slot is empty if recheckMtd returned false, as base relation > case doing so. Fixed. > *** 347,352 **** ExecScanReScan(ScanState *node) > { > Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; > > + if (scanrelid == 0) > + return; /* nothing to do */ > + > Assert(scanrelid > 0); > > estate->es_epqScanDone[scanrelid - 1] = false; > > Why nothing to do? > Base relations managed by ForeignScan are tracked in fs_relids bitmap. I think the estate->es_epqScanDone flag should be initialized when we do ExecScanReSacn for each of the component ForeignScanState nodes in the local join execution plan state tree. > As you introduced a few days before, if ForeignScan has parametalized > remote join, EPQ slot contains invalid tuples based on old outer tuple. Maybe my explanation was not enough, but I haven't said such a thing. The problem in that case is that just returning the previously-returned foeign-join tuple would produce an incorrect result if an outer tuple to be joined has changed due to a concurrent transaction, as explained upthread. (I think that the EPQ slots would contain valid tuples.) Attached is an updated version of the patch. Other changes: * remove unnecessary memory-context handling for the foreign-join case in ForeignRecheck * revise code a bit and add a bit more comments Thanks for the comments! Best regards, Etsuro Fujita
Attachment
I wrote: >> As Robert mentioned in [1], I think that if we're inside EPQ, >> pushed-down quals and/or pushed-down joins should be locally rechecked >> in the same way as other cases such as IndexRecheck. So, I'll propose >> the updated version of the patch. On 2015/10/16 18:48, Kouhei Kaigai wrote: > You have never answered my question for two months. > > I never deny to execute the pushed-down qualifiers locally. > It is likely the best tactics in most cases. > But, why you try to enforce all the people a particular manner? > > Here are various kind of FDW drivers. How do you guarantee it is > the best solution for all the people? It is basically impossible. > (Please google "Probatio diabolica") > > You try to add two special purpose fields in ForeignScan; > fdw_recheck_plan and fdw_recheck_quals. > It requires FDW drivers to have pushed-down qualifier in a particular > data format, and also requires FDW drivers to process EPQ recheck by > alternative local plan, even if a part of FDW drivers can process > these jobs by its own implementation better. > > I've repeatedly pointed out this issue, but never get reasonable > answer from you. > > Again, I also admit alternative plan may be reasonable tactics for > most of FDW drivers. However, only FDW author can "decide" it is > the best tactics to handle the task for their module, not us. > > I don't think it is a good interface design to enforce people to > follow a particular implementation manner. It should be discretion > of the extension. I think that if you think so, you should give at least one concrete example for that. Ideally accompanied by a demo of how that works well. Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Monday, October 19, 2015 8:52 PM > To: Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI > Cc: pgsql-hackers@postgresql.org; shigeru.hanada@gmail.com; > robertmhaas@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > I wrote: > >> As Robert mentioned in [1], I think that if we're inside EPQ, > >> pushed-down quals and/or pushed-down joins should be locally rechecked > >> in the same way as other cases such as IndexRecheck. So, I'll propose > >> the updated version of the patch. > > On 2015/10/16 18:48, Kouhei Kaigai wrote: > > You have never answered my question for two months. > > > > I never deny to execute the pushed-down qualifiers locally. > > It is likely the best tactics in most cases. > > But, why you try to enforce all the people a particular manner? > > > > Here are various kind of FDW drivers. How do you guarantee it is > > the best solution for all the people? It is basically impossible. > > (Please google "Probatio diabolica") > > > > You try to add two special purpose fields in ForeignScan; > > fdw_recheck_plan and fdw_recheck_quals. > > It requires FDW drivers to have pushed-down qualifier in a particular > > data format, and also requires FDW drivers to process EPQ recheck by > > alternative local plan, even if a part of FDW drivers can process > > these jobs by its own implementation better. > > > > I've repeatedly pointed out this issue, but never get reasonable > > answer from you. > > > > Again, I also admit alternative plan may be reasonable tactics for > > most of FDW drivers. However, only FDW author can "decide" it is > > the best tactics to handle the task for their module, not us. > > > > I don't think it is a good interface design to enforce people to > > follow a particular implementation manner. It should be discretion > > of the extension. > > I think that if you think so, you should give at least one concrete > example for that. Ideally accompanied by a demo of how that works well. > I previously showed an example situation: http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F801138B6F@BPXM15GP.gisp.nec.co.jp Then, your response was below: | Thanks for the answer, but I'm not still convinced. | I think the EPQ testing shown in that use-case would probably not | efficient, compared to the core's. What I'm repeatedly talking about is flexibility of the interface, not efficiently. If core backend provide a good enough EPQ recheck routine, extension can call it but decision by its author. Why do you want to prohibit extension to choose its implementation? Also, I introduced the case of PG-Strom in the face-to-face meeting with you. PG-Strom has self CPU-fallback routine to rescue GPU errors. thus, I prefer to reuse this routine for EPQ rechecks, rather than adding alternative local plan support here. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > As Tom mentioned, just recomputing the original join tuple is not good > enough. We would need to rejoin the test tuples for the baserels even if > ROW_MARK_COPY is in use. Consider: > > A=# BEGIN; > A=# UPDATE t SET a = a + 1 WHERE b = 1; > B=# SELECT * from t, ft1, ft2 > WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; > A=# COMMIT; > > where the plan for the SELECT FOR UPDATE is > > LockRows > -> Nested Loop > -> Seq Scan on t > -> Foreign Scan on <ft1, ft2> > Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND ft1.a > = $1 AND ft2.b = $2 > > If an EPQ recheck is invoked by the A's UPDATE, just recomputing the > original join tuple from the whole-row image that you proposed would output > an incorrect result in the EQP recheck since the value a in the updated > version of a to-be-joined tuple in t would no longer match the value ft1.a > extracted from the whole-row image if the A's UPDATE has committed > successfully. So I think we would need to rejoin the tuples populated from > the whole-row images for the baserels ft1 and ft2, by executing the > secondary plan with the new parameter values for a and b. No. You just need to populate fdw_recheck_quals correctly, same as for the scan case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 19, 2015 at 12:17 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > 1. Fetch every EPQ slot of base relations involved in this join. > In case of ForeignScan, all the required tuples of base relations > should be filled because it is preliminary fetched by whole-row var > if earlier row-locking, or by RefetchForeignRow if later row-locking. > In case of CustomScan, it can call ExecProcNode() to generate the > first tuple even if it does not exists. > Anyway, I assume all the component tuples of this join can be fetched > using existing EPQ slot because they are owned by base relations. > > 2. The recheck callback fills up ss_ScanTupleSlot according to the > fdw_scan_tlist or custom_scan_tlist. The callback knows the best way > to reconstruct the joined tuple from the base relations' tuple fetched > on the step-1. > For example, if joined tuple is consists of (t1.a, t1.b, t2.x, t3.s), > the callback picks up 't1.a' and 't1.b' from the tuple fetched from > the EPQ slot of t1, then put these values onto the 1st and 2nd slot. > Also, it picks up 't2.x' from the tuple fetched from the EPQ slot of > t2, then put this value onto the 3rd slot. Same as above for 't3'. > At this point, ss_ScanTupleSlot gets filled up by the expected fields > as if join clauses are satisfied. > > 3. The recheck callback also checks qualifiers of base relations that > are pushed down. Because expression nodes kept in fds_exprs or > custom_exprs are initialized to reference ss_ScanTupleSlot at setrefs.c, > it is more reasonable to run ExecQual after the step-2. > If one of the qualifiers of base relation was evaluated as false, > the recheck callback returns an empty slot. > > 4. The recheck callback also checks join-clauses to join underlying > base relations. Due to same reason at step-3, it is more reasonable > to execute ExecQual after the step-2. > If one of the join-clauses was evaluated as false, the recheck returns > an empty slot. > Elsewhere, it returns ss_ScanTupleSlot, then ExecScan will process > any further jobs. Hmm, I guess this would work. But it still feels unnatural to me. It feels like we haven't really pushed down the join. It's pushed down except when there's an EPQ check, and then it's not. So we need a whole alternate plan tree. With my proposal, we don't need that. There is also some possible loss of efficiency with this approach. Suppose that we have two tables ft1 and ft2 which are being joined, and we push down the join. They are being joined on an integer column, and the join needs to select several other columns as well. However, ft1 and ft2 are very wide tables that also contain some text columns. The query is like this: SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2 ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON localtab.q = ft1.q; If we refetch each row individually, we will need a wholerow image of ft1 and ft2 that includes all columns, or at least f1.huge and f2.huge2. If we just fetch a wholerow image of the join output, we can exclude those. The only thing we need to recheck is that it's still the case that localtab.q = ft1.q (because the value of localtab.q might have changed). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Mon, Oct 19, 2015 at 12:17 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > 1. Fetch every EPQ slot of base relations involved in this join. > > In case of ForeignScan, all the required tuples of base relations > > should be filled because it is preliminary fetched by whole-row var > > if earlier row-locking, or by RefetchForeignRow if later row-locking. > > In case of CustomScan, it can call ExecProcNode() to generate the > > first tuple even if it does not exists. > > Anyway, I assume all the component tuples of this join can be fetched > > using existing EPQ slot because they are owned by base relations. > > > > 2. The recheck callback fills up ss_ScanTupleSlot according to the > > fdw_scan_tlist or custom_scan_tlist. The callback knows the best way > > to reconstruct the joined tuple from the base relations' tuple fetched > > on the step-1. > > For example, if joined tuple is consists of (t1.a, t1.b, t2.x, t3.s), > > the callback picks up 't1.a' and 't1.b' from the tuple fetched from > > the EPQ slot of t1, then put these values onto the 1st and 2nd slot. > > Also, it picks up 't2.x' from the tuple fetched from the EPQ slot of > > t2, then put this value onto the 3rd slot. Same as above for 't3'. > > At this point, ss_ScanTupleSlot gets filled up by the expected fields > > as if join clauses are satisfied. > > > > 3. The recheck callback also checks qualifiers of base relations that > > are pushed down. Because expression nodes kept in fds_exprs or > > custom_exprs are initialized to reference ss_ScanTupleSlot at setrefs.c, > > it is more reasonable to run ExecQual after the step-2. > > If one of the qualifiers of base relation was evaluated as false, > > the recheck callback returns an empty slot. > > > > 4. The recheck callback also checks join-clauses to join underlying > > base relations. Due to same reason at step-3, it is more reasonable > > to execute ExecQual after the step-2. > > If one of the join-clauses was evaluated as false, the recheck returns > > an empty slot. > > Elsewhere, it returns ss_ScanTupleSlot, then ExecScan will process > > any further jobs. > > Hmm, I guess this would work. But it still feels unnatural to me. It > feels like we haven't really pushed down the join. It's pushed down > except when there's an EPQ check, and then it's not. So we need a > whole alternate plan tree. With my proposal, we don't need that. > Even if we fetch whole-row of both side, join pushdown is exactly working because we can receive less number of rows than local join + 2 of foreign- scan. (If planner works well, we can expect join-path that increases number of rows shall be dropped.) One downside of my proposition is growth of width for individual rows. It is a trade-off situation. The above approach takes no changes for existing EPQ infrastructure, thus, its implementation design is clear. On the other hands, your approach will reduce traffic over the network, however, it is still unclear how we integrate scanrelid==0 with EPQ infrastructure. On the other hands, in case of custom-scan that takes underlying local scan-nodes, thus, any kind of ROW_MARK_* except for ROW_MARK_COPY will happen. I think width of the joined tuples are relatively minor issue than FDW cases. However, we cannot expect the fetched rows are protected by early row-locking mechanism, so probability of re-fetching rows and reconstruction of joined-tuple has relatively higher priority. > There is also some possible loss of efficiency with this approach. > Suppose that we have two tables ft1 and ft2 which are being joined, > and we push down the join. They are being joined on an integer > column, and the join needs to select several other columns as well. > However, ft1 and ft2 are very wide tables that also contain some text > columns. The query is like this: > > SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2 > ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON > localtab.q = ft1.q; > > If we refetch each row individually, we will need a wholerow image of > ft1 and ft2 that includes all columns, or at least f1.huge and > f2.huge2. If we just fetch a wholerow image of the join output, we > can exclude those. The only thing we need to recheck is that it's > still the case that localtab.q = ft1.q (because the value of > localtab.q might have changed). > Isn't it possible to distinguish whole-var reference required by locking mechanism, from the ones required by users? (Does resjunk=true give us a hint?) In case when whole-var reference is required by system internal, it seems to me harmless to put dummy NULLs on unreferenced columns. Is it a feasible idea? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/10/20 5:34, Robert Haas wrote: > On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> As Tom mentioned, just recomputing the original join tuple is not good >> enough. We would need to rejoin the test tuples for the baserels even if >> ROW_MARK_COPY is in use. Consider: >> >> A=# BEGIN; >> A=# UPDATE t SET a = a + 1 WHERE b = 1; >> B=# SELECT * from t, ft1, ft2 >> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; >> A=# COMMIT; >> >> where the plan for the SELECT FOR UPDATE is >> >> LockRows >> -> Nested Loop >> -> Seq Scan on t >> -> Foreign Scan on <ft1, ft2> >> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND ft1.a >> = $1 AND ft2.b = $2 >> >> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the >> original join tuple from the whole-row image that you proposed would output >> an incorrect result in the EQP recheck since the value a in the updated >> version of a to-be-joined tuple in t would no longer match the value ft1.a >> extracted from the whole-row image if the A's UPDATE has committed >> successfully. So I think we would need to rejoin the tuples populated from >> the whole-row images for the baserels ft1 and ft2, by executing the >> secondary plan with the new parameter values for a and b. > No. You just need to populate fdw_recheck_quals correctly, same as > for the scan case. Yeah, I think we can probably do that for the case where a pushed-down join clause is an inner-join one, but I'm not sure that we can do that for the case where that clause is an outer-join one. Maybe I'm missing something, though. Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Tuesday, October 20, 2015 1:11 PM > To: Robert Haas > Cc: Tom Lane; Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI; > pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/20 5:34, Robert Haas wrote: > > On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > >> As Tom mentioned, just recomputing the original join tuple is not good > >> enough. We would need to rejoin the test tuples for the baserels even if > >> ROW_MARK_COPY is in use. Consider: > >> > >> A=# BEGIN; > >> A=# UPDATE t SET a = a + 1 WHERE b = 1; > >> B=# SELECT * from t, ft1, ft2 > >> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; > >> A=# COMMIT; > >> > >> where the plan for the SELECT FOR UPDATE is > >> > >> LockRows > >> -> Nested Loop > >> -> Seq Scan on t > >> -> Foreign Scan on <ft1, ft2> > >> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND ft1.a > >> = $1 AND ft2.b = $2 > >> > >> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the > >> original join tuple from the whole-row image that you proposed would output > >> an incorrect result in the EQP recheck since the value a in the updated > >> version of a to-be-joined tuple in t would no longer match the value ft1.a > >> extracted from the whole-row image if the A's UPDATE has committed > >> successfully. So I think we would need to rejoin the tuples populated from > >> the whole-row images for the baserels ft1 and ft2, by executing the > >> secondary plan with the new parameter values for a and b. > > > No. You just need to populate fdw_recheck_quals correctly, same as > > for the scan case. > > Yeah, I think we can probably do that for the case where a pushed-down > join clause is an inner-join one, but I'm not sure that we can do that > for the case where that clause is an outer-join one. Maybe I'm missing > something, though. > Please check my message yesterday. The non-nullable side of outer-join is always visible regardless of the join-clause pushed down, as long as it satisfies the scan-quals pushed-down. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
>>> On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> As Tom mentioned, just recomputing the original join tuple is not good >>>> enough. We would need to rejoin the test tuples for the baserels even if >>>> ROW_MARK_COPY is in use. Consider: >>>> >>>> A=# BEGIN; >>>> A=# UPDATE t SET a = a + 1 WHERE b = 1; >>>> B=# SELECT * from t, ft1, ft2 >>>> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; >>>> A=# COMMIT; >>>> >>>> where the plan for the SELECT FOR UPDATE is >>>> >>>> LockRows >>>> -> Nested Loop >>>> -> Seq Scan on t >>>> -> Foreign Scan on <ft1, ft2> >>>> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND ft1.a >>>> = $1 AND ft2.b = $2 >>>> >>>> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the >>>> original join tuple from the whole-row image that you proposed would output >>>> an incorrect result in the EQP recheck since the value a in the updated >>>> version of a to-be-joined tuple in t would no longer match the value ft1.a >>>> extracted from the whole-row image if the A's UPDATE has committed >>>> successfully. So I think we would need to rejoin the tuples populated from >>>> the whole-row images for the baserels ft1 and ft2, by executing the >>>> secondary plan with the new parameter values for a and b. Robert Haas wrote: >>> No. You just need to populate fdw_recheck_quals correctly, same as >>> for the scan case. I wrote: >> Yeah, I think we can probably do that for the case where a pushed-down >> join clause is an inner-join one, but I'm not sure that we can do that >> for the case where that clause is an outer-join one. Maybe I'm missing >> something, though. On 2015/10/20 15:42, Kouhei Kaigai wrote: > Please check my message yesterday. The non-nullable side of outer-join is > always visible regardless of the join-clause pushed down, as long as it > satisfies the scan-quals pushed-down. Sorry, my explanation was not correct. (Needed to take in caffeine.) What I'm concerned about is the following: SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON localtab.id = ft1.id FOR UPDATE OF ft1 LockRows -> Nested Loop Join Filter: (localtab.id = ft1.id) -> Seq Scan on localtab -> Foreign Scan on <ft1, ft2> Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x FOR UPDATE OF ft1 Assume that ft1 performs late row locking. If an EPQ recheck was invoked due to a concurrent transaction on the remote server that changed only the value x of the ft1 tuple previously retrieved, then we would have to generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2 tuple previously retrieved was not a null tuple.) However, I'm not sure how we can do that in ForeignRecheck; we can't know for example, which one is outer and which one is inner, without an alternative local join execution plan. Maybe I'm missing something, though. Best regards, Etsuro Fujita
On 2015/10/20 13:11, Etsuro Fujita wrote: > On 2015/10/20 5:34, Robert Haas wrote: >> On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>> As Tom mentioned, just recomputing the original join tuple is not good >>> enough. We would need to rejoin the test tuples for the baserels >>> even if >>> ROW_MARK_COPY is in use. Consider: >>> >>> A=# BEGIN; >>> A=# UPDATE t SET a = a + 1 WHERE b = 1; >>> B=# SELECT * from t, ft1, ft2 >>> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; >>> A=# COMMIT; >>> >>> where the plan for the SELECT FOR UPDATE is >>> >>> LockRows >>> -> Nested Loop >>> -> Seq Scan on t >>> -> Foreign Scan on <ft1, ft2> >>> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c >>> AND ft1.a >>> = $1 AND ft2.b = $2 >>> >>> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the >>> original join tuple from the whole-row image that you proposed would >>> output >>> an incorrect result in the EQP recheck since the value a in the updated >>> version of a to-be-joined tuple in t would no longer match the value >>> ft1.a >>> extracted from the whole-row image if the A's UPDATE has committed >>> successfully. So I think we would need to rejoin the tuples >>> populated from >>> the whole-row images for the baserels ft1 and ft2, by executing the >>> secondary plan with the new parameter values for a and b. >> No. You just need to populate fdw_recheck_quals correctly, same as >> for the scan case. > Yeah, I think we can probably do that for the case where a pushed-down > join clause is an inner-join one, but I'm not sure that we can do that > for the case where that clause is an outer-join one. Maybe I'm missing > something, though. As I said yesterday, that opinion of me is completely wrong. Sorry for the incorrectness. Let me explain a little bit more. I still think that even if ROW_MARK_COPY is in use, we would need to locally rejoin the tuples populated from the whole-row images for the foreign tables involved in a remote join, using a secondary plan. Consider eg, SELECT localtab.*, ft2 from localtab, ft1, ft2 WHERE ft1.x = ft2.x AND ft1.y = localtab.y FOR UPDATE In this case, since the output of the foreign join would not include any ft1 columns, I don't think we could do the same thing as for the scan case, even if populating fdw_recheck_quals correctly. And I think we would need to rejoin the tuples, using a local join execution plan, which would have the parameterization for the to-be-pushed-down clause ft1.y = localtab.y. I'm still missing something, though. Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Wednesday, October 21, 2015 12:31 PM > To: Robert Haas > Cc: Tom Lane; Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI; > pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/20 13:11, Etsuro Fujita wrote: > > On 2015/10/20 5:34, Robert Haas wrote: > >> On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita > >> <fujita.etsuro@lab.ntt.co.jp> wrote: > >>> As Tom mentioned, just recomputing the original join tuple is not good > >>> enough. We would need to rejoin the test tuples for the baserels > >>> even if > >>> ROW_MARK_COPY is in use. Consider: > >>> > >>> A=# BEGIN; > >>> A=# UPDATE t SET a = a + 1 WHERE b = 1; > >>> B=# SELECT * from t, ft1, ft2 > >>> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; > >>> A=# COMMIT; > >>> > >>> where the plan for the SELECT FOR UPDATE is > >>> > >>> LockRows > >>> -> Nested Loop > >>> -> Seq Scan on t > >>> -> Foreign Scan on <ft1, ft2> > >>> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c > >>> AND ft1.a > >>> = $1 AND ft2.b = $2 > >>> > >>> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the > >>> original join tuple from the whole-row image that you proposed would > >>> output > >>> an incorrect result in the EQP recheck since the value a in the updated > >>> version of a to-be-joined tuple in t would no longer match the value > >>> ft1.a > >>> extracted from the whole-row image if the A's UPDATE has committed > >>> successfully. So I think we would need to rejoin the tuples > >>> populated from > >>> the whole-row images for the baserels ft1 and ft2, by executing the > >>> secondary plan with the new parameter values for a and b. > > >> No. You just need to populate fdw_recheck_quals correctly, same as > >> for the scan case. > > > Yeah, I think we can probably do that for the case where a pushed-down > > join clause is an inner-join one, but I'm not sure that we can do that > > for the case where that clause is an outer-join one. Maybe I'm missing > > something, though. > > As I said yesterday, that opinion of me is completely wrong. Sorry for > the incorrectness. Let me explain a little bit more. I still think > that even if ROW_MARK_COPY is in use, we would need to locally rejoin > the tuples populated from the whole-row images for the foreign tables > involved in a remote join, using a secondary plan. Consider eg, > > SELECT localtab.*, ft2 from localtab, ft1, ft2 > WHERE ft1.x = ft2.x AND ft1.y = localtab.y FOR UPDATE > > In this case, since the output of the foreign join would not include any > ft1 columns, I don't think we could do the same thing as for the scan > case, even if populating fdw_recheck_quals correctly. > As an aside, could you introduce the reason why you think so? It is significant point in discussion, if we want to reach the consensus. It looks to me the above introduction mix up the target-list of user query and the target-list of remote query. If EPQ mechanism requires joined tuple on ft1 and ft2, FDW driver can make a remote query as follows: SELECT ft2, ft1.y, ft1.x, ft2.x FROM ft1.x = ft2.x FOR UPDATE Thus, fdw_scan_tlist has four target-entries, but later two items are resjunk=true because ForeignScan node drops these columns by projection when it returns a tuple to upper node. On the other hands, the joined-tuple we're talking about in this context is a tuple prior to projection; formed according to the fdw_scan_tlist. So, it contains all the necessary information to run scan/join qualifiers towards the joined-tuple. It is not affected by the target-list of user query. Even though I think the approach with joined-tuple reconstruction is reasonable solution here, it is not a fair reason to introduce disadvantage of Robert's suggestion. > And I think we > would need to rejoin the tuples, using a local join execution plan, > which would have the parameterization for the to-be-pushed-down clause > ft1.y = localtab.y. I'm still missing something, though. > Also, please don't mix up "what we do" and "how we do". It is "what we do" to discuss which format of tuples shall be returned to the core backend from the extension, because it determines the role of interface. If our consensus is to return a joined-tuple, we need to design the interface according to the consensus. On the other hands, it is "how we do" discussion whether we should enforce all the FDW/CSP extension to have alternative plan, or not. Once we got a consensus in "what we do" discussion, there are variable options to solve the requirement by the consensus, however, we cannot prioritize "how we do" without "what we do". Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/10/21 13:34, Kouhei Kaigai wrote: >> On 2015/10/20 13:11, Etsuro Fujita wrote: >>> On 2015/10/20 5:34, Robert Haas wrote: >>>> No. You just need to populate fdw_recheck_quals correctly, same as >>>> for the scan case. >> As I said yesterday, that opinion of me is completely wrong. Sorry for >> the incorrectness. Let me explain a little bit more. I still think >> that even if ROW_MARK_COPY is in use, we would need to locally rejoin >> the tuples populated from the whole-row images for the foreign tables >> involved in a remote join, using a secondary plan. Consider eg, >> >> SELECT localtab.*, ft2 from localtab, ft1, ft2 >> WHERE ft1.x = ft2.x AND ft1.y = localtab.y FOR UPDATE >> >> In this case, since the output of the foreign join would not include any >> ft1 columns, I don't think we could do the same thing as for the scan >> case, even if populating fdw_recheck_quals correctly. > As an aside, could you introduce the reason why you think so? It is > significant point in discussion, if we want to reach the consensus. > On the other hands, the joined-tuple we're talking about in this context > is a tuple prior to projection; formed according to the fdw_scan_tlist. > So, it contains all the necessary information to run scan/join qualifiers > towards the joined-tuple. It is not affected by the target-list of user > query. After research into the planner, I noticed that I was still wrong; IIUC, the planner requires that the output of foreign join include the column ft1.y even for that case. (I don't understand the reason why the planner requires that.) So, as Robert mentioned, the clause ft1.y = localtab.y could be rechecked during an EPQ recheck, if populating fdw_recheck_quals correctly. Sorry again for the incorrectness. > Even though I think the approach with joined-tuple reconstruction is > reasonable solution here, it is not a fair reason to introduce disadvantage > of Robert's suggestion. Agreed. > Also, please don't mix up "what we do" and "how we do". > > It is "what we do" to discuss which format of tuples shall be returned > to the core backend from the extension, because it determines the role > of interface. If our consensus is to return a joined-tuple, we need to > design the interface according to the consensus. > > On the other hands, it is "how we do" discussion whether we should > enforce all the FDW/CSP extension to have alternative plan, or not. > Once we got a consensus in "what we do" discussion, there are variable > options to solve the requirement by the consensus, however, we cannot > prioritize "how we do" without "what we do". Agreed. Best regards, Etsuro Fujita
On 2015/10/20 9:36, Kouhei Kaigai wrote: > Even if we fetch whole-row of both side, join pushdown is exactly working > because we can receive less number of rows than local join + 2 of foreign- > scan. (If planner works well, we can expect join-path that increases number > of rows shall be dropped.) > > One downside of my proposition is growth of width for individual rows. > It is a trade-off situation. The above approach takes no changes for > existing EPQ infrastructure, thus, its implementation design is clear. > On the other hands, your approach will reduce traffic over the network, > however, it is still unclear how we integrate scanrelid==0 with EPQ > infrastructure. I agree with KaiGai-san that his proposition (or my proposition based on secondary plans) is still a performance improvement over the current implementation on local joining plus early row locking, since that that wouldn't have to transfer useless data that didn't satisfy join conditions at all! > On the other hands, in case of custom-scan that takes underlying local > scan-nodes, thus, any kind of ROW_MARK_* except for ROW_MARK_COPY will > happen. I think width of the joined tuples are relatively minor issue > than FDW cases. However, we cannot expect the fetched rows are protected > by early row-locking mechanism, so probability of re-fetching rows and > reconstruction of joined-tuple has relatively higher priority. I see. >> There is also some possible loss of efficiency with this approach. >> Suppose that we have two tables ft1 and ft2 which are being joined, >> and we push down the join. They are being joined on an integer >> column, and the join needs to select several other columns as well. >> However, ft1 and ft2 are very wide tables that also contain some text >> columns. The query is like this: >> >> SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2 >> ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON >> localtab.q = ft1.q; >> >> If we refetch each row individually, we will need a wholerow image of >> ft1 and ft2 that includes all columns, or at least f1.huge and >> f2.huge2. If we just fetch a wholerow image of the join output, we >> can exclude those. The only thing we need to recheck is that it's >> still the case that localtab.q = ft1.q (because the value of >> localtab.q might have changed). As KaiGai-san mentioned above, what we need to discuss more about with Robert's proposition is how to integrate that into the existing EPQ machinery. For example, when, where, and how should we refetch the whole-row image of the join output in the case of late row locking? IMV I think that that would need to add a new FDW API different from RefetchForeignRow, say RefetchForeignJoinRow. IMO I think that another benefit from the proposition from KaiGai-san (or me) would be that that could provide the whole functionality for row locking in remote joins, without an additional development burden on an FDW author; the author only has to write GetForeignRowMarkType and RefetchForeignRow, which I think is relatively easy. I think that in the proposition, the use of rowmark types such as ROW_MARK_SHARE or ROW_MARK_EXCLUSIVE for foreign tables in remote joins would be quite inefficient, but I think that the use of ROW_MARK_REFERENCE instead of ROW_MARK_COPY would be an option for the workload where EPQ rechecks are rarely invoked, because we just need to transfer ctids, not whole-row images. Best regards, Etsuro Fujita
On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Sorry, my explanation was not correct. (Needed to take in caffeine.) What > I'm concerned about is the following: > > SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON > localtab.id = ft1.id FOR UPDATE OF ft1 > > LockRows > -> Nested Loop > Join Filter: (localtab.id = ft1.id) > -> Seq Scan on localtab > -> Foreign Scan on <ft1, ft2> > Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x > FOR UPDATE OF ft1 > > Assume that ft1 performs late row locking. If the SQL includes "FOR UPDATE of ft1", then it clearly performs early row locking. I assume you meant to omit that. > If an EPQ recheck was invoked > due to a concurrent transaction on the remote server that changed only the > value x of the ft1 tuple previously retrieved, then we would have to > generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2 > tuple previously retrieved was not a null tuple.) However, I'm not sure how > we can do that in ForeignRecheck; we can't know for example, which one is > outer and which one is inner, without an alternative local join execution > plan. Maybe I'm missing something, though. I would expect it to issue a new query like: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1. This should be significantly more efficient than fetching the base rows from each of two tables with two separate queries. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > Sorry, my explanation was not correct. (Needed to take in caffeine.) What > > I'm concerned about is the following: > > > > SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON > > localtab.id = ft1.id FOR UPDATE OF ft1 > > > > LockRows > > -> Nested Loop > > Join Filter: (localtab.id = ft1.id) > > -> Seq Scan on localtab > > -> Foreign Scan on <ft1, ft2> > > Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x > > FOR UPDATE OF ft1 > > > > Assume that ft1 performs late row locking. > > If the SQL includes "FOR UPDATE of ft1", then it clearly performs > early row locking. I assume you meant to omit that. > > > If an EPQ recheck was invoked > > due to a concurrent transaction on the remote server that changed only the > > value x of the ft1 tuple previously retrieved, then we would have to > > generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2 > > tuple previously retrieved was not a null tuple.) However, I'm not sure how > > we can do that in ForeignRecheck; we can't know for example, which one is > > outer and which one is inner, without an alternative local join execution > > plan. Maybe I'm missing something, though. > > I would expect it to issue a new query like: SELECT * FROM ft1 LEFT > JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1. > > This should be significantly more efficient than fetching the base > rows from each of two tables with two separate queries. > In this case, the EPQ slot to store the joined tuple is still a challenge to be solved. Is it possible to use one or any of EPQ slots that are setup for base relations but represented by ForeignScan/CustomScan? In case when ForeignScan run a remote join that involves three base foreign tables (relid=2, 3, 5 for example), for example, no other code touches this slot. So, it is safe even if we put a joined tuple on EPQ slots of underlying base relations. In this case, EPQ slots are initialized as below: es_epqTuple[0] ... EPQ tuple of base relation (relid=1) es_epqTuple[1] ... EPQ of the joined tuple (for relis=2, 3 5) es_epqTuple[2]... EPQ of the joined tuple (for relis=2, 3 5), copy of above es_epqTuple[3] ... EPQ tuple of base relation(relid=4) es_epqTuple[4] ... EPQ of the joined tuple (for relis=2, 3 5), copy of above es_epqTuple[5] ... EPQ tupleof base relation (relid=6) Also, FDW/CSP shall be responsible to return a joined tuple as a result for whole-row reference of underlying base relation. (One other challenge is how to handle the case when user explicitly required a whole-row reference...Hmm...) Then, if FDW/CSP is designed to utilize the preliminary joined tuples rather than local join, it can just raise the tuple kept in one of the EPQ slots for underlying base relations. If FDW/CSP prefers local join, it can perform as like local join doing; check join condition and construct a joined tuple by itself or by alternative plan. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, Oct 29, 2015 at 6:05 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > In this case, the EPQ slot to store the joined tuple is still > a challenge to be solved. > > Is it possible to use one or any of EPQ slots that are setup for > base relations but represented by ForeignScan/CustomScan? Yes, I proposed that exact thing upthread. > In case when ForeignScan run a remote join that involves three > base foreign tables (relid=2, 3, 5 for example), for example, > no other code touches this slot. So, it is safe even if we put > a joined tuple on EPQ slots of underlying base relations. > > In this case, EPQ slots are initialized as below: > > es_epqTuple[0] ... EPQ tuple of base relation (relid=1) > es_epqTuple[1] ... EPQ of the joined tuple (for relis=2, 3 5) > es_epqTuple[2] ... EPQ of the joined tuple (for relis=2, 3 5), copy of above > es_epqTuple[3] ... EPQ tuple of base relation (relid=4) > es_epqTuple[4] ... EPQ of the joined tuple (for relis=2, 3 5), copy of above > es_epqTuple[5] ... EPQ tuple of base relation (relid=6) You don't really need to initialize them all. You can just initialize es_epqTuple[1] and leave 2 and 4 unused. > Then, if FDW/CSP is designed to utilize the preliminary joined > tuples rather than local join, it can just raise the tuple kept > in one of the EPQ slots for underlying base relations. > If FDW/CSP prefers local join, it can perform as like local join > doing; check join condition and construct a joined tuple by itself > or by alternative plan. Right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Thu, Oct 29, 2015 at 6:05 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > In this case, the EPQ slot to store the joined tuple is still > > a challenge to be solved. > > > > Is it possible to use one or any of EPQ slots that are setup for > > base relations but represented by ForeignScan/CustomScan? > > Yes, I proposed that exact thing upthread. > > > In case when ForeignScan run a remote join that involves three > > base foreign tables (relid=2, 3, 5 for example), for example, > > no other code touches this slot. So, it is safe even if we put > > a joined tuple on EPQ slots of underlying base relations. > > > > In this case, EPQ slots are initialized as below: > > > > es_epqTuple[0] ... EPQ tuple of base relation (relid=1) > > es_epqTuple[1] ... EPQ of the joined tuple (for relis=2, 3 5) > > es_epqTuple[2] ... EPQ of the joined tuple (for relis=2, 3 5), copy of above > > es_epqTuple[3] ... EPQ tuple of base relation (relid=4) > > es_epqTuple[4] ... EPQ of the joined tuple (for relis=2, 3 5), copy of above > > es_epqTuple[5] ... EPQ tuple of base relation (relid=6) > > You don't really need to initialize them all. You can just initialize > es_epqTuple[1] and leave 2 and 4 unused. > > > Then, if FDW/CSP is designed to utilize the preliminary joined > > tuples rather than local join, it can just raise the tuple kept > > in one of the EPQ slots for underlying base relations. > > If FDW/CSP prefers local join, it can perform as like local join > > doing; check join condition and construct a joined tuple by itself > > or by alternative plan. > > Right. > A challenge is that junk wholerow references on behalf of ROW_MARK_COPY are injected by preprocess_targetlist(). It is earlier than the main path consideration by query_planner(), thus, it is not predictable how remote query shall be executed at this point. If ROW_MARK_COPY, base tuple image is fetched using this junk attribute. So, here is two options if we allow to put joined tuple on either of es_epqTuple[]. options-1) We ignore record type definition. FDW returns a joined tuple towards the whole-row reference of either of the base relations in this join. The junk attribute shall be filtered out eventually and only FDW driver shall see, so it is harmless to do (probably). This option takes no big changes, however, we need a little brave to adopt. options-2) We allow FDW/CSP to adjust target-list of the relevant nodes after these paths get chosen by planner. It enables to remove whole-row reference of base relations and add alternative whole-row reference instead if FDW/CSP can support it. This feature can be relevant to target-list push-down to the remote side, not only EPQ rechecks, because adjustment of target-list means we allows FDW/CSP to determine which expression shall be executed locally, or shall not be. I think, this option is more straightforward, however, needs a little bit deeper consideration, because we have to design the best hook point and need to ensure how path-ification will perform. Therefore, I think we need two steps towards the entire solution. Step-1) FDW/CSP will recheck base EPQ tuples and support local reconstruction on the fly. It does not need something special enhancement on the planner - so we can fix up by v9.5 release. Step-2) FDW/CSP will support adjustment of target-list to add whole-row reference of joined tuple instead of multiple base relations, then FDW/CSP will be able to put a joined tuple on either of EPQ slot if it wants - it takes a new feature enhancement, so v9.6 is a suitable timeline. How about your opinion towards the direction? I don't want to drop extra optimization opportunity, however, we are now in November. I don't have enough brave to add none-obvious new feature here. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/10/28 6:04, Robert Haas wrote: > On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Sorry, my explanation was not correct. (Needed to take in caffeine.) What >> I'm concerned about is the following: >> >> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON >> localtab.id = ft1.id FOR UPDATE OF ft1 >> >> LockRows >> -> Nested Loop >> Join Filter: (localtab.id = ft1.id) >> -> Seq Scan on localtab >> -> Foreign Scan on <ft1, ft2> >> Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x >> FOR UPDATE OF ft1 >> >> Assume that ft1 performs late row locking. > If the SQL includes "FOR UPDATE of ft1", then it clearly performs > early row locking. I assume you meant to omit that. Right. Sorry for my mistake. >> If an EPQ recheck was invoked >> due to a concurrent transaction on the remote server that changed only the >> value x of the ft1 tuple previously retrieved, then we would have to >> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2 >> tuple previously retrieved was not a null tuple.) However, I'm not sure how >> we can do that in ForeignRecheck; we can't know for example, which one is >> outer and which one is inner, without an alternative local join execution >> plan. Maybe I'm missing something, though. > I would expect it to issue a new query like: SELECT * FROM ft1 LEFT > JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1. We assume here that ft1 uses late row locking, so I thought the above SQL should include "FOR UPDATE of ft1". But I still don't think that that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for that is that the updated version of the ft1 tuple wouldn't satisfy the ft1.tid = $0 condition in an EPQ recheck, because the ctid for the updated version of the ft1 tuple has changed. (IIUC, I think that if we use a TID scan for ft1, the SQL would generate the expected result, because I think that the TID condition would be ignored in the EPQ recheck, but I don't think it's guaranteed to use a TID scan for ft1.) Maybe I'm missing something, though. > This should be significantly more efficient than fetching the base > rows from each of two tables with two separate queries. Maybe I think we could fix the SQL, so I have to admit that, but I'm just wondering (1) what would happen for the case when ft1 uses late row rocking and ft2 uses early row rocking and (2) that would be still more efficient than re-fetching only the base row from ft1. What I thought to improve the efficiency in the secondary-plan approach that I proposed was that if we could parallelize re-fetching foreign rows in ExecLockRows and EvalPlanQualFetchRowMarks, we would be able to improve the efficiency not only for the case when performing a join of foreign tables remotely but for the case when performing the join locally. Best regards, Etsuro Fujita
> On 2015/10/28 6:04, Robert Haas wrote: > > On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > >> Sorry, my explanation was not correct. (Needed to take in caffeine.) What > >> I'm concerned about is the following: > >> > >> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON > >> localtab.id = ft1.id FOR UPDATE OF ft1 > >> > >> LockRows > >> -> Nested Loop > >> Join Filter: (localtab.id = ft1.id) > >> -> Seq Scan on localtab > >> -> Foreign Scan on <ft1, ft2> > >> Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x > >> FOR UPDATE OF ft1 > >> > >> Assume that ft1 performs late row locking. > > > If the SQL includes "FOR UPDATE of ft1", then it clearly performs > > early row locking. I assume you meant to omit that. > > Right. Sorry for my mistake. > > >> If an EPQ recheck was invoked > >> due to a concurrent transaction on the remote server that changed only the > >> value x of the ft1 tuple previously retrieved, then we would have to > >> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2 > >> tuple previously retrieved was not a null tuple.) However, I'm not sure how > >> we can do that in ForeignRecheck; we can't know for example, which one is > >> outer and which one is inner, without an alternative local join execution > >> plan. Maybe I'm missing something, though. > > > I would expect it to issue a new query like: SELECT * FROM ft1 LEFT > > JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1. > > We assume here that ft1 uses late row locking, so I thought the above > SQL should include "FOR UPDATE of ft1". But I still don't think that > that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the > fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for > that is that the updated version of the ft1 tuple wouldn't satisfy the > ft1.tid = $0 condition in an EPQ recheck, because the ctid for the > updated version of the ft1 tuple has changed. (IIUC, I think that if we > use a TID scan for ft1, the SQL would generate the expected result, > because I think that the TID condition would be ignored in the EPQ > recheck, but I don't think it's guaranteed to use a TID scan for ft1.) > Maybe I'm missing something, though. > It looks to me, we should not use ctid system column to identify remote row when postgres_fdw tries to support late row locking. The documentation says: http://www.postgresql.org/docs/devel/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE UPDATE and DELETE operations are performed against rows previously fetched by the table-scanning functions. The FDW mayneed extra information, such as a row ID or the values of primary-key columns, to ensure that it can identify the exactrow to update or delete The "rowid" should not be changed once it is fetched from the remote side until it is actually updated, deleted or locked, for correct identification. If ctid is used for this purpose, it is safe only when remote row is locked when it is fetched - it is exactly early row locking behavior, isn't it? > > This should be significantly more efficient than fetching the base > > rows from each of two tables with two separate queries. > > Maybe I think we could fix the SQL, so I have to admit that, but I'm > just wondering (1) what would happen for the case when ft1 uses late row > rocking and ft2 uses early row rocking and (2) that would be still more > efficient than re-fetching only the base row from ft1. > It should be decision by FDW driver. It is not easy to estimate a certain FDW driver mixes up early and late locking policy within a same remote join query. Do you really want to support such a mysterious implementation? Or, do you expect all the FDW driver is enforced to return a joined tuple if remote join case? It is different from my idea; it shall be an extra optimization option if FDW can fetch a joined tuple at once, but not always. So, if FDW driver does not support this optimal behavior, your driver can fetch two base tables then run local alternative join (or something other). > What I thought to improve the efficiency in the secondary-plan approach > that I proposed was that if we could parallelize re-fetching foreign > rows in ExecLockRows and EvalPlanQualFetchRowMarks, we would be able to > improve the efficiency not only for the case when performing a join of > foreign tables remotely but for the case when performing the join locally. > Parallelism is not a magic bullet... Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/03 22:15, Kouhei Kaigai wrote: > A challenge is that junk wholerow references on behalf of ROW_MARK_COPY > are injected by preprocess_targetlist(). It is earlier than the main path > consideration by query_planner(), thus, it is not predictable how remote > query shall be executed at this point. > If ROW_MARK_COPY, base tuple image is fetched using this junk attribute. > So, here is two options if we allow to put joined tuple on either of > es_epqTuple[]. > > options-1) We ignore record type definition. FDW returns a joined tuple > towards the whole-row reference of either of the base relations in this > join. The junk attribute shall be filtered out eventually and only FDW > driver shall see, so it is harmless to do (probably). > This option takes no big changes, however, we need a little brave to adopt. > > options-2) We allow FDW/CSP to adjust target-list of the relevant nodes > after these paths get chosen by planner. It enables to remove whole-row > reference of base relations and add alternative whole-row reference instead > if FDW/CSP can support it. > This feature can be relevant to target-list push-down to the remote side, > not only EPQ rechecks, because adjustment of target-list means we allows > FDW/CSP to determine which expression shall be executed locally, or shall > not be. > I think, this option is more straightforward, however, needs a little bit > deeper consideration, because we have to design the best hook point and > need to ensure how path-ification will perform. > > Therefore, I think we need two steps towards the entire solution. > Step-1) FDW/CSP will recheck base EPQ tuples and support local > reconstruction on the fly. It does not need something special > enhancement on the planner - so we can fix up by v9.5 release. > Step-2) FDW/CSP will support adjustment of target-list to add whole-row > reference of joined tuple instead of multiple base relations, then FDW/CSP > will be able to put a joined tuple on either of EPQ slot if it wants - it > takes a new feature enhancement, so v9.6 is a suitable timeline. > > How about your opinion towards the direction? > I don't want to drop extra optimization opportunity, however, we are now in > November. I don't have enough brave to add none-obvious new feature here. I think we need to consider a general solution that can be applied not only to the case where the component tables in a foreign join all use ROW_MARK_COPY but to the case where those tables use different rowmark types such as ROW_MARK_COPY and ROW_MARK_EXCLUSIVE, as I pointed out upthread. Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Wednesday, November 04, 2015 5:11 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/11/03 22:15, Kouhei Kaigai wrote: > > A challenge is that junk wholerow references on behalf of ROW_MARK_COPY > > are injected by preprocess_targetlist(). It is earlier than the main path > > consideration by query_planner(), thus, it is not predictable how remote > > query shall be executed at this point. > > If ROW_MARK_COPY, base tuple image is fetched using this junk attribute. > > So, here is two options if we allow to put joined tuple on either of > > es_epqTuple[]. > > > > options-1) We ignore record type definition. FDW returns a joined tuple > > towards the whole-row reference of either of the base relations in this > > join. The junk attribute shall be filtered out eventually and only FDW > > driver shall see, so it is harmless to do (probably). > > This option takes no big changes, however, we need a little brave to adopt. > > > > options-2) We allow FDW/CSP to adjust target-list of the relevant nodes > > after these paths get chosen by planner. It enables to remove whole-row > > reference of base relations and add alternative whole-row reference instead > > if FDW/CSP can support it. > > This feature can be relevant to target-list push-down to the remote side, > > not only EPQ rechecks, because adjustment of target-list means we allows > > FDW/CSP to determine which expression shall be executed locally, or shall > > not be. > > I think, this option is more straightforward, however, needs a little bit > > deeper consideration, because we have to design the best hook point and > > need to ensure how path-ification will perform. > > > > Therefore, I think we need two steps towards the entire solution. > > Step-1) FDW/CSP will recheck base EPQ tuples and support local > > reconstruction on the fly. It does not need something special > > enhancement on the planner - so we can fix up by v9.5 release. > > Step-2) FDW/CSP will support adjustment of target-list to add whole-row > > reference of joined tuple instead of multiple base relations, then FDW/CSP > > will be able to put a joined tuple on either of EPQ slot if it wants - it > > takes a new feature enhancement, so v9.6 is a suitable timeline. > > > > How about your opinion towards the direction? > > I don't want to drop extra optimization opportunity, however, we are now in > > November. I don't have enough brave to add none-obvious new feature here. > > I think we need to consider a general solution that can be applied not > only to the case where the component tables in a foreign join all use > ROW_MARK_COPY but to the case where those tables use different rowmark > types such as ROW_MARK_COPY and ROW_MARK_EXCLUSIVE, as I pointed out > upthread. > In mixture case, FDW/CSP can choose local recheck & reconstruction based on the EPQ tuples of base relation. Nobody enforce FDW/CSP to return a joined tuple always even if author don't want to support the feature. Why do you think it is not a generic solution? FDW/CSP driver "can choose" the best solution according to its implementation and capability. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/04 17:10, Kouhei Kaigai wrote: >> On 2015/10/28 6:04, Robert Haas wrote: >>> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> Sorry, my explanation was not correct. (Needed to take in caffeine.) What >>>> I'm concerned about is the following: >>>> >>>> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON >>>> localtab.id = ft1.id FOR UPDATE OF ft1 >>>> >>>> LockRows >>>> -> Nested Loop >>>> Join Filter: (localtab.id = ft1.id) >>>> -> Seq Scan on localtab >>>> -> Foreign Scan on <ft1, ft2> >>>> Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x >>>> FOR UPDATE OF ft1 >>>> >>>> Assume that ft1 performs late row locking. >>> If the SQL includes "FOR UPDATE of ft1", then it clearly performs >>> early row locking. I assume you meant to omit that. >>>> If an EPQ recheck was invoked >>>> due to a concurrent transaction on the remote server that changed only the >>>> value x of the ft1 tuple previously retrieved, then we would have to >>>> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2 >>>> tuple previously retrieved was not a null tuple.) However, I'm not sure how >>>> we can do that in ForeignRecheck; we can't know for example, which one is >>>> outer and which one is inner, without an alternative local join execution >>>> plan. Maybe I'm missing something, though. >>> I would expect it to issue a new query like: SELECT * FROM ft1 LEFT >>> JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1. >> We assume here that ft1 uses late row locking, so I thought the above >> SQL should include "FOR UPDATE of ft1". But I still don't think that >> that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the >> fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for >> that is that the updated version of the ft1 tuple wouldn't satisfy the >> ft1.tid = $0 condition in an EPQ recheck, because the ctid for the >> updated version of the ft1 tuple has changed. (IIUC, I think that if we >> use a TID scan for ft1, the SQL would generate the expected result, >> because I think that the TID condition would be ignored in the EPQ >> recheck, but I don't think it's guaranteed to use a TID scan for ft1.) >> Maybe I'm missing something, though. > It looks to me, we should not use ctid system column to identify remote > row when postgres_fdw tries to support late row locking. > > The documentation says: > http://www.postgresql.org/docs/devel/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE > > UPDATE and DELETE operations are performed against rows previously > fetched by the table-scanning functions. The FDW may need extra information, > such as a row ID or the values of primary-key columns, to ensure that it can > identify the exact row to update or delete > > The "rowid" should not be changed once it is fetched from the remote side > until it is actually updated, deleted or locked, for correct identification. > If ctid is used for this purpose, it is safe only when remote row is locked > when it is fetched - it is exactly early row locking behavior, isn't it? Yeah, we should use early row locking for a target foreign table in UPDATE/DELETE. In case of SELECT FOR UPDATE, I think we are allowed to use ctid to identify target rows for late row locking, but I think the above SQL should be changed to something like this: SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x >>> This should be significantly more efficient than fetching the base >>> rows from each of two tables with two separate queries. >> Maybe I think we could fix the SQL, so I have to admit that, but I'm >> just wondering (1) what would happen for the case when ft1 uses late row >> rocking and ft2 uses early row rocking and (2) that would be still more >> efficient than re-fetching only the base row from ft1. > It should be decision by FDW driver. It is not easy to estimate a certain > FDW driver mixes up early and late locking policy within a same remote join > query. Do you really want to support such a mysterious implementation? Yeah, the reason for that is because GetForeignRowMarkType allows that. > Or, do you expect all the FDW driver is enforced to return a joined tuple > if remote join case? No. That wouldn't make sense if at least one component table involved in a foreign join uses the rowmark type other than ROW_MARK_COPY. > It is different from my idea; it shall be an extra > optimization option if FDW can fetch a joined tuple at once, but not always. > So, if FDW driver does not support this optimal behavior, your driver can > fetch two base tables then run local alternative join (or something other). OK, so if we all agree that the joined-tuple optimization is just an option for the case where all the component tables use ROW_MARK_COPY, I'd propose to leave that for 9.6. Best regards, Etsuro Fujita
On 2015/11/04 17:28, Kouhei Kaigai wrote: >> I think we need to consider a general solution that can be applied not >> only to the case where the component tables in a foreign join all use >> ROW_MARK_COPY but to the case where those tables use different rowmark >> types such as ROW_MARK_COPY and ROW_MARK_EXCLUSIVE, as I pointed out >> upthread. > In mixture case, FDW/CSP can choose local recheck & reconstruction based > on the EPQ tuples of base relation. Nobody enforce FDW/CSP to return > a joined tuple always even if author don't want to support the feature. > Why do you think it is not a generic solution? FDW/CSP driver "can choose" > the best solution according to its implementation and capability. It looked to me that you were discussing only the case where component foreign tables in a foreign join all use ROW_MARK_COPY, so I commented that. Sorry for my misunderstanding. Best regards, Etsuro Fujita
Hi, I've caught up again. > OK, so if we all agree that the joined-tuple optimization is just an > option for the case where all the component tables use ROW_MARK_COPY, > I'd propose to leave that for 9.6. I still think that ExecScan is called under EPQ recheck without EQP tuple for the *scan*. The ForeignScan can be generated for a join and underlying foreign scans and such execution node returns what the core deesn't expect for any scan node. This is what I think is the root cause of this problem. So, as the third way, I propose to resurrect the abandoned ForeinJoinState seems to be for the unearthed requirements. FDW returns ForeignJoinPath, not ForeignScanPath then finally it becomes ForeignJoinState, which is handeled as a join node with no doubt. What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
> -----Original Message----- > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp] > Sent: Thursday, November 05, 2015 10:02 AM > To: fujita.etsuro@lab.ntt.co.jp > Cc: Kaigai Kouhei(海外 浩平); robertmhaas@gmail.com; tgl@sss.pgh.pa.us; > pgsql-hackers@postgresql.org; shigeru.hanada@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Hi, I've caught up again. > > > OK, so if we all agree that the joined-tuple optimization is just an > > option for the case where all the component tables use ROW_MARK_COPY, > > I'd propose to leave that for 9.6. > > I still think that ExecScan is called under EPQ recheck without > EQP tuple for the *scan*. > > The ForeignScan can be generated for a join and underlying > foreign scans and such execution node returns what the core > deesn't expect for any scan node. This is what I think is the > root cause of this problem. > > So, as the third way, I propose to resurrect the abandoned > ForeinJoinState seems to be for the unearthed requirements. FDW > returns ForeignJoinPath, not ForeignScanPath then finally it > becomes ForeignJoinState, which is handeled as a join node with > no doubt. > > What do you think about this? > Apart from EPQ issues, it is fundamentally impossible to reflect the remote join tree on local side, because remote server runs the partial join in their best or arbitrary way. If this ForeignJoinState has just a compatible join sub-tree, what is the difference from the alternative local join sub-plan? Even if we have another node, the roles of FDW driver is unchanged. It eventually needs to do them:1. Recheck scan-qualifier of base foreign table2. Recheck join-clause of remote joins3. Reconstructa joined tuple I try to estimate your intention... You say that ForeignScan with scanrelid==0 is not a scan actually, so it is problematic to call ExecScan on ExecForeignScan always. Thus, individual ForeignJoin shall be defined. Right? In case of scanrelid==0, it performs like a scan on pseudo relation that has record type defined by fdw_scan_tlist. The rows generated with this node are consists of rows in underlying base relations. A significant point is, FDW driver is responsible to generate the rows according to the fdw_scan_tlist. Once FDW driver generates rows, ExecScan() runs remaining tasks - execution of host clauses (although it is not easy to image remote join includes host clause has cheaper cost than others) and projection. One thing I can agree is, ForeignScan is enforced to use ExecScan, thus some FDW driver may concern about this hard-wired logic. If we try to make ForeignScan unbound from the ExecScan, I like to suggest to revise ExecForeignScan, just invoke a callback; then FDW driver can choose whether ExecScan is best or not. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/04 18:50, Etsuro Fujita wrote: > On 2015/11/04 17:10, Kouhei Kaigai wrote: >>> On 2015/10/28 6:04, Robert Haas wrote: >>>> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita >>>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>>> Sorry, my explanation was not correct. (Needed to take in >>>>> caffeine.) What >>>>> I'm concerned about is the following: >>>>> >>>>> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON >>>>> localtab.id = ft1.id FOR UPDATE OF ft1 >>>>> If an EPQ recheck was invoked >>>>> due to a concurrent transaction on the remote server that changed >>>>> only the >>>>> value x of the ft1 tuple previously retrieved, then we would have to >>>>> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that >>>>> the ft2 >>>>> tuple previously retrieved was not a null tuple.) However, I'm not >>>>> sure how >>>>> we can do that in ForeignRecheck; we can't know for example, which >>>>> one is >>>>> outer and which one is inner, without an alternative local join >>>>> execution >>>>> plan. Maybe I'm missing something, though. >>>> I would expect it to issue a new query like: SELECT * FROM ft1 LEFT >>>> JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1. >>> We assume here that ft1 uses late row locking, so I thought the above >>> SQL should include "FOR UPDATE of ft1". But I still don't think that >>> that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the >>> fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for >>> that is that the updated version of the ft1 tuple wouldn't satisfy the >>> ft1.tid = $0 condition in an EPQ recheck, because the ctid for the >>> updated version of the ft1 tuple has changed. (IIUC, I think that if we >>> use a TID scan for ft1, the SQL would generate the expected result, >>> because I think that the TID condition would be ignored in the EPQ >>> recheck, but I don't think it's guaranteed to use a TID scan for ft1.) >>> Maybe I'm missing something, though. >> It looks to me, we should not use ctid system column to identify remote >> row when postgres_fdw tries to support late row locking. >> The "rowid" should not be changed once it is fetched from the remote side >> until it is actually updated, deleted or locked, for correct >> identification. >> If ctid is used for this purpose, it is safe only when remote row is >> locked >> when it is fetched - it is exactly early row locking behavior, isn't it? > In case of SELECT FOR UPDATE, I think we are allowed to use ctid to > identify target rows for late row locking, but I think the above SQL > should be changed to something like this: > > SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT > JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x I noticed that the modofied SQL was still wrong; ss1 would produce no tuple, if using eg, a sequential scan for ss1, as discussed above. Sheesh, where is my brain? I still think we are allowed to do that, but what is the right SQL for that? In the current implementation of postgres_fdw, we need not take into consideration that what was fetched was an updated version of the tuple rather than the same version previously obtained, since that always uses at least REPEATABLE READ in the remote session. But otherwise it would be possible that what was fetched was an updated version of the tuple, having a different ctid value, which wouldn't satisfy the condition like "ft1.tid = $0" in ss1 any more. Best regards, Etsuro Fujita
Hello, At Thu, 5 Nov 2015 01:58:00 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80116284C@BPXM15GP.gisp.nec.co.jp> > > So, as the third way, I propose to resurrect the abandoned > > ForeinJoinState seems to be for the unearthed requirements. FDW > > returns ForeignJoinPath, not ForeignScanPath then finally it > > becomes ForeignJoinState, which is handeled as a join node with > > no doubt. > > > > What do you think about this? > > > Apart from EPQ issues, it is fundamentally impossible to reflect > the remote join tree on local side, because remote server runs > the partial join in their best or arbitrary way. > If this ForeignJoinState has just a compatible join sub-tree, what > is the difference from the alternative local join sub-plan? I think the ForeignJoinState don't have subnodes and might has no difference in its structure from ForeignScanState. Its significant difference from ForeignScanState would be that the core can properly handle the return from the node as a joined tuple in ordinary way. Executor no more calls ExecScan for joined tuples again. > Even if we have another node, the roles of FDW driver is unchanged. > It eventually needs to do them: > 1. Recheck scan-qualifier of base foreign table > 2. Recheck join-clause of remote joins > 3. Reconstruct a joined tuple Yes, the most significant point of this proposal is in not FDW side but core side. > I try to estimate your intention... > You say that ForeignScan with scanrelid==0 is not a scan actually, > so it is problematic to call ExecScan on ExecForeignScan always. > Thus, individual ForeignJoin shall be defined. > Right? Definitely. > In case of scanrelid==0, it performs like a scan on pseudo relation > that has record type defined by fdw_scan_tlist. The rows generated > with this node are consists of rows in underlying base relations. > A significant point is, FDW driver is responsible to generate the > rows according to the fdw_scan_tlist. Once FDW driver generates rows, > ExecScan() runs remaining tasks - execution of host clauses (although > it is not easy to image remote join includes host clause has cheaper > cost than others) and projection. Agreed. The role of FDW won't be changed by introducing ForeignJoin. > One thing I can agree is, ForeignScan is enforced to use ExecScan, > thus some FDW driver may concern about this hard-wired logic. > If we try to make ForeignScan unbound from the ExecScan, I like to > suggest to revise ExecForeignScan, just invoke a callback; then > FDW driver can choose whether ExecScan is best or not. Agreed. Calling ExecScan unconditionally from ForeignScan is the cause of the root(?) cause I mentioned. Since there'd be no difference in data structure between Foreign(Join&Node), calling fdwroutine->ExecForeignScan() or something instaed of ExecScan() from ExecForeignScan could be the alternative and most promising solution for all problems in focus now. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, The attached small patch is what I have in mind now. fdwroutine->ExecForeignScan may be unset if the FDW does nothing special. And all the FDW routine needs is the node. > Subject: [PATCH] Allow substitute ExecScan body for ExecForignScan > > ForeignScan node may return joined tuple. This joined tuple cannot be > handled properly by ExecScan during EQP recheck. This patch allows > FDWs to give a special treat to such tuples. regards, > > One thing I can agree is, ForeignScan is enforced to use ExecScan, > > thus some FDW driver may concern about this hard-wired logic. > > If we try to make ForeignScan unbound from the ExecScan, I like to > > suggest to revise ExecForeignScan, just invoke a callback; then > > FDW driver can choose whether ExecScan is best or not. > > Agreed. Calling ExecScan unconditionally from ForeignScan is the > cause of the root(?) cause I mentioned. Since there'd be no > difference in data structure between Foreign(Join&Node), calling > fdwroutine->ExecForeignScan() or something instaed of ExecScan() > from ExecForeignScan could be the alternative and most promising > solution for all problems in focus now. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Nov 3, 2015 at 8:15 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > A challenge is that junk wholerow references on behalf of ROW_MARK_COPY > are injected by preprocess_targetlist(). It is earlier than the main path > consideration by query_planner(), thus, it is not predictable how remote > query shall be executed at this point. Oh, dear. That seems like a rather serious problem for my approach. > If ROW_MARK_COPY, base tuple image is fetched using this junk attribute. > So, here is two options if we allow to put joined tuple on either of > es_epqTuple[]. Neither of these sounds viable to me. I'm inclined to go back to something like what you proposed here: http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80114B89D@BPXM15GP.gisp.nec.co.jp -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Sent: Friday, November 06, 2015 9:40 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Etsuro Fujita; Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; > Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On Tue, Nov 3, 2015 at 8:15 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > A challenge is that junk wholerow references on behalf of ROW_MARK_COPY > > are injected by preprocess_targetlist(). It is earlier than the main path > > consideration by query_planner(), thus, it is not predictable how remote > > query shall be executed at this point. > > Oh, dear. That seems like a rather serious problem for my approach. > > > If ROW_MARK_COPY, base tuple image is fetched using this junk attribute. > > So, here is two options if we allow to put joined tuple on either of > > es_epqTuple[]. > > Neither of these sounds viable to me. > > I'm inclined to go back to something like what you proposed here: > Good :-) > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80114B89 > D@BPXM15GP.gisp.nec.co.jp > This patch needs to be rebased. One thing different from the latest version is fdw_recheck_quals of ForeignScan was added. So, ... (1) Principle is that FDW driver knows what qualifiers were pushed down and how does it kept in the private field. So, fdw_recheck_quals is redundant and to be reverted. (2) Even though the principle is as described in (1), however, wired logic in ForeignRecheck() and fdw_recheck_quals are useful default for most of FDW drivers. So, it shall be kept and valid only if RecheckForeignScan callback is not defined. Which is better approach for the v3 patch? My preference is (1), because fdw_recheck_quals is a new feature, thus, FDW driver has to be adjusted in v9.5 more or less, even if it already supports qualifier push-down. In general, interface becomes more graceful to stick its principle. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Fri, Nov 6, 2015 at 9:42 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > This patch needs to be rebased. > One thing different from the latest version is fdw_recheck_quals of > ForeignScan was added. So, ... > > (1) Principle is that FDW driver knows what qualifiers were pushed down > and how does it kept in the private field. So, fdw_recheck_quals is > redundant and to be reverted. > > (2) Even though the principle is as described in (1), however, > wired logic in ForeignRecheck() and fdw_recheck_quals are useful > default for most of FDW drivers. So, it shall be kept and valid > only if RecheckForeignScan callback is not defined. > > Which is better approach for the v3 patch? > My preference is (1), because fdw_recheck_quals is a new feature, > thus, FDW driver has to be adjusted in v9.5 more or less, even if > it already supports qualifier push-down. > In general, interface becomes more graceful to stick its principle. fdw_recheck_quals seems likely to be very convenient for FDW authors, and I think ripping it out would be a terrible decision. I think ForeignRecheck should first call ExecQual to test fdw_recheck_quals. If it returns false, return false. If it returns true, then give the FDW callback a chance, if one is defined. If that returns false, return false. If we haven't yet returned false, return true. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Fri, Nov 6, 2015 at 9:42 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > This patch needs to be rebased. > > One thing different from the latest version is fdw_recheck_quals of > > ForeignScan was added. So, ... > > > > (1) Principle is that FDW driver knows what qualifiers were pushed down > > and how does it kept in the private field. So, fdw_recheck_quals is > > redundant and to be reverted. > > > > (2) Even though the principle is as described in (1), however, > > wired logic in ForeignRecheck() and fdw_recheck_quals are useful > > default for most of FDW drivers. So, it shall be kept and valid > > only if RecheckForeignScan callback is not defined. > > > > Which is better approach for the v3 patch? > > My preference is (1), because fdw_recheck_quals is a new feature, > > thus, FDW driver has to be adjusted in v9.5 more or less, even if > > it already supports qualifier push-down. > > In general, interface becomes more graceful to stick its principle. > > fdw_recheck_quals seems likely to be very convenient for FDW authors, > and I think ripping it out would be a terrible decision. > OK, I try to co-exist fdw_recheck_quals and RecheckForeignScan callback. > I think ForeignRecheck should first call ExecQual to test > fdw_recheck_quals. If it returns false, return false. If it returns > true, then give the FDW callback a chance, if one is defined. If that > returns false, return false. If we haven't yet returned false, > return true. > I think ExecQual on fdw_recheck_quals shall be called next to the RecheckForeignScan callback, because econtext->ecxt_scantuple shall not be reconstructed unless RecheckForeignScan callback is not called if scanrelid==0. If RecheckForeignScan is called prior to ExecQual, FDW driver can take either of two options according to its preference. (1) RecheckForeignScan callback reconstruct a joined tuple based on the primitive EPQ slots, but nothing are recheckedby itself. ForeignRecheck runs ExecQual on fdw_recheck_quals that represents qualifiers of base relations andjoin condition. (2) RecheckForeignScan callback reconstruct a joined tuple based on the primitive EPQ slots, then rechecks qualifiers ofbase relations and join condition by itself. It put NIL on fdw_recheck_quals, so ExecQual in ForeignRecheck() alwaystrue. In either case, we cannot use ExecQual prior to reconstruction of a joined tuple because only FDW driver knows how to reconstruct it. So, it means ForeignScan with scanrelid==0 always has to set NIL on fdw_recheck_quals, if we would put ExecQual prior to the callback. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Sunday, November 08, 2015 12:38 AM > To: 'Robert Haas' > Cc: Etsuro Fujita; Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; > Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > On Fri, Nov 6, 2015 at 9:42 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > This patch needs to be rebased. > > > One thing different from the latest version is fdw_recheck_quals of > > > ForeignScan was added. So, ... > > > > > > (1) Principle is that FDW driver knows what qualifiers were pushed down > > > and how does it kept in the private field. So, fdw_recheck_quals is > > > redundant and to be reverted. > > > > > > (2) Even though the principle is as described in (1), however, > > > wired logic in ForeignRecheck() and fdw_recheck_quals are useful > > > default for most of FDW drivers. So, it shall be kept and valid > > > only if RecheckForeignScan callback is not defined. > > > > > > Which is better approach for the v3 patch? > > > My preference is (1), because fdw_recheck_quals is a new feature, > > > thus, FDW driver has to be adjusted in v9.5 more or less, even if > > > it already supports qualifier push-down. > > > In general, interface becomes more graceful to stick its principle. > > > > fdw_recheck_quals seems likely to be very convenient for FDW authors, > > and I think ripping it out would be a terrible decision. > > > OK, I try to co-exist fdw_recheck_quals and RecheckForeignScan callback. > > > I think ForeignRecheck should first call ExecQual to test > > fdw_recheck_quals. If it returns false, return false. If it returns > > true, then give the FDW callback a chance, if one is defined. If that > > returns false, return false. If we haven't yet returned false, > > return true. > > > I think ExecQual on fdw_recheck_quals shall be called next to the > RecheckForeignScan callback, because econtext->ecxt_scantuple shall > not be reconstructed unless RecheckForeignScan callback is not called > if scanrelid==0. > > If RecheckForeignScan is called prior to ExecQual, FDW driver can > take either of two options according to its preference. > > (1) RecheckForeignScan callback reconstruct a joined tuple based on > the primitive EPQ slots, but nothing are rechecked by itself. > ForeignRecheck runs ExecQual on fdw_recheck_quals that represents > qualifiers of base relations and join condition. > > (2) RecheckForeignScan callback reconstruct a joined tuple based on > the primitive EPQ slots, then rechecks qualifiers of base relations > and join condition by itself. It put NIL on fdw_recheck_quals, so > ExecQual in ForeignRecheck() always true. > > In either case, we cannot use ExecQual prior to reconstruction of > a joined tuple because only FDW driver knows how to reconstruct it. > So, it means ForeignScan with scanrelid==0 always has to set NIL on > fdw_recheck_quals, if we would put ExecQual prior to the callback. > The attached patch is an adjusted version of the previous one. Even though it co-exists a new callback and fdw_recheck_quals, the callback is kicked first as follows. ----------------<cut here>---------------- @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot) ResetExprContext(econtext); + /* + * FDW driver has to recheck visibility of EPQ tuple towards + * the scan qualifiers once it gets pushed down. + * In addition, if this node represents a join sub-tree, not + * a scan, FDW driver is also responsible to reconstruct + * a joined tuple according to the primitive EPQ tuples. + */ + if (fdwroutine->RecheckForeignScan) + { + if (!fdwroutine->RecheckForeignScan(node, slot)) + return false; + } return ExecQual(node->fdw_recheck_quals, econtext, false); } ----------------<cut here>---------------- If callback is invoked first, FDW driver can reconstruct a joined tuple with its comfortable way, then remaining checks can be done by ExecQual and fds_recheck_quals on the caller side. If callback would be located on the tail, FDW driver has no choice. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On 2015/11/09 9:26, Kouhei Kaigai wrote: >>> I think ForeignRecheck should first call ExecQual to test >>> fdw_recheck_quals. If it returns false, return false. If it returns >>> true, then give the FDW callback a chance, if one is defined. If that >>> returns false, return false. If we haven't yet returned false, >>> return true. >> I think ExecQual on fdw_recheck_quals shall be called next to the >> RecheckForeignScan callback, because econtext->ecxt_scantuple shall >> not be reconstructed unless RecheckForeignScan callback is not called >> if scanrelid==0. I agree with KaiGai-san. I think we can define fdw_recheck_quals for the foreign-join case as quals not in scan.plan.qual, the same way as the simple foreign scan case. (In other words, the quals would be defind as "otherclauses", ie, rinfo->is_pushed_down=true, that have been pushed down to the remote server. For checking the fdw_recheck_quals, however, I think we should reconstruct the join tuple first, which I think is essential for cases where an outer join is performed remotely, to avoid changing the semantics. BTW, in my patch [1], a secondary plan will be created to evaluate such otherclauses after reconstructing the join tuple. > The attached patch is an adjusted version of the previous one. > Even though it co-exists a new callback and fdw_recheck_quals, > the callback is kicked first as follows. Thanks for the patch! > ----------------<cut here>---------------- > @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot) > > ResetExprContext(econtext); > > + /* > + * FDW driver has to recheck visibility of EPQ tuple towards > + * the scan qualifiers once it gets pushed down. > + * In addition, if this node represents a join sub-tree, not > + * a scan, FDW driver is also responsible to reconstruct > + * a joined tuple according to the primitive EPQ tuples. > + */ > + if (fdwroutine->RecheckForeignScan) > + { > + if (!fdwroutine->RecheckForeignScan(node, slot)) > + return false; > + } > return ExecQual(node->fdw_recheck_quals, econtext, false); > } > ----------------<cut here>---------------- > > If callback is invoked first, FDW driver can reconstruct a joined tuple > with its comfortable way, then remaining checks can be done by ExecQual > and fds_recheck_quals on the caller side. > If callback would be located on the tail, FDW driver has no choice. To test this change, I think we should update the postgres_fdw patch so as to add the RecheckForeignScan. Having said that, as I said previously, I don't see much value in adding the callback routine, to be honest. I know KaiGai-san considers that that would be useful for custom joins, but I don't think that that would be useful even for foreign joins, because I think that in case of foreign joins, the practical implementation of that routine in FDWs would be to create a secondary plan and execute that plan by performing ExecProcNode, as my patch does [1]. Maybe I'm missing something, though. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/5624D583.10202@lab.ntt.co.jp
> > ----------------<cut here>---------------- > > @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot > *slot) > > > > ResetExprContext(econtext); > > > > + /* > > + * FDW driver has to recheck visibility of EPQ tuple towards > > + * the scan qualifiers once it gets pushed down. > > + * In addition, if this node represents a join sub-tree, not > > + * a scan, FDW driver is also responsible to reconstruct > > + * a joined tuple according to the primitive EPQ tuples. > > + */ > > + if (fdwroutine->RecheckForeignScan) > > + { > > + if (!fdwroutine->RecheckForeignScan(node, slot)) > > + return false; > > + } > > return ExecQual(node->fdw_recheck_quals, econtext, false); > > } > > ----------------<cut here>---------------- > > > > If callback is invoked first, FDW driver can reconstruct a joined tuple > > with its comfortable way, then remaining checks can be done by ExecQual > > and fds_recheck_quals on the caller side. > > If callback would be located on the tail, FDW driver has no choice. > > To test this change, I think we should update the postgres_fdw patch so > as to add the RecheckForeignScan. > > Having said that, as I said previously, I don't see much value in adding > the callback routine, to be honest. I know KaiGai-san considers that > that would be useful for custom joins, but I don't think that that would > be useful even for foreign joins, because I think that in case of > foreign joins, the practical implementation of that routine in FDWs > would be to create a secondary plan and execute that plan by performing > ExecProcNode, as my patch does [1]. Maybe I'm missing something, though. > I've never denied that alternative local sub-plan is one of the best approach for postgres_fdw, however, I've also never heard why you can say the best approach for postgres_fdw is definitely also best for others. If we would justify less flexible interface specification because of comfort for a particular extension, it should not be an extension, but a built-in feature. My standpoint has been consistent through the discussion; we can never predicate which feature shall be implemented on FDW interface, therefore, we also cannot predicate which implementation is best for EPQ rechecks also. Only FDW driver knows which is the "best" for them, not us. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/09 13:40, Kouhei Kaigai wrote: >> Having said that, as I said previously, I don't see much value in adding >> the callback routine, to be honest. I know KaiGai-san considers that >> that would be useful for custom joins, but I don't think that that would >> be useful even for foreign joins, because I think that in case of >> foreign joins, the practical implementation of that routine in FDWs >> would be to create a secondary plan and execute that plan by performing >> ExecProcNode, as my patch does [1]. Maybe I'm missing something, though. > I've never denied that alternative local sub-plan is one of the best > approach for postgres_fdw, however, I've also never heard why you can > say the best approach for postgres_fdw is definitely also best for > others. > If we would justify less flexible interface specification because of > comfort for a particular extension, it should not be an extension, > but a built-in feature. > My standpoint has been consistent through the discussion; we can never > predicate which feature shall be implemented on FDW interface, therefore, > we also cannot predicate which implementation is best for EPQ rechecks > also. Only FDW driver knows which is the "best" for them, not us. What the RecheckForeignScan routine does for the foreign-join case would be the following for tuples stored in estate->es_epqTuple[]: 1. Apply relevant restriction clauses, including fdw_recheck_quals, to the tuples for the baserels involved in a foreign-join, and see if the tuples still pass the clauses. 2. If so, form a join tuple, while applying relevant join clauses to the tuples, and set the join tuple in the given slot. Else set empty. I think these would be more efficiently processed internally in core than externally in FDWs. That's why I don't see much value in adding the routine. I have to admit that that means no flexibility, though. However, the routine as-is doesn't seem good enough, either. For example, since the routine is called after each of the tuples was re-fetched from the remote end or re-computed from the whole-row var and stored in the corresponding estate->es_epqTuple[], the routine wouldn't allow for what Robert proposed in [2]. To do such a thing, I think we would probably need to change the existing EPQ machinery more drastically and rethink the right place for calling the routine. Best regards, Etsuro Fujita [2] http://www.postgresql.org/message-id/CA+TgmoZdPU_fcSpOzXxpD1xvyq3cZCAwD7-x3aVWbKgSFoHvRA@mail.gmail.com
> On 2015/11/09 13:40, Kouhei Kaigai wrote: > >> Having said that, as I said previously, I don't see much value in adding > >> the callback routine, to be honest. I know KaiGai-san considers that > >> that would be useful for custom joins, but I don't think that that would > >> be useful even for foreign joins, because I think that in case of > >> foreign joins, the practical implementation of that routine in FDWs > >> would be to create a secondary plan and execute that plan by performing > >> ExecProcNode, as my patch does [1]. Maybe I'm missing something, though. > > > I've never denied that alternative local sub-plan is one of the best > > approach for postgres_fdw, however, I've also never heard why you can > > say the best approach for postgres_fdw is definitely also best for > > others. > > If we would justify less flexible interface specification because of > > comfort for a particular extension, it should not be an extension, > > but a built-in feature. > > My standpoint has been consistent through the discussion; we can never > > predicate which feature shall be implemented on FDW interface, therefore, > > we also cannot predicate which implementation is best for EPQ rechecks > > also. Only FDW driver knows which is the "best" for them, not us. > > What the RecheckForeignScan routine does for the foreign-join case would > be the following for tuples stored in estate->es_epqTuple[]: > > 1. Apply relevant restriction clauses, including fdw_recheck_quals, to > the tuples for the baserels involved in a foreign-join, and see if the > tuples still pass the clauses. > It depends on how FDW driver has restriction clauses, but you should not use fdw_recheck_quals to recheck individual base relations, because it is initialized to run on the joined tuple according to fdw_scan_tlist, so restriction clauses has to be kept in other private field. > 2. If so, form a join tuple, while applying relevant join clauses to the > tuples, and set the join tuple in the given slot. Else set empty. > No need to form a joined tuple after the rechecks of base relations's clauses. If FDW support only inner-join, it can reconstruct a joined tuple first, then run fdw_recheck_quals (by caller) that contains both relation's clauses and join clause. FDW driver can choose its comfortable way according to its implementation and capability. > I think these would be more efficiently processed internally in core > than externally in FDWs. That's why I don't see much value in adding > the routine. I have to admit that that means no flexibility, though. > Something like "efficiently", "better", "reasonable" and etc... are your opinions from your standpoint. Things important is why you thought X is better and Y is worse. It is what I've wanted to see for three months, but never seen. Discussion will become unproductive without understanding of the reason of different conclusion. Please don't omit why you think it is "efficient" that can justify to enforce all FDW drivers a particular implementation manner, as a part of interface contract. > However, the routine as-is doesn't seem good enough, either. For > example, since the routine is called after each of the tuples was > re-fetched from the remote end or re-computed from the whole-row var and > stored in the corresponding estate->es_epqTuple[], the routine wouldn't > allow for what Robert proposed in [2]. To do such a thing, I think we > would probably need to change the existing EPQ machinery more > drastically and rethink the right place for calling the routine. > Please also see my message: http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8011617C6@BPXM15GP.gisp.nec.co.jp And, why Robert thought here is a tough challenge: http://www.postgresql.org/message-id/CA+TgmoY5Lf+vYy1Bha=U7__S3qtMQP7d+gSSfd+LN4Xz6Fybkg@mail.gmail.com Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > To test this change, I think we should update the postgres_fdw patch so as > to add the RecheckForeignScan. > > Having said that, as I said previously, I don't see much value in adding the > callback routine, to be honest. I know KaiGai-san considers that that would > be useful for custom joins, but I don't think that that would be useful even > for foreign joins, because I think that in case of foreign joins, the > practical implementation of that routine in FDWs would be to create a > secondary plan and execute that plan by performing ExecProcNode, as my patch > does [1]. Maybe I'm missing something, though. I really don't see why you're fighting on this point. Making this a generic feature will require only a few extra lines of code for FDW authors. If this were going to cause some great inconvenience for FDW authors, then I'd agree it isn't worth it. But I see zero evidence that this is actually the case. From my point of view I'm now thinking this solution has two parts: (1) Let foreign scans have inner and outer subplans. For this purpose, we only need one, but it's no more work to enable both, so we may as well. If we had some reason, we could add a list of subplans of arbitrary length, but there doesn't seem to be an urgent need for that. (2) Add a recheck callback. If the foreign data wrapper wants to adopt the solution you're proposing, the recheck callback can call ExecProcNode(outerPlanState(node)). I don't think this should end up being more than a few lines of code, although of course we should verify that. So no problem: postgres_fdw and any other FDWs where the remote side is a database can easily delegate to a subplan, and anybody who wants to do something else still can. What is not to like about that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/11/12 2:53, Robert Haas wrote: > On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> To test this change, I think we should update the postgres_fdw patch so as >> to add the RecheckForeignScan. >> >> Having said that, as I said previously, I don't see much value in adding the >> callback routine, to be honest. I know KaiGai-san considers that that would >> be useful for custom joins, but I don't think that that would be useful even >> for foreign joins, because I think that in case of foreign joins, the >> practical implementation of that routine in FDWs would be to create a >> secondary plan and execute that plan by performing ExecProcNode, as my patch >> does [1]. Maybe I'm missing something, though. > I really don't see why you're fighting on this point. Making this a > generic feature will require only a few extra lines of code for FDW > authors. If this were going to cause some great inconvenience for FDW > authors, then I'd agree it isn't worth it. But I see zero evidence > that this is actually the case. Really? I think there would be not a little burden on an FDW author; when postgres_fdw delegates to the subplan to the remote server, for example, it would need to create a remote join query by looking at tuples possibly fetched and stored in estate->es_epqTuple[], send the query and receive the result during the callback routine. Furthermore, what I'm most concerned about is that wouldn't be efficient. So, my question about that approach is whether FDWs really do some thing like that during the callback routine, instead of performing a secondary join plan locally. As I said before, I know that KaiGai-san considers that that approach would be useful for custom joins. But I see zero evidence that there is a good use-case for an FDW. > From my point of view I'm now > thinking this solution has two parts: > > (1) Let foreign scans have inner and outer subplans. For this > purpose, we only need one, but it's no more work to enable both, so we > may as well. If we had some reason, we could add a list of subplans > of arbitrary length, but there doesn't seem to be an urgent need for > that. > > (2) Add a recheck callback. > > If the foreign data wrapper wants to adopt the solution you're > proposing, the recheck callback can call > ExecProcNode(outerPlanState(node)). I don't think this should end up > being more than a few lines of code, although of course we should > verify that. So no problem: postgres_fdw and any other FDWs where the > remote side is a database can easily delegate to a subplan, and > anybody who wants to do something else still can. > > What is not to like about that? >
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Thursday, November 12, 2015 2:54 PM > To: Robert Haas > Cc: Kaigai Kouhei(海外 浩平); Tom Lane; Kyotaro HORIGUCHI; > pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/11/12 2:53, Robert Haas wrote: > > On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > >> To test this change, I think we should update the postgres_fdw patch so as > >> to add the RecheckForeignScan. > >> > >> Having said that, as I said previously, I don't see much value in adding the > >> callback routine, to be honest. I know KaiGai-san considers that that would > >> be useful for custom joins, but I don't think that that would be useful even > >> for foreign joins, because I think that in case of foreign joins, the > >> practical implementation of that routine in FDWs would be to create a > >> secondary plan and execute that plan by performing ExecProcNode, as my patch > >> does [1]. Maybe I'm missing something, though. > > > I really don't see why you're fighting on this point. Making this a > > generic feature will require only a few extra lines of code for FDW > > authors. If this were going to cause some great inconvenience for FDW > > authors, then I'd agree it isn't worth it. But I see zero evidence > > that this is actually the case. > > Really? I think there would be not a little burden on an FDW author; > when postgres_fdw delegates to the subplan to the remote server, for > example, it would need to create a remote join query by looking at > tuples possibly fetched and stored in estate->es_epqTuple[], send the > query and receive the result during the callback routine. > I cannot understand why it is the only solution. Our assumption is, FDW driver knows the best way to do. So, you can take the best way for your FDW driver - including what you want to implement in the built-in feature. > Furthermore, > what I'm most concerned about is that wouldn't be efficient. So, my > You have to add "because ..." sentence here because I and Robert think a little inefficiency is not a problem. If you try to persuade other parsons who have different opinion, you need to introduce WHY you have different conclusion. (Of course, we might oversight something) Please don't start the sentence from "I think ...". We all knows your opinion, but what I've wanted to see is "the reason why my approach is valuable is ...". I never suggest something technically difficult, but it is a problem on communication. > question about that approach is whether FDWs really do some thing like > that during the callback routine, instead of performing a secondary join > plan locally. > Nobody prohibits postgres_fdw performs a secondary join here. All you need to do is, picking up a sub-plan tree from FDW's private field then call ExecProcNode() inside the callback. > As I said before, I know that KaiGai-san considers that > that approach would be useful for custom joins. But I see zero evidence > that there is a good use-case for an FDW. > > > From my point of view I'm now > > thinking this solution has two parts: > > > > (1) Let foreign scans have inner and outer subplans. For this > > purpose, we only need one, but it's no more work to enable both, so we > > may as well. If we had some reason, we could add a list of subplans > > of arbitrary length, but there doesn't seem to be an urgent need for > > that. > > > > (2) Add a recheck callback. > > > > If the foreign data wrapper wants to adopt the solution you're > > proposing, the recheck callback can call > > ExecProcNode(outerPlanState(node)). I don't think this should end up > > being more than a few lines of code, although of course we should > > verify that. So no problem: postgres_fdw and any other FDWs where the > > remote side is a database can easily delegate to a subplan, and > > anybody who wants to do something else still can. > > > > What is not to like about that? > > -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, > > I really don't see why you're fighting on this point. Making this a > > generic feature will require only a few extra lines of code for FDW > > authors. If this were going to cause some great inconvenience for FDW > > authors, then I'd agree it isn't worth it. But I see zero evidence > > that this is actually the case. > > Really? I think there would be not a little burden on an FDW author; > when postgres_fdw delegates to the subplan to the remote server, for > example, it would need to create a remote join query by looking at > tuples possibly fetched and stored in estate->es_epqTuple[], send the > query and receive the result during the callback routine. Do you mind that FDW cannot generate a plan so that make a tuple from eqpTules then apply fdw_quals from predefined executor nodes? The returned tuple itself can be stored in fdw_private as I think Kiagai-san said before. So it is enough if we can fabricate a Result node outerPlan of which is ForeignScan, which somehow returns the tuple to examine. I should be missing something, though. regards, > Furthermore, what I'm most concerned about is that wouldn't be > efficient. So, my question about that approach is whether FDWs really > do some thing like that during the callback routine, instead of > performing a secondary join plan locally. As I said before, I know > that KaiGai-san considers that that approach would be useful for > custom joins. But I see zero evidence that there is a good use-case > for an FDW. > > > From my point of view I'm now > > thinking this solution has two parts: > > > > (1) Let foreign scans have inner and outer subplans. For this > > purpose, we only need one, but it's no more work to enable both, so we > > may as well. If we had some reason, we could add a list of subplans > > of arbitrary length, but there doesn't seem to be an urgent need for > > that. > > > > (2) Add a recheck callback. > > > > If the foreign data wrapper wants to adopt the solution you're > > proposing, the recheck callback can call > > ExecProcNode(outerPlanState(node)). I don't think this should end up > > being more than a few lines of code, although of course we should > > verify that. So no problem: postgres_fdw and any other FDWs where the > > remote side is a database can easily delegate to a subplan, and > > anybody who wants to do something else still can. > > > > What is not to like about that? -- Kyotaro Horiguchi NTT Open Source Software Center
Robert and Kaigai-san, Sorry, I sent in an unfinished email. On 2015/11/12 15:30, Kouhei Kaigai wrote: >> On 2015/11/12 2:53, Robert Haas wrote: >>> On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> To test this change, I think we should update the postgres_fdw patch so as >>>> to add the RecheckForeignScan. >>>> >>>> Having said that, as I said previously, I don't see much value in adding the >>>> callback routine, to be honest. I know KaiGai-san considers that that would >>>> be useful for custom joins, but I don't think that that would be useful even >>>> for foreign joins, because I think that in case of foreign joins, the >>>> practical implementation of that routine in FDWs would be to create a >>>> secondary plan and execute that plan by performing ExecProcNode, as my patch >>>> does [1]. Maybe I'm missing something, though. >>> I really don't see why you're fighting on this point. Making this a >>> generic feature will require only a few extra lines of code for FDW >>> authors. If this were going to cause some great inconvenience for FDW >>> authors, then I'd agree it isn't worth it. But I see zero evidence >>> that this is actually the case. >> Really? I think there would be not a little burden on an FDW author; >> when postgres_fdw delegates to the subplan to the remote server, for >> example, it would need to create a remote join query by looking at >> tuples possibly fetched and stored in estate->es_epqTuple[], send the >> query and receive the result during the callback routine. > I cannot understand why it is the only solution. I didn't say that. >> Furthermore, >> what I'm most concerned about is that wouldn't be efficient. So, my > You have to add "because ..." sentence here because I and Robert > think a little inefficiency is not a problem. Sorry, my explanation was not enough. The reason for that is that in the above postgres_fdw case for example, the overhead in sending the query to the remote end and transferring the result to the local end would not be negligible. Yeah, we might be able to apply a special handling for the improved efficiency when using early row locking, but otherwise can we do the same thing? > Please don't start the sentence from "I think ...". We all knows > your opinion, but what I've wanted to see is "the reason why my > approach is valuable is ...". I didn't say that my approach is *valuable* either. What I think is, I see zero evidence that there is a good use-case for an FDW to do something other than doing an ExecProcNode in the callback routine, as I said below, so I don't see the need to add such a routine while that would cause maybe not a large, but not a little burden for writing such a routine on FDW authors. > Nobody prohibits postgres_fdw performs a secondary join here. > All you need to do is, picking up a sub-plan tree from FDW's private > field then call ExecProcNode() inside the callback. >> As I said before, I know that KaiGai-san considers that >> that approach would be useful for custom joins. But I see zero evidence >> that there is a good use-case for an FDW. >>> From my point of view I'm now >>> thinking this solution has two parts: >>> >>> (1) Let foreign scans have inner and outer subplans. For this >>> purpose, we only need one, but it's no more work to enable both, so we >>> may as well. If we had some reason, we could add a list of subplans >>> of arbitrary length, but there doesn't seem to be an urgent need for >>> that. I did the same thing in an earlier version of the patch I posted. Although I agreed on Robert's comment "The Plan tree and the PlanState tree should be mirror images of each other; breaking that equivalence will cause confusion, at least.", I think that that would make code much simpler, especially the code for setting chgParam for inner/outer subplans. But one thing I'm concerned about is enable both inner and outer plans, because I think that that would make the planner postprocessing complicated, depending on what the foreign scans do by the inner/outer subplans. Is it worth doing so? Maybe I'm missing something, though. >>> (2) Add a recheck callback. >>> >>> If the foreign data wrapper wants to adopt the solution you're >>> proposing, the recheck callback can call >>> ExecProcNode(outerPlanState(node)). I don't think this should end up >>> being more than a few lines of code, although of course we should >>> verify that. Yeah, I think FDWs would probably need to create a subplan accordingly at planning time, and then initializing/closing the plan at execution time. I think we could facilitate subplan creation by providing helper functions for that, though. Best regards, Etsuro Fujita
Horiguchi-san, On 2015/11/12 16:10, Kyotaro HORIGUCHI wrote: >>> I really don't see why you're fighting on this point. Making this a >>> generic feature will require only a few extra lines of code for FDW >>> authors. If this were going to cause some great inconvenience for FDW >>> authors, then I'd agree it isn't worth it. But I see zero evidence >>> that this is actually the case. >> Really? I think there would be not a little burden on an FDW author; >> when postgres_fdw delegates to the subplan to the remote server, for >> example, it would need to create a remote join query by looking at >> tuples possibly fetched and stored in estate->es_epqTuple[], send the >> query and receive the result during the callback routine. > Do you mind that FDW cannot generate a plan so that make a tuple > from eqpTules then apply fdw_quals from predefined executor > nodes? No. Please see my previous email. Sorry for my unfinished email. Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Thursday, November 12, 2015 6:54 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Robert and Kaigai-san, > > Sorry, I sent in an unfinished email. > > On 2015/11/12 15:30, Kouhei Kaigai wrote: > >> On 2015/11/12 2:53, Robert Haas wrote: > >>> On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita > >>> <fujita.etsuro@lab.ntt.co.jp> wrote: > >>>> To test this change, I think we should update the postgres_fdw patch so as > >>>> to add the RecheckForeignScan. > >>>> > >>>> Having said that, as I said previously, I don't see much value in adding > the > >>>> callback routine, to be honest. I know KaiGai-san considers that that would > >>>> be useful for custom joins, but I don't think that that would be useful even > >>>> for foreign joins, because I think that in case of foreign joins, the > >>>> practical implementation of that routine in FDWs would be to create a > >>>> secondary plan and execute that plan by performing ExecProcNode, as my patch > >>>> does [1]. Maybe I'm missing something, though. > > >>> I really don't see why you're fighting on this point. Making this a > >>> generic feature will require only a few extra lines of code for FDW > >>> authors. If this were going to cause some great inconvenience for FDW > >>> authors, then I'd agree it isn't worth it. But I see zero evidence > >>> that this is actually the case. > > >> Really? I think there would be not a little burden on an FDW author; > >> when postgres_fdw delegates to the subplan to the remote server, for > >> example, it would need to create a remote join query by looking at > >> tuples possibly fetched and stored in estate->es_epqTuple[], send the > >> query and receive the result during the callback routine. > > > I cannot understand why it is the only solution. > > I didn't say that. > > >> Furthermore, > >> what I'm most concerned about is that wouldn't be efficient. So, my > > > You have to add "because ..." sentence here because I and Robert > > think a little inefficiency is not a problem. > > Sorry, my explanation was not enough. The reason for that is that in > the above postgres_fdw case for example, the overhead in sending the > query to the remote end and transferring the result to the local end > would not be negligible. Yeah, we might be able to apply a special > handling for the improved efficiency when using early row locking, but > otherwise can we do the same thing? > It is trade-off. Late locking semantics allows to lock relatively smaller number of remote rows, it will take extra latency. Also, it became clear we have a challenge to pull a joined tuple at once. > > Please don't start the sentence from "I think ...". We all knows > > your opinion, but what I've wanted to see is "the reason why my > > approach is valuable is ...". > > I didn't say that my approach is *valuable* either. What I think is, I > see zero evidence that there is a good use-case for an FDW to do > something other than doing an ExecProcNode in the callback routine, as I > said below, so I don't see the need to add such a routine while that > would cause maybe not a large, but not a little burden for writing such > a routine on FDW authors. > It is quite natural because we cannot predicate what kind of extension is implemented on FDW interface. You might know the initial version of PG-Strom is implemented on FDW (about 4 years before...). If I would continue to stick FDW, it became a FDW driver with own join engine. (cstore_fdw may potentially support own join logic on top of their columnar storage for instance?) From the standpoint of interface design, if we would not admit flexibility of implementation unless community don't see a working example, a reasonable tactics *for extension author* is to follow the interface restriction even if it is not best approach from his standpoint. It does not mean the approach by majority is also best for the minority. It just requires the minority a compromise. > > Nobody prohibits postgres_fdw performs a secondary join here. > > All you need to do is, picking up a sub-plan tree from FDW's private > > field then call ExecProcNode() inside the callback. > > >> As I said before, I know that KaiGai-san considers that > >> that approach would be useful for custom joins. But I see zero evidence > >> that there is a good use-case for an FDW. > > >>> From my point of view I'm now > >>> thinking this solution has two parts: > >>> > >>> (1) Let foreign scans have inner and outer subplans. For this > >>> purpose, we only need one, but it's no more work to enable both, so we > >>> may as well. If we had some reason, we could add a list of subplans > >>> of arbitrary length, but there doesn't seem to be an urgent need for > >>> that. > > I did the same thing in an earlier version of the patch I posted. > Although I agreed on Robert's comment "The Plan tree and the PlanState > tree should be mirror images of each other; breaking that equivalence > will cause confusion, at least.", I think that that would make code much > simpler, especially the code for setting chgParam for inner/outer > subplans. But one thing I'm concerned about is enable both inner and > outer plans, because I think that that would make the planner > postprocessing complicated, depending on what the foreign scans do by > the inner/outer subplans. Is it worth doing so? Maybe I'm missing > something, though. > If you persuade other person who has different opinion, you need to explain why was it complicated, how much complicated and what was the solution you tried at that time. The "complicated" is a subjectively-based term. At least, we don't share your experience, so it is hard to understand the how complexity. I guess it is similar to what built-in logic is usually doing, thus, it should not be a problem we cannot solve. A utility routine FDW driver can call will solve the issue (even if it is not supported on v9.5 yet). > >>> (2) Add a recheck callback. > >>> > >>> If the foreign data wrapper wants to adopt the solution you're > >>> proposing, the recheck callback can call > >>> ExecProcNode(outerPlanState(node)). I don't think this should end up > >>> being more than a few lines of code, although of course we should > >>> verify that. > > Yeah, I think FDWs would probably need to create a subplan accordingly > at planning time, and then initializing/closing the plan at execution > time. I think we could facilitate subplan creation by providing helper > functions for that, though. > I can agree with we ought to provide a utility routine to construct a local alternative subplan, however, we are in beta2 stage for v9.5. So, I'd like to suggest only callback on v9.5 (FDW driver can handle its subplan by itself, no need to path the backend), then design the utility routine for this. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, I also uncertain about what exactly is the blocker.. At Fri, 13 Nov 2015 02:31:53 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80116F7AF@BPXM15GP.gisp.nec.co.jp> > > Sorry, my explanation was not enough. The reason for that is that in > > the above postgres_fdw case for example, the overhead in sending the > > query to the remote end and transferring the result to the local end > > would not be negligible. Yeah, we might be able to apply a special > > handling for the improved efficiency when using early row locking, but > > otherwise can we do the same thing? > > > It is trade-off. Late locking semantics allows to lock relatively smaller > number of remote rows, it will take extra latency. > Also, it became clear we have a challenge to pull a joined tuple at once. Late row locking anyway needs to send query to the remote side and needs to generate the joined row in either side of the connection. Early row locking on FDW don't need that since the necessary tuples are already in out hand. Is there any performance issue in this? Unfortunately I've not comprehend what is the problem:( Or, Are you Fujita-san thinking about bulk late row locking or such? If so, it is a matter of future, as update/insert pushdown, I suppose. > > I didn't say that my approach is *valuable* either. What I think is, I > > see zero evidence that there is a good use-case for an FDW to do > > something other than doing an ExecProcNode in the callback routine, as I > > said below, so I don't see the need to add such a routine while that > > would cause maybe not a large, but not a little burden for writing such > > a routine on FDW authors. > > > It is quite natural because we cannot predicate what kind of extension > is implemented on FDW interface. You might know the initial version of > PG-Strom is implemented on FDW (about 4 years before...). If I would > continue to stick FDW, it became a FDW driver with own join engine. > (cstore_fdw may potentially support own join logic on top of their > columnar storage for instance?) > > From the standpoint of interface design, if we would not admit flexibility > of implementation unless community don't see a working example, a reasonable > tactics *for extension author* is to follow the interface restriction even > if it is not best approach from his standpoint. > It does not mean the approach by majority is also best for the minority. > It just requires the minority a compromise. Or try to open the way to introduce the feature he/she wants. If workable postgres_fdw with join pushdown based on this API to any extent be shown here, we can envestigate on the problem there. But perhaps the deadline is just before us.. > > I did the same thing in an earlier version of the patch I posted. > > Although I agreed on Robert's comment "The Plan tree and the PlanState > > tree should be mirror images of each other; breaking that equivalence > > will cause confusion, at least.", I think that that would make code much > > simpler, especially the code for setting chgParam for inner/outer > > subplans. I see that the Kaigai-san's patch doesn't put different nodes from paths during plan creation, in other words, it doesn't break coherence between paths and plans as long as core's point of view. The Fujita-san's patch mentioned above altered a node in core's sight. I understand that it is the most significant difference between them.. > > But one thing I'm concerned about is enable both inner and > > outer plans, because I think that that would make the planner > > postprocessing complicated, depending on what the foreign scans do by > > the inner/outer subplans. Is it worth doing so? Maybe I'm missing > > something, though. This is discussion about late row locking? Join pushdown itself is a kind of complicated process. And since it fools planner in one aspect, the additional feature would be inevitable to be complex to some extent. We could discuss on that after some specific problem comes in out sight. > If you persuade other person who has different opinion, you need to > explain why was it complicated, how much complicated and what was > the solution you tried at that time. > The "complicated" is a subjectively-based term. At least, we don't > share your experience, so it is hard to understand the how complexity. Mee too. It surely might be complicated (though the extent is mainly in indivisual's mind..) but also I don't see how the Fujita-san's patch resolves that "problem". > I guess it is similar to what built-in logic is usually doing, thus, > it should not be a problem we cannot solve. A utility routine FDW > driver can call will solve the issue (even if it is not supported > on v9.5 yet). > > > >>> (2) Add a recheck callback. > > >>> > > >>> If the foreign data wrapper wants to adopt the solution you're > > >>> proposing, the recheck callback can call > > >>> ExecProcNode(outerPlanState(node)). I don't think this should end up > > >>> being more than a few lines of code, although of course we should > > >>> verify that. > > > > Yeah, I think FDWs would probably need to create a subplan accordingly > > at planning time, and then initializing/closing the plan at execution > > time. I think we could facilitate subplan creation by providing helper > > functions for that, though. > > > I can agree with we ought to provide a utility routine to construct > a local alternative subplan, however, we are in beta2 stage for v9.5. > So, I'd like to suggest only callback on v9.5 (FDW driver can handle > its subplan by itself, no need to path the backend), then design the > utility routine for this. Support routine itself won't be a blocker since it can be copied into FDW for the memoent, then we can propose to expose them if it found to be essential. It would be a problem if some essential data be found out of reach, but I guess we have all required data already in hand. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/11/13 11:31, Kouhei Kaigai wrote: >>>> On 2015/11/12 2:53, Robert Haas wrote: >>>>> From my point of view I'm now >>>>> thinking this solution has two parts: >>>>> >>>>> (1) Let foreign scans have inner and outer subplans. For this >>>>> purpose, we only need one, but it's no more work to enable both, so we >>>>> may as well. If we had some reason, we could add a list of subplans >>>>> of arbitrary length, but there doesn't seem to be an urgent need for >>>>> that. I wrote: >> But one thing I'm concerned about is enable both inner and >> outer plans, because I think that that would make the planner >> postprocessing complicated, depending on what the foreign scans do by >> the inner/outer subplans. Is it worth doing so? Maybe I'm missing >> something, though. > If you persuade other person who has different opinion, you need to > explain why was it complicated, how much complicated and what was > the solution you tried at that time. > The "complicated" is a subjectively-based term. At least, we don't > share your experience, so it is hard to understand the how complexity. I don't mean to object that idea. I'm unfamiliar with that idea, so I just wanted to know the reason, or use cases. Best regards, Etsuro Fujita
On 2015/11/13 13:44, Kyotaro HORIGUCHI wrote: I wrote: >>> What I think is, I >>> see zero evidence that there is a good use-case for an FDW to do >>> something other than doing an ExecProcNode in the callback routine, as I >>> said below, so I don't see the need to add such a routine while that >>> would cause maybe not a large, but not a little burden for writing such >>> a routine on FDW authors. KaiGai-san wrote: >> It is quite natural because we cannot predicate what kind of extension >> is implemented on FDW interface. You might know the initial version of >> PG-Strom is implemented on FDW (about 4 years before...). If I would >> continue to stick FDW, it became a FDW driver with own join engine. >> From the standpoint of interface design, if we would not admit flexibility >> of implementation unless community don't see a working example, a reasonable >> tactics *for extension author* is to follow the interface restriction even >> if it is not best approach from his standpoint. >> It does not mean the approach by majority is also best for the minority. >> It just requires the minority a compromise. > Or try to open the way to introduce the feature he/she wants. I think the biggest difference between KaiGai-san's patch and mine is that KaiGai-san's patch introduces a callback routine to allow an FDW author not only to execute a secondary plan but to do something else, instead of executing the plan, if he/she wants to do so. His approach would provide the flexibility, but IMHO I think major FDWs that would be implementing join pushdown, such as postgres_fdw, wouldn't be utilizing the flexibility; probably, they would be just executing the secondary plan in the routine. Furthermore, since that for executing the plan, his approach would require that an FDW author has to add code not only for creating the plan but for initializing/executing/ending it to his/her FDW by itself while in my approach, he/she only has to add code for the plan creation, his approach would impose a more development burden on such major FDWs' authors than mine. I think the flexibility would be a good thing, but I also think it's important not to burden FDW authors. Maybe I'm missing something, though. Best regards, Etsuro Fujita
> On 2015/11/13 13:44, Kyotaro HORIGUCHI wrote: > > I wrote: > >>> What I think is, I > >>> see zero evidence that there is a good use-case for an FDW to do > >>> something other than doing an ExecProcNode in the callback routine, as I > >>> said below, so I don't see the need to add such a routine while that > >>> would cause maybe not a large, but not a little burden for writing such > >>> a routine on FDW authors. > > KaiGai-san wrote: > >> It is quite natural because we cannot predicate what kind of extension > >> is implemented on FDW interface. You might know the initial version of > >> PG-Strom is implemented on FDW (about 4 years before...). If I would > >> continue to stick FDW, it became a FDW driver with own join engine. > > >> From the standpoint of interface design, if we would not admit flexibility > >> of implementation unless community don't see a working example, a reasonable > >> tactics *for extension author* is to follow the interface restriction even > >> if it is not best approach from his standpoint. > >> It does not mean the approach by majority is also best for the minority. > >> It just requires the minority a compromise. > > > Or try to open the way to introduce the feature he/she wants. > > I think the biggest difference between KaiGai-san's patch and mine is > that KaiGai-san's patch introduces a callback routine to allow an FDW > author not only to execute a secondary plan but to do something else, > instead of executing the plan, if he/she wants to do so. His approach > would provide the flexibility, but IMHO I think major FDWs that would be > implementing join pushdown, such as postgres_fdw, wouldn't be utilizing > the flexibility; probably, they would be just executing the secondary > plan in the routine. > Yes, my approach never deny. > Furthermore, since that for executing the plan, > his approach would require that an FDW author has to add code not only > for creating the plan but for initializing > Pick up a plan from fdw_plans, then call ExecInitNode() > executing > Pick up a plan-state from fdw_ps then call ExecProcNode() > ending it to > Also, call ExecEndNode() towards the plan-state. > his/her FDW by itself while in my approach, he/she only has to add code > for the plan creation, his approach would impose a more development > burden on such major FDWs' authors than mine. > It looks to me the more development burden is additional three lines. Both of our approaches commonly needs to construct alternative local plan, likely unparametalized nest-loop, on planner phase, it shall be supported by a utility function in the core background. So, one more additional line will be eventually needed. > I think the flexibility > would be a good thing, but I also think it's important not to burden FDW > authors. Maybe I'm missing something, though. > The actual pain is people cannot design/implement their module as they want. I've repeatedly pointed out FDW driver can have own join implementation and people potentially want to use own logic than local plan. At least, if PG-Strom would still run on FDW, I *want* to reuse its CPU-fallback routine instead of the alternative sub-plan. Could you introduce us why above sequence (a few additional lines) are unacceptable burden and can justify to eliminate flexibility for minorities? If you can implement the "common part" for majority, we can implement same stuff as utility functions can be called from the callbacks. My questions are: * How much lines do you expect for the additional burden? * Why does it justify to eliminate flexibility of the interface? * Why cannot we implement the common part as utility routines that can be called from the callback? Please don't hesitate to point out flaw of my proposition, if you noticed something significant we have never noticed. However, at this moment, it does not seems to me your concern is something significant. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, Nov 12, 2015 at 12:54 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Really? I think there would be not a little burden on an FDW author; when > postgres_fdw delegates to the subplan to the remote server, for example, it > would need to create a remote join query by looking at tuples possibly > fetched and stored in estate->es_epqTuple[], send the query and receive the > result during the callback routine. Furthermore, what I'm most concerned > about is that wouldn't be efficient. So, my question about that approach is > whether FDWs really do some thing like that during the callback routine, > instead of performing a secondary join plan locally. As I said before, I > know that KaiGai-san considers that that approach would be useful for custom > joins. But I see zero evidence that there is a good use-case for an FDW. It could do that. But it could also just invoke a subplan as you are proposing. Or at least, I think we should set it up so that such a thing is possible. In which case I don't see the problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 8, 2015 at 7:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch is an adjusted version of the previous one. > Even though it co-exists a new callback and fdw_recheck_quals, > the callback is kicked first as follows. This seems excessive to me: why would we need an arbitrary-length list of plans for an FDW? I think we should just allow an outer child and an inner child, which is probably one more than we'll ever need in practice. This looks like an independent bug fix: + fscan->fdw_recheck_quals = (List *) + fix_upper_expr(root, + (Node *) fscan->fdw_recheck_quals, + itlist, + INDEX_VAR, + rtoffset); pfree(itlist); If so, it should be committed separately and back-patched to 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Sun, Nov 8, 2015 at 7:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > The attached patch is an adjusted version of the previous one. > > Even though it co-exists a new callback and fdw_recheck_quals, > > the callback is kicked first as follows. > > This seems excessive to me: why would we need an arbitrary-length list > of plans for an FDW? I think we should just allow an outer child and > an inner child, which is probably one more than we'll ever need in > practice. > It just intends to keep code symmetry with custom-scan case, so not a significant reason. And, I expected ForeignScan will also need multiple sub-plans soon to support more intelligent push-down like: http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F47DA@BPXM15GP.gisp.nec.co.jp It is a separate discussion, of course, so I don't have strong preference here. > This looks like an independent bug fix: > > + fscan->fdw_recheck_quals = (List *) > + fix_upper_expr(root, > + (Node *) > fscan->fdw_recheck_quals, > + itlist, > + INDEX_VAR, > + rtoffset); > pfree(itlist); > > If so, it should be committed separately and back-patched to 9.5. > OK, I'll split the patch into two. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> > On Sun, Nov 8, 2015 at 7:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > The attached patch is an adjusted version of the previous one. > > > Even though it co-exists a new callback and fdw_recheck_quals, > > > the callback is kicked first as follows. > > > > This seems excessive to me: why would we need an arbitrary-length list > > of plans for an FDW? I think we should just allow an outer child and > > an inner child, which is probably one more than we'll ever need in > > practice. > > > It just intends to keep code symmetry with custom-scan case, so not > a significant reason. > And, I expected ForeignScan will also need multiple sub-plans soon > to support more intelligent push-down like: > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F47D > A@BPXM15GP.gisp.nec.co.jp > > It is a separate discussion, of course, so I don't have strong preference > here. > > > This looks like an independent bug fix: > > > > + fscan->fdw_recheck_quals = (List *) > > + fix_upper_expr(root, > > + (Node *) > > fscan->fdw_recheck_quals, > > + itlist, > > + INDEX_VAR, > > + rtoffset); > > pfree(itlist); > > > > If so, it should be committed separately and back-patched to 9.5. > > > OK, I'll split the patch into two. > The attached patch is the portion cut from the previous EPQ recheck patch. Regarding of the fdw_plans or fdw_plan, I'll follow your suggestion. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On 2015/11/18 3:19, Robert Haas wrote: > On Thu, Nov 12, 2015 at 12:54 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Really? I think there would be not a little burden on an FDW author; when >> postgres_fdw delegates to the subplan to the remote server, for example, it >> would need to create a remote join query by looking at tuples possibly >> fetched and stored in estate->es_epqTuple[], send the query and receive the >> result during the callback routine. Furthermore, what I'm most concerned >> about is that wouldn't be efficient. So, my question about that approach is >> whether FDWs really do some thing like that during the callback routine, >> instead of performing a secondary join plan locally. As I said before, I >> know that KaiGai-san considers that that approach would be useful for custom >> joins. But I see zero evidence that there is a good use-case for an FDW. > It could do that. But it could also just invoke a subplan as you are > proposing. Or at least, I think we should set it up so that such a > thing is possible. In which case I don't see the problem. I suppose you (and KaiGai-san) are probably right, but I really fail to see it actually doing that. Best regards, Etsuro Fujita
On Tue, Nov 17, 2015 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > It just intends to keep code symmetry with custom-scan case, so not > a significant reason. > And, I expected ForeignScan will also need multiple sub-plans soon > to support more intelligent push-down like: > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F47DA@BPXM15GP.gisp.nec.co.jp I might be missing something, but why would that require multiple child plans? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Tue, Nov 17, 2015 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > It just intends to keep code symmetry with custom-scan case, so not > > a significant reason. > > And, I expected ForeignScan will also need multiple sub-plans soon > > to support more intelligent push-down like: > > > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F47D > A@BPXM15GP.gisp.nec.co.jp > > I might be missing something, but why would that require multiple child plans? > Apart from EPQ rechecks, the above aggressive push-down idea allows to send contents of multiple relations to the remote side. In this case, ForeignScan needs to have multiple sub-plans. For example, please assume here is three relations; tbl_A and tbl_B are local and small, tbl_F is remote and large. In case when both of (tbl_A JOIN tbl_F) and (tbl_B JOIN tbl_F) produces large number of rows thus consumes deserved amount of network traffic but (tbl_A JOIN tbl_B JOIN tbl_F) produce small number of rows, the optimal strategy is to send local contents to the remote side once then run a remote query here to produce relatively smaller rows. In the implementation level, ForeignScan shall represent (tbl_A JOIN tbl_B JOIN tbl_F), then it returns a bunch of joined tuples. Its remote query contains VALUES(...) clauses to pack contents of the tbl_A and tbl_B, thus, it needs to be capable to execute underlying multiple scan plans and fetch tuples prior to remote query execution. So, ForeignScan may also have multiple sub-plans. Of course, it is an independent feature from the EPQ rechecks. It is not a matter even if we will extend this field later. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch is the portion cut from the previous EPQ recheck > patch. Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 17, 2015 at 9:30 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I suppose you (and KaiGai-san) are probably right, but I really fail to see > it actually doing that. Noted, but let's do it that way and move on. It would be a shame if we didn't end up with a working FDW join pushdown system in 9.6 because of a disagreement on this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 17, 2015 at 10:22 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Apart from EPQ rechecks, the above aggressive push-down idea allows to send > contents of multiple relations to the remote side. In this case, ForeignScan > needs to have multiple sub-plans. > > For example, please assume here is three relations; tbl_A and tbl_B are > local and small, tbl_F is remote and large. > In case when both of (tbl_A JOIN tbl_F) and (tbl_B JOIN tbl_F) produces > large number of rows thus consumes deserved amount of network traffic but > (tbl_A JOIN tbl_B JOIN tbl_F) produce small number of rows, the optimal > strategy is to send local contents to the remote side once then run > a remote query here to produce relatively smaller rows. > In the implementation level, ForeignScan shall represent (tbl_A JOIN tbl_B > JOIN tbl_F), then it returns a bunch of joined tuples. Its remote query > contains VALUES(...) clauses to pack contents of the tbl_A and tbl_B, thus, > it needs to be capable to execute underlying multiple scan plans and fetch > tuples prior to remote query execution. Hmm, maybe. I'm not entirely sure multiple subplans is the best way to implement that, but let's argue about that another day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/11/19 12:34, Robert Haas wrote: > On Tue, Nov 17, 2015 at 9:30 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I suppose you (and KaiGai-san) are probably right, but I really fail to see >> it actually doing that. > Noted, but let's do it that way and move on. It would be a shame if > we didn't end up with a working FDW join pushdown system in 9.6 > because of a disagreement on this point. Another idea would be to consider join pushdown as unsupported for now when select-for-update is involved in 9.5, as described in [1], and revisit this issue when adding join pushdown to postgres_fdw in 9.6. Best regards, Etsuro Fujita [1] https://wiki.postgresql.org/wiki/Open_Items
> On Tue, Nov 17, 2015 at 10:22 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Apart from EPQ rechecks, the above aggressive push-down idea allows to send > > contents of multiple relations to the remote side. In this case, ForeignScan > > needs to have multiple sub-plans. > > > > For example, please assume here is three relations; tbl_A and tbl_B are > > local and small, tbl_F is remote and large. > > In case when both of (tbl_A JOIN tbl_F) and (tbl_B JOIN tbl_F) produces > > large number of rows thus consumes deserved amount of network traffic but > > (tbl_A JOIN tbl_B JOIN tbl_F) produce small number of rows, the optimal > > strategy is to send local contents to the remote side once then run > > a remote query here to produce relatively smaller rows. > > In the implementation level, ForeignScan shall represent (tbl_A JOIN tbl_B > > JOIN tbl_F), then it returns a bunch of joined tuples. Its remote query > > contains VALUES(...) clauses to pack contents of the tbl_A and tbl_B, thus, > > it needs to be capable to execute underlying multiple scan plans and fetch > > tuples prior to remote query execution. > > Hmm, maybe. I'm not entirely sure multiple subplans is the best way > to implement that, but let's argue about that another day. > So, are you suggesting to make a patch that allows ForeignScan to have multiple sub-plans right now? Or, one sub-plan? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > So, are you suggesting to make a patch that allows ForeignScan to have > multiple sub-plans right now? Or, one sub-plan? Two: http://www.postgresql.org/message-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ-14yKfkgKREAQ@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 18, 2015 at 10:54 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> Noted, but let's do it that way and move on. It would be a shame if >> we didn't end up with a working FDW join pushdown system in 9.6 >> because of a disagreement on this point. > > Another idea would be to consider join pushdown as unsupported for now when > select-for-update is involved in 9.5, as described in [1], and revisit this > issue when adding join pushdown to postgres_fdw in 9.6. Well, I think it's probably too late to squeeze this into 9.5 at this point, but I'm eager to get it fixed for 9.6. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/11/20 6:57, Robert Haas wrote: > On Wed, Nov 18, 2015 at 10:54 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> Noted, but let's do it that way and move on. It would be a shame if >>> we didn't end up with a working FDW join pushdown system in 9.6 >>> because of a disagreement on this point. >> Another idea would be to consider join pushdown as unsupported for now when >> select-for-update is involved in 9.5, as described in [1], and revisit this >> issue when adding join pushdown to postgres_fdw in 9.6. > Well, I think it's probably too late to squeeze this into 9.5 at this > point, but I'm eager to get it fixed for 9.6. OK, I'll update the postgres_fdw-join-pushdown patch so as to work with that callback routine, if needed. Best regards, Etsuro Fujita
> On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > So, are you suggesting to make a patch that allows ForeignScan to have > > multiple sub-plans right now? Or, one sub-plan? > > Two: > > http://www.postgresql.org/message-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ- > 14yKfkgKREAQ@mail.gmail.com > Hmm. Two is a bit mysterious for me because two sub-plans (likely) means this ForeignScan node checks join clauses and reconstruct a joined tuple by itself but does not check scan clauses pushed- down (it is job of inner/outer scan plan, isn't it?). In this case, how do we treat N-way remote join cases (N>2) if we assume such a capability in FDW driver? One subplan means FDW driver run an entire join sub-tree with local alternative sub-plan; that is my expectation for the majority case. However, I cannot explain two subplans, but not multiple, well. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/19 12:32, Robert Haas wrote: > On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> The attached patch is the portion cut from the previous EPQ recheck >> patch. > Thanks, committed. Thanks, Robert and KaiGai-san. Sorry, I'm a bit late to the party. Here are my questions: * This patch means we can define fdw_recheck_quals even for the case of foreign tables with non-NIL fdw_scan_tlist. However, we discussed in another thread [1] that such foreign tables might break EvalPlanQual tests. Where are we on that issue? * For the case of foreign joins, I think fdw_recheck_quals can be defined for example, the same way as for the case of foreign tables, ie, quals not in scan.plan.qual, or ones defined as "otherclauses" (rinfo->is_pushed_down=true) pushed down to the remote. But since it's required that the FDW has to add to the fdw_scan_tlist the set of columns needed to check quals in fdw_recheck_quals in preparation for EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being long, leading to an increase in a total data transfer amount from the remote. So, that seems not practical to me. Maybe I'm missing something, but what use cases are you thinking? Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/55AF3C08.1070409@lab.ntt.co.jp
> On 2015/11/19 12:32, Robert Haas wrote: > > On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> The attached patch is the portion cut from the previous EPQ recheck > >> patch. > > > Thanks, committed. > > Thanks, Robert and KaiGai-san. > > Sorry, I'm a bit late to the party. Here are my questions: > > * This patch means we can define fdw_recheck_quals even for the case of > foreign tables with non-NIL fdw_scan_tlist. However, we discussed in > another thread [1] that such foreign tables might break EvalPlanQual > tests. Where are we on that issue? > In case of later locking, RefetchForeignRow() will set a base tuple that have compatible layout of the base relation, not fdw_scan_tlist, because RefetchForeignRow() does not have information about scan node. Here is two solutions. 1) You should not use fdw_scan_tlist for the FDW that uses late locking mechanism. 2) recheck callback applies projection to fit fdw_scan_tlist (that is not difficult to provide as a utility function by the core). Even though we allow to set up fdw_scan_tlist on simple scan cases, it does not mean it works for any cases. > * For the case of foreign joins, I think fdw_recheck_quals can be > defined for example, the same way as for the case of foreign tables, ie, > quals not in scan.plan.qual, or ones defined as "otherclauses" > (rinfo->is_pushed_down=true) pushed down to the remote. But since it's > required that the FDW has to add to the fdw_scan_tlist the set of > columns needed to check quals in fdw_recheck_quals in preparation for > EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being > long, leading to an increase in a total data transfer amount from the > remote. So, that seems not practical to me. Maybe I'm missing > something, but what use cases are you thinking? > It is trade-off. What solution do you think we can have? To avoid data transfer used for EPQ recheck only, we can implement FDW driver to issue remote join again on EPQ recheck, however, it is not a wise design, isn't it? If we would be able to have no extra data transfer and no remote join execution during EPQ recheck, it is a perfect. However, we have to take both advantage and disadvantage when we determine an implementation. We usually choose a way that has more advantage than disadvantage, but it does not mean no disadvantage. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> > So, are you suggesting to make a patch that allows ForeignScan to have >> > multiple sub-plans right now? Or, one sub-plan? >> >> Two: >> >> http://www.postgresql.org/message-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ- >> 14yKfkgKREAQ@mail.gmail.com >> > Hmm. Two is a bit mysterious for me because two sub-plans (likely) > means this ForeignScan node checks join clauses and reconstruct > a joined tuple by itself but does not check scan clauses pushed- > down (it is job of inner/outer scan plan, isn't it?). > In this case, how do we treat N-way remote join cases (N>2) if we > assume such a capability in FDW driver? > > One subplan means FDW driver run an entire join sub-tree with local > alternative sub-plan; that is my expectation for the majority case. > However, I cannot explain two subplans, but not multiple, well. What I'm imagining is that we'd add handling that allows the ForeignScan to have inner and outer children. If the FDW wants to delegate the EvalPlanQual handling to a local plan, it can use the outer child for that. Or the inner one, if it likes. The other one is available for some other purposes which we can't imagine yet. If this is too weird, we can only add handling for an outer subplan and forget about having an inner subplan for now. I just thought to make it symmetric, since outer and inner subplans are pretty deeply baked into the structure of the system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> > So, are you suggesting to make a patch that allows ForeignScan to have > >> > multiple sub-plans right now? Or, one sub-plan? > >> > >> Two: > >> > http://www.postgresql.org/message-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ- > >> 14yKfkgKREAQ@mail.gmail.com > >> > > Hmm. Two is a bit mysterious for me because two sub-plans (likely) > > means this ForeignScan node checks join clauses and reconstruct > > a joined tuple by itself but does not check scan clauses pushed- > > down (it is job of inner/outer scan plan, isn't it?). > > In this case, how do we treat N-way remote join cases (N>2) if we > > assume such a capability in FDW driver? > > > > One subplan means FDW driver run an entire join sub-tree with local > > alternative sub-plan; that is my expectation for the majority case. > > However, I cannot explain two subplans, but not multiple, well. > > What I'm imagining is that we'd add handling that allows the > ForeignScan to have inner and outer children. If the FDW wants to > delegate the EvalPlanQual handling to a local plan, it can use the > outer child for that. Or the inner one, if it likes. The other one > is available for some other purposes which we can't imagine yet. If > this is too weird, we can only add handling for an outer subplan and > forget about having an inner subplan for now. > I'd like to agree the last sentence. Having one sub-plan is better (but the second best from my standpoint) than fixed two subplans, because ... > I just thought to make > it symmetric, since outer and inner subplans are pretty deeply baked > into the structure of the system. > Yep, if we would have a special ForeignJoinPath to handle two foreign- tables join, it will be natural. However, our choice allows N-way join at once if sub-plan is consists of three or more foreign-tables. In this case, ForeignScan (scanrelid==0) can represents a sub-plan that shall be equivalent to a stack of joins; that looks like a ForeignScan has inner, outer and variable number of "middler" input streams. If and when we assume ForeignScan has own join mechanism but processes scan-qualifiers by local sub-plans, fixed-number sub-plans are not sufficient. (Probably, it is minority case although.) I'm inclined to put just one outer path at this moment, because the purpose of the FDW sub-plans is EPQ recheck right now. So, we will be able to enhance the feature when we implement other stuffs - more aggressive join push-down for example. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/24 2:41, Robert Haas wrote: > On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> One subplan means FDW driver run an entire join sub-tree with local >> alternative sub-plan; that is my expectation for the majority case. > What I'm imagining is that we'd add handling that allows the > ForeignScan to have inner and outer children. If the FDW wants to > delegate the EvalPlanQual handling to a local plan, it can use the > outer child for that. Or the inner one, if it likes. The other one > is available for some other purposes which we can't imagine yet. If > this is too weird, we can only add handling for an outer subplan and > forget about having an inner subplan for now. I just thought to make > it symmetric, since outer and inner subplans are pretty deeply baked > into the structure of the system. I'd vote for only allowing an outer subplan. Best regards, Etsuro Fujita
On 2015/11/20 22:45, Kouhei Kaigai wrote: I wrote: >> * This patch means we can define fdw_recheck_quals even for the case of >> foreign tables with non-NIL fdw_scan_tlist. However, we discussed in >> another thread [1] that such foreign tables might break EvalPlanQual >> tests. Where are we on that issue? > In case of later locking, RefetchForeignRow() will set a base tuple > that have compatible layout of the base relation, not fdw_scan_tlist, > because RefetchForeignRow() does not have information about scan node. IIUC, I think the base tuple would be stored into EPQ state not only in case of late row locking but in case of early row locking. >> * For the case of foreign joins, I think fdw_recheck_quals can be >> defined for example, the same way as for the case of foreign tables, ie, >> quals not in scan.plan.qual, or ones defined as "otherclauses" >> (rinfo->is_pushed_down=true) pushed down to the remote. But since it's >> required that the FDW has to add to the fdw_scan_tlist the set of >> columns needed to check quals in fdw_recheck_quals in preparation for >> EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being >> long, leading to an increase in a total data transfer amount from the >> remote. So, that seems not practical to me. Maybe I'm missing >> something, but what use cases are you thinking? > It is trade-off. What solution do you think we can have? > To avoid data transfer used for EPQ recheck only, we can implement > FDW driver to issue remote join again on EPQ recheck, however, it > is not a wise design, isn't it? > > If we would be able to have no extra data transfer and no remote > join execution during EPQ recheck, it is a perfect. I was thinking that in an approach using a local join execution plan, I would just set fdw_recheck_quals set to NIL and evaluate the otherclauses as part of the local join execution plan, so that fdw_scan_tlist won't end up being longer, as in the patch [1]. (Note that in that patch, remote_exprs==NIL when calling make_foreignscan during postgresGetForeignPlan in case of foreign joins.) Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/5624D583.10202@lab.ntt.co.jp
On 2015/11/09 9:26, Kouhei Kaigai wrote: > The attached patch is an adjusted version of the previous one. There seems to be no changes to make_foreignscan. Is that OK? Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Tuesday, November 24, 2015 12:45 PM > To: Robert Haas; Kaigai Kouhei(海外 浩平) > Cc: Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/11/24 2:41, Robert Haas wrote: > > On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> One subplan means FDW driver run an entire join sub-tree with local > >> alternative sub-plan; that is my expectation for the majority case. > > > What I'm imagining is that we'd add handling that allows the > > ForeignScan to have inner and outer children. If the FDW wants to > > delegate the EvalPlanQual handling to a local plan, it can use the > > outer child for that. Or the inner one, if it likes. The other one > > is available for some other purposes which we can't imagine yet. If > > this is too weird, we can only add handling for an outer subplan and > > forget about having an inner subplan for now. I just thought to make > > it symmetric, since outer and inner subplans are pretty deeply baked > > into the structure of the system. > > I'd vote for only allowing an outer subplan. > The attached patch adds: Path *fdw_outerpath field to ForeignPath node. FDW driver can set arbitrary but one path-node here. After that, this path-node shall be transformed to plan-node by createplan.c, then passed to FDW driver using GetForeignPlan callback. We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). The Plan->outerPlan is a common field, so patch size become relatively small. FDW driver can initialize this plan at BeginForeignScan, then execute this sub-plan-tree on demand. Remaining portions are as previous version. ExecScanFetch is revised to call recheckMtd always when scanrelid==0, then FDW driver can get control using RecheckForeignScan callback. It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes, (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses, by its preferable implementation - including execution of an alternative sub-plan. > There seems to be no changes to make_foreignscan. Is that OK? > create_foreignscan_path(), not only make_foreignscan(). This patch is not tested by actual FDW extensions, so it is helpful to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Hello, At Thu, 26 Nov 2015 05:04:32 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F801176205@BPXM15GP.gisp.nec.co.jp> > > On 2015/11/24 2:41, Robert Haas wrote: > > > On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > >> One subplan means FDW driver run an entire join sub-tree with local > > >> alternative sub-plan; that is my expectation for the majority case. > > > > > What I'm imagining is that we'd add handling that allows the > > > ForeignScan to have inner and outer children. If the FDW wants to > > > delegate the EvalPlanQual handling to a local plan, it can use the > > > outer child for that. Or the inner one, if it likes. The other one > > > is available for some other purposes which we can't imagine yet. If > > > this is too weird, we can only add handling for an outer subplan and > > > forget about having an inner subplan for now. I just thought to make > > > it symmetric, since outer and inner subplans are pretty deeply baked > > > into the structure of the system. > > > > I'd vote for only allowing an outer subplan. > > > The attached patch adds: Path *fdw_outerpath field to ForeignPath node. > FDW driver can set arbitrary but one path-node here. It is named "outerpath/plan". Surely we used the term 'outer' in association with other nodes for disign decision but is it valid to call it outer? Addition to that, there's no innerpath in this patch and have "path" instead. > After that, this path-node shall be transformed to plan-node by > createplan.c, then passed to FDW driver using GetForeignPlan callback. > We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). > The Plan->outerPlan is a common field, so patch size become relatively Plan->outerPlan => Plan->lefttree? > small. FDW driver can initialize this plan at BeginForeignScan, then > execute this sub-plan-tree on demand. > > Remaining portions are as previous version. ExecScanFetch is revised > to call recheckMtd always when scanrelid==0, then FDW driver can get > control using RecheckForeignScan callback. Perhaps we need a comment about foreignscan as a fake join for the case with scanrelid == 0 in ExecScanReScan. > It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes, > (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses, > by its preferable implementation - including execution of an alternative > sub-plan. In ForeignRecheck, ExecQual on fdw_recheck_quals is executed if RecheckForeignScan returns true, which I think the signal that the returned tuple matches the recheck qual. Whether do you think have the responsibility to check the reconstructed tuple when RecheckCoreignScan is registered, RecheckForeignScan or ExecQual? The documentation says as following so I think the former has. # I don't understhad what 'can or must' means, though... 'can and # must'? + Also, this callback can or must recheck scan qualifiers and join + conditions which are pushed down. Especially, it needs special > > There seems to be no changes to make_foreignscan. Is that OK? > > > create_foreignscan_path(), not only make_foreignscan(). > > This patch is not tested by actual FDW extensions, so it is helpful > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. regardes, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/11/26 14:04, Kouhei Kaigai wrote: >> On 2015/11/24 2:41, Robert Haas wrote: >>> On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>>> One subplan means FDW driver run an entire join sub-tree with local >>>> alternative sub-plan; that is my expectation for the majority case. >>> What I'm imagining is that we'd add handling that allows the >>> ForeignScan to have inner and outer children. If the FDW wants to >>> delegate the EvalPlanQual handling to a local plan, it can use the >>> outer child for that. Or the inner one, if it likes. The other one >>> is available for some other purposes which we can't imagine yet. If >>> this is too weird, we can only add handling for an outer subplan and >>> forget about having an inner subplan for now. I just thought to make >>> it symmetric, since outer and inner subplans are pretty deeply baked >>> into the structure of the system. >> I'd vote for only allowing an outer subplan. > The attached patch adds: Path *fdw_outerpath field to ForeignPath node. > FDW driver can set arbitrary but one path-node here. > After that, this path-node shall be transformed to plan-node by > createplan.c, then passed to FDW driver using GetForeignPlan callback. I understand this, as I also did the same thing in my patches, but actually, that seems a bit complicated to me. Instead, could we keep the fdw_outerpath in the fdw_private of a ForeignPath node when creating the path node during GetForeignPaths, and then create an outerplan accordingly from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by using create_plan_recurse there? I think that that would make the core involvment much simpler. > We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). > The Plan->outerPlan is a common field, so patch size become relatively > small. FDW driver can initialize this plan at BeginForeignScan, then > execute this sub-plan-tree on demand. Another idea would be to add the core support for initializing/closing/rescanning the outerplan tree when the tree is given. > Remaining portions are as previous version. ExecScanFetch is revised > to call recheckMtd always when scanrelid==0, then FDW driver can get > control using RecheckForeignScan callback. > It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes, > (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses, > by its preferable implementation - including execution of an alternative > sub-plan. @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot) ResetExprContext(econtext); + /* + * FDW driver has to recheck visibility of EPQ tuple towards + * the scan qualifiers once it gets pushed down. + * In addition, if this node represents a join sub-tree, not + * a scan, FDW driver is also responsible to reconstruct + * a joined tuple according to the primitive EPQ tuples. + */ + if (fdwroutine->RecheckForeignScan) + { + if (!fdwroutine->RecheckForeignScan(node, slot)) + return false; + } Maybe I'm missing something, but I think we should let FDW do the work if scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should abort the transaction.) Another thing I'm concerned about is @@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node) { Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; - Assert(scanrelid > 0); + if (scanrelid > 0) + estate->es_epqScanDone[scanrelid - 1] = false; + else + { + Bitmapset *relids; + int rtindex = -1; + + if (IsA(node->ps.plan, ForeignScan)) + relids = ((ForeignScan *) node->ps.plan)->fs_relids; + else if (IsA(node->ps.plan, CustomScan)) + relids = ((CustomScan *) node->ps.plan)->custom_relids; + else + elog(ERROR, "unexpected scan node: %d", + (int)nodeTag(node->ps.plan)); - estate->es_epqScanDone[scanrelid - 1] = false; + while ((rtindex = bms_next_member(relids, rtindex)) >= 0) + { + Assert(rtindex > 0); + estate->es_epqScanDone[rtindex - 1] = false; + } + } } That seems the outerplan's business to me, so I think it'd be better to just return, right before the assertion, as I said before. Seen from another angle, ISTM that FDWs that don't use a local join execution plan wouldn't need to be aware of handling the es_epqScanDone flags. (Do you think that such FDWs should do something like what ExecScanFtch is doing about the flags, in their RecheckForeignScans? If so, I think we need docs for that.) >> There seems to be no changes to make_foreignscan. Is that OK? > create_foreignscan_path(), not only make_foreignscan(). OK > This patch is not tested by actual FDW extensions, so it is helpful > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. Will do. Best regards, Etsuro Fujita
> At Thu, 26 Nov 2015 05:04:32 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote > in <9A28C8860F777E439AA12E8AEA7694F801176205@BPXM15GP.gisp.nec.co.jp> > > > On 2015/11/24 2:41, Robert Haas wrote: > > > > On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > >> One subplan means FDW driver run an entire join sub-tree with local > > > >> alternative sub-plan; that is my expectation for the majority case. > > > > > > > What I'm imagining is that we'd add handling that allows the > > > > ForeignScan to have inner and outer children. If the FDW wants to > > > > delegate the EvalPlanQual handling to a local plan, it can use the > > > > outer child for that. Or the inner one, if it likes. The other one > > > > is available for some other purposes which we can't imagine yet. If > > > > this is too weird, we can only add handling for an outer subplan and > > > > forget about having an inner subplan for now. I just thought to make > > > > it symmetric, since outer and inner subplans are pretty deeply baked > > > > into the structure of the system. > > > > > > I'd vote for only allowing an outer subplan. > > > > > The attached patch adds: Path *fdw_outerpath field to ForeignPath node. > > FDW driver can set arbitrary but one path-node here. > > It is named "outerpath/plan". Surely we used the term 'outer' in > association with other nodes for disign decision but is it valid > to call it outer? Addition to that, there's no innerpath in this > patch and have "path" instead. > Just "path" is too simple, not to inform people the expected usage of the node. If we would assign another name, my preference is "fdw_subpath" or "fdw_altpath". > > After that, this path-node shall be transformed to plan-node by > > createplan.c, then passed to FDW driver using GetForeignPlan callback. > > We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). > > The Plan->outerPlan is a common field, so patch size become relatively > > Plan->outerPlan => Plan->lefttree? > Yes, s/outerPlan/lefttree/g > > small. FDW driver can initialize this plan at BeginForeignScan, then > > execute this sub-plan-tree on demand. > > > > Remaining portions are as previous version. ExecScanFetch is revised > > to call recheckMtd always when scanrelid==0, then FDW driver can get > > control using RecheckForeignScan callback. > > Perhaps we need a comment about foreignscan as a fake join for > the case with scanrelid == 0 in ExecScanReScan. > Indeed, > > It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes, > > (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses, > > by its preferable implementation - including execution of an alternative > > sub-plan. > > In ForeignRecheck, ExecQual on fdw_recheck_quals is executed if > RecheckForeignScan returns true, which I think the signal that > the returned tuple matches the recheck qual. Whether do you think > have the responsibility to check the reconstructed tuple when > RecheckCoreignScan is registered, RecheckForeignScan or ExecQual? > Only RecheckForeignScan can reconstruct a joined tuple. On the other hands, both of facility can recheck scan-qualifiers and join-clauses. FDW author can choose its design according to his preference. If fdw_recheck_quals==NIL, FDW can apply all the rechecks within RecheckForeignScan callback. > The documentation says as following so I think the former has. > > # I don't understhad what 'can or must' means, though... 'can and > # must'? > > + Also, this callback can or must recheck scan qualifiers and join > + conditions which are pushed down. Especially, it needs special > If fdw_recheck_quals is set up correctly and join type is inner join, FDW driver does not recheck by itself. Elsewhere, it has to recheck the joined tuple, not only reconstruction. I try to revise the SGML stuff. > > > > There seems to be no changes to make_foreignscan. Is that OK? > > > > > create_foreignscan_path(), not only make_foreignscan(). > > > > This patch is not tested by actual FDW extensions, so it is helpful > > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. > > regardes, > -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> On 2015/11/26 14:04, Kouhei Kaigai wrote: > >> On 2015/11/24 2:41, Robert Haas wrote: > >>> On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >>>> One subplan means FDW driver run an entire join sub-tree with local > >>>> alternative sub-plan; that is my expectation for the majority case. > > >>> What I'm imagining is that we'd add handling that allows the > >>> ForeignScan to have inner and outer children. If the FDW wants to > >>> delegate the EvalPlanQual handling to a local plan, it can use the > >>> outer child for that. Or the inner one, if it likes. The other one > >>> is available for some other purposes which we can't imagine yet. If > >>> this is too weird, we can only add handling for an outer subplan and > >>> forget about having an inner subplan for now. I just thought to make > >>> it symmetric, since outer and inner subplans are pretty deeply baked > >>> into the structure of the system. > > >> I'd vote for only allowing an outer subplan. > > > The attached patch adds: Path *fdw_outerpath field to ForeignPath node. > > FDW driver can set arbitrary but one path-node here. > > After that, this path-node shall be transformed to plan-node by > > createplan.c, then passed to FDW driver using GetForeignPlan callback. > > I understand this, as I also did the same thing in my patches, but > actually, that seems a bit complicated to me. Instead, could we keep > the fdw_outerpath in the fdw_private of a ForeignPath node when creating > the path node during GetForeignPaths, and then create an outerplan > accordingly from the fdw_outerpath stored into the fdw_private during > GetForeignPlan, by using create_plan_recurse there? I think that that > would make the core involvment much simpler. > How to use create_plan_recurse by extension? It is a static function. > > We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). > > The Plan->outerPlan is a common field, so patch size become relatively > > small. FDW driver can initialize this plan at BeginForeignScan, then > > execute this sub-plan-tree on demand. > > Another idea would be to add the core support for > initializing/closing/rescanning the outerplan tree when the tree is given. > No. Please don't repeat same discussion again. > > Remaining portions are as previous version. ExecScanFetch is revised > > to call recheckMtd always when scanrelid==0, then FDW driver can get > > control using RecheckForeignScan callback. > > It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes, > > (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses, > > by its preferable implementation - including execution of an alternative > > sub-plan. > > @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot > *slot) > > ResetExprContext(econtext); > > + /* > + * FDW driver has to recheck visibility of EPQ tuple towards > + * the scan qualifiers once it gets pushed down. > + * In addition, if this node represents a join sub-tree, not > + * a scan, FDW driver is also responsible to reconstruct > + * a joined tuple according to the primitive EPQ tuples. > + */ > + if (fdwroutine->RecheckForeignScan) > + { > + if (!fdwroutine->RecheckForeignScan(node, slot)) > + return false; > + } > > Maybe I'm missing something, but I think we should let FDW do the work > if scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. > (And if scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we > should abort the transaction.) > It should be Assert(). The node with scanrelid==0 never happen unless FDW driver does not add such a path explicitly. > Another thing I'm concerned about is > > @@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node) > { > Index scanrelid = ((Scan *) > node->ps.plan)->scanrelid; > > - Assert(scanrelid > 0); > + if (scanrelid > 0) > + estate->es_epqScanDone[scanrelid - 1] = false; > + else > + { > + Bitmapset *relids; > + int rtindex = -1; > + > + if (IsA(node->ps.plan, ForeignScan)) > + relids = ((ForeignScan *) > node->ps.plan)->fs_relids; > + else if (IsA(node->ps.plan, CustomScan)) > + relids = ((CustomScan *) > node->ps.plan)->custom_relids; > + else > + elog(ERROR, "unexpected scan node: %d", > + (int)nodeTag(node->ps.plan)); > > - estate->es_epqScanDone[scanrelid - 1] = false; > + while ((rtindex = bms_next_member(relids, rtindex)) >= > 0) > + { > + Assert(rtindex > 0); > + estate->es_epqScanDone[rtindex - 1] = false; > + } > + } > } > > That seems the outerplan's business to me, so I think it'd be better to > just return, right before the assertion, as I said before. Seen from > another angle, ISTM that FDWs that don't use a local join execution plan > wouldn't need to be aware of handling the es_epqScanDone flags. (Do you > think that such FDWs should do something like what ExecScanFtch is doing > about the flags, in their RecheckForeignScans? If so, I think we need > docs for that.) > Execution of alternative local subplan (outerplan) is discretional. We have to pay attention FDW drivers which handles EPQ recheck by itself. Even though you argue callback can violate state of es_epqScanDone flags, it is safe to follow the existing behavior. > >> There seems to be no changes to make_foreignscan. Is that OK? > > > create_foreignscan_path(), not only make_foreignscan(). > > OK > > > This patch is not tested by actual FDW extensions, so it is helpful > > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. > > Will do. > -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/27 0:14, Kouhei Kaigai wrote: >> The documentation says as following so I think the former has. >> >> # I don't understhad what 'can or must' means, though... 'can and >> # must'? >> >> + Also, this callback can or must recheck scan qualifiers and join >> + conditions which are pushed down. Especially, it needs special > If fdw_recheck_quals is set up correctly and join type is inner join, > FDW driver does not recheck by itself. Elsewhere, it has to recheck > the joined tuple, not only reconstruction. Sorry, I don't understand this. In my understanding, fdw_recheck_quals can be defined for a foreign join, regardless of the join type, and when the fdw_recheck_quals are defined, the RecheckForeignScan callback routine doesn't need to evaluate the fdw_recheck_quals by itself. No? Best regards, Etsuro Fujita
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Friday, November 27, 2015 2:40 PM > To: Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI > Cc: robertmhaas@gmail.com; tgl@sss.pgh.pa.us; pgsql-hackers@postgresql.org; > shigeru.hanada@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/11/27 0:14, Kouhei Kaigai wrote: > > >> The documentation says as following so I think the former has. > >> > >> # I don't understhad what 'can or must' means, though... 'can and > >> # must'? > >> > >> + Also, this callback can or must recheck scan qualifiers and join > >> + conditions which are pushed down. Especially, it needs special > > > If fdw_recheck_quals is set up correctly and join type is inner join, > > FDW driver does not recheck by itself. Elsewhere, it has to recheck > > the joined tuple, not only reconstruction. > > Sorry, I don't understand this. In my understanding, fdw_recheck_quals > can be defined for a foreign join, regardless of the join type, > Yes, "can be defined", but will not be workable if either side of joined tuple is NULL because of outer join. SQL functions returns NULL prior to evaluation, then ExecQual() treats this result as FALSE. However, a joined tuple that has NULL fields may be a valid tuple. We don't need to care about unmatched tuple if INNER JOIN. > and when > the fdw_recheck_quals are defined, the RecheckForeignScan callback > routine doesn't need to evaluate the fdw_recheck_quals by itself. No? > Yes, it does not need to run fdw_recheck_quals by itself (if they can guarantee correct results for any corner cases). Of course, if FDW driver keep expression for scan-qualifiers and join-clauses on another place (like fdw_exprs), it is FDW driver's responsibility to execute it, regardless of fdw_recheck_quals. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/11/27 0:14, Kouhei Kaigai wrote: >> On 2015/11/26 14:04, Kouhei Kaigai wrote: >>> The attached patch adds: Path *fdw_outerpath field to ForeignPath node. >>> FDW driver can set arbitrary but one path-node here. >>> After that, this path-node shall be transformed to plan-node by >>> createplan.c, then passed to FDW driver using GetForeignPlan callback. >> I understand this, as I also did the same thing in my patches, but >> actually, that seems a bit complicated to me. Instead, could we keep >> the fdw_outerpath in the fdw_private of a ForeignPath node when creating >> the path node during GetForeignPaths, and then create an outerplan >> accordingly from the fdw_outerpath stored into the fdw_private during >> GetForeignPlan, by using create_plan_recurse there? I think that that >> would make the core involvment much simpler. > How to use create_plan_recurse by extension? It is a static function. I was just thinking a change to make that function extern. >>> We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). >>> The Plan->outerPlan is a common field, so patch size become relatively >>> small. FDW driver can initialize this plan at BeginForeignScan, then >>> execute this sub-plan-tree on demand. >> Another idea would be to add the core support for >> initializing/closing/rescanning the outerplan tree when the tree is given. > No. Please don't repeat same discussion again. IIUC, I think your point is to allow FDWs to do something else, instead of performing a local join execution plan, during RecheckForeignScan. So, what's wrong with the core doing that support in that case? >> @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot >> *slot) >> >> ResetExprContext(econtext); >> >> + /* >> + * FDW driver has to recheck visibility of EPQ tuple towards >> + * the scan qualifiers once it gets pushed down. >> + * In addition, if this node represents a join sub-tree, not >> + * a scan, FDW driver is also responsible to reconstruct >> + * a joined tuple according to the primitive EPQ tuples. >> + */ >> + if (fdwroutine->RecheckForeignScan) >> + { >> + if (!fdwroutine->RecheckForeignScan(node, slot)) >> + return false; >> + } >> >> Maybe I'm missing something, but I think we should let FDW do the work >> if scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. >> (And if scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we >> should abort the transaction.) > It should be Assert(). The node with scanrelid==0 never happen > unless FDW driver does not add such a path explicitly. That's an idea. But the abort seems to me more consistent with other places (see eg, RefetchForeignRow in EvalPlanQualFetchRowMarks). >> Another thing I'm concerned about is >> >> @@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node) >> { >> Index scanrelid = ((Scan *) >> node->ps.plan)->scanrelid; >> >> - Assert(scanrelid > 0); >> + if (scanrelid > 0) >> + estate->es_epqScanDone[scanrelid - 1] = false; >> + else >> + { >> + Bitmapset *relids; >> + int rtindex = -1; >> + >> + if (IsA(node->ps.plan, ForeignScan)) >> + relids = ((ForeignScan *) >> node->ps.plan)->fs_relids; >> + else if (IsA(node->ps.plan, CustomScan)) >> + relids = ((CustomScan *) >> node->ps.plan)->custom_relids; >> + else >> + elog(ERROR, "unexpected scan node: %d", >> + (int)nodeTag(node->ps.plan)); >> >> - estate->es_epqScanDone[scanrelid - 1] = false; >> + while ((rtindex = bms_next_member(relids, rtindex)) >= >> 0) >> + { >> + Assert(rtindex > 0); >> + estate->es_epqScanDone[rtindex - 1] = false; >> + } >> + } >> } >> >> That seems the outerplan's business to me, so I think it'd be better to >> just return, right before the assertion, as I said before. Seen from >> another angle, ISTM that FDWs that don't use a local join execution plan >> wouldn't need to be aware of handling the es_epqScanDone flags. (Do you >> think that such FDWs should do something like what ExecScanFtch is doing >> about the flags, in their RecheckForeignScans? If so, I think we need >> docs for that.) > Execution of alternative local subplan (outerplan) is discretional. > We have to pay attention FDW drivers which handles EPQ recheck by > itself. Even though you argue callback can violate state of > es_epqScanDone flags, it is safe to follow the existing behavior. So, I think the documentation needs more work. Yet another thing that I'm concerned about is @@ -3747,7 +3754,8 @@ make_foreignscan(List *qptlist, List *fdw_exprs, List *fdw_private, List *fdw_scan_tlist, - List *fdw_recheck_quals) + List *fdw_recheck_quals, + Plan *fdw_outerplan) { ForeignScan *node = makeNode(ForeignScan); Plan *plan = &node->scan.plan; @@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist, /* cost will be filled in by create_foreignscan_plan */ plan->targetlist= qptlist; plan->qual = qpqual; - plan->lefttree = NULL; + plan->lefttree = fdw_outerplan; plan->righttree = NULL; node->scan.scanrelid = scanrelid; I think that that would break the EXPLAIN output. One option to avoid that is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or BeginForeignScan as you proposed. That breaks the equivalence that the Plan tree and the PlanState tree should be mirror images of each other, but I think that that break would be harmless. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/55DEF5F0.308@lab.ntt.co.jp
On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> The attached patch adds: Path *fdw_outerpath field to ForeignPath node. >> FDW driver can set arbitrary but one path-node here. >> After that, this path-node shall be transformed to plan-node by >> createplan.c, then passed to FDW driver using GetForeignPlan callback. > > I understand this, as I also did the same thing in my patches, but actually, > that seems a bit complicated to me. Instead, could we keep the > fdw_outerpath in the fdw_private of a ForeignPath node when creating the > path node during GetForeignPaths, and then create an outerplan accordingly > from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by > using create_plan_recurse there? I think that that would make the core > involvment much simpler. I can't see how it's going to get much simpler than this. The core core is well under a hundred lines, and it all looks pretty straightforward to me. All of our existing path and plan types keep lists of paths and plans separate from other kinds of data, and I don't think we're going to win any awards for deviating from that principle here. > @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot > *slot) > > ResetExprContext(econtext); > > + /* > + * FDW driver has to recheck visibility of EPQ tuple towards > + * the scan qualifiers once it gets pushed down. > + * In addition, if this node represents a join sub-tree, not > + * a scan, FDW driver is also responsible to reconstruct > + * a joined tuple according to the primitive EPQ tuples. > + */ > + if (fdwroutine->RecheckForeignScan) > + { > + if (!fdwroutine->RecheckForeignScan(node, slot)) > + return false; > + } > > Maybe I'm missing something, but I think we should let FDW do the work if > scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if > scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should > abort the transaction.) That would be unnecessarily restrictive. On the one hand, even if scanrelid != 0, the FDW can decide that it prefers to do the rechecks using RecheckForeignScan rather than fdw_recheck_quals. For most FDWs, I expect using fdw_recheck_quals to be more convenient, but there may be cases where somebody prefers to use RecheckForeignScan, and allowing that costs nothing. On the flip side, an FDW could choose to support join pushdown but not worry about EPQ rechecks: it can just refuse to push down joins when any rowmarks are present. Requiring the FDW author to supply a dummy RecheckForeignScan method in that case is pointless. So I think KaiGai's check is exactly right. > Another thing I'm concerned about is > > @@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node) > { > Index scanrelid = ((Scan *) > node->ps.plan)->scanrelid; > > - Assert(scanrelid > 0); > + if (scanrelid > 0) > + estate->es_epqScanDone[scanrelid - 1] = false; > + else > + { > + Bitmapset *relids; > + int rtindex = -1; > + > + if (IsA(node->ps.plan, ForeignScan)) > + relids = ((ForeignScan *) > node->ps.plan)->fs_relids; > + else if (IsA(node->ps.plan, CustomScan)) > + relids = ((CustomScan *) > node->ps.plan)->custom_relids; > + else > + elog(ERROR, "unexpected scan node: %d", > + (int)nodeTag(node->ps.plan)); > > - estate->es_epqScanDone[scanrelid - 1] = false; > + while ((rtindex = bms_next_member(relids, rtindex)) >>= 0) > + { > + Assert(rtindex > 0); > + estate->es_epqScanDone[rtindex - 1] = false; > + } > + } > } > > That seems the outerplan's business to me, so I think it'd be better to just > return, right before the assertion, as I said before. Seen from another > angle, ISTM that FDWs that don't use a local join execution plan wouldn't > need to be aware of handling the es_epqScanDone flags. (Do you think that > such FDWs should do something like what ExecScanFtch is doing about the > flags, in their RecheckForeignScans? If so, I think we need docs for that.) I noticed this too when reviewing KaiGai's patch, but ultimately I think the way KaiGai has it is fine. It may not be useful in some cases, but AFAICS it should be harmless. >> This patch is not tested by actual FDW extensions, so it is helpful >> to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. > > Will do. That would be great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: Plan *plan = &node->scan.plan; > @@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist, > /* cost will be filled in by create_foreignscan_plan */ > plan->targetlist = qptlist; > plan->qual = qpqual; > - plan->lefttree = NULL; > + plan->lefttree = fdw_outerplan; > plan->righttree = NULL; > node->scan.scanrelid = scanrelid; > > I think that that would break the EXPLAIN output. In what way? EXPLAIN recurses into the left and right trees of every plan node regardless of what type it is, so superficially I feel like this ought to just work. What problem do you foresee? I do think that ExecInitForeignScan ought to be changed to ExecInitNode on it's outer plan if present rather than leaving that to the FDW's BeginForeignScan method. > One option to avoid that > is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or > BeginForeignScan as you proposed. That breaks the equivalence that the Plan > tree and the PlanState tree should be mirror images of each other, but I > think that that break would be harmless. I'm not sure how many times I have to say this, but we are not doing that. I will not commit any patch that does that, and I will vigorously argue against anyone else committing such a patch either. That *would* break EXPLAIN, because EXPLAIN relies on being able to walk the PlanState tree and find all the Plan nodes from the corresponding PlanState nodes. Now you might think that it would be OK to omit a plan node that we decided we weren't ever going to execute, but today we don't do that, and I don't think we should. I think it could be very confusing if EXPLAIN and EXPLAIN ANALYZE show different sets of plan nodes for the same query. Quite apart from EXPLAIN, there are numerous other places that assume that they can walk the PlanState tree and find all the Plan nodes. Breaking that assumption would be bad news. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Nov 27, 2015 at 1:25 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> Sorry, I don't understand this. In my understanding, fdw_recheck_quals >> can be defined for a foreign join, regardless of the join type, >> > Yes, "can be defined", but will not be workable if either side of > joined tuple is NULL because of outer join. SQL functions returns > NULL prior to evaluation, then ExecQual() treats this result as FALSE. > However, a joined tuple that has NULL fields may be a valid tuple. > > We don't need to care about unmatched tuple if INNER JOIN. This is a really good point, and a very strong argument for the design KaiGai has chosen here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita >> One option to avoid that >> is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or >> BeginForeignScan as you proposed. That breaks the equivalence that the Plan >> tree and the PlanState tree should be mirror images of each other, but I >> think that that break would be harmless. > I'm not sure how many times I have to say this, but we are not doing > that. I will not commit any patch that does that, and I will > vigorously argue against anyone else committing such a patch either. And I'll back him up. That's a horrible idea. You're proposing to break a very fundamental structural property for the convenience of one little corner of the system. regards, tom lane
On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch adds: Path *fdw_outerpath field to ForeignPath node. > FDW driver can set arbitrary but one path-node here. > After that, this path-node shall be transformed to plan-node by > createplan.c, then passed to FDW driver using GetForeignPlan callback. > We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). > The Plan->outerPlan is a common field, so patch size become relatively > small. FDW driver can initialize this plan at BeginForeignScan, then > execute this sub-plan-tree on demand. > > Remaining portions are as previous version. ExecScanFetch is revised > to call recheckMtd always when scanrelid==0, then FDW driver can get > control using RecheckForeignScan callback. > It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes, > (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses, > by its preferable implementation - including execution of an alternative > sub-plan. > >> There seems to be no changes to make_foreignscan. Is that OK? >> > create_foreignscan_path(), not only make_foreignscan(). > > This patch is not tested by actual FDW extensions, so it is helpful > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. I have done some editing and some small revisions on this patch. Here's what I came up with. The revisions are mostly cosmetic, but I revised it a bit so that the signature of GetForeignPlan need not change. Also, I made nodeForeignScan.c do some of the outer plan handling automatically, and I fixed the compile breaks in contrib/file_fdw and contrib/postgres_fdw. Comments/review/testing are very welcome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hello, thank you for taking time for this. At Tue, 1 Dec 2015 14:56:54 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoY+1Cq0bjXBP+coeKtkOMbpUMVQsfL2fJQY+ws7Nu=wgg@mail.gmail.com> > On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > This patch is not tested by actual FDW extensions, so it is helpful > > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. > > I have done some editing and some small revisions on this patch. > Here's what I came up with. The revisions are mostly cosmetic, but I > revised it a bit so that the signature of GetForeignPlan need not > change. Also, I made nodeForeignScan.c do some of the outer plan > handling automatically, and I fixed the compile breaks in > contrib/file_fdw and contrib/postgres_fdw. > > Comments/review/testing are very welcome. Applied on HEAD with no error. Regtests of core, postgres_fdw and file_fdw finished with no error. (I haven't done any further testing) nodeScan.c: The comments in nodeScan.c looks way clearer. Thank you for rewriting. nodeForeignscan.c: Is this a mistake? > @@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)> scanstate->fdwroutine = fdwroutine;> scanstate->fdw_state = NULL;> > + /* Initialize any outer plan. */ -> + if (outerPlanState(scanstate)) +> + if (outerPlanState(node))> + outerPlanState(scanstate) = createplan.c, planmain.h: I agree with reverting the signature of GetForeignPlan. fdwapi.h: The reverting of the additional parameter of ForeignScan leavesonly change of indentation of the last parameter. fdwhandler.sgml: This is easy to understand to me. Thank you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Sorry, I made a mistake. At Wed, 02 Dec 2015 10:29:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20151202.102917.50152198.horiguchi.kyotaro@lab.ntt.co.jp> > Hello, thank you for editing. > > At Tue, 1 Dec 2015 14:56:54 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoY+1Cq0bjXBP+coeKtkOMbpUMVQsfL2fJQY+ws7Nu=wgg@mail.gmail.com> > > On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > This patch is not tested by actual FDW extensions, so it is helpful > > > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. > > > > I have done some editing and some small revisions on this patch. > > Here's what I came up with. The revisions are mostly cosmetic, but I > > revised it a bit so that the signature of GetForeignPlan need not > > change. Also, I made nodeForeignScan.c do some of the outer plan > > handling automatically, and I fixed the compile breaks in > > contrib/file_fdw and contrib/postgres_fdw. > > > > Comments/review/testing are very welcome. > > Applied on HEAD with no error. Regtests of core, postgres_fdw and > file_fdw finished with no error. > > > nodeScan.c: > > The comments in nodeScan.c looks way clearer. Thank you for rewriting. > > nodeForeignscan.c: > > Is this a mistake? > > > @@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags) > > scanstate->fdwroutine = fdwroutine; > > scanstate->fdw_state = NULL; > > > > + /* Initialize any outer plan. */ > -> + if (outerPlanState(scanstate)) > +> + if (outerPlanState(node)) > > + outerPlanState(scanstate) = No, the above is wrong. -> + if (outerPlanState(scanstate)) +> + if (outerPlan(node))> + outerPlanState(scanstate) = > createplan.c, planmain.h: > > I agree with reverting the signature of GetForeignPlan. > > fdwapi.h: > > The reverting of the additional parameter of ForeignScan leaves > only change of indentation of the last parameter. > > fdwhandler.sgml: > > This is easy to understand to me. Thank you. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2015/12/02 1:41, Robert Haas wrote: > On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >>> The attached patch adds: Path *fdw_outerpath field to ForeignPath node. >>> FDW driver can set arbitrary but one path-node here. >>> After that, this path-node shall be transformed to plan-node by >>> createplan.c, then passed to FDW driver using GetForeignPlan callback. >> I understand this, as I also did the same thing in my patches, but actually, >> that seems a bit complicated to me. Instead, could we keep the >> fdw_outerpath in the fdw_private of a ForeignPath node when creating the >> path node during GetForeignPaths, and then create an outerplan accordingly >> from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by >> using create_plan_recurse there? I think that that would make the core >> involvment much simpler. > I can't see how it's going to get much simpler than this. The core > core is well under a hundred lines, and it all looks pretty > straightforward to me. All of our existing path and plan types keep > lists of paths and plans separate from other kinds of data, and I > don't think we're going to win any awards for deviating from that > principle here. One thing I can think of is that we can keep both the structure of a ForeignPath node and the API of create_foreignscan_path as-is. The latter is a good thing for FDW authors. And IIUC the patch you posted today, I think we could make create_foreignscan_plan a bit simpler too. Ie, in your patch, you modified that function asfollows: @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path, */ scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid, best_path, - tlist, scan_clauses); + tlist, + scan_clauses); + outerPlan(scan_plan) = fdw_outerplan; I think that would be OK, but I think we would have to do a bit more here about the fdw_outerplan's targetlist and qual; I think that the targetlist needs to be changed to fdw_scan_tlist, as in the patch [1], and that it'd be better to change the qual to remote conditions, ie, quals not in the scan_plan's scan.plan.qual, to avoid duplicate evaluation of local conditions. (In the patch [1], I didn't do anything about the qual because the current postgres_fdw join pushdown patch assumes that all the the scan_plan's scan.plan.qual are pushed down.) Or, FDW authors might want to do something about fdw_recheck_quals for a foreign-join while creating the fdw_outerplan. So if we do that during GetForeignPlan, I think we could make create_foreignscan_plan a bit simpler, or provide flexibility to FDW authors. >> @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot >> *slot) >> >> ResetExprContext(econtext); >> >> + /* >> + * FDW driver has to recheck visibility of EPQ tuple towards >> + * the scan qualifiers once it gets pushed down. >> + * In addition, if this node represents a join sub-tree, not >> + * a scan, FDW driver is also responsible to reconstruct >> + * a joined tuple according to the primitive EPQ tuples. >> + */ >> + if (fdwroutine->RecheckForeignScan) >> + { >> + if (!fdwroutine->RecheckForeignScan(node, slot)) >> + return false; >> + } >> >> Maybe I'm missing something, but I think we should let FDW do the work if >> scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if >> scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should >> abort the transaction.) > That would be unnecessarily restrictive. On the one hand, even if > scanrelid != 0, the FDW can decide that it prefers to do the rechecks > using RecheckForeignScan rather than fdw_recheck_quals. For most > FDWs, I expect using fdw_recheck_quals to be more convenient, but > there may be cases where somebody prefers to use RecheckForeignScan, > and allowing that costs nothing. I suppose that the flexibility would probably be a good thing, but I'm a little bit concerned that that might be rather confusing to FDW authors. Maybe I'm missing something, though. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/5624D583.10202@lab.ntt.co.jp
> On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > The attached patch adds: Path *fdw_outerpath field to ForeignPath node. > > FDW driver can set arbitrary but one path-node here. > > After that, this path-node shall be transformed to plan-node by > > createplan.c, then passed to FDW driver using GetForeignPlan callback. > > We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan). > > The Plan->outerPlan is a common field, so patch size become relatively > > small. FDW driver can initialize this plan at BeginForeignScan, then > > execute this sub-plan-tree on demand. > > > > Remaining portions are as previous version. ExecScanFetch is revised > > to call recheckMtd always when scanrelid==0, then FDW driver can get > > control using RecheckForeignScan callback. > > It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes, > > (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses, > > by its preferable implementation - including execution of an alternative > > sub-plan. > > > >> There seems to be no changes to make_foreignscan. Is that OK? > >> > > create_foreignscan_path(), not only make_foreignscan(). > > > > This patch is not tested by actual FDW extensions, so it is helpful > > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck. > > I have done some editing and some small revisions on this patch. > Here's what I came up with. The revisions are mostly cosmetic, but I > revised it a bit so that the signature of GetForeignPlan need not > change. > Thanks for the revising. (I could not be online for a few days, sorry.) > Also, I made nodeForeignScan.c do some of the outer plan > handling automatically, > It's OK for me. We may omit initialization/shutdown of sub-plan when it is not actually needed, even if FDW driver set up. However, it is very tiny advantage. > and I fixed the compile breaks in > contrib/file_fdw and contrib/postgres_fdw. > Sorry, I didn't fix up contrib side. > Comments/review/testing are very welcome. > One small point: @@ -3755,7 +3762,6 @@ make_foreignscan(List *qptlist, /* cost will be filled in by create_foreignscan_plan */ plan->targetlist= qptlist; plan->qual = qpqual; - plan->lefttree = NULL; plan->righttree = NULL; node->scan.scanrelid = scanrelid; /* fs_server will be filledin by create_foreignscan_plan */ Although it is harmless, I prefer this line is kept because caller of make_foreignscan() expects a ForeignScan node with empty lefttree, even if it is filled up later. Best regards, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> On 2015/12/02 1:41, Robert Haas wrote: > > On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > >>> The attached patch adds: Path *fdw_outerpath field to ForeignPath node. > >>> FDW driver can set arbitrary but one path-node here. > >>> After that, this path-node shall be transformed to plan-node by > >>> createplan.c, then passed to FDW driver using GetForeignPlan callback. > > >> I understand this, as I also did the same thing in my patches, but actually, > >> that seems a bit complicated to me. Instead, could we keep the > >> fdw_outerpath in the fdw_private of a ForeignPath node when creating the > >> path node during GetForeignPaths, and then create an outerplan accordingly > >> from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by > >> using create_plan_recurse there? I think that that would make the core > >> involvment much simpler. > > > I can't see how it's going to get much simpler than this. The core > > core is well under a hundred lines, and it all looks pretty > > straightforward to me. All of our existing path and plan types keep > > lists of paths and plans separate from other kinds of data, and I > > don't think we're going to win any awards for deviating from that > > principle here. > > One thing I can think of is that we can keep both the structure of a > ForeignPath node and the API of create_foreignscan_path as-is. The > latter is a good thing for FDW authors. And IIUC the patch you posted > today, I think we could make create_foreignscan_plan a bit simpler too. > Ie, in your patch, you modified that function as follows: > > @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, > ForeignPath *best_path, > */ > scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid, > > best_path, > - > tlist, scan_clauses); > + > tlist, > + > scan_clauses); > + outerPlan(scan_plan) = fdw_outerplan; > > I think that would be OK, but I think we would have to do a bit more > here about the fdw_outerplan's targetlist and qual; I think that the > targetlist needs to be changed to fdw_scan_tlist, as in the patch [1], > Hmm... you are right. The sub-plan shall generate a tuple according to the fdw_scan_tlist, if valid. Do you think the surgical operation is best to apply alternative target-list than build_path_tlist()? > and that it'd be better to change the qual to remote conditions, ie, > quals not in the scan_plan's scan.plan.qual, to avoid duplicate > evaluation of local conditions. (In the patch [1], I didn't do anything > about the qual because the current postgres_fdw join pushdown patch > assumes that all the the scan_plan's scan.plan.qual are pushed down.) > Or, FDW authors might want to do something about fdw_recheck_quals for a > foreign-join while creating the fdw_outerplan. So if we do that during > GetForeignPlan, I think we could make create_foreignscan_plan a bit > simpler, or provide flexibility to FDW authors. > So, you suggest it is better to pass fdw_outerplan on the GetForeignPlan callback, to allow FDW to adjust target-list and quals of sub-plans. I think it is reasonable argue. Only FDW knows which qualifiers are executable on remote side, so it is not easy to remove qualifiers to be executed on host-side only, from the sub-plan tree. > >> @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot > >> *slot) > >> > >> ResetExprContext(econtext); > >> > >> + /* > >> + * FDW driver has to recheck visibility of EPQ tuple towards > >> + * the scan qualifiers once it gets pushed down. > >> + * In addition, if this node represents a join sub-tree, not > >> + * a scan, FDW driver is also responsible to reconstruct > >> + * a joined tuple according to the primitive EPQ tuples. > >> + */ > >> + if (fdwroutine->RecheckForeignScan) > >> + { > >> + if (!fdwroutine->RecheckForeignScan(node, slot)) > >> + return false; > >> + } > >> > >> Maybe I'm missing something, but I think we should let FDW do the work if > >> scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if > >> scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should > >> abort the transaction.) > > > That would be unnecessarily restrictive. On the one hand, even if > > scanrelid != 0, the FDW can decide that it prefers to do the rechecks > > using RecheckForeignScan rather than fdw_recheck_quals. For most > > FDWs, I expect using fdw_recheck_quals to be more convenient, but > > there may be cases where somebody prefers to use RecheckForeignScan, > > and allowing that costs nothing. > > I suppose that the flexibility would probably be a good thing, but I'm a > little bit concerned that that might be rather confusing to FDW authors. > We expect FDW authors, like Hanada-san, have deep knowledge about PostgreSQL internal. It is not a feature for SQL newbie. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> Maybe I'm missing something, though. > > Best regards, > Etsuro Fujita > > [1] http://www.postgresql.org/message-id/5624D583.10202@lab.ntt.co.jp >
On 2015/12/02 1:53, Robert Haas wrote: > On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: Plan *plan = > &node->scan.plan; >> @@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist, >> /* cost will be filled in by create_foreignscan_plan */ >> plan->targetlist = qptlist; >> plan->qual = qpqual; >> - plan->lefttree = NULL; >> + plan->lefttree = fdw_outerplan; >> plan->righttree = NULL; >> node->scan.scanrelid = scanrelid; >> >> I think that that would break the EXPLAIN output. > In what way? EXPLAIN recurses into the left and right trees of every > plan node regardless of what type it is, so superficially I feel like > this ought to just work. What problem do you foresee? > > I do think that ExecInitForeignScan ought to be changed to > ExecInitNode on it's outer plan if present rather than leaving that to > the FDW's BeginForeignScan method. IIUC, I think the EXPLAIN output for eg, select localtab.* from localtab, ft1, ft2 where localtab.a = ft1.a and ft1.a = ft2.a for update would be something like this: LockRows -> Nested Loop Join Filter: (ft1.a = localtab.a) -> Seq Scan on localtab -> ForeignScan on ft1/ft2-foreign-join -> Nested Loop Join Filter: (ft1.a = ft2.a) -> Foreign Scan on ft1 -> Foreign Scan on ft2 The subplan below the Foreign Scan on the foreign-join seems odd to me. One option to avoid that is to handle the subplanas in my patch [2], which I created to address your comment that we should not break the equivalence discussed below. I'm not sure that the patch's handling of chgParam for the subplan is a good idea, though. >> One option to avoid that >> is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or >> BeginForeignScan as you proposed. That breaks the equivalence that the Plan >> tree and the PlanState tree should be mirror images of each other, but I >> think that that break would be harmless. > I'm not sure how many times I have to say this, but we are not doing > that. I will not commit any patch that does that, and I will > vigorously argue against anyone else committing such a patch either. > That *would* break EXPLAIN, because EXPLAIN relies on being able to > walk the PlanState tree and find all the Plan nodes from the > corresponding PlanState nodes. Now you might think that it would be > OK to omit a plan node that we decided we weren't ever going to > execute, but today we don't do that, and I don't think we should. I > think it could be very confusing if EXPLAIN and EXPLAIN ANALYZE show > different sets of plan nodes for the same query. Quite apart from > EXPLAIN, there are numerous other places that assume that they can > walk the PlanState tree and find all the Plan nodes. Breaking that > assumption would be bad news. Agreed. Thanks for the explanation! Best regards, Etsuro Fujita [2] http://www.postgresql.org/message-id/5624D583.10202@lab.ntt.co.jp
On 2015/12/02 14:54, Kouhei Kaigai wrote: >> On 2015/12/02 1:41, Robert Haas wrote: >>> On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>>> The attached patch adds: Path *fdw_outerpath field to ForeignPath node. >>>>> FDW driver can set arbitrary but one path-node here. >>>>> After that, this path-node shall be transformed to plan-node by >>>>> createplan.c, then passed to FDW driver using GetForeignPlan callback. >>>> I understand this, as I also did the same thing in my patches, but actually, >>>> that seems a bit complicated to me. Instead, could we keep the >>>> fdw_outerpath in the fdw_private of a ForeignPath node when creating the >>>> path node during GetForeignPaths, and then create an outerplan accordingly >>>> from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by >>>> using create_plan_recurse there? I think that that would make the core >>>> involvment much simpler. >>> I can't see how it's going to get much simpler than this. The core >>> core is well under a hundred lines, and it all looks pretty >>> straightforward to me. All of our existing path and plan types keep >>> lists of paths and plans separate from other kinds of data, and I >>> don't think we're going to win any awards for deviating from that >>> principle here. >> One thing I can think of is that we can keep both the structure of a >> ForeignPath node and the API of create_foreignscan_path as-is. The >> latter is a good thing for FDW authors. And IIUC the patch you posted >> today, I think we could make create_foreignscan_plan a bit simpler too. >> Ie, in your patch, you modified that function as follows: >> >> @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, >> ForeignPath *best_path, >> */ >> scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid, >> >> best_path, >> - >> tlist, scan_clauses); >> + >> tlist, >> + >> scan_clauses); >> + outerPlan(scan_plan) = fdw_outerplan; >> >> I think that would be OK, but I think we would have to do a bit more >> here about the fdw_outerplan's targetlist and qual; I think that the >> targetlist needs to be changed to fdw_scan_tlist, as in the patch [1], > Hmm... you are right. The sub-plan shall generate a tuple according to > the fdw_scan_tlist, if valid. Do you think the surgical operation is best > to apply alternative target-list than build_path_tlist()? Sorry, I'm not sure about that. I thought changing it to fdw_scan_tlist just because that's simple. >> and that it'd be better to change the qual to remote conditions, ie, >> quals not in the scan_plan's scan.plan.qual, to avoid duplicate >> evaluation of local conditions. (In the patch [1], I didn't do anything >> about the qual because the current postgres_fdw join pushdown patch >> assumes that all the the scan_plan's scan.plan.qual are pushed down.) >> Or, FDW authors might want to do something about fdw_recheck_quals for a >> foreign-join while creating the fdw_outerplan. So if we do that during >> GetForeignPlan, I think we could make create_foreignscan_plan a bit >> simpler, or provide flexibility to FDW authors. > So, you suggest it is better to pass fdw_outerplan on the GetForeignPlan > callback, to allow FDW to adjust target-list and quals of sub-plans. I think that is one option for us. Another option, which I proposed above, is 1) store an fdw_outerpath in the fdw_private when creating the ForeignPath node in GetForeignPaths, and then 2) create an fdw_outerplan from the fdw_outerpath stored into the fdw_private when creating the ForeignScan node in GetForeignPlan, by using create_plan_recurse in GetForeignPlan. (To do so, I was thinking to make that function extern.) One good point about that is that we can keep the API of create_foreignscan_path as-is, which I think would be a good thing for FDW authors that don't care about join pushdown. > I think it is reasonable argue. Only FDW knows which qualifiers are > executable on remote side, so it is not easy to remove qualifiers to be > executed on host-side only, from the sub-plan tree. Yeah, we could provide the flexibility to FDW authors. >>>> @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot >>>> *slot) >>>> >>>> ResetExprContext(econtext); >>>> >>>> + /* >>>> + * FDW driver has to recheck visibility of EPQ tuple towards >>>> + * the scan qualifiers once it gets pushed down. >>>> + * In addition, if this node represents a join sub-tree, not >>>> + * a scan, FDW driver is also responsible to reconstruct >>>> + * a joined tuple according to the primitive EPQ tuples. >>>> + */ >>>> + if (fdwroutine->RecheckForeignScan) >>>> + { >>>> + if (!fdwroutine->RecheckForeignScan(node, slot)) >>>> + return false; >>>> + } >>>> >>>> Maybe I'm missing something, but I think we should let FDW do the work if >>>> scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if >>>> scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should >>>> abort the transaction.) >>> That would be unnecessarily restrictive. On the one hand, even if >>> scanrelid != 0, the FDW can decide that it prefers to do the rechecks >>> using RecheckForeignScan rather than fdw_recheck_quals. For most >>> FDWs, I expect using fdw_recheck_quals to be more convenient, but >>> there may be cases where somebody prefers to use RecheckForeignScan, >>> and allowing that costs nothing. >> I suppose that the flexibility would probably be a good thing, but I'm a >> little bit concerned that that might be rather confusing to FDW authors. > We expect FDW authors, like Hanada-san, have deep knowledge about PostgreSQL > internal. It is not a feature for SQL newbie. That's right! Best regards, Etsuro Fujita
On 2015/12/02 1:54, Robert Haas wrote: > On Fri, Nov 27, 2015 at 1:25 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> Sorry, I don't understand this. In my understanding, fdw_recheck_quals >>> can be defined for a foreign join, regardless of the join type, >> Yes, "can be defined", but will not be workable if either side of >> joined tuple is NULL because of outer join. SQL functions returns >> NULL prior to evaluation, then ExecQual() treats this result as FALSE. >> However, a joined tuple that has NULL fields may be a valid tuple. >> >> We don't need to care about unmatched tuple if INNER JOIN. > This is a really good point, and a very strong argument for the design > KaiGai has chosen here. Maybe my explanation was not enough. Sorry about that. But I mean that we define fdw_recheck_quals for a foreign-join as quals that 1) were extracted by extract_actual_join_clauses as "otherclauses" (rinfo->is_pushed_down=true) and that 2) were pushed down to the remote server, not scan quals relevant to all the base tables invoved in the foreign-join. So in this definition, I think fdw_recheck_quals for a foreign-join will be workable, regardless of the join type. Best regards, Etsuro Fujita
On Tue, Dec 1, 2015 at 10:20 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > One thing I can think of is that we can keep both the structure of a > ForeignPath node and the API of create_foreignscan_path as-is. The latter > is a good thing for FDW authors. And IIUC the patch you posted today, I > think we could make create_foreignscan_plan a bit simpler too. Ie, in your > patch, you modified that function as follows: > > @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath > *best_path, > */ > scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid, > > best_path, > - > tlist, scan_clauses); > + > tlist, > + > scan_clauses); > + outerPlan(scan_plan) = fdw_outerplan; > > I think that would be OK, but I think we would have to do a bit more here > about the fdw_outerplan's targetlist and qual; I think that the targetlist > needs to be changed to fdw_scan_tlist, as in the patch [1], and that it'd be > better to change the qual to remote conditions, ie, quals not in the > scan_plan's scan.plan.qual, to avoid duplicate evaluation of local > conditions. (In the patch [1], I didn't do anything about the qual because > the current postgres_fdw join pushdown patch assumes that all the the > scan_plan's scan.plan.qual are pushed down.) Or, FDW authors might want to > do something about fdw_recheck_quals for a foreign-join while creating the > fdw_outerplan. So if we do that during GetForeignPlan, I think we could > make create_foreignscan_plan a bit simpler, or provide flexibility to FDW > authors. It's certainly true that we need the alternative plan's tlist to match that of the main plan; otherwise, it's going to be difficult for the FDW to make use of that alternative subplan to fill its slot, which is kinda the point of all this. However, I'm quite reluctant to introduce code into create_foreignscan_plan() that forces the subplan's tlist to match that of the main plan. For one thing, that would likely foreclose the possibility of an FDW ever using the outer plan for any purpose other than EPQ rechecks. It may be hard to imagine what else you'd do with the outer plan as things are today, but right now the two haves of the patch - letting FDWs have an outer subplan, and providing them with a way of overriding the EPQ recheck behavior - are technically independent. Putting tlist-altering behavior into create_foreignscan_plan() ties those two things together irrevocably. Instead, I think we should go the opposite direction and pass the outerplan to GetForeignPlan after all. I was lulled into a full sense of security by the realization that every FDW that uses this feature MUST want to do outerPlan(scan_plan) = fdw_outerplan. That's true, but irrelevant. The point is that the FDW might want to do something additional, like frob the outer plan's tlist, and it can't do that if we don't pass it fdw_outerplan. So we should do that, after all. Updated patch attached. This fixes a couple of whitespace issues that were pointed out, also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2015/12/05 5:15, Robert Haas wrote: > On Tue, Dec 1, 2015 at 10:20 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> One thing I can think of is that we can keep both the structure of a >> ForeignPath node and the API of create_foreignscan_path as-is. The latter >> is a good thing for FDW authors. And IIUC the patch you posted today, I >> think we could make create_foreignscan_plan a bit simpler too. Ie, in your >> patch, you modified that function as follows: >> >> @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath >> *best_path, >> */ >> scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid, >> >> best_path, >> - >> tlist, scan_clauses); >> + >> tlist, >> + >> scan_clauses); >> + outerPlan(scan_plan) = fdw_outerplan; >> >> I think that would be OK, but I think we would have to do a bit more here >> about the fdw_outerplan's targetlist and qual; I think that the targetlist >> needs to be changed to fdw_scan_tlist, as in the patch [1], and that it'd be >> better to change the qual to remote conditions, ie, quals not in the >> scan_plan's scan.plan.qual, to avoid duplicate evaluation of local >> conditions. (In the patch [1], I didn't do anything about the qual because >> the current postgres_fdw join pushdown patch assumes that all the the >> scan_plan's scan.plan.qual are pushed down.) Or, FDW authors might want to >> do something about fdw_recheck_quals for a foreign-join while creating the >> fdw_outerplan. So if we do that during GetForeignPlan, I think we could >> make create_foreignscan_plan a bit simpler, or provide flexibility to FDW >> authors. > It's certainly true that we need the alternative plan's tlist to match > that of the main plan; otherwise, it's going to be difficult for the > FDW to make use of that alternative subplan to fill its slot, which is > kinda the point of all this. OK. > However, I'm quite reluctant to > introduce code into create_foreignscan_plan() that forces the > subplan's tlist to match that of the main plan. For one thing, that > would likely foreclose the possibility of an FDW ever using the outer > plan for any purpose other than EPQ rechecks. It may be hard to > imagine what else you'd do with the outer plan as things are today, > but right now the two haves of the patch - letting FDWs have an outer > subplan, and providing them with a way of overriding the EPQ recheck > behavior - are technically independent. Putting tlist-altering > behavior into create_foreignscan_plan() ties those two things together > irrevocably. Agreed. > Instead, I think we should go the opposite direction and pass the > outerplan to GetForeignPlan after all. I was lulled into a full sense > of security by the realization that every FDW that uses this feature > MUST want to do outerPlan(scan_plan) = fdw_outerplan. That's true, > but irrelevant. The point is that the FDW might want to do something > additional, like frob the outer plan's tlist, and it can't do that if > we don't pass it fdw_outerplan. So we should do that, after all. As I proposed upthread, another idea would be to 1) to store an fdw_outerpath in the fdw_private list of a ForeignPath node, and then 2) to create an fdw_outerplan from *the fdw_outerpath stored into the fdw_private* in GetForeignPlan. One good thing for this is that we keep the API of create_foreignscan_path as-is. What do you think about that? > Updated patch attached. This fixes a couple of whitespace issues that > were pointed out, also. Thanks for updating the patch! Best regards, Etsuro Fujita
On Mon, Dec 7, 2015 at 12:25 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> Instead, I think we should go the opposite direction and pass the >> outerplan to GetForeignPlan after all. I was lulled into a full sense >> of security by the realization that every FDW that uses this feature >> MUST want to do outerPlan(scan_plan) = fdw_outerplan. That's true, >> but irrelevant. The point is that the FDW might want to do something >> additional, like frob the outer plan's tlist, and it can't do that if >> we don't pass it fdw_outerplan. So we should do that, after all. > > As I proposed upthread, another idea would be to 1) to store an > fdw_outerpath in the fdw_private list of a ForeignPath node, and then 2) to > create an fdw_outerplan from *the fdw_outerpath stored into > the fdw_private* in GetForeignPlan. One good thing for this is that we keep > the API of create_foreignscan_path as-is. What do you think about that? I don't think it's a good idea, per what I said in the first paragraph of this email: http://www.postgresql.org/message-id/CA+TgmoZ5G+ZGPh3STMGM6cWgTOywz3N1PjSw6Lvhz31ofgLZVw@mail.gmail.com I think the core system likely needs visibility into where paths and plans are present in node trees, and putting them somewhere inside fdw_private would be going in the opposite direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think the core system likely needs visibility into where paths and > plans are present in node trees, and putting them somewhere inside > fdw_private would be going in the opposite direction. Absolutely. You don't really want FDWs having to take responsibility for setrefs.c processing of their node trees, for example. This is why e.g. ForeignScan has both fdw_exprs and fdw_private. I'm not too concerned about whether we have to adjust FDW-related APIs as we go along. It's been clear from the beginning that we'd have to do that, and we are nowhere near a point where we should promise that we're done doing so. regards, tom lane
On 2015/12/08 3:06, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think the core system likely needs visibility into where paths and >> plans are present in node trees, and putting them somewhere inside >> fdw_private would be going in the opposite direction. > Absolutely. You don't really want FDWs having to take responsibility > for setrefs.c processing of their node trees, for example. This is why > e.g. ForeignScan has both fdw_exprs and fdw_private. > > I'm not too concerned about whether we have to adjust FDW-related APIs > as we go along. It's been clear from the beginning that we'd have to > do that, and we are nowhere near a point where we should promise that > we're done doing so. OK, I'd vote for Robert's idea, then. I'd like to discuss the next thing about his patch. As I mentioned in [1], the following change in the patch will break the EXPLAIN output. @@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags) scanstate->fdwroutine = fdwroutine; scanstate->fdw_state = NULL; + /* Initialize any outer plan. */ + if (outerPlanState(scanstate)) + outerPlanState(scanstate) = + ExecInitNode(outerPlan(node), estate, eflags); + As pointed out by Horiguchi-san, that's not correct, though; we should initialize the outer plan if outerPlan(node) != NULL, not outerPlanState(scanstate) != NULL. Attached is an updated version of his patch. I'm also attaching an updated version of the postgres_fdw join pushdown patch. You can find the breaking examples by doing the regression tests in the postgres_fdw patch. Please apply the patches in the following order: epq-recheck-v6-efujita (attached) usermapping_matching.patch in [2] add_GetUserMappingById.patch in [2] foreign_join_v16_efujita2.patch (attached) As I proposed upthread, I think we could fix that by handling the outer plan as in the patch [3]; a) the core initializes the outer plan and stores it into somewhere in the ForeignScanState node, not the lefttree of the ForeignScanState node, during ExecInitForeignScan, and b) when the RecheckForeignScan routine gets called, the FDW extracts the plan from the given ForeignScanState node and executes it. What do you think about that? Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/565EA539.1080703@lab.ntt.co.jp [2] http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com [3] http://www.postgresql.org/message-id/5624D583.10202@lab.ntt.co.jp
Attachment
On Tue, Dec 8, 2015 at 5:49 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > On 2015/12/08 3:06, Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I think the core system likely needs visibility into where paths and >>> plans are present in node trees, and putting them somewhere inside >>> fdw_private would be going in the opposite direction. > >> Absolutely. You don't really want FDWs having to take responsibility >> for setrefs.c processing of their node trees, for example. This is why >> e.g. ForeignScan has both fdw_exprs and fdw_private. >> >> I'm not too concerned about whether we have to adjust FDW-related APIs >> as we go along. It's been clear from the beginning that we'd have to >> do that, and we are nowhere near a point where we should promise that >> we're done doing so. > > OK, I'd vote for Robert's idea, then. I'd like to discuss the next > thing about his patch. As I mentioned in [1], the following change in > the patch will break the EXPLAIN output. > > @@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState > *estate, int eflags) > scanstate->fdwroutine = fdwroutine; > scanstate->fdw_state = NULL; > > + /* Initialize any outer plan. */ > + if (outerPlanState(scanstate)) > + outerPlanState(scanstate) = > + ExecInitNode(outerPlan(node), estate, eflags); > + > > As pointed out by Horiguchi-san, that's not correct, though; we should > initialize the outer plan if outerPlan(node) != NULL, not > outerPlanState(scanstate) != NULL. Attached is an updated version of > his patch. Oops, good catch. > I'm also attaching an updated version of the postgres_fdw > join pushdown patch. Is that based on Ashutosh's version of the patch, or are the two of you developing independent of each other? We should avoid dueling patches if possible. > You can find the breaking examples by doing the > regression tests in the postgres_fdw patch. Please apply the patches in > the following order: > > epq-recheck-v6-efujita (attached) > usermapping_matching.patch in [2] > add_GetUserMappingById.patch in [2] > foreign_join_v16_efujita2.patch (attached) > > As I proposed upthread, I think we could fix that by handling the outer > plan as in the patch [3]; a) the core initializes the outer plan and > stores it into somewhere in the ForeignScanState node, not the lefttree > of the ForeignScanState node, during ExecInitForeignScan, and b) when > the RecheckForeignScan routine gets called, the FDW extracts the plan > from the given ForeignScanState node and executes it. What do you think > about that? I think the actual regression test outputs are fine, and that your desire to suppress part of the plan tree from showing up in the EXPLAIN output is misguided. I like it just the way it is. To prevent user confusion, I think that when we add support to postgres_fdw for this we might also want to add some documentation explaining how to interpret this EXPLAIN output, but I don't think there's any problem with the output itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015/12/09 1:13, Robert Haas wrote: > On Tue, Dec 8, 2015 at 5:49 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> I'd like to discuss the next >> thing about his patch. As I mentioned in [1], the following change in >> the patch will break the EXPLAIN output. >> >> @@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState >> *estate, int eflags) >> scanstate->fdwroutine = fdwroutine; >> scanstate->fdw_state = NULL; >> >> + /* Initialize any outer plan. */ >> + if (outerPlanState(scanstate)) >> + outerPlanState(scanstate) = >> + ExecInitNode(outerPlan(node), estate, eflags); >> + >> >> As pointed out by Horiguchi-san, that's not correct, though; we should >> initialize the outer plan if outerPlan(node) != NULL, not >> outerPlanState(scanstate) != NULL. Attached is an updated version of >> his patch. >> I'm also attaching an updated version of the postgres_fdw >> join pushdown patch. > Is that based on Ashutosh's version of the patch, or are the two of > you developing independent of each other? We should avoid dueling > patches if possible. That's not based on his version. I'll add to his patch changes I've made. IIUC, his version is an updated version of Hanada-san's original patches that I've modified, so I guess that I could do that easily. (I've added a helper function for creating a local join execution plan for a given foreign join, but that is a rush work. So, I'll rewrite that.) >> You can find the breaking examples by doing the >> regression tests in the postgres_fdw patch. Please apply the patches in >> the following order: >> >> epq-recheck-v6-efujita (attached) >> usermapping_matching.patch in [2] >> add_GetUserMappingById.patch in [2] >> foreign_join_v16_efujita2.patch (attached) >> >> As I proposed upthread, I think we could fix that by handling the outer >> plan as in the patch [3]; a) the core initializes the outer plan and >> stores it into somewhere in the ForeignScanState node, not the lefttree >> of the ForeignScanState node, during ExecInitForeignScan, and b) when >> the RecheckForeignScan routine gets called, the FDW extracts the plan >> from the given ForeignScanState node and executes it. What do you think >> about that? > I think the actual regression test outputs are fine, and that your > desire to suppress part of the plan tree from showing up in the > EXPLAIN output is misguided. I like it just the way it is. To > prevent user confusion, I think that when we add support to > postgres_fdw for this we might also want to add some documentation > explaining how to interpret this EXPLAIN output, but I don't think > there's any problem with the output itself. I'm not sure that that's a good idea. one reason for that is I think that that would be more confusing to users when more than two foreign tables are involved in a foreign join as shown in the following example. Note that the outer plans will be shown recursively. Another reason is there is no consistency between the costs of the outer plans and that of the main plan. postgres=# explain verbose select * from foo, bar, baz where foo.a = bar.a and bar.a = baz.a for update; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ LockRows (cost=100.00..100.45 rows=15 width=96) Output: foo.a, bar.a, baz.a, foo.*, bar.*, baz.* -> Foreign Scan (cost=100.00..100.30rows=15 width=96) Output: foo.a, bar.a, baz.a, foo.*, bar.*, baz.* Relations: ((public.foo)INNER JOIN (public.bar)) INNER JOIN (public.baz) Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1, r.a2 FROM (SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT l.a9, ROW(l.a9) FROM (SELECT a a9 FROM p ublic.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT r.a9, ROW(r.a9) FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a1 = r.a1))) l (a1, a2, a3, a4) INNER JOIN (SELECT r.a9, ROW(r.a9) FROM (SELECT a a9 FROM public.baz FOR UPDATE) r) r (a1, a2) ON ((l.a1 = r.a1)) -> Hash Join (cost=272.13..272.69 rows=15 width=96) Output: foo.a, foo.*, bar.a, bar.*, baz.a, baz.* Hash Cond: (foo.a = baz.a) -> Foreign Scan (cost=100.00..100.04 rows=2 width=64) Output: foo.a, foo.*, bar.a, bar.* Relations: (public.foo) INNER JOIN (public.bar) Remote SQL: SELECT l.a1, l.a2, r.a1,r.a2 FROM (SELECT l.a9, ROW(l.a9) FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT r.a9, ROW(r.a9) FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a1 = r.a1)) -> Nested Loop (cost=200.00..202.18 rows=2 width=64) Output: foo.a, foo.*, bar.a, bar.* Join Filter: (foo.a = bar.a) -> Foreign Scan on public.foo (cost=100.00..101.06 rows=2 width=32) Output: foo.a, foo.* Remote SQL: SELECT a FROM public.foo FOR UPDATE -> Materialize (cost=100.00..101.07 rows=2 width=32) Output: bar.a, bar.* -> Foreign Scan on public.bar (cost=100.00..101.06 rows=2 width=32) Output: bar.a, bar.* Remote SQL: SELECT a FROM public.bar FOR UPDATE -> Hash (cost=153.86..153.86 rows=1462 width=32) Output: baz.a,baz.* -> Foreign Scan on public.baz (cost=100.00..153.86 rows=1462 width=32) Output: baz.a, baz.* RemoteSQL: SELECT a FROM public.baz FOR UPDATE (29 rows) Best regards, Etsuro Fujita
On Tue, Dec 8, 2015 at 10:00 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: >> I think the actual regression test outputs are fine, and that your >> desire to suppress part of the plan tree from showing up in the >> EXPLAIN output is misguided. I like it just the way it is. To >> prevent user confusion, I think that when we add support to >> postgres_fdw for this we might also want to add some documentation >> explaining how to interpret this EXPLAIN output, but I don't think >> there's any problem with the output itself. > > I'm not sure that that's a good idea. one reason for that is I think that > that would be more confusing to users when more than two foreign tables are > involved in a foreign join as shown in the following example. Note that the > outer plans will be shown recursively. Another reason is there is no > consistency between the costs of the outer plans and that of the main plan. I still don't really see a problem here, but, regardless, the solution can't be to hide nodes that are in fact present from the user. We can talk about making further changes here, but hiding the nodes altogether is categorically out in my mind. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Tue, Dec 8, 2015 at 10:00 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > >> I think the actual regression test outputs are fine, and that your > >> desire to suppress part of the plan tree from showing up in the > >> EXPLAIN output is misguided. I like it just the way it is. To > >> prevent user confusion, I think that when we add support to > >> postgres_fdw for this we might also want to add some documentation > >> explaining how to interpret this EXPLAIN output, but I don't think > >> there's any problem with the output itself. > > > > I'm not sure that that's a good idea. one reason for that is I think that > > that would be more confusing to users when more than two foreign tables are > > involved in a foreign join as shown in the following example. Note that the > > outer plans will be shown recursively. Another reason is there is no > > consistency between the costs of the outer plans and that of the main plan. > > I still don't really see a problem here, but, regardless, the solution > can't be to hide nodes that are in fact present from the user. We can > talk about making further changes here, but hiding the nodes > altogether is categorically out in my mind. > Fujita-san, If you really want to hide the alternative sub-plan, you can move the outer planstate onto somewhere private field on BeginForeignScan, then kick ExecProcNode() at the ForeignRecheck callback by itself. Explain walks down the sub-plan if outerPlanState(planstate) is valid. So, as long as your extension keeps the planstate privately, it is not visible from the EXPLAIN. Of course, I don't recommend it. -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/12/09 13:26, Kouhei Kaigai wrote: >> On Tue, Dec 8, 2015 at 10:00 PM, Etsuro Fujita >> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> I think the actual regression test outputs are fine, and that your >>>> desire to suppress part of the plan tree from showing up in the >>>> EXPLAIN output is misguided. I like it just the way it is. To >>>> prevent user confusion, I think that when we add support to >>>> postgres_fdw for this we might also want to add some documentation >>>> explaining how to interpret this EXPLAIN output, but I don't think >>>> there's any problem with the output itself. >>> I'm not sure that that's a good idea. one reason for that is I think that >>> that would be more confusing to users when more than two foreign tables are >>> involved in a foreign join as shown in the following example. Note that the >>> outer plans will be shown recursively. Another reason is there is no >>> consistency between the costs of the outer plans and that of the main plan. >> I still don't really see a problem here, but, regardless, the solution >> can't be to hide nodes that are in fact present from the user. We can >> talk about making further changes here, but hiding the nodes >> altogether is categorically out in my mind. > If you really want to hide the alternative sub-plan, you can move the > outer planstate onto somewhere private field on BeginForeignScan, > then kick ExecProcNode() at the ForeignRecheck callback by itself. > Explain walks down the sub-plan if outerPlanState(planstate) is > valid. So, as long as your extension keeps the planstate privately, > it is not visible from the EXPLAIN. > > Of course, I don't recommend it. Sorry, my explanation might be not enough, but I'm not saying to hide the subplan. I think it would be better to show the subplan somewhere in the EXPLAIN outout, but I'm not sure that it's a good idea to show that in the current form. We have two plan trees; one for normal query execution and another for EvalPlanQual testing. I think it'd be better to show the EXPLAIN output the way that allows users to easily identify each of the plan trees. Best regards, Etsuro Fujita
On Wed, Dec 9, 2015 at 3:22 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Sorry, my explanation might be not enough, but I'm not saying to hide the > subplan. I think it would be better to show the subplan somewhere in the > EXPLAIN outout, but I'm not sure that it's a good idea to show that in the > current form. We have two plan trees; one for normal query execution and > another for EvalPlanQual testing. I think it'd be better to show the > EXPLAIN output the way that allows users to easily identify each of the plan > trees. It's hard to do that because we don't identify that internally anywhere. Like I said before, the possibility of a ForeignScan having an outer subplan is formally independent of the new EPQ stuff, and I'd prefer to maintain that separation and just address this with documentation. Getting this bug fixed has been one of the more exhausting experiences of my involvement with PostgreSQL, and to be honest, I think I'd like to stop spending too much time on this now and work on getting the feature that this is intended to support working. Right now, the only people who can have an opinion on this topic are those who are following this thread in detail, and there really aren't that many of those. If we get the feature - join pushdown for postgres_fdw - working, then we might get some feedback from users about what they like about it or don't, and certainly if this is a frequent complaint then that bolsters the case for doing something about it, and possibly also helps us figure out what that thing should be. On the other hand, if we don't get the feature because we're busy debating interface details related to this patch, then none of these details matter anyway because nobody except developer is actually running the code in question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 10, 2015 at 1:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Dec 9, 2015 at 3:22 AM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Sorry, my explanation might be not enough, but I'm not saying to hide the >> subplan. I think it would be better to show the subplan somewhere in the >> EXPLAIN outout, but I'm not sure that it's a good idea to show that in the >> current form. We have two plan trees; one for normal query execution and >> another for EvalPlanQual testing. I think it'd be better to show the >> EXPLAIN output the way that allows users to easily identify each of the plan >> trees. > > It's hard to do that because we don't identify that internally > anywhere. Like I said before, the possibility of a ForeignScan having > an outer subplan is formally independent of the new EPQ stuff, and I'd > prefer to maintain that separation and just address this with > documentation. Fujita-san, others, could this be addressed with documentation? > Getting this bug fixed has been one of the more exhausting experiences > of my involvement with PostgreSQL, and to be honest, I think I'd like > to stop spending too much time on this now and work on getting the > feature that this is intended to support working. Right now, the only > people who can have an opinion on this topic are those who are > following this thread in detail, and there really aren't that many of > those. I am numbering that to mainly 3 people, you included :) > If we get the feature - join pushdown for postgres_fdw - > working, then we might get some feedback from users about what they > like about it or don't, and certainly if this is a frequent complaint > then that bolsters the case for doing something about it, and possibly > also helps us figure out what that thing should be. On the other > hand, if we don't get the feature because we're busy debating > interface details related to this patch, then none of these details > matter anyway because nobody except developer is actually running the > code in question. As this debate continues, I think that moving this patch to the next CF would make the most sense then.. So done this way. -- Michael
On 2015/12/22 15:24, Michael Paquier wrote: > On Thu, Dec 10, 2015 at 1:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> If we get the feature - join pushdown for postgres_fdw - >> working, then we might get some feedback from users about what they >> like about it or don't, and certainly if this is a frequent complaint >> then that bolsters the case for doing something about it, and possibly >> also helps us figure out what that thing should be. On the other >> hand, if we don't get the feature because we're busy debating >> interface details related to this patch, then none of these details >> matter anyway because nobody except developer is actually running the >> code in question. > > As this debate continues, I think that moving this patch to the next > CF would make the most sense then.. So done this way. Perhaps, this ended (?) with the following commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=385f337c9f39b21dca96ca4770552a10a6d5af24 Thanks, Amit
On Tue, Dec 22, 2015 at 3:52 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2015/12/22 15:24, Michael Paquier wrote: >> As this debate continues, I think that moving this patch to the next >> CF would make the most sense then.. So done this way. > > Perhaps, this ended (?) with the following commit: > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=385f337c9f39b21dca96ca4770552a10a6d5af24 Ah, thanks! What has been committed is actually more or less epq-recheck-v6-efujita.patch posted upthread, I'll mark the patch as committed then. -- Michael
On Tue, Dec 22, 2015 at 2:00 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Dec 22, 2015 at 3:52 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2015/12/22 15:24, Michael Paquier wrote: >>> As this debate continues, I think that moving this patch to the next >>> CF would make the most sense then.. So done this way. >> >> Perhaps, this ended (?) with the following commit: >> >> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=385f337c9f39b21dca96ca4770552a10a6d5af24 > > Ah, thanks! What has been committed is actually more or less > epq-recheck-v6-efujita.patch posted upthread, I'll mark the patch as > committed then. +1. And thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company