Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | 20151001.153807.70233777.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
|
List | pgsql-hackers |
Hello, I caught up this thread, maybe. > > So, if we wanted to fix this in a way that preserves the spirit of > > what's there now, it seems to me that we'd want the FDW to return > > something that's like a whole row reference, but represents the output > > of the foreign join rather than some underlying base table. And then > > get the EPQ machinery to have the evaluation of the ForeignScan for > > the join, when it happens in an EPQ context, to return that tuple. > > But I don't really have a good idea how to do that. > > > > More thought seems needed here... > > > Alternative built-in join execution? > Once it is executed under the EPQ context, built-in join node fetches > a tuple from both of inner and outer side for each. It is eventually > fetched from the EPQ slot, then the alternative join produce a result > tuple. It seems quite similar to what Fujita-san is trying now by somehow *replacing* "foreign join" scan node with alternative local join plan when EPQ. I think what Robert says is that "foreign join" scans that completely behaves as a ordinary scan node on executor. Current framework of foreign join pushdown seems a bit tricky because it incompletely emulating local join on foreign scans. The mixture seems to be the root cause of this problem. 1. Somehow run local joins on current EPQ tuples currently given by "foreign join" scans. 1.1 Somehow detecting running EPQ and switch the plan to run in ExecScanFetch or somewhere else. 1.2 Replace "foreign join scan" node with the alternative local join node on ExecInit. (I don't like this.) 1.3 In-core alternative local join executor for join pushdown? 2. "foreign join" scan plan node completely compliant to current executor semantics of ordinary scan node. In other words, the node has corresponding RTE_RELATION RTE, marked with ROW_MARK_COPY on locking and returns a slot withtlist that contains join result columns and the whole-row var on them. Then, ExecPlanQualFetchRowMarks gets the whole-rowvar and set it into eqpTuple for corresponding *relid*. I prefer the 2, but have no good idea how to do that now, too. > In case when FDW is not designed to handle join by itself, it is > a reasonable fallback I think. > > I expect FDW driver needs to handle EPQ recheck in the case below: > * ForeignScan on base relation and it uses late row locking. I think this is indisputable. > * ForeignScan on join relation, even if early locking. This could be unnecessary if the "foreign join" scan node can have its own rowmark of ROW_MARK_COPY. regards, At Thu, 1 Oct 2015 02:15:29 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80114D442@BPXM15GP.gisp.nec.co.jp> > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > > Sent: Wednesday, September 30, 2015 6:55 AM > > To: Etsuro Fujita > > Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂 > > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > > > On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > > > I thought the same thing [1]. While I thought it was relatively easy to > > > make changes to RefetchForeignRow that way for the foreign table case > > > (scanrelid>0), I was not sure how hard it would be to do so for the foreign > > > join case (scanrelid==0). So, I proposed to leave that changes for 9.6. > > > I'll have a rethink on this issue along the lines of that approach. > > > > Well, I spent some more time looking at this today, and testing it out > > using a fixed-up version of your foreign_join_v16 patch, and I decided > > that RefetchForeignRow is basically a red herring. That's only used > > for FDWs that do late row locking, but postgres_fdw (and probably many > > others) do early row locking, in which case RefetchForeignRow never > > gets called. Instead, the row is treated as a "non-locked source row" > > by ExecLockRows (even though it is in fact locked) and is re-fetched > > by EvalPlanQualFetchRowMarks. We should probably update the comment > > about non-locked source rows to mention the case of FDWs that do early > > row locking. > > > Indeed, select_rowmark_type() says ROW_MARK_COPY if GetForeignRowMarkType > callback is not defined. > > > Anyway, everything appears to work OK up to this point: we correctly > > retrieve the saved whole-rows from the foreign side and call > > EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and > > es_epqTupleSet[rti - 1]. So far, so good. Now we call > > EvalPlanQualNext, and that's where we get into trouble. We've got the > > already-locked tuples from the foreign side and those tuples CANNOT > > have gone away or been modified because we have already locked them. > > So, all the foreign join needs to do is return the same tuple that it > > returned before: the EPQ recheck was triggered by some *other* table > > involved in the plan, not our table. A local table also involved in > > the query, or conceivably a foreign table that does late row locking, > > could have had something change under it after the row was fetched, > > but in postgres_fdw that can't happen because we locked the row up > > front. And thus, again, all we need to do is re-return the same > > tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has > > caused us to preserve a copy of each *baserel* tuple. > > > > Now, this is as sad as can be. Early row locking has huge advantages > > for FDWs, both in terms of minimizing server round trips and also > > because the FDW doesn't really need to do anything about EPQ. Sure, > > it's inefficient to carry around whole-row references, but it makes > > life easy for the FDW author. > > > I got the point. Is it helpful to add description why ROW_MARK_COPY > does not need recheck on both of local/remote tuples? > http://www.postgresql.org/docs/devel/static/fdw-row-locking.html > > > So, if we wanted to fix this in a way that preserves the spirit of > > what's there now, it seems to me that we'd want the FDW to return > > something that's like a whole row reference, but represents the output > > of the foreign join rather than some underlying base table. And then > > get the EPQ machinery to have the evaluation of the ForeignScan for > > the join, when it happens in an EPQ context, to return that tuple. > > But I don't really have a good idea how to do that. > > > > More thought seems needed here... > > > Alternative built-in join execution? > Once it is executed under the EPQ context, built-in join node fetches > a tuple from both of inner and outer side for each. It is eventually > fetched from the EPQ slot, then the alternative join produce a result > tuple. > In case when FDW is not designed to handle join by itself, it is > a reasonable fallback I think. > > I expect FDW driver needs to handle EPQ recheck in the case below: > * ForeignScan on base relation and it uses late row locking. > * ForeignScan on join relation, even if early locking. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: