Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F80114D442@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
Re: Foreign join pushdown vs EvalPlanQual |
List | pgsql-hackers |
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Sent: Wednesday, September 30, 2015 6:55 AM > To: Etsuro Fujita > Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂 > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: > > I thought the same thing [1]. While I thought it was relatively easy to > > make changes to RefetchForeignRow that way for the foreign table case > > (scanrelid>0), I was not sure how hard it would be to do so for the foreign > > join case (scanrelid==0). So, I proposed to leave that changes for 9.6. > > I'll have a rethink on this issue along the lines of that approach. > > Well, I spent some more time looking at this today, and testing it out > using a fixed-up version of your foreign_join_v16 patch, and I decided > that RefetchForeignRow is basically a red herring. That's only used > for FDWs that do late row locking, but postgres_fdw (and probably many > others) do early row locking, in which case RefetchForeignRow never > gets called. Instead, the row is treated as a "non-locked source row" > by ExecLockRows (even though it is in fact locked) and is re-fetched > by EvalPlanQualFetchRowMarks. We should probably update the comment > about non-locked source rows to mention the case of FDWs that do early > row locking. > Indeed, select_rowmark_type() says ROW_MARK_COPY if GetForeignRowMarkType callback is not defined. > Anyway, everything appears to work OK up to this point: we correctly > retrieve the saved whole-rows from the foreign side and call > EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and > es_epqTupleSet[rti - 1]. So far, so good. Now we call > EvalPlanQualNext, and that's where we get into trouble. We've got the > already-locked tuples from the foreign side and those tuples CANNOT > have gone away or been modified because we have already locked them. > So, all the foreign join needs to do is return the same tuple that it > returned before: the EPQ recheck was triggered by some *other* table > involved in the plan, not our table. A local table also involved in > the query, or conceivably a foreign table that does late row locking, > could have had something change under it after the row was fetched, > but in postgres_fdw that can't happen because we locked the row up > front. And thus, again, all we need to do is re-return the same > tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has > caused us to preserve a copy of each *baserel* tuple. > > Now, this is as sad as can be. Early row locking has huge advantages > for FDWs, both in terms of minimizing server round trips and also > because the FDW doesn't really need to do anything about EPQ. Sure, > it's inefficient to carry around whole-row references, but it makes > life easy for the FDW author. > I got the point. Is it helpful to add description why ROW_MARK_COPY does not need recheck on both of local/remote tuples? http://www.postgresql.org/docs/devel/static/fdw-row-locking.html > So, if we wanted to fix this in a way that preserves the spirit of > what's there now, it seems to me that we'd want the FDW to return > something that's like a whole row reference, but represents the output > of the foreign join rather than some underlying base table. And then > get the EPQ machinery to have the evaluation of the ForeignScan for > the join, when it happens in an EPQ context, to return that tuple. > But I don't really have a good idea how to do that. > > More thought seems needed here... > Alternative built-in join execution? Once it is executed under the EPQ context, built-in join node fetches a tuple from both of inner and outer side for each. It is eventually fetched from the EPQ slot, then the alternative join produce a result tuple. In case when FDW is not designed to handle join by itself, it is a reasonable fallback I think. I expect FDW driver needs to handle EPQ recheck in the case below: * ForeignScan on base relation and it uses late row locking. * ForeignScan on join relation, even if early locking. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: