Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | CA+TgmoYAz=vpDn_EVBcBzTa=JUiHBGmtpreJoP+A=p80mkt0UQ@mail.gmail.com Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
Re: Foreign join pushdown vs EvalPlanQual |
List | pgsql-hackers |
On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > I thought the same thing [1]. While I thought it was relatively easy to > make changes to RefetchForeignRow that way for the foreign table case > (scanrelid>0), I was not sure how hard it would be to do so for the foreign > join case (scanrelid==0). So, I proposed to leave that changes for 9.6. > I'll have a rethink on this issue along the lines of that approach. Well, I spent some more time looking at this today, and testing it out using a fixed-up version of your foreign_join_v16 patch, and I decided that RefetchForeignRow is basically a red herring. That's only used for FDWs that do late row locking, but postgres_fdw (and probably many others) do early row locking, in which case RefetchForeignRow never gets called. Instead, the row is treated as a "non-locked source row" by ExecLockRows (even though it is in fact locked) and is re-fetched by EvalPlanQualFetchRowMarks. We should probably update the comment about non-locked source rows to mention the case of FDWs that do early row locking. Anyway, everything appears to work OK up to this point: we correctly retrieve the saved whole-rows from the foreign side and call EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and es_epqTupleSet[rti - 1]. So far, so good. Now we call EvalPlanQualNext, and that's where we get into trouble. We've got the already-locked tuples from the foreign side and those tuples CANNOT have gone away or been modified because we have already locked them. So, all the foreign join needs to do is return the same tuple that it returned before: the EPQ recheck was triggered by some *other* table involved in the plan, not our table. A local table also involved in the query, or conceivably a foreign table that does late row locking, could have had something change under it after the row was fetched, but in postgres_fdw that can't happen because we locked the row up front. And thus, again, all we need to do is re-return the same tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has caused us to preserve a copy of each *baserel* tuple. Now, this is as sad as can be. Early row locking has huge advantages for FDWs, both in terms of minimizing server round trips and also because the FDW doesn't really need to do anything about EPQ. Sure, it's inefficient to carry around whole-row references, but it makes life easy for the FDW author. So, if we wanted to fix this in a way that preserves the spirit of what's there now, it seems to me that we'd want the FDW to return something that's like a whole row reference, but represents the output of the foreign join rather than some underlying base table. And then get the EPQ machinery to have the evaluation of the ForeignScan for the join, when it happens in an EPQ context, to return that tuple. But I don't really have a good idea how to do that. More thought seems needed here... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: