Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 560BA7DF.7020307@lab.ntt.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2015/09/30 6:55, Robert Haas wrote:
> On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> I thought the same thing [1].  While I thought it was relatively easy to
>> make changes to RefetchForeignRow that way for the foreign table case
>> (scanrelid>0), I was not sure how hard it would be to do so for the foreign
>> join case (scanrelid==0).  So, I proposed to leave that changes for 9.6.
>> I'll have a rethink on this issue along the lines of that approach.
>
> Well, I spent some more time looking at this today, and testing it out
> using a fixed-up version of your foreign_join_v16 patch, and I decided
> that RefetchForeignRow is basically a red herring.  That's only used
> for FDWs that do late row locking, but postgres_fdw (and probably many
> others) do early row locking, in which case RefetchForeignRow never
> gets called. Instead, the row is treated as a "non-locked source row"
> by ExecLockRows (even though it is in fact locked) and is re-fetched
> by EvalPlanQualFetchRowMarks.  We should probably update the comment
> about non-locked source rows to mention the case of FDWs that do early
> row locking.
>
> Anyway, everything appears to work OK up to this point: we correctly
> retrieve the saved whole-rows from the foreign side and call
> EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and
> es_epqTupleSet[rti - 1].  So far, so good.  Now we call
> EvalPlanQualNext, and that's where we get into trouble.  We've got the
> already-locked tuples from the foreign side and those tuples CANNOT
> have gone away or been modified because we have already locked them.
> So, all the foreign join needs to do is return the same tuple that it
> returned before: the EPQ recheck was triggered by some *other* table
> involved in the plan, not our table.  A local table also involved in
> the query, or conceivably a foreign table that does late row locking,
> could have had something change under it after the row was fetched,
> but in postgres_fdw that can't happen because we locked the row up
> front.  And thus, again, all we need to do is re-return the same
> tuple.  But we don't have that.  Instead, the ROW_MARK_COPY logic has
> caused us to preserve a copy of each *baserel* tuple.
>
> Now, this is as sad as can be.  Early row locking has huge advantages
> for FDWs, both in terms of minimizing server round trips and also
> because the FDW doesn't really need to do anything about EPQ.  Sure,
> it's inefficient to carry around whole-row references, but it makes
> life easy for the FDW author.
>
> So, if we wanted to fix this in a way that preserves the spirit of
> what's there now, it seems to me that we'd want the FDW to return
> something that's like a whole row reference, but represents the output
> of the foreign join rather than some underlying base table.  And then
> get the EPQ machinery to have the evaluation of the ForeignScan for
> the join, when it happens in an EPQ context, to return that tuple.
> But I don't really have a good idea how to do that.

I like a general solution.  Can't we extend that idea so that foreign 
tables involved in a foreign join are allowed to have different rowmark 
methods other than ROW_MARK_COPY, eg, ROW_MARK_EXCLUSIVE?

Best regards,
Etsuro Fujita




pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual
Next
From: Tomas Vondra
Date:
Subject: Re: PATCH: index-only scans with partial indexes