Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 56249FA0.3010001@lab.ntt.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Foreign join pushdown vs EvalPlanQual  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2015/10/17 12:22, Robert Haas wrote:
> On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Both you and Etsuro Fujita are proposing to fix this problem by
>>> somehow making it the FDW's problem to reconstruct the tuple
>>> previously produced by the join from whole-row images of the baserels.
>>> But that's not looking back far enough: why are we asking for
>>> whole-row images of the baserels when what we really want is a
>>> whole-row image of the output of the join?  The output of the join is
>>> what we need to re-return.

>> There are multiple components to the requirement though:

>> 3. If so, form a join row and return that.  Else return NULL.

> Not check.
>
> Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw.
> There is a local table t.  The user does something like UPDATE t SET
> ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND ....  The
> query planner generates something like:
>
> Update
> -> Join
>    -> Scan on t
>    -> Foreign Scan on <ft1, ft2>
>
> If an EPQ recheck occurs, the only thing that matters is that the
> Foreign Scan return the right output row (or possibly now rows, if the
> row it would have formed no longer matches the quals).  It doesn't
> matter how it does this.  Let's say the columns actually needed by the
> query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b.
> Currently, the output of the foreign scan is something like: ft1.a,
> ft1.b, ft2.a, ft.b, ft1.*, ft2.*.  The EPQ recheck has access to ft1.*
> and ft2.*, but it's not straightforward for postgres_fdw to regenerate
> the join tuple from that.  Maybe the pushed-down was a left join,
> maybe it was a right join, maybe it was a full join.  So some of the
> columns could have gone to NULL.  To figure it out, you need to build
> a secondary plan tree that mimics the structure of the join you pushed
> down, which is kinda hairy.

As Tom mentioned, just recomputing the original join tuple is not good 
enough.  We would need to rejoin the test tuples for the baserels even 
if ROW_MARK_COPY is in use.  Consider:

A=# BEGIN;
A=# UPDATE t SET a = a + 1 WHERE b = 1;
B=# SELECT * from t, ft1, ft2     WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE;
A=# COMMIT;

where the plan for the SELECT FOR UPDATE is

LockRows
-> Nested Loop   -> Seq Scan on t   -> Foreign Scan on <ft1, ft2>        Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE
ft1.c= ft2.c AND 
 
ft1.a = $1 AND ft2.b = $2

If an EPQ recheck is invoked by the A's UPDATE, just recomputing the 
original join tuple from the whole-row image that you proposed would 
output an incorrect result in the EQP recheck since the value a in the 
updated version of a to-be-joined tuple in t would no longer match the 
value ft1.a extracted from the whole-row image if the A's UPDATE has 
committed successfully.  So I think we would need to rejoin the tuples 
populated from the whole-row images for the baserels ft1 and ft2, by 
executing the secondary plan with the new parameter values for a and b.

As for the secondary plan, I think we could create the corresponding 
local join execution path during GetForeignJoinPaths, (1) by looking at 
the pathlist of the joinrel RelOptInfo, which would have already 
contained some local join execution paths, as does the patch, or (2) by 
calling a helper function that creates a local join execution path from 
given outer/inner paths selected from the pathlists of the 
outerrel/innerrel RelOptInfos, as proposed be KaiGai-san before.  ISTM 
that the latter would be better, so I plan to propose such a function as 
part of the postgres_fdw join pushdown patch for 9.6.

> This example is of the early row locking case, but I think the story
> is about the same if the FDW wants to do late row locking instead.  If
> there's an EPQ recheck, it could issue individual row re-fetches
> against every base table and then re-do all the joins that it pushed
> down locally.  But it would be faster and cleaner, I think, to send
> one query to the remote side that re-fetches all the rows at once, and
> whose target list is exactly what we need, rather than whole row
> targetlists for each baserel that then have to be rejiggered on our
> side.

I agree with you on that point.  (In fact, I thought that too!)  But 
considering that many FDWs including postgres_fdw use early row locking 
(ie, ROW_MARK_COPY) currently, I'd like to leave that for future work.

>> I think what Kaigai-san and Etsuro-san are after is trying to find a way
>> to reuse some of the existing EPQ machinery to help with that.  This may
>> not be practical, or it may end up being messier than a standalone
>> implementation; but it's not silly on its face to want to reuse some of
>> that code.

> Yeah, I think we're all in agreement that reusing as much of the EPQ
> machinery as is sensible is something we should do.  We are not in
> agreement on which parts of it need to be changed or extended.

Agreed.

Best regards,
Etsuro Fujita




pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [PATCH] SQL function to report log message
Next
From: Kouhei Kaigai
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual