Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F80115A29E@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Foreign join pushdown vs EvalPlanQual  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
> On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> Both you and Etsuro Fujita are proposing to fix this problem by
> >> somehow making it the FDW's problem to reconstruct the tuple
> >> previously produced by the join from whole-row images of the baserels.
> >> But that's not looking back far enough: why are we asking for
> >> whole-row images of the baserels when what we really want is a
> >> whole-row image of the output of the join?  The output of the join is
> >> what we need to re-return.
> >
> > There are multiple components to the requirement though:
> >
> > 1. Recheck the rows that were in the baserels and possibly fetch updated
> > versions of them.  (Once we're in EPQ, we want the most recent row
> > versions, not what the query snapshot can see.)
> 
> Check.  But postgres_fdw, and probably quite a few other FDWs, use
> early row locking.  So ROW_MARK_COPY is in use and we need not worry
> about refetching rows.
> 
> > 2. Apply relevant restriction clauses and see if the updated row versions
> > still pass the clauses.
> 
> Check.
> 
> > 3. If so, form a join row and return that.  Else return NULL.
> 
> Not check.
> 
> Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw.
> There is a local table t.  The user does something like UPDATE t SET
> ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND ....  The
> query planner generates something like:
> 
> Update
> -> Join
>   -> Scan on t
>   -> Foreign Scan on <ft1, ft2>
> 
> If an EPQ recheck occurs, the only thing that matters is that the
> Foreign Scan return the right output row (or possibly now rows, if the
> row it would have formed no longer matches the quals).  It doesn't
> matter how it does this.  Let's say the columns actually needed by the
> query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b.
> Currently, the output of the foreign scan is something like: ft1.a,
> ft1.b, ft2.a, ft.b, ft1.*, ft2.*.  The EPQ recheck has access to ft1.*
> and ft2.*, but it's not straightforward for postgres_fdw to regenerate
> the join tuple from that.  Maybe the pushed-down was a left join,
> maybe it was a right join, maybe it was a full join.  So some of the
> columns could have gone to NULL.  To figure it out, you need to build
> a secondary plan tree that mimics the structure of the join you pushed
> down, which is kinda hairy.
>
In case of outer join, do we need to care about join-clause, unlike
scan qualifiers?

Rows filled-up by NULLs appears when here is no matched tuple on other
side. It means any rows in the relation of non-NUllable side are visible
regardless of join-clause, even though it may be or may not be matched
with the latest rows refetched based on the latest values.

Example)
remote table: ft1
id |  val
---+-------1 | 'aaa'2 | 'bbb'3 | 'ccc'

remote table: ft2
id |  val
---+-------2 | 'xxx'3 | 'yyy'4 | 'zzz'

If remote join query is: SELECT *, ft1.*, ft2.* FROM ft1 LEFT JOIN ft2 ON ft1.id = ft2.id WHERE ft1.id < 3;
its expected result is: ft1.id | ft1.val | ft2.id | ft2.val |  ft1.*  |  ft2.*  |
-------+---------+--------+---------+---------+---------+  1    |  'aaa'  |  null  |  null   |(1,'aaa')|  null   |   2
 |  'bbb'  |   2    |  'xxx'  |(2,'bbb')|(2,'xxx')|
 

The non-NULLs side (ft1 in this case) are visible regardless of the join-
clause, as long as tuples in ft1 satisfies the scan-qualifier (ft1.id < 3).

FDW/CSP knows the type of joins that should be responsible, so it can skip
evaluation of join-clauses but apply only scan-qualifiers on base relation's
tuple.

> Think how much easier your life would be if you hadn't bothered
> fetching ft1.* and ft2.*, which aren't so handy in this case, and had
> instead made the output of the foreign scan ft1.a, ft1.b, ft2.a,
> ft2.b, ROW(ft1.a, ft1.b, ft2.a, ft2.b) -- and that the output of that
> ROW() operation was stored in an EPQ slot.  Now, you don't need the
> secondary plan tree any more.  You've got all the data you need right
> in your hand.  The values inside the ROW() constructor were evaluated
> after accounting for the goes-to-NULL effects of any pushed-down
> joins.
> 
> This example is of the early row locking case, but I think the story
> is about the same if the FDW wants to do late row locking instead.  If
> there's an EPQ recheck, it could issue individual row re-fetches
> against every base table and then re-do all the joins that it pushed
> down locally.  But it would be faster and cleaner, I think, to send
> one query to the remote side that re-fetches all the rows at once, and
> whose target list is exactly what we need, rather than whole row
> targetlists for each baserel that then have to be rejiggered on our
> side.
>
Which approach is more reasonable?

In case of early row locking, FDW ensures all the rows involved in
the join is protected by concurrent accesses. So, no need to concern
about refetching from the remote side.
On the other hands, in case of late row locking, we need to pay attention
whether a part of (or all) base relations are updated by the concurrent
accesses. In this case, joined-tuple is no longer valid, so we may need
to fetch the joined-tuple from the remote side during rechecking.
Probably, relevant rowid (ctid system column in postgres_fdw) enables to
identify the tuples to be fetched from the remote side effectively, so it
shall not be a heavy query, however, it needs to run a remote query once.

If we reconstruct a joined tuple from the base relations kept in EPQ slot,
it needs additional reconstruction cost if early row locking case (disadvantage),
however, no need to run remote join again in late row locking situation
because base relation's tuples are already fetched by the infrastructure
(advantage). The local reconstruction approach also has an advantage -
that does not need to enhance existing EPQ slot mechanism so much.
All this approach needs EPQ slot holds tuple of the base relation.

Please correct me, if I misunderstand your proposition.

> > I think what Kaigai-san and Etsuro-san are after is trying to find a way
> > to reuse some of the existing EPQ machinery to help with that.  This may
> > not be practical, or it may end up being messier than a standalone
> > implementation; but it's not silly on its face to want to reuse some of
> > that code.
> 
> Yeah, I think we're all in agreement that reusing as much of the EPQ
> machinery as is sensible is something we should do.  We are not in
> agreement on which parts of it need to be changed or extended.
>
Yes. I'd also like to reuse existing EPQ infrastructure as long as we can.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual
Next
From: Etsuro Fujita
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual