Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id CA+TgmoZoVYMj+feq1Bz3BDCMERRktkcTQpyD-e8Lzdv1Dphn_A@mail.gmail.com
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Foreign join pushdown vs EvalPlanQual  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Both you and Etsuro Fujita are proposing to fix this problem by
>> somehow making it the FDW's problem to reconstruct the tuple
>> previously produced by the join from whole-row images of the baserels.
>> But that's not looking back far enough: why are we asking for
>> whole-row images of the baserels when what we really want is a
>> whole-row image of the output of the join?  The output of the join is
>> what we need to re-return.
>
> There are multiple components to the requirement though:
>
> 1. Recheck the rows that were in the baserels and possibly fetch updated
> versions of them.  (Once we're in EPQ, we want the most recent row
> versions, not what the query snapshot can see.)

Check.  But postgres_fdw, and probably quite a few other FDWs, use
early row locking.  So ROW_MARK_COPY is in use and we need not worry
about refetching rows.

> 2. Apply relevant restriction clauses and see if the updated row versions
> still pass the clauses.

Check.

> 3. If so, form a join row and return that.  Else return NULL.

Not check.

Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw.
There is a local table t.  The user does something like UPDATE t SET
... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND ....  The
query planner generates something like:

Update
-> Join -> Scan on t -> Foreign Scan on <ft1, ft2>

If an EPQ recheck occurs, the only thing that matters is that the
Foreign Scan return the right output row (or possibly now rows, if the
row it would have formed no longer matches the quals).  It doesn't
matter how it does this.  Let's say the columns actually needed by the
query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b.
Currently, the output of the foreign scan is something like: ft1.a,
ft1.b, ft2.a, ft.b, ft1.*, ft2.*.  The EPQ recheck has access to ft1.*
and ft2.*, but it's not straightforward for postgres_fdw to regenerate
the join tuple from that.  Maybe the pushed-down was a left join,
maybe it was a right join, maybe it was a full join.  So some of the
columns could have gone to NULL.  To figure it out, you need to build
a secondary plan tree that mimics the structure of the join you pushed
down, which is kinda hairy.

Think how much easier your life would be if you hadn't bothered
fetching ft1.* and ft2.*, which aren't so handy in this case, and had
instead made the output of the foreign scan ft1.a, ft1.b, ft2.a,
ft2.b, ROW(ft1.a, ft1.b, ft2.a, ft2.b) -- and that the output of that
ROW() operation was stored in an EPQ slot.  Now, you don't need the
secondary plan tree any more.  You've got all the data you need right
in your hand.  The values inside the ROW() constructor were evaluated
after accounting for the goes-to-NULL effects of any pushed-down
joins.

This example is of the early row locking case, but I think the story
is about the same if the FDW wants to do late row locking instead.  If
there's an EPQ recheck, it could issue individual row re-fetches
against every base table and then re-do all the joins that it pushed
down locally.  But it would be faster and cleaner, I think, to send
one query to the remote side that re-fetches all the rows at once, and
whose target list is exactly what we need, rather than whole row
targetlists for each baserel that then have to be rejiggered on our
side.

> I think what Kaigai-san and Etsuro-san are after is trying to find a way
> to reuse some of the existing EPQ machinery to help with that.  This may
> not be practical, or it may end up being messier than a standalone
> implementation; but it's not silly on its face to want to reuse some of
> that code.

Yeah, I think we're all in agreement that reusing as much of the EPQ
machinery as is sensible is something we should do.  We are not in
agreement on which parts of it need to be changed or extended.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [PATCH v3] GSSAPI encryption support
Next
From: Amit Kapila
Date:
Subject: Re: Dangling Client Backend Process