Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801157911@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Foreign join pushdown vs EvalPlanQual  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
> -----Original Message-----
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
> Sent: Wednesday, October 14, 2015 4:40 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: fujita.etsuro@lab.ntt.co.jp; pgsql-hackers@postgresql.org;
> shigeru.hanada@gmail.com; robertmhaas@gmail.com
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
>
> Hello,
>
> At Wed, 14 Oct 2015 03:07:31 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote
> in <9A28C8860F777E439AA12E8AEA7694F801157077@BPXM15GP.gisp.nec.co.jp>
> > > I noticed that the approach using a column to populate the foreign
> > > scan's slot directly wouldn't work well in some cases.  For example,
> > > consider:
> > >
> > > SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x =
> > > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v;
> > >
> > > The best plan is presumably something like this as you said before:
> > >
> > > LockRows
> > > -> Nested Loop
> > >     -> Seq Scan on verysmall v
> > >     -> Foreign Scan on bigft1 and bigft2
> > >          Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x =
> > > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2
> > >
> > > Consider the EvalPlanQual testing to see if the updated version of a
> > > tuple in v satisfies the query.  If we use the column in the testing, we
> > > would get the wrong results in some cases.
>
> I have a basic (or maybe silly) qustion. Is it true that the
> join-inner (the foreignscan in the example) is re-executed with
> the modified value of v.r?  I observed for a join case among only
> local tables that previously fetched tuples for the inner are
> simplly reused regardless of join types. Even when a refetch
> happens (I haven't confirmed but it would occur in the case of no
> security quals), the tuple is pointed by ctid so the re-join
> between local and remote would fail. Is this wrong?
>
Let's dive into ExecNestLoop().
Once nl_NeedNewOuter is true, ExecProcNode(outerPlan) is called then
ExecReScan(innerPlan) is called with new param-info delivered from the
outer-tuple.

nl_NeedNewOuter is reset just after ExecProcNode(outerPlan), then
it is set once outer-tuple is needed again when inner-scan reached
to end of the relation, or found a tuple on semi-join.
In case of semi-join returned a joined-tuple then EPQ recheck is
applied, it can call ExecProcNode(outerPlan) and reset inner-plan
state.

It is what I can say from the existing code.
I doubt whether the behavior is right on EPQ rechecks. The above scenario
introduces the inner-relation (verysmall) is updated by the concurrent
session, thus param-info has to be updated.

However, it does not looks to me the implementation pays attention here.
If ExecNestLoop() is called under the EPQ recheck context, it needs to
call ExecProcNode() towards both of outer and inner plan to ensure the
visibility of joined-tuple towards the latest status.
Of course, underlying scan plans for base relations never make advance
the scan pointer. It just returns a tuple in EPQ slot, then I want
ExecNestLoop() to evaluate whether these tuples satisfies the join-clause.


> > In this case, does ForeignScan have to be reset prior to ExecProcNode()?
> > Once ExecReScanForeignScan() gets called by ExecNestLoop(), it marks EPQ
> > slot is invalid. So, more or less, ForeignScan needs to kick the remote
> > join again based on the new parameter come from the latest verysmall tuple.
> > Please correct me, if I don't understand correctly.
>
> So, no rescan would happen for the cases, I think. ReScan seems
> to be kicked only for the new(next) outer tuple that causes
> change of parameter, but not kicked for EPQ. I might take you
> wrongly..
>
> > In case of unparametalized ForeignScan case, the cached join-tuple work
> > well because it is independent from verysmall.
>
>
> > Once again, if FDW driver is responsible to construct join-tuple from
> > the base relation's tuple cached in EPQ slot, this case don't need to
> > kick remote query again, because all the materials to construct join-
> > tuple are already held locally. Right?
>
> It is definitely right and should be doable. But I think the
> point we are argueing here is what is the desirable behavior.
>
In case of scanrelid==0, expectation to ForeignScan/CustomScan is to
behave as if local join exists here. It requires ForeignScan to generate
joined-tuple as a result of remote join, that may contains multiple junk
TLEs to carry whole-var references of base foreign tables.
According to the criteria, the desirable behavior is clear as below:

1. FDW/CSP picks up base relation's tuple from the EPQ slots.  It shall be setup by whole-row reference if earlier
row-locksemantics,  or by RefetchForeignRow if later row-lock semantics. 

2. Fill up ss_ScanTupleSlot according to the xxx_scan_tlist.  We may be able to provide a common support function here,
becausethis  list keeps relation between a particular attribute of the joined-tuple  and its source column. 

3. Apply join-clause and base-restrict that were pushed down.  setrefs.c initializes expressions kept in
fdw_exprs/custom_exprsto run  on the ss_ScanTupleSlot. It is the easiest way to check here. 

4. If joined-tuple is still visible after the step 3, FDW/CSP returns  joined-tuple. Elsewhere, returns an empty slot.

It is entirely compatible behavior even if local join is located on
the point of ForeignScan/CustomScan with scanrelid==0.

Even if remote join is parametalized by other relation, we can simply
use param-info delivered from the corresponding outer scan at the step-3.
EState should have the parameters already updated, FDW driver needs to
care about nothing.

It is quite less invasive approach towards the existing EPQ recheck
mechanism. I cannot understand why Fujita-san never "try" this approach.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT documentation clean-up patch
Next
From: Michael Paquier
Date:
Subject: Re: Use pg_rewind when target timeline was switched