Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F80114DBFB@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Foreign join pushdown vs EvalPlanQual  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
> -----Original Message-----
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
> Sent: Friday, October 02, 2015 1:28 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: fujita.etsuro@lab.ntt.co.jp; robertmhaas@gmail.com;
> pgsql-hackers@postgresql.org; shigeru.hanada@gmail.com
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
>
> Hello,
>
> At Fri, 2 Oct 2015 03:10:01 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote
> in <9A28C8860F777E439AA12E8AEA7694F80114DAEC@BPXM15GP.gisp.nec.co.jp>
> > > > As long as FDW author can choose their best way to produce a joined
> > > > tuple, it may be worth to investigate.
> > > >
> > > > My comments are:
> > > > * ForeignRecheck is the best location to call RefetchForeignJoinRow
> > > >   when scanrelid==0, not ExecScanFetch. Why you try to add special
> > > >   case for FDW in the common routine.
> > > > * It is FDW's choice where the remote join tuple is kept, even though
> > > >   most of FDW will keep it on the private field of ForeignScanState.
> > >
> > > I think that scanrelid == 0 means that the node in focus is not a
> > > scan node in current executor
> > > semantics. EvalPlanQualFetchRowMarks fetches the possiblly
> > > modified row then EvalPlanQualNext does recheck for the new
> > > row. It's the roles of each functions.
> > >
> > > In this criteria, recheck routines are not the place for
> > > refetching.  EvalPlanQualFetchRowMarks is that.
> > >
> > I never say FDW should refetch tuples on the recheck routine.
> > All I suggest is, projection to generate a joined tuple and
> > recheck according to the qualifier pushed down are role of
> > FDW driver, because it knows the best strategy to do the job.
>
> I have no objection that rechecking is FDW's job.
>
> I think you are thinking that all ROW_MARK_COPY base rows are
> held in ss_ScanTupleSlot so simply calling recheckMtd on the slot
> gives enough data to the function. (EPQState would also be needed
> to retrieve, though..) Right?
>
Not ss_ScanTupleSlot. It is initialized according to fdw_scan_tlist
in case of scanrelid==0, regardless of base foreign relation's
definition.
My expectation is, FDW callback construct tts_values/tts_isnull
of ss_ScanTupleSlot according to the preloaded tuples in EPQ slots
and underlying projection. Only FDW driver knows the best way to
construct this result tuple.

You can pull out EState reference from PlanState portion of the
ForeignScanState, so nothing needs to be changed.

> All the underlying foreign tables should be marked as
> ROW_MARK_COPY to call recheckMtd safely. And somehow it required
> to know what column stores what base tuple.
>
Even if ROW_MARK_REFERENCE by later locking, the tuple to be rechecked
is already loaded estate->es_epqTuple[], isn't it?
Recheck routine does not needs to care about row-mark policy.

> > It looks to me all of them makes the problem complicated more.
> > I never heard why "foreign-join" scan node is difficult to construct
> > a joined tuple using the EPQ slots that are already loaded on.
> >
> > Regardless of the early or late locking, EPQ slots of base relation
> > are already filled up, aren't it?
>
> recheckMtd needs to take EState as a parameter?
>
No.

> > All mission of the "foreign-join" scan node is return a joined
> > tuple as if it was executed by local join logic.
> > Local join consumes two tuples then generate one tuple.
> > The "foreign-join" scan node can perform equivalently, even if it
> > is under EPQ recheck context.
> >
> > So, job of FDW driver is...
> > Step-1) Fetch tuples from the EPQ slots of the base foreign relation
> >         to be joined. Please note that it is just a pointer reference.
> > Step-2) Try to join these two (or more) tuples according to the
> >         join condition (only FDW knows because it is kept in private)
> > Step-3) If result is valid, FDW driver makes a projection from these
> >         tuples, then return it.
> >
> > If you concern about re-invention of the code for each FDW, core
> > can provide a utility routine to cover 95% of FDW structure.
> >
> > I want to keep EvalPlanQualFetchRowMarks per base relation basis.
> > It is a bad choice to consider join at this point.
>
>
> > > > Apart from FDW requirement, custom-scan/join needs recheckMtd is
> > > > called when scanrelid==0 to avoid assertion fail. I hope FDW has
> > > > symmetric structure, however, not a mandatory requirement for me.
> > >
> > > It wouldn't be needed if EvalPlanQualFetchRowMarks works as
> > > exepcted. Is this wrong?
> > >
> > Yes, it does not work.
> > Expected behavior EvalPlanQualFetchRowMarks is to load the tuple
> > to be rechecked onto EPQ slot, using heap_fetch or copied image.
> > It is per base relation basis.
>
> Hmm. What I said by "works as expected" is that the function
> stores the tuple for the "foreign join" scan node. If it doesn't,
> you're right.
>
Which slot of the EPQ slot will save the joined tuple?
scanrelid is zero, and we have no identifier of join planstate.

> > Who can provide a projection to generate joined tuple?
> > It is a job of individual plan-state-node to be walked on during
> > EvalPlanQualNext().
>
> EvalPlanQualNext simply does recheck tuples stored in epqTuples,
> which are designed to be provided by EvalPlanQualFetchRowMarks.
>
> I think that that premise shouldn't be broken for convenience...
>
Do I see something different or understand incorrectly?
EvalPlanQualNext() walks down entire subtree of the Lock node.
(epqstate->planstate is entire subplan of Lock node.)
 TupleTableSlot * EvalPlanQualNext(EPQState *epqstate) {     MemoryContext oldcontext;     TupleTableSlot *slot;
oldcontext= MemoryContextSwitchTo(epqstate->estate->es_query_cxt);     slot = ExecProcNode(epqstate->planstate);
MemoryContextSwitchTo(oldcontext);     return slot; } 

If and when relations joins are kept in the sub-plan, ExecProcNode()
processes the projection by join, doesn't it?

Why projection by join is not a part of EvalPlanQualNext()?
It is the core of its job. Unless projection by join, upper node cannot
recheck the tuple come from child nodes.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>





pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual
Next
From: Fujii Masao
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.