Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F80116F7AF@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: Foreign join pushdown vs EvalPlanQual  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: Foreign join pushdown vs EvalPlanQual  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
> -----Original Message-----
> From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
> Sent: Thursday, November 12, 2015 6:54 PM
> To: Kaigai Kouhei(海外 浩平); Robert Haas
> Cc: Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; Shigeru Hanada
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> 
> Robert and Kaigai-san,
> 
> Sorry, I sent in an unfinished email.
> 
> On 2015/11/12 15:30, Kouhei Kaigai wrote:
> >> On 2015/11/12 2:53, Robert Haas wrote:
> >>> On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
> >>> <fujita.etsuro@lab.ntt.co.jp> wrote:
> >>>> To test this change, I think we should update the postgres_fdw patch so as
> >>>> to add the RecheckForeignScan.
> >>>>
> >>>> Having said that, as I said previously, I don't see much value in adding
> the
> >>>> callback routine, to be honest.  I know KaiGai-san considers that that would
> >>>> be useful for custom joins, but I don't think that that would be useful even
> >>>> for foreign joins, because I think that in case of foreign joins, the
> >>>> practical implementation of that routine in FDWs would be to create a
> >>>> secondary plan and execute that plan by performing ExecProcNode, as my patch
> >>>> does [1].  Maybe I'm missing something, though.
> 
> >>> I really don't see why you're fighting on this point.  Making this a
> >>> generic feature will require only a few extra lines of code for FDW
> >>> authors.  If this were going to cause some great inconvenience for FDW
> >>> authors, then I'd agree it isn't worth it.  But I see zero evidence
> >>> that this is actually the case.
> 
> >> Really?  I think there would be not a little burden on an FDW author;
> >> when postgres_fdw delegates to the subplan to the remote server, for
> >> example, it would need to create a remote join query by looking at
> >> tuples possibly fetched and stored in estate->es_epqTuple[], send the
> >> query and receive the result during the callback routine.
> 
> > I cannot understand why it is the only solution.
> 
> I didn't say that.
> 
> >> Furthermore,
> >> what I'm most concerned about is that wouldn't be efficient. So, my
> 
> > You have to add "because ..." sentence here because I and Robert
> > think a little inefficiency is not a problem.
> 
> Sorry, my explanation was not enough.  The reason for that is that in
> the above postgres_fdw case for example, the overhead in sending the
> query to the remote end and transferring the result to the local end
> would not be negligible.  Yeah, we might be able to apply a special
> handling for the improved efficiency when using early row locking, but
> otherwise can we do the same thing?
>
It is trade-off. Late locking semantics allows to lock relatively smaller
number of remote rows, it will take extra latency.
Also, it became clear we have a challenge to pull a joined tuple at once.

> > Please don't start the sentence from "I think ...". We all knows
> > your opinion, but what I've wanted to see is "the reason why my
> > approach is valuable is ...".
> 
> I didn't say that my approach is *valuable* either.  What I think is, I
> see zero evidence that there is a good use-case for an FDW to do
> something other than doing an ExecProcNode in the callback routine, as I
> said below, so I don't see the need to add such a routine while that
> would cause maybe not a large, but not a little burden for writing such
> a routine on FDW authors.
>
It is quite natural because we cannot predicate what kind of extension
is implemented on FDW interface. You might know the initial version of
PG-Strom is implemented on FDW (about 4 years before...). If I would
continue to stick FDW, it became a FDW driver with own join engine.
(cstore_fdw may potentially support own join logic on top of their
columnar storage for instance?)

From the standpoint of interface design, if we would not admit flexibility
of implementation unless community don't see a working example, a reasonable
tactics *for extension author* is to follow the interface restriction even
if it is not best approach from his standpoint.
It does not mean the approach by majority is also best for the minority.
It just requires the minority a compromise.

> > Nobody prohibits postgres_fdw performs a secondary join here.
> > All you need to do is, picking up a sub-plan tree from FDW's private
> > field then call ExecProcNode() inside the callback.
> 
> >> As I said before, I know that KaiGai-san considers that
> >> that approach would be useful for custom joins.  But I see zero evidence
> >> that there is a good use-case for an FDW.
> 
> >>>  From my point of view I'm now
> >>> thinking this solution has two parts:
> >>>
> >>> (1) Let foreign scans have inner and outer subplans.  For this
> >>> purpose, we only need one, but it's no more work to enable both, so we
> >>> may as well.  If we had some reason, we could add a list of subplans
> >>> of arbitrary length, but there doesn't seem to be an urgent need for
> >>> that.
> 
> I did the same thing in an earlier version of the patch I posted.
> Although I agreed on Robert's comment "The Plan tree and the PlanState
> tree should be mirror images of each other; breaking that equivalence
> will cause confusion, at least.", I think that that would make code much
> simpler, especially the code for setting chgParam for inner/outer
> subplans.  But one thing I'm concerned about is enable both inner and
> outer plans, because I think that that would make the planner
> postprocessing complicated, depending on what the foreign scans do by
> the inner/outer subplans.  Is it worth doing so?  Maybe I'm missing
> something, though.
>
If you persuade other person who has different opinion, you need to
explain why was it complicated, how much complicated and what was
the solution you tried at that time.
The "complicated" is a subjectively-based term. At least, we don't
share your experience, so it is hard to understand the how complexity.

I guess it is similar to what built-in logic is usually doing, thus,
it should not be a problem we cannot solve. A utility routine FDW
driver can call will solve the issue (even if it is not supported
on v9.5 yet).

> >>> (2) Add a recheck callback.
> >>>
> >>> If the foreign data wrapper wants to adopt the solution you're
> >>> proposing, the recheck callback can call
> >>> ExecProcNode(outerPlanState(node)).  I don't think this should end up
> >>> being more than a few lines of code, although of course we should
> >>> verify that.
> 
> Yeah, I think FDWs would probably need to create a subplan accordingly
> at planning time, and then initializing/closing the plan at execution
> time.  I think we could facilitate subplan creation by providing helper
> functions for that, though.
>
I can agree with we ought to provide a utility routine to construct
a local alternative subplan, however, we are in beta2 stage for v9.5.
So, I'd like to suggest only callback on v9.5 (FDW driver can handle
its subplan by itself, no need to path the backend), then design the
utility routine for this.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan