Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Foreign join pushdown vs EvalPlanQual
Date
Msg-id 564461AC.6050400@lab.ntt.co.jp
Whole thread Raw
In response to Re: Foreign join pushdown vs EvalPlanQual  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: Foreign join pushdown vs EvalPlanQual  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
Robert and Kaigai-san,

Sorry, I sent in an unfinished email.

On 2015/11/12 15:30, Kouhei Kaigai wrote:
>> On 2015/11/12 2:53, Robert Haas wrote:
>>> On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>>> To test this change, I think we should update the postgres_fdw patch so as
>>>> to add the RecheckForeignScan.
>>>>
>>>> Having said that, as I said previously, I don't see much value in adding the
>>>> callback routine, to be honest.  I know KaiGai-san considers that that would
>>>> be useful for custom joins, but I don't think that that would be useful even
>>>> for foreign joins, because I think that in case of foreign joins, the
>>>> practical implementation of that routine in FDWs would be to create a
>>>> secondary plan and execute that plan by performing ExecProcNode, as my patch
>>>> does [1].  Maybe I'm missing something, though.

>>> I really don't see why you're fighting on this point.  Making this a
>>> generic feature will require only a few extra lines of code for FDW
>>> authors.  If this were going to cause some great inconvenience for FDW
>>> authors, then I'd agree it isn't worth it.  But I see zero evidence
>>> that this is actually the case.

>> Really?  I think there would be not a little burden on an FDW author;
>> when postgres_fdw delegates to the subplan to the remote server, for
>> example, it would need to create a remote join query by looking at
>> tuples possibly fetched and stored in estate->es_epqTuple[], send the
>> query and receive the result during the callback routine.

> I cannot understand why it is the only solution.

I didn't say that.

>> Furthermore,
>> what I'm most concerned about is that wouldn't be efficient. So, my

> You have to add "because ..." sentence here because I and Robert
> think a little inefficiency is not a problem.

Sorry, my explanation was not enough.  The reason for that is that in 
the above postgres_fdw case for example, the overhead in sending the 
query to the remote end and transferring the result to the local end 
would not be negligible.  Yeah, we might be able to apply a special 
handling for the improved efficiency when using early row locking, but 
otherwise can we do the same thing?

> Please don't start the sentence from "I think ...". We all knows
> your opinion, but what I've wanted to see is "the reason why my
> approach is valuable is ...".

I didn't say that my approach is *valuable* either.  What I think is, I 
see zero evidence that there is a good use-case for an FDW to do 
something other than doing an ExecProcNode in the callback routine, as I 
said below, so I don't see the need to add such a routine while that 
would cause maybe not a large, but not a little burden for writing such 
a routine on FDW authors.

> Nobody prohibits postgres_fdw performs a secondary join here.
> All you need to do is, picking up a sub-plan tree from FDW's private
> field then call ExecProcNode() inside the callback.

>> As I said before, I know that KaiGai-san considers that
>> that approach would be useful for custom joins.  But I see zero evidence
>> that there is a good use-case for an FDW.

>>>  From my point of view I'm now
>>> thinking this solution has two parts:
>>>
>>> (1) Let foreign scans have inner and outer subplans.  For this
>>> purpose, we only need one, but it's no more work to enable both, so we
>>> may as well.  If we had some reason, we could add a list of subplans
>>> of arbitrary length, but there doesn't seem to be an urgent need for
>>> that.

I did the same thing in an earlier version of the patch I posted. 
Although I agreed on Robert's comment "The Plan tree and the PlanState 
tree should be mirror images of each other; breaking that equivalence 
will cause confusion, at least.", I think that that would make code much 
simpler, especially the code for setting chgParam for inner/outer 
subplans.  But one thing I'm concerned about is enable both inner and 
outer plans, because I think that that would make the planner 
postprocessing complicated, depending on what the foreign scans do by 
the inner/outer subplans.  Is it worth doing so?  Maybe I'm missing 
something, though.

>>> (2) Add a recheck callback.
>>>
>>> If the foreign data wrapper wants to adopt the solution you're
>>> proposing, the recheck callback can call
>>> ExecProcNode(outerPlanState(node)).  I don't think this should end up
>>> being more than a few lines of code, although of course we should
>>> verify that.

Yeah, I think FDWs would probably need to create a subplan accordingly 
at planning time, and then initializing/closing the plan at execution 
time.  I think we could facilitate subplan creation by providing helper 
functions for that, though.

Best regards,
Etsuro Fujita




pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: pglogical_output - a general purpose logical decoding output plugin
Next
From: Etsuro Fujita
Date:
Subject: Re: Minor comment improvement to create_foreignscan_plan