Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual) - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
Date
Msg-id 55EFD222.8060701@lab.ntt.co.jp
Whole thread Raw
In response to Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
List pgsql-hackers
On 2015/09/09 3:53, Robert Haas wrote:
> On Tue, Sep 8, 2015 at 5:35 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
>> On 2015/09/04 0:33, Robert Haas wrote:
>>> I'm worried that trawling through that
>>> SpecialJoinInfo data will end up needing to duplicate much of
>>> make_join_rel and add_paths_to_joinrel.  For example, consider:
>>>
>>> SELECT * FROM verysmall v JOIN (bigft1 FULL JOIN bigft2 ON bigft1.x =
>>> bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r;
>>>
>>> The best path for this plan is presumably something like this:
>>>
>>> Nested Loop
>>> -> Seq Scan on verysmall v
>>> -> Foreign Scan on bigft1 and bigft2
>>>       Remote SQL: SELECT * FROM bigft1 FULL JOIN bigft2 ON bigft1.x =
>>> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2
>>>
>>> Now, how is the FDW going to figure out that it needs to generate this
>>> parameterized path without duplicating this code from
>>> add_paths_to_joinrel?
>>>
>>>       /*
>>>        * Decide whether it's sensible to generate parameterized paths for
>>> this
>>>        * joinrel, and if so, which relations such paths should require.
>>> There
>>>        * is usually no need to create a parameterized result path unless
>>> there
>>> ...
>>>
>>> Maybe there's a very simple answer to this question and I'm just not
>>> seeing it, but I really don't see how that's going to work.

>> Why don't you look at the "regular" (local join execution) paths that were
>> already generated.  I think that if we called the FDW at a proper hook
>> location, the FDW could probably find a regular path in rel->pathlist of the
>> join rel (bigft1, bigft2) that possibly generates something like:
>>
>> Nested Loop
>>    -> Seq Scan on verysmall v
>>    -> Nested Loop
>>         Join Filter: (bigft1.a = bigft2.a)
>>         -> Foreign Scan on bigft1
>>              Remote SQL: SELECT * FROM bigft1 WHERE bigft1.q = $1
>>         -> Foreign Scan on bigft2
>>              Remote SQL: SELECT * FROM bigft2 WHERE bigft2.r = $2
>>
>>  From the parameterization of the regular nestloop path for joining bigft1
>> and bigft2 locally, I think that the FDW could find that it's sensible to
>> generate the foreign-join path for (bigft1, bigft2) with the
>> parameterization.

> But that path might have already been discarded on the basis of cost.
> I think Tom's idea is better: let the FDW consult some state cached
> for this purpose in the RelOptInfo.

Do you have an idea of what information would be collected into the 
state and how the FDW would derive parameterizations to consider 
producing pushed-down joins with from that information?  What I'm 
concerned about that is to reduce the number of parameterizations to 
consider, to reduce overhead in costing the corresponding queries.  I'm 
missing something, though.

Best regards,
Etsuro Fujita




pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: Parallel Seq Scan
Next
From: Pavel Stehule
Date:
Subject: Re: On-demand running query plans using auto_explain and signals