Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: [HACKERS] why not parallel seq scan for slow functions
Date
Msg-id CAMkU=1ymvFbTCYFgzj45_EMzBg=ddQ_m2j3cObzU=vywqttf-A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] why not parallel seq scan for slow functions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] why not parallel seq scan for slow functions  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
>> > wrote:
>> >>
>> >> So because of this high projection cost the seqpath and parallel path
>> >> both have fuzzily same cost but seqpath is winning because it's
>> >> parallel safe.
>> >
>> >
>> > I think you are correct.  However, unless parallel_tuple_cost is set
>> > very
>> > low, apply_projection_to_path never gets called with the Gather path as
>> > an
>> > argument.  It gets ruled out at some earlier stage, presumably because
>> > it
>> > assumes the projection step cannot make it win if it is already behind
>> > by
>> > enough.
>> >
>>
>> I think that is genuine because tuple communication cost is very high.
>
>
> Sorry, I don't know which you think is genuine, the early pruning or my
> complaint about the early pruning.
>

Early pruning.  See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins.  Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.

If I understand it correctly, we have a way, it just can lead to exponential explosion problem, so we are afraid to use it, correct?  If I just lobotomize the path domination code (make pathnode.c line 466 always test false) 

                if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)

Then it keeps the parallel plan and later chooses to use it (after applying your other patch in this thread) as the overall best plan.  It even doesn't slow down "make installcheck-parallel" by very much, which I guess just means the regression tests don't have a lot of complex joins.

But what is an acceptable solution?  Is there a heuristic for when retaining a parallel path could be helpful, the same way there is for fast-start paths?  It seems like the best thing would be to include the evaluation costs in the first place at this step.

Why is the path-cost domination code run before the cost of the function evaluation is included?  Is that because the information needed to compute it is not available at that point, or because it would be too slow to include it at that point? Or just because no one thought it important to do?

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: [HACKERS] Walsender timeouts and large transactions
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Macros bundling RELKIND_* conditions