Home > mailing lists

Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: [HACKERS] why not parallel seq scan for slow functions
Date	August 2, 2017 20:42:40
Msg-id	CAMkU=1ymvFbTCYFgzj45_EMzBg=ddQ_m2j3cObzU=vywqttf-A@mail.gmail.com Whole thread
In response to	Re: [HACKERS] why not parallel seq scan for slow functions (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: [HACKERS] why not parallel seq scan for slow functions
List	pgsql-hackers

Tree view

On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
>> > wrote:
>> >>
>> >> So because of this high projection cost the seqpath and parallel path
>> >> both have fuzzily same cost but seqpath is winning because it's
>> >> parallel safe.
>> >
>> >
>> > I think you are correct. However, unless parallel_tuple_cost is set
>> > very
>> > low, apply_projection_to_path never gets called with the Gather path as
>> > an
>> > argument. It gets ruled out at some earlier stage, presumably because
>> > it
>> > assumes the projection step cannot make it win if it is already behind
>> > by
>> > enough.
>> >
>>
>> I think that is genuine because tuple communication cost is very high.
>
>
> Sorry, I don't know which you think is genuine, the early pruning or my
> complaint about the early pruning.
>

Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.

If I understand it correctly, we have a way, it just can lead to exponential explosion problem, so we are afraid to use it, correct? If I just lobotomize the path domination code (make pathnode.c line 466 always test false)

if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)

Then it keeps the parallel plan and later chooses to use it (after applying your other patch in this thread) as the overall best plan. It even doesn't slow down "make installcheck-parallel" by very much, which I guess just means the regression tests don't have a lot of complex joins.

But what is an acceptable solution? Is there a heuristic for when retaining a parallel path could be helpful, the same way there is for fast-start paths? It seems like the best thing would be to include the evaluation costs in the first place at this step.

Why is the path-cost domination code run before the cost of the function evaluation is included? Is that because the information needed to compute it is not available at that point, or because it would be too slow to include it at that point? Or just because no one thought it important to do?

Cheers,

Jeff

pgsql-hackers by date:

From: Yura Sokolov
Date: 02 August 2017, 20:35:58
Subject: Re: [HACKERS] Walsender timeouts and large transactions

From: Peter Eisentraut
Date: 02 August 2017, 20:44:54
Subject: Re: [HACKERS] Macros bundling RELKIND_* conditions

Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

Previous

Next