Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: [HACKERS] why not parallel seq scan for slow functions
Date
Msg-id CAMkU=1wDRf7ayn0c+j5GR57UfKsMkikn1fhgQNwQnr+ApEXHcQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] why not parallel seq scan for slow functions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] why not parallel seq scan for slow functions
List pgsql-hackers
On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>> So because of this high projection cost the seqpath and parallel path
>> both have fuzzily same cost but seqpath is winning because it's
>> parallel safe.
>
>
> I think you are correct.  However, unless parallel_tuple_cost is set very
> low, apply_projection_to_path never gets called with the Gather path as an
> argument.  It gets ruled out at some earlier stage, presumably because it
> assumes the projection step cannot make it win if it is already behind by
> enough.
>

I think that is genuine because tuple communication cost is very high.

Sorry, I don't know which you think is genuine, the early pruning or my complaint about the early pruning.  

I agree that the communication cost is high, which is why I don't want to have to set parellel_tuple_cost very low.  For example, to get the benefit of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my real-world case, not the dummy test case I posted, although the number are around the same for that one too).  But with a setting that low, all kinds of other things also start using parallel plans, even if they don't benefit from them and are harmed.

I realize we need to do some aggressive pruning to avoid an exponential explosion in planning time, but in this case it has some rather unfortunate consequences.  I wanted to explore it, but I can't figure out where this particular pruning is taking place.

By the time we get to planner.c line 1787, current_rel->pathlist already does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the pruning is happening earlier than that.

 
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))


Setting parallel_workers to 8 changes the threshold for the parallel to even be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076.  So it is going in the correct direction, but not by enough to matter.
 
Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: Fwd: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] pl/perl extension fails on Windows