Re: crashes due to setting max_parallel_workers=0 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: crashes due to setting max_parallel_workers=0
Date
Msg-id e96d3112-26dd-d77d-7f8f-19b512ec4a02@2ndquadrant.com
Whole thread Raw
In response to Re: crashes due to setting max_parallel_workers=0  (Rushabh Lathia <rushabh.lathia@gmail.com>)
List pgsql-hackers
On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
>
>
> On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com
> <mailto:peter.eisentraut@2ndquadrant.com>> wrote:
>
>     On 3/25/17 09:01, David Rowley wrote:
>     > On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com>> wrote:
>     >> Also another point which I think we should fix is, when someone set
>     >> max_parallel_workers = 0, we should also set the
>     >> max_parallel_workers_per_gather
>     >> to zero. So that way it we can avoid generating the gather path with
>     >> max_parallel_worker = 0.
>     > I see that it was actually quite useful that it works the way it does.
>     > If it had worked the same as max_parallel_workers_per_gather, then
>     > likely Tomas would never have found this bug.
>
>     Another problem is that the GUC system doesn't really support cases
>     where the validity of one setting depends on the current value of
>     another setting.  So each individual setting needs to be robust against
>     cases of related settings being nonsensical.
>
>
> Okay.
>
> About the original issue reported by Tomas, I did more debugging and
> found that - problem was gather_merge_clear_slots() was not returning
> the clear slot when nreader is zero (means nworkers_launched = 0).
> Due to the same scan was continue even all the tuple are exhausted,
> and then end up with server crash at gather_merge_getnext(). In the patch
> I also added the Assert into gather_merge_getnext(), about the index
> should be less then the nreaders + 1 (leader).
>
> PFA simple patch to fix the problem.
>

I think there are two issues at play, here - the first one is that we 
still produce parallel plans even with max_parallel_workers=0, and the 
second one is the crash in GatherMerge when nworkers=0.

Your patch fixes the latter (thanks for looking into it), which is 
obviously a good thing - getting 0 workers on a busy system is quite 
possible, because all the parallel workers can be already chewing on 
some other query.

But it seems a bit futile to produce the parallel plan in the first 
place, because with max_parallel_workers=0 we can't possibly get any 
parallel workers ever. I wonder why compute_parallel_worker() only looks 
at max_parallel_workers_per_gather, i.e. why shouldn't it do:
   parallel_workers = Min(parallel_workers, max_parallel_workers);

Perhaps this was discussed and is actually intentional, though.

Regarding handling this at the GUC level - I agree with Peter that 
that's not a good idea. I suppose we could deal with checking the values 
in the GUC check/assign hooks, but what we don't have is a way to undo 
the changes in all the GUCs. That is, if I do
   SET max_parallel_workers = 0;   SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and 
nothing else.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: WIP: [[Parallel] Shared] Hash
Next
From: David Rowley
Date:
Subject: Re: Performance improvement for joins where outer side is unique