Re: crashes due to setting max_parallel_workers=0 - Mailing list pgsql-hackers

From Rushabh Lathia
Subject Re: crashes due to setting max_parallel_workers=0
Date
Msg-id CAGPqQf2aGRD=Mx_8R7ZE7DV-nUE0gGoUX9qWJKxNZBmRR59n0w@mail.gmail.com
Whole thread Raw
In response to crashes due to setting max_parallel_workers=0  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: crashes due to setting max_parallel_workers=0  (Rushabh Lathia <rushabh.lathia@gmail.com>)
Re: crashes due to setting max_parallel_workers=0  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 03/25/2017 05:18 PM, Rushabh Lathia wrote:


On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:

    On 3/25/17 09:01, David Rowley wrote:
    > On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com>> wrote:
    >> Also another point which I think we should fix is, when someone set
    >> max_parallel_workers = 0, we should also set the
    >> max_parallel_workers_per_gather
    >> to zero. So that way it we can avoid generating the gather path with
    >> max_parallel_worker = 0.
    > I see that it was actually quite useful that it works the way it does.
    > If it had worked the same as max_parallel_workers_per_gather, then
    > likely Tomas would never have found this bug.

    Another problem is that the GUC system doesn't really support cases
    where the validity of one setting depends on the current value of
    another setting.  So each individual setting needs to be robust against
    cases of related settings being nonsensical.


Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.


I think there are two issues at play, here - the first one is that we still produce parallel plans even with max_parallel_workers=0, and the second one is the crash in GatherMerge when nworkers=0.

Your patch fixes the latter (thanks for looking into it), which is obviously a good thing - getting 0 workers on a busy system is quite possible, because all the parallel workers can be already chewing on some other query.


Thanks.
 
But it seems a bit futile to produce the parallel plan in the first place, because with max_parallel_workers=0 we can't possibly get any parallel workers ever. I wonder why compute_parallel_worker() only looks at max_parallel_workers_per_gather, i.e. why shouldn't it do:

   parallel_workers = Min(parallel_workers, max_parallel_workers);


I agree with you here. Producing the parallel plan when max_parallel_workers = 0 is wrong. But rather then your suggested fix, I think that we should do something like:
  
    /*
     * In no case use more than max_parallel_workers_per_gather or
     * max_parallel_workers.
     */
    parallel_workers = Min(parallel_workers, Min(max_parallel_workers, max_parallel_workers_per_gather));

 
Perhaps this was discussed and is actually intentional, though.


Yes, I am not quite sure about this.

Regarding handling this at the GUC level - I agree with Peter that that's not a good idea. I suppose we could deal with checking the values in the GUC check/assign hooks, but what we don't have is a way to undo the changes in all the GUCs. That is, if I do

   SET max_parallel_workers = 0;
   SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and nothing else.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Rushabh Lathia

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_get_statisticsextdef() is not quite the full shilling
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: free space map and visibility map