Thread: Re: [HACKERS] strange parallel query behavior after OOM crashes

Re: [HACKERS] strange parallel query behavior after OOM crashes

From
Kuntal Ghosh
Date:
On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
>> The problem here seem to be the change in the max_parallel_workers value
>> while the parallel workers are still under execution. So this poses two
>> questions:
>>
>> 1. From usecase point of view, why could there be a need to tweak the
>> max_parallel_workers exactly at the time when the parallel workers are at
>> play.
>> 2. Could there be a restriction on tweaking of max_parallel_workers while
>> the parallel workers are at play? At least do not allow setting the
>> max_parallel_workers less than the current # of active parallel workers.
>
> Well, that would be letting the tail wag the dog.  The maximum value
> of max_parallel_workers is only 1024, and what we're really worried
> about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
> daylight.  How about just creating a #define that's used by guc.c as
> the maximum for the GUC, and here we assert that we're <= that value?
>
I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
asserted that the absolute difference between the two counts <= that
value, i.e.,
abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;

Because, it's possible that register count has overflowed but
terminate count has not yet overflowed. As a result, the difference in
unsigned integer will be near UINT32_MAX. Hence, we need the absolute
difference after typecasting the same to integer. This value should be
less than the max_parallel_workers upper limit.

I've attached both the patches for better accessibility. PFA.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] strange parallel query behavior after OOM crashes

From
Tomas Vondra
Date:
On 04/10/2017 01:39 PM, Kuntal Ghosh wrote:
> On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri <nehakhatri5@gmail.com> wrote:
>>> The problem here seem to be the change in the max_parallel_workers value
>>> while the parallel workers are still under execution. So this poses two
>>> questions:
>>>
>>> 1. From usecase point of view, why could there be a need to tweak the
>>> max_parallel_workers exactly at the time when the parallel workers are at
>>> play.
>>> 2. Could there be a restriction on tweaking of max_parallel_workers while
>>> the parallel workers are at play? At least do not allow setting the
>>> max_parallel_workers less than the current # of active parallel workers.
>>
>> Well, that would be letting the tail wag the dog.  The maximum value
>> of max_parallel_workers is only 1024, and what we're really worried
>> about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
>> daylight.  How about just creating a #define that's used by guc.c as
>> the maximum for the GUC, and here we assert that we're <= that value?
>>
> I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
> asserted that the absolute difference between the two counts <= that
> value, i.e.,
> abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;
>
> Because, it's possible that register count has overflowed but
> terminate count has not yet overflowed. As a result, the difference in
> unsigned integer will be near UINT32_MAX. Hence, we need the absolute
> difference after typecasting the same to integer. This value should be
> less than the max_parallel_workers upper limit.
>
> I've attached both the patches for better accessibility. PFA.
>

At first I was like 'WTF? Why do we need a new GUC just becase of an 
assert?' but you're actually not adding a new GUC parameter, you're 
adding a constant which is then used as a max value for max for the two 
existing parallel GUCs.

I think this is fine.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] strange parallel query behavior after OOM crashes

From
Robert Haas
Date:
On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> At first I was like 'WTF? Why do we need a new GUC just becase of an
> assert?' but you're actually not adding a new GUC parameter, you're adding a
> constant which is then used as a max value for max for the two existing
> parallel GUCs.
>
> I think this is fine.

I think it is pretty odd-looking.  As written, it computes an unsigned
-- and therefore necessarily non-negative -- value into a signed --
and thus possibly neative -- value only to pass it back to abs() to
make sure it's not negative:

+       Assert(!parallel ||
abs((int)(BackgroundWorkerData->parallel_register_count -
+                 BackgroundWorkerData->parallel_terminate_count)) <=
+                               MAX_PARALLEL_WORKER_LIMIT);

I think we can just say

Assert(!parallel || BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] strange parallel query behavior after OOM crashes

From
Robert Haas
Date:
On Tue, Apr 11, 2017 at 12:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> At first I was like 'WTF? Why do we need a new GUC just becase of an
>> assert?' but you're actually not adding a new GUC parameter, you're adding a
>> constant which is then used as a max value for max for the two existing
>> parallel GUCs.
>>
>> I think this is fine.
>
> I think it is pretty odd-looking.  As written, it computes an unsigned
> -- and therefore necessarily non-negative -- value into a signed --
> and thus possibly neative -- value only to pass it back to abs() to
> make sure it's not negative:
>
> +       Assert(!parallel ||
> abs((int)(BackgroundWorkerData->parallel_register_count -
> +
>                   BackgroundWorkerData->parallel_terminate_count)) <=
> +                               MAX_PARALLEL_WORKER_LIMIT);
>
> I think we can just say
>
> Assert(!parallel || BackgroundWorkerData->parallel_register_count -
> BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);

Actually, there's an even simpler way: stick it inside the if () block
where we return false if we're outside the limit.  Then we don't need
to test the "parallel" bool either, because it's already known to be
true.  Committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company