Re: strange parallel query behavior after OOM crashes - Mailing list pgsql-hackers

From Kuntal Ghosh
Subject Re: strange parallel query behavior after OOM crashes
Date
Msg-id CAGz5QCK7TjMukGnu4v9cUd1huJOrZ0G1zc7vbgxKzshfP65x4A@mail.gmail.com
Whole thread Raw
In response to Re: strange parallel query behavior after OOM crashes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: strange parallel query behavior after OOM crashes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>>>
>>> The comment says that the counters are allowed to overflow, i.e. after a
>>> long uptime you might get these values
>>>
>>>      parallel_register_count = UINT_MAX + 1 = 1
>>>      parallel_terminate_count = UINT_MAX
>>>
>>> which is fine, because the C handles the overflow during subtraction and
>>> so
>>>
>>>      parallel_register_count - parallel_terminate_count = 1
>>>
>>> But the assert is not doing subtraction, it's comparing the values
>>> directly:
>>>
>>>      Assert(parallel_register_count >= parallel_terminate_count);
>>>
>>> and the (perfectly valid) values trivially violate this comparison.
>>>
>> Thanks for the explanation. So, we can't use the above assert
>> statement. Even the following assert statement will not be helpful:
>> Assert(parallel_register_count - parallel_terminate_count >= 0);
>> Because, it'll fail to track the scenario when parallel_register_count
>> is not overflowed, still less than parallel_terminate_count. :(
>>
>
> Actually, that assert would work, because C does handle overflows on uint
> values during subtraction just fine. That is,
>
>     (UINT_MAX+x) - UINT_MAX = x
>
> Also, the comment about overflows before BackgroundWorkerArray claims this
> is the case.
>
Agreed on the overflowed case. But, my concern is when an overflow has
not yet happened:

Suppose,
uint parallel_register_count = 1; /* Didn't overflow* /
uint parallel_terminate_count = 2; /* Didn't overflow */

Assert(parallel_register_count - parallel_terminate_count >= 0);
We want the assert statement to fail here, but I think it won't since
-1 has a valid representation in unsigned int and it is greater than
0, no?



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: strange parallel query behavior after OOM crashes
Next
From: Simon Riggs
Date:
Subject: Re: increasing the default WAL segment size