Re: strange parallel query behavior after OOM crashes - Mailing list pgsql-hackers

From Kuntal Ghosh
Subject Re: strange parallel query behavior after OOM crashes
Date
Msg-id CAGz5QCJXswKiv+eocT1r=xaGV4uyXcorL0PgU6tqzyC60QTG1g@mail.gmail.com
Whole thread Raw
In response to Re: strange parallel query behavior after OOM crashes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
>
> On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
>>
>> AFAICU, during crash recovery, we wait for all non-syslogger children
>> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
>> StartupDataBase. While starting the startup process we check if any
>> bgworker is scheduled for a restart. If a bgworker is crashed and not
>> meant for a restart(parallel worker in our case), we call
>> ForgetBackgroundWorker() to discard it.
>>
>
> OK, so essentially we reset the counters, getting
>
>     parallel_register_count = 0
>     parallel_terminate_count = 0
>
> and then ForgetBackgroundWworker() runs and increments the terminate_count,
> breaking the logic?
>
> And you're using (parallel_register_count > 0) to detect whether we're still
> in the init phase or not. Hmmm.
>
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.

>>>
>>> In any case, the comment right before BackgroundWorkerArray says this:
>>>
>>>   * These counters can of course overflow, but it's not important here
>>>   * since the subtraction will still give the right number.
>>>
>>> which means that this assert
>>>
>>> +       Assert(BackgroundWorkerData->parallel_register_count >=
>>> +               BackgroundWorkerData->parallel_terminate_count);
>>>
>>> is outright broken, just like any other attempts to rely on simple
>>> comparisons of these two counters, no?
>>>
>> IIUC, the assert statements ensures that register count should always
>> be greater than or equal to the terminate count.
>> Whereas, the comment refers to the fact that register count and
>> terminate count indicate the total number of parallel workers spawned
>> and terminated respectively since the server has been started. At
>> every session, we're not resetting the counts, hence they can
>> overflow. But, their substraction should give you the number of active
>> parallel worker at any instance.
>> So, I'm not able to see how the assert is broken w.r.t the aforesaid
>> comment. Am I missing something here?
>>
>
> The comment says that the counters are allowed to overflow, i.e. after a
> long uptime you might get these values
>
>     parallel_register_count = UINT_MAX + 1 = 1
>     parallel_terminate_count = UINT_MAX
>
> which is fine, because the C handles the overflow during subtraction and so
>
>     parallel_register_count - parallel_terminate_count = 1
>
> But the assert is not doing subtraction, it's comparing the values directly:
>
>     Assert(parallel_register_count >= parallel_terminate_count);
>
> and the (perfectly valid) values trivially violate this comparison.
>
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(

It seems to me the only thing we can make sure is
(parallel_register_count == parallel_terminate_count == 0) in
ForgetBackgroundWorker in case of a crash.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Adding support for Default partition in partitioning
Next
From: Tomas Vondra
Date:
Subject: Re: strange parallel query behavior after OOM crashes