Re: strange parallel query behavior after OOM crashes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: strange parallel query behavior after OOM crashes
Date
Msg-id 5b81a91d-7c3a-4b5b-38da-b4efb2ba9acb@2ndquadrant.com
Whole thread Raw
In response to Re: strange parallel query behavior after OOM crashes  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
List pgsql-hackers
On 04/05/2017 04:09 PM, Kuntal Ghosh wrote:
> On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>>> Yes. But, as Robert suggested up in the thread, we should not use
>>> (parallel_register_count = 0) as an alternative to define a bgworker
>>> crash. Hence, I've added an argument named 'wasCrashed' in
>>> ForgetBackgroundWorker to indicate a bgworker crash.
>>
>> Did you intend to attach that patch to this email?
>>
> Actually, I'm confused how we should ensure (register_count >
> terminate_count) invariant. I think there can be a system crash what
> Tomas has suggested up in the thread.
>
> Assert(parallel_register_count - parallel_terminate_count <=
> max_parallel_workers);
> Backend 1 > SET max_parallel_worker = 8;
> Backend 1 > Execute a long running parallel query q1 with number of
> parallel worker spawned is say, 4.
> Backend 2> SET max_parallel_worker = 3;
> Backend 2 > Try to execute any parallel query q2 with number of
> parallel worker spawned > 0.
>
> The above assert statement will bring down the server unnecessarily
> while executing q2. If the assert statement was not there, it could
> have gone ahead without launching any workers.
>

Ah, right. I forgot max_parallel_workers may be changed in session. I 
think we can use max_worker_processes instead:
  Assert(parallel_register_count - parallel_terminate_count <= 
max_worker_processes);

The whole point is that if parallel_terminate_count exceeds 
parallel_register_count, the subtraction wraps around to a value close 
to UINT_MAX. All we need is an maximum possible delta between the two 
values to detect this, and max_worker_processes seems fine. We could 
also use UINT_MAX/2 for example.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [sqlsmith] Unpinning error in parallel worker
Next
From: Tomas Vondra
Date:
Subject: Re: strange parallel query behavior after OOM crashes