Home > mailing lists

Re: strange parallel query behavior after OOM crashes - Mailing list pgsql-hackers

From	Kuntal Ghosh
Subject	Re: strange parallel query behavior after OOM crashes
Date	April 6, 2017 12:34:13
Msg-id	CAGz5QCLP9kdDHk=zBUs-5+V1AGARXPFOi=AA1Z9JxRpnt0rmqQ@mail.gmail.com Whole thread
In response to	Re: strange parallel query behavior after OOM crashes (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
>> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
>>> I'm probably missing something, but I don't quite understand how these
>>> values actually survive the crash. I mean, what I observed is OOM followed
>>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
>>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
>>> some reason?
>> AFAICU, during crash recovery, we wait for all non-syslogger children
>> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
>> StartupDataBase. While starting the startup process we check if any
>> bgworker is scheduled for a restart.
>>
>
> In general, your theory appears right, but can you check how it
> behaves in standby server because there is a difference in how the
> startup process behaves during master and standby startup?  In master,
> it stops after recovery whereas in standby it will keep on running to
> receive WAL.
>
While performing StartupDatabase, both master and standby server
behave in similar way till postmaster spawns startup process.
In master, startup process completes its job and dies. As a result,
reaper is called which in turn calls maybe_start_bgworker().
In standby, after getting a valid snapshot, startup process sends
postmaster a signal to enable connections. Signal handler in
postmaster calls maybe_start_bgworker().
In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
forget the bgworker process.

I've attached the patch for adding an argument in
ForgetBackgroundWorker() to indicate a crashed situation. Based on
that, we can take the necessary actions. I've not included the Assert
statement in this patch.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachment

0001-Fix-parallel-worker-counts-after-a-crash_v1.patch

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 06 April 2017, 12:05:52
Subject: No-op case in ExecEvalConvertRowtype

From: David Rowley
Date: 06 April 2017, 12:55:43
Subject: Re: [COMMITTERS] pgsql: Collect and use multi-columndependency stats

Re: strange parallel query behavior after OOM crashes - Mailing list pgsql-hackers

Attachment

Previous

Next