Re: strange parallel query behavior after OOM crashes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: strange parallel query behavior after OOM crashes |
Date | |
Msg-id | 24a143e0-c9bb-8f4f-1472-9d6faae4c92e@2ndquadrant.com Whole thread Raw |
In response to | Re: strange parallel query behavior after OOM crashes (Kuntal Ghosh <kuntalghosh.2007@gmail.com>) |
Responses |
Re: strange parallel query behavior after OOM crashes
|
List | pgsql-hackers |
On 04/05/2017 09:05 AM, Kuntal Ghosh wrote: > On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> On 04/04/2017 06:52 PM, Robert Haas wrote: >>> >>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> >>> wrote: >>>> >>>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com> >>>> wrote: >>>>> >>>>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh >>>>> <kuntalghosh.2007@gmail.com> wrote: >>>>>> >>>>>> 2. the server restarts automatically, initialize >>>>>> BackgroundWorkerData->parallel_register_count and >>>>>> BackgroundWorkerData->parallel_terminate_count in the shared memory. >>>>>> After that, it calls ForgetBackgroundWorker and it increments >>>>>> parallel_terminate_count. >>>>> >>>>> >>>>> Hmm. So this seems like the root of the problem. Presumably those >>>>> things need to be reset AFTER forgetting any background workers from >>>>> before the crash. >>>>> >>>> IMHO, the fix would be not to increase the terminated parallel worker >>>> count whenever ForgetBackgroundWorker is called due to a bgworker >>>> crash. I've attached a patch for the same. PFA. >>> >>> >>> While I'm not opposed to that approach, I don't think this is a good >>> way to implement it. If you want to pass an explicit flag to >>> ForgetBackgroundWorker telling it whether or not it should performing >>> the increment, fine. But with what you've got here, you're >>> essentially relying on "spooky action at a distance". It would be >>> easy for future code changes to break this, not realizing that >>> somebody's got a hard dependency on 0 having a specific meaning. >>> >> >> I'm probably missing something, but I don't quite understand how these >> values actually survive the crash. I mean, what I observed is OOM followed >> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the >> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for >> some reason? > AFAICU, during crash recovery, we wait for all non-syslogger children > to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform > StartupDataBase. While starting the startup process we check if any > bgworker is scheduled for a restart. If a bgworker is crashed and not > meant for a restart(parallel worker in our case), we call > ForgetBackgroundWorker() to discard it. > OK, so essentially we reset the counters, getting parallel_register_count = 0 parallel_terminate_count = 0 and then ForgetBackgroundWworker() runs and increments the terminate_count, breaking the logic? And you're using (parallel_register_count > 0) to detect whether we're still in the init phase or not. Hmmm. >> >> In any case, the comment right before BackgroundWorkerArray says this: >> >> * These counters can of course overflow, but it's not important here >> * since the subtraction will still give the right number. >> >> which means that this assert >> >> + Assert(BackgroundWorkerData->parallel_register_count >= >> + BackgroundWorkerData->parallel_terminate_count); >> >> is outright broken, just like any other attempts to rely on simple >> comparisons of these two counters, no? >> > IIUC, the assert statements ensures that register count should always > be greater than or equal to the terminate count. > Whereas, the comment refers to the fact that register count and > terminate count indicate the total number of parallel workers spawned > and terminated respectively since the server has been started. At > every session, we're not resetting the counts, hence they can > overflow. But, their substraction should give you the number of active > parallel worker at any instance. > So, I'm not able to see how the assert is broken w.r.t the aforesaid > comment. Am I missing something here? > The comment says that the counters are allowed to overflow, i.e. after a long uptime you might get these values parallel_register_count = UINT_MAX + 1 = 1 parallel_terminate_count = UINT_MAX which is fine, because the C handles the overflow during subtraction and so parallel_register_count - parallel_terminate_count = 1 But the assert is not doing subtraction, it's comparing the values directly: Assert(parallel_register_count >= parallel_terminate_count); and the (perfectly valid) values trivially violate this comparison. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: