Re: bgworker crashed or not? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: bgworker crashed or not?
Date
Msg-id CA+TgmoZifBwcv22Gu+Zj-pmZmZG6e-orWuL5SP0_SgHgdHPJ6w@mail.gmail.com
Whole thread Raw
In response to Re: bgworker crashed or not?  (Petr Jelinek <petr@2ndquadrant.com>)
Responses Re: bgworker crashed or not?
List pgsql-hackers
On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> What I'm inclined to do is change the logic so that:
>>
>> (1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so
>> that anything which is still registered gets restarted immediately.
>
> Yes, that's quite obvious change which I missed completely :).

I've committed the portion of your patch which does this, with pretty
extensive changes.  I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple.  I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:

1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.

Let me know if you think I've got it wrong.

>> (2) If a shmem-connected backend fails to release the deadman switch
>> or exits with an exit code other than 0 or 1, we crash-and-restart.  A
>> non-shmem-connected backend never causes a crash-and-restart.
>
> +1

I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement.   Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that.  Hopefully it's
OK now.

>> (3) When a background worker exits without triggering a
>> crash-and-restart, an exit code of precisely 0 causes the worker to be
>> unregistered; any other exit code has no special effect, so
>> bgw_restart_time controls.
>
> +1

This isn't done yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Wanted: jsonb on-disk representation documentation
Next
From: Jeff Janes
Date:
Subject: Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers