Re: Properly handle OOM death? - Mailing list pgsql-general

From Justin Pryzby
Subject Re: Properly handle OOM death?
Date
Msg-id ZVI112aVNCHOQgfF@pryzbyj2023
Whole thread Raw
In response to Re: Properly handle OOM death?  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
List pgsql-general
On Mon, Mar 13, 2023 at 06:43:01PM +0100, Peter J. Holzer wrote:
> On 2023-03-13 09:21:18 -0800, Israel Brewster wrote:
> > I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more
> > memory constrained than I would like, such that every week or so the various
> > processes running on the machine will align badly and the OOM killer will kick
> > in, killing off postgresql, as per the following journalctl output:
> > 
> > Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM
killer.
> > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
> > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.
> > 
> > And the service is no longer running.
> 
> I might be misreading this, but it looks to me that systemd detects that
> *some* process in the group was killed by the oom killer and stops the
> service.

Yeah.

I found this old message on google.  I'm surprised there aren't more,
similar complaints about this.  It's as Peter said: it (sometimes)
causes systemd to actively *stop* the cluster after OOM, when it
would've come back online on its own if the init (supervisor) process
didn't interfere.

My solution was to set:
/usr/lib/systemd/system/postgresql@.service OOMPolicy=continue

I suggest that the default unit files should do likewise.

-- 
Justin



pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Unique Primary Key Linked to Multiple Accounts
Next
From: b55white
Date:
Subject: Re: Issue in compiling postgres on latest macOS 14.1.1