Re: Properly handle OOM death? - Mailing list pgsql-general

From Peter J. Holzer
Subject Re: Properly handle OOM death?
Date
Msg-id 20230313201629.5d6nkptfxy3qs5fr@hjp.at
Whole thread Raw
In response to Re: Properly handle OOM death?  (Israel Brewster <ijbrewster@alaska.edu>)
Responses Re: Properly handle OOM death?  (Israel Brewster <ijbrewster@alaska.edu>)
List pgsql-general
On 2023-03-13 09:55:50 -0800, Israel Brewster wrote:
> On Mar 13, 2023, at 9:43 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> > On 2023-03-13 09:21:18 -0800, Israel Brewster wrote:
> >> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more
> >> memory constrained than I would like, such that every week or so the various
> >> processes running on the machine will align badly and the OOM killer will kick
> >> in, killing off postgresql, as per the following journalctl output:
> >>
> >> Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of
> >> this unit has been killed by the OOM killer.
> >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with
> >> result 'oom-kill'.
> >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d
> >> 17h 48min 24.509s CPU time.
> >>
> >> And the service is no longer running.
> >
> > I might be misreading this, but it looks to me that systemd detects that
> > *some* process in the group was killed by the oom killer and stops the
> > service.
> >
> > Can you check which process was actually killed? If it's not the
> > postmaster, setting OOMScoreAdjust is probably useless.
> >
> > (I tried searching the web for the error messages and didn't find
> > anything useful)
>
> Your guess is as good as (if not better than) mine. I can find the PID
> of the killed process in the system log, but without knowing what the
> PID of postmaster and the child processes were prior to the kill, I’m
> not sure that helps much.

The syslog should contain a list of all tasks prior to the kill. For
example, I just provoked an OOM kill on my laptop and the syslog
contains (among lots of others) these lines:

Mar 13 21:00:36 trintignant kernel: [112024.084117] [   2721]   126  2721    54563     2042   163840      555
-900postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084123] [   2873]   126  2873    18211       85   114688      594
 0 postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084128] [   2941]   126  2941    54592     1231   147456      565
 0 postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084134] [   2942]   126  2942    54563      535   143360      550
 0 postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084139] [   2943]   126  2943    54563     1243   139264      548
 0 postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084145] [   2944]   126  2944    54798      561   147456      545
 0 postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084150] [   2945]   126  2945    54563      215   131072      551
 0 postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084156] [   2956]   126  2956    18718      506   122880      553
 0 postgres 
Mar 13 21:00:36 trintignant kernel: [112024.084161] [   2957]   126  2957    54672      269   139264      546
 0 postgres 

That's less helpful than it could be since all the postgres processes
are just listed as "postgres" without arguments. However, it is very
likely that the first one is actually the postmaster, because it has the
lowest pid (and the other pids follow closely) and it has an OOM score
of -900 as set in the systemd service file.

So I could compare the PID of the killed process with this list (in my
case the killed process wasn't one of them but a test program which just
allocates lots of memory).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

pgsql-general by date:

Previous
From: Joe Conway
Date:
Subject: Re: Properly handle OOM death?
Next
From: Israel Brewster
Date:
Subject: Re: Properly handle OOM death?