Thread: Properly handle OOM death?
I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:
Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.
And the service is no longer running.
When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.
Obviously this is not a good situation. Which leads to two questions:
1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:
# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure
Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?
Thanks.
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145
cell: 907-328-9145
On 3/13/23 10:21 AM, Israel Brewster wrote: > I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit > more memory constrained than I would like, such that every week or so > the various processes running on the machine will align badly and the > OOM killer will kick in, killing off postgresql, as per the following > journalctl output: > > Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A > process of this unit has been killed by the OOM killer. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed > with result 'oom-kill'. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: > Consumed 5d 17h 48min 24.509s CPU time. > > And the service is no longer running. > > When this happens, I go in and restart the postgresql service, and > everything is happy again for the next week or two. > > Obviously this is not a good situation. Which leads to two questions: > > 1) is there some tweaking I can do in the postgresql config itself to > prevent the situation from occurring in the first place? > 2) My first thought was to simply have systemd restart postgresql > whenever it is killed like this, which is easy enough. Then I looked at > the default unit file, and found these lines: > > # prevent OOM killer from choosing the postmaster (individual backends will > # reset the score to 0) > OOMScoreAdjust=-900 > # restarting automatically will prevent "pg_ctlcluster ... stop" from > working, > # so we disable it here. Also, the postmaster will restart by itself on most > # problems anyway, so it is questionable if one wants to enable external > # automatic restarts. > #Restart=on-failure > > Which seems to imply that the OOM killer should only be killing off > individual backends, not the entire cluster to begin with - which should > be fine. And also that adding the restart=on-failure option is probably > not the greatest idea. Which makes me wonder what is really going on? You might want to read: https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT > > Thanks. > > --- > Israel Brewster > Software Engineer > Alaska Volcano Observatory > Geophysical Institute - UAF > 2156 Koyukuk Drive > Fairbanks AK 99775-7320 > Work: 907-474-5172 > cell: 907-328-9145 > -- Adrian Klaver adrian.klaver@aklaver.com
On Mar 13, 2023, at 9:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 3/13/23 10:21 AM, Israel Brewster wrote:I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:
Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.
And the service is no longer running.
When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.
Obviously this is not a good situation. Which leads to two questions:
1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:
# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure
Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?
You might want to read:
https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
Good information, thanks. One thing there confuses me though. It says:
Another approach, which can be used with or without altering vm.overcommit_memory, is to set the process-specific OOM score adjustment value for the postmaster process to -1000, thereby guaranteeing it will not be targeted by the OOM killer
Isn’t that exactly what the "OOMScoreAdjust=-900” line in the Unit file does though (except with a score of -900 rather than -1000)?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145
cell: 907-328-9145
Thanks.
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145
--
Adrian Klaver
adrian.klaver@aklaver.com
On 3/13/23 13:21, Israel Brewster wrote: > I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit > more memory constrained than I would like, such that every week or so > the various processes running on the machine will align badly and the > OOM killer will kick in, killing off postgresql, as per the following > journalctl output: > > Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A > process of this unit has been killed by the OOM killer. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed > with result 'oom-kill'. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: > Consumed 5d 17h 48min 24.509s CPU time. > > And the service is no longer running. > > When this happens, I go in and restart the postgresql service, and > everything is happy again for the next week or two. > > Obviously this is not a good situation. Which leads to two questions: > > 1) is there some tweaking I can do in the postgresql config itself to > prevent the situation from occurring in the first place? > 2) My first thought was to simply have systemd restart postgresql > whenever it is killed like this, which is easy enough. Then I looked at > the default unit file, and found these lines: > > # prevent OOM killer from choosing the postmaster (individual backends will > # reset the score to 0) > OOMScoreAdjust=-900 > # restarting automatically will prevent "pg_ctlcluster ... stop" from > working, > # so we disable it here. Also, the postmaster will restart by itself on most > # problems anyway, so it is questionable if one wants to enable external > # automatic restarts. > #Restart=on-failure > > Which seems to imply that the OOM killer should only be killing off > individual backends, not the entire cluster to begin with - which should > be fine. And also that adding the restart=on-failure option is probably > not the greatest idea. Which makes me wonder what is really going on? First, are you running with a cgroup memory.limit set (e.g. in a container)? Assuming no, see: https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT That will tell you: 1/ Turn off memory overcommit: "Although this setting will not prevent the OOM killer from being invoked altogether, it will lower the chances significantly and will therefore lead to more robust system behavior." 2/ set /proc/self/oom_score_adj to -1000 rather than -900 (OOMScoreAdjust=-1000): the value -1000 is important as it is a "magic" value which prevents the process from being selected by the OOM killer (see: https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/oom.h#L6) whereas -900 just makes it less likely. All that said, even if the individual backend gets killed, the postmaster will still go into crash recovery. So while technically postgres does not restart, the effect is much the same. So see #1 above as your best protection. HTH, Joe -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mar 13, 2023, at 9:36 AM, Joe Conway <mail@joeconway.com> wrote:On 3/13/23 13:21, Israel Brewster wrote:I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:
Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.
And the service is no longer running.
When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.
Obviously this is not a good situation. Which leads to two questions:
1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:
# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure
Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?
First, are you running with a cgroup memory.limit set (e.g. in a container)?
Not sure, actually. I *think* I had it set it up as a full VM though, not a container. I’ll have to double-check that.
Assuming no, see:
https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
That will tell you:
1/ Turn off memory overcommit: "Although this setting will not prevent the OOM killer from being invoked altogether, it will lower the chances significantly and will therefore lead to more robust system behavior."
2/ set /proc/self/oom_score_adj to -1000 rather than -900 (OOMScoreAdjust=-1000): the value -1000 is important as it is a "magic" value which prevents the process from being selected by the OOM killer (see: https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/oom.h#L6) whereas -900 just makes it less likely.
..and that answers the question I just sent about the above linked page 😄 Thanks!
All that said, even if the individual backend gets killed, the postmaster will still go into crash recovery. So while technically postgres does not restart, the effect is much the same. So see #1 above as your best protection.
Interesting. Makes sense though. Thanks!
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145
cell: 907-328-9145
HTH,
Joe
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 2023-03-13 09:21:18 -0800, Israel Brewster wrote: > I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more > memory constrained than I would like, such that every week or so the various > processes running on the machine will align badly and the OOM killer will kick > in, killing off postgresql, as per the following journalctl output: > > Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of > this unit has been killed by the OOM killer. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with > result 'oom-kill'. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d > 17h 48min 24.509s CPU time. > > And the service is no longer running. I might be misreading this, but it looks to me that systemd detects that *some* process in the group was killed by the oom killer and stops the service. Can you check which process was actually killed? If it's not the postmaster, setting OOMScoreAdjust is probably useless. (I tried searching the web for the error messages and didn't find anything useful) > 2) My first thought was to simply have systemd restart postgresql whenever it > is killed like this, which is easy enough. Then I looked at the default unit > file, and found these lines: > > # prevent OOM killer from choosing the postmaster (individual backends will > # reset the score to 0) > OOMScoreAdjust=-900 > # restarting automatically will prevent "pg_ctlcluster ... stop" from working, > # so we disable it here. I never call pg_ctlcluster directly, so that probably wouldn't be a good reason for me. > Also, the postmaster will restart by itself on most > # problems anyway, so it is questionable if one wants to enable external > # automatic restarts. > #Restart=on-failure So I'd try this despite the comment. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
> On Mar 13, 2023, at 9:43 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2023-03-13 09:21:18 -0800, Israel Brewster wrote: >> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more >> memory constrained than I would like, such that every week or so the various >> processes running on the machine will align badly and the OOM killer will kick >> in, killing off postgresql, as per the following journalctl output: >> >> Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of >> this unit has been killed by the OOM killer. >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with >> result 'oom-kill'. >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d >> 17h 48min 24.509s CPU time. >> >> And the service is no longer running. > > I might be misreading this, but it looks to me that systemd detects that > *some* process in the group was killed by the oom killer and stops the > service. > > Can you check which process was actually killed? If it's not the > postmaster, setting OOMScoreAdjust is probably useless. > > (I tried searching the web for the error messages and didn't find > anything useful) Your guess is as good as (if not better than) mine. I can find the PID of the killed process in the system log, but withoutknowing what the PID of postmaster and the child processes were prior to the kill, I’m not sure that helps much. Thoughfor what it’s worth, I do note the following about all the kill logs: 1) They reference a “Memory cgroup out of memory”, which refers back to the opening comment on Joe Conway’s message - thiswould imply to me that I *AM* running with a cgroup memory.limit set. Not sure how that changes things? 2) All the entries contain the line "oom_score_adj:0”, which would seem to imply that the postmaster, with its -900 scoreis not being directly targeted by the OOM killer. > >> 2) My first thought was to simply have systemd restart postgresql whenever it >> is killed like this, which is easy enough. Then I looked at the default unit >> file, and found these lines: >> >> # prevent OOM killer from choosing the postmaster (individual backends will >> # reset the score to 0) >> OOMScoreAdjust=-900 >> # restarting automatically will prevent "pg_ctlcluster ... stop" from working, >> # so we disable it here. > > I never call pg_ctlcluster directly, so that probably wouldn't be a good > reason for me. Valid point, unless something under-the-hood needs to call it? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > >> Also, the postmaster will restart by itself on most >> # problems anyway, so it is questionable if one wants to enable external >> # automatic restarts. >> #Restart=on-failure > > So I'd try this despite the comment. > > hp > > -- > _ | Peter J. Holzer | Story must make more sense than reality. > |_|_) | | > | | | hjp@hjp.at | -- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!"
On 3/13/23 13:55, Israel Brewster wrote: > 1) They reference a “Memory cgroup out of memory”, which refers back > to the opening comment on Joe Conway’s message - this would imply to > me that I *AM* running with a cgroup memory.limit set. Not sure how > that changes things? cgroup memory limit is enforced regardless of the actual host level memory pressure. As an example, if your host VM has 128 GB of memory, but your cgroup memory limit is 512MB, you will get an OOM kill when the sum memory usage of all of your postgres processes (and anything else sharing the same cgroup) exceeds 512 MB, even if the host VM has nothing else going on consuming memory. You can check if a memory is set by reading the corresponding virtual file, e.g: 8<------------------- # cat /sys/fs/cgroup/memory/system.slice/postgresql.service/memory.limit_in_bytes 9223372036854710272 8<------------------- A few notes: 1/ The specific path to memory.limit_in_bytes might vary, but this example is the default for the RHEL 8 postgresql 10 RPM. 2/ The value above, 9223372036854710272 basically means "no limit" has been set. 3/ The example assumes cgroup v1. There are very few distro's that enable cgroup v2 by default, and generally I have not seen much cgroup v2 usage in the wild (although I strongly recommend it), but if you are using cgroup v2 the names have changed. You can check by doing: 8<--cgroupv2 enabled----------------- # stat -fc %T /sys/fs/cgroup/ cgroup2fs 8<--cgroupv1 enabled----------------- # stat -fc %T /sys/fs/cgroup/ tmpfs 8<------------------- > 2) All the entries contain the line "oom_score_adj:0”, which would > seem to imply that the postmaster, with its -900 score is not being > directly targeted by the OOM killer. Sounds correct -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> On Mar 13, 2023, at 10:37 AM, Joe Conway <mail@joeconway.com> wrote: > > On 3/13/23 13:55, Israel Brewster wrote: >> 1) They reference a “Memory cgroup out of memory”, which refers back >> to the opening comment on Joe Conway’s message - this would imply to >> me that I *AM* running with a cgroup memory.limit set. Not sure how >> that changes things? > > cgroup memory limit is enforced regardless of the actual host level memory pressure. As an example, if your host VM has128 GB of memory, but your cgroup memory limit is 512MB, you will get an OOM kill when the sum memory usage of all ofyour postgres processes (and anything else sharing the same cgroup) exceeds 512 MB, even if the host VM has nothing elsegoing on consuming memory. > > You can check if a memory is set by reading the corresponding virtual file, e.g: > > 8<------------------- > # cat /sys/fs/cgroup/memory/system.slice/postgresql.service/memory.limit_in_bytes > 9223372036854710272 > 8<------------------- > > A few notes: > 1/ The specific path to memory.limit_in_bytes might vary, but this example is the default for the RHEL 8 postgresql 10RPM. Not finding that file specifically (this is probably too much info, but…): root@novarupta:~# ls /sys/fs/cgroup/system.slice/ -.mount cgroup.threads dev-hugepages.mount memory.events.local memory.swap.events proc-diskstats.mount ssh.service system-postgresql.slice systemd-resolved.service accounts-daemon.service cgroup.type dev-lxc-console.mount memory.high memory.swap.high proc-loadavg.mount sys-devices-system-cpu-online.mount systemd-initctl.socket systemd-sysctl.service cgroup.controllers console-getty.service dev-lxc-tty1.mount memory.low memory.swap.max proc-meminfo.mount sys-devices-virtual-net.mount systemd-journal-flush.service systemd-sysusers.service cgroup.events console-setup.service dev-lxc-tty2.mount memory.max networkd-dispatcher.service proc-stat.mount sys-fs-fuse-connections.mount systemd-journald-audit.socket systemd-tmpfiles-setup-dev.service cgroup.freeze cpu.pressure dev-mqueue.mount memory.min pids.current proc-swaps.mount sys-kernel-debug.mount systemd-journald-dev-log.socket systemd-tmpfiles-setup.service cgroup.max.depth cpu.stat dev-ptmx.mount memory.numa_stat pids.events proc-sys-kernel-random-boot_id.mount syslog.socket systemd-journald.service systemd-update-utmp.service cgroup.max.descendants cron.service io.pressure memory.oom.group pids.max proc-sys-net.mount sysstat.service systemd-journald.socket systemd-user-sessions.service cgroup.procs data.mount keyboard-setup.service memory.pressure pool.mount 'proc-sysrq\x2dtrigger.mount' 'system-container\x2dgetty.slice' systemd-logind.service ufw.service cgroup.stat dbus.service memory.current memory.stat postfix.service proc-uptime.mount system-modprobe.slice systemd-networkd.service uuidd.socket cgroup.subtree_control dbus.socket memory.events memory.swap.current proc-cpuinfo.mount rsyslog.service system-postfix.slice systemd-remount-fs.service root@novarupta:~# ls /sys/fs/cgroup/system.slice/system-postgresql.slice/ cgroup.controllers cgroup.max.depth cgroup.stat cgroup.type io.pressure memory.events.local memory.max memory.oom.group memory.swap.current memory.swap.max pids.max cgroup.events cgroup.max.descendants cgroup.subtree_control cpu.pressure memory.current memory.high memory.min memory.pressure memory.swap.events pids.current postgresql@13-main.service cgroup.freeze cgroup.procs cgroup.threads cpu.stat memory.events memory.low memory.numa_stat memory.stat memory.swap.high pids.events root@novarupta:~# ls /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/ cgroup.controllers cgroup.max.depth cgroup.stat cgroup.type io.pressure memory.events.local memory.max memory.oom.group memory.swap.current memory.swap.max pids.max cgroup.events cgroup.max.descendants cgroup.subtree_control cpu.pressure memory.current memory.high memory.min memory.pressure memory.swap.events pids.current cgroup.freeze cgroup.procs cgroup.threads cpu.stat memory.events memory.low memory.numa_stat memory.stat memory.swap.high pids.events > > 2/ The value above, 9223372036854710272 basically means "no limit" has been set. > > 3/ The example assumes cgroup v1. There are very few distro's that enable cgroup v2 by default, and generally I have notseen much cgroup v2 usage in the wild (although I strongly recommend it), but if you are using cgroup v2 the names havechanged. You can check by doing: > > 8<--cgroupv2 enabled----------------- > # stat -fc %T /sys/fs/cgroup/ > cgroup2fs > 8<--cgroupv1 enabled----------------- > # stat -fc %T /sys/fs/cgroup/ > tmpfs > 8<------------------- Looks like V2: root@novarupta:~# stat -fc %T /sys/fs/cgroup/ cgroup2fs root@novarupta:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > >> 2) All the entries contain the line "oom_score_adj:0”, which would >> seem to imply that the postmaster, with its -900 score is not being >> directly targeted by the OOM killer. > > Sounds correct > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
On 3/13/23 14:50, Israel Brewster wrote: > Looks like V2: > > root@novarupta:~# stat -fc %T /sys/fs/cgroup/ > cgroup2fs Interesting -- it does indeed look like you are using cgroup v2 So the file you want to look at in that case is: 8<----------- cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.max 4294967296 cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.high 3221225472 8<----------- If the value comes back as "max" it means no limit is set. In this example (on my Linux Mint machine with a custom systemd unit file) I have memory.max set to 4G and memory.high set to 3G. The value of memory.max determines when the OOM killer will strike. The value of memory.high will determine when the kernel goes into aggressive memory reclaim (trying to avoid memory.max and thus an OOM kill). The corresponding/relevant systemd unit file parameters are: 8<----------- MemoryAccounting=yes MemoryHigh=3G MemoryMax=4G 8<----------- There are other ways that memory.max may get set, but it seems most likely that the systemd unit file is doing it (if it is in fact set). -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mar 13, 2023, at 11:10 AM, Joe Conway <mail@joeconway.com> wrote: > > On 3/13/23 14:50, Israel Brewster wrote: >> Looks like V2: >> root@novarupta:~# stat -fc %T /sys/fs/cgroup/ >> cgroup2fs > > Interesting -- it does indeed look like you are using cgroup v2 > > So the file you want to look at in that case is: > 8<----------- > cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.max > 4294967296 > > cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.high > 3221225472 > 8<----------- > If the value comes back as "max" it means no limit is set. This does, in fact, appear to be the case here: root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.max max root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.high max root@novarupta:~# which would presumably indicate that it’s a system level limit being exceeded, rather than a postgresql specific one? Thesyslog specifically says "Memory cgroup out of memory”, if that means something (this is my first exposure to cgroups,if you couldn’t tell). --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > In this example (on my Linux Mint machine with a custom systemd unit file) I have memory.max set to 4G and memory.highset to 3G. > > The value of memory.max determines when the OOM killer will strike. The value of memory.high will determine when the kernelgoes into aggressive memory reclaim (trying to avoid memory.max and thus an OOM kill). > > The corresponding/relevant systemd unit file parameters are: > 8<----------- > MemoryAccounting=yes > MemoryHigh=3G > MemoryMax=4G > 8<----------- > > There are other ways that memory.max may get set, but it seems most likely that the systemd unit file is doing it (if itis in fact set). > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
On Mon, Mar 13, 2023 at 1:21 PM Israel Brewster <ijbrewster@alaska.edu> wrote: > > I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, suchthat every week or so the various processes running on the machine will align badly and the OOM killer will kick in,killing off postgresql, as per the following journalctl output: > > Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'. > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time. > > And the service is no longer running. > > When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two. > > Obviously this is not a good situation. Which leads to two questions: > > 1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the firstplace? > 2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough.Then I looked at the default unit file, and found these lines: > > # prevent OOM killer from choosing the postmaster (individual backends will > # reset the score to 0) > OOMScoreAdjust=-900 > # restarting automatically will prevent "pg_ctlcluster ... stop" from working, > # so we disable it here. Also, the postmaster will restart by itself on most > # problems anyway, so it is questionable if one wants to enable external > # automatic restarts. > #Restart=on-failure > > Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to beginwith - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Whichmakes me wonder what is really going on? > Related, we (a FOSS project) used to have a Linux server with a LAMP stack on GoDaddy. The machine provided a website and wiki. It was very low-end. I think it had 512MB or 1 GB RAM and no swap file. And no way to enable a swap file (part of an upsell). We paid about $2 a month for it. MySQL was killed several times a week. It corrupted the database on a regular basis. We had to run the database repair tools daily. We eventually switched to Ionos for hosting. We got a VM with more memory and a swap file for about $5 a month. No more OOM kills. If possible, you might want to add more memory (or a swap file) to the machine. It will help sidestep the OOM problem. You can also add vm.overcommit_memory = 2 to stop Linux from oversubscribing memory. The machine will act like a Solaris box rather than a Linux box (which takes some getting used to). Also see https://serverfault.com/questions/606185/how-does-vm-overcommit-memory-work . Jeff
On 3/13/23 15:18, Israel Brewster wrote: > root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.max > max > root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.high > max > root@novarupta:~# > > which would presumably indicate that it’s a system level limit being > exceeded, rather than a postgresql specific one? Yep > The syslog specifically says "Memory cgroup out of memory”, if that means > something (this is my first exposure to cgroups, if you couldn’t > tell). I am not entirely sure, but without actually testing it I suspect that since memory.max = high (that is, the limit is whatever the host has available) the OOM kill is technically a cgroup OOM kill even though it is effectively a host level memory pressure event. Did you try setting "vm.overcommit_memory=2"? -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 2023-03-13 09:55:50 -0800, Israel Brewster wrote: > On Mar 13, 2023, at 9:43 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2023-03-13 09:21:18 -0800, Israel Brewster wrote: > >> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more > >> memory constrained than I would like, such that every week or so the various > >> processes running on the machine will align badly and the OOM killer will kick > >> in, killing off postgresql, as per the following journalctl output: > >> > >> Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of > >> this unit has been killed by the OOM killer. > >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with > >> result 'oom-kill'. > >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d > >> 17h 48min 24.509s CPU time. > >> > >> And the service is no longer running. > > > > I might be misreading this, but it looks to me that systemd detects that > > *some* process in the group was killed by the oom killer and stops the > > service. > > > > Can you check which process was actually killed? If it's not the > > postmaster, setting OOMScoreAdjust is probably useless. > > > > (I tried searching the web for the error messages and didn't find > > anything useful) > > Your guess is as good as (if not better than) mine. I can find the PID > of the killed process in the system log, but without knowing what the > PID of postmaster and the child processes were prior to the kill, I’m > not sure that helps much. The syslog should contain a list of all tasks prior to the kill. For example, I just provoked an OOM kill on my laptop and the syslog contains (among lots of others) these lines: Mar 13 21:00:36 trintignant kernel: [112024.084117] [ 2721] 126 2721 54563 2042 163840 555 -900postgres Mar 13 21:00:36 trintignant kernel: [112024.084123] [ 2873] 126 2873 18211 85 114688 594 0 postgres Mar 13 21:00:36 trintignant kernel: [112024.084128] [ 2941] 126 2941 54592 1231 147456 565 0 postgres Mar 13 21:00:36 trintignant kernel: [112024.084134] [ 2942] 126 2942 54563 535 143360 550 0 postgres Mar 13 21:00:36 trintignant kernel: [112024.084139] [ 2943] 126 2943 54563 1243 139264 548 0 postgres Mar 13 21:00:36 trintignant kernel: [112024.084145] [ 2944] 126 2944 54798 561 147456 545 0 postgres Mar 13 21:00:36 trintignant kernel: [112024.084150] [ 2945] 126 2945 54563 215 131072 551 0 postgres Mar 13 21:00:36 trintignant kernel: [112024.084156] [ 2956] 126 2956 18718 506 122880 553 0 postgres Mar 13 21:00:36 trintignant kernel: [112024.084161] [ 2957] 126 2957 54672 269 139264 546 0 postgres That's less helpful than it could be since all the postgres processes are just listed as "postgres" without arguments. However, it is very likely that the first one is actually the postmaster, because it has the lowest pid (and the other pids follow closely) and it has an OOM score of -900 as set in the systemd service file. So I could compare the PID of the killed process with this list (in my case the killed process wasn't one of them but a test program which just allocates lots of memory). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
> On Mar 13, 2023, at 11:42 AM, Joe Conway <mail@joeconway.com> wrote: > > On 3/13/23 15:18, Israel Brewster wrote: >> The syslog specifically says "Memory cgroup out of memory”, if that means >> something (this is my first exposure to cgroups, if you couldn’t >> tell). > > I am not entirely sure, but without actually testing it I suspect that since memory.max = high (that is, the limit is whateverthe host has available) the OOM kill is technically a cgroup OOM kill even though it is effectively a host levelmemory pressure event. That would make sense. > > Did you try setting "vm.overcommit_memory=2"? Yeah: root@novarupta:~# sysctl -w vm.overcommit_memory=2 sysctl: setting key "vm.overcommit_memory", ignoring: Read-only file system I’m thinking I wound up with a container rather than a full VM after all - and as such, the best solution may be to migrateto a full VM with some swap space available to avoid the issue in the first place. I’ll have to get in touch withthe sys admin for that though. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
On 3/13/23 16:18, Israel Brewster wrote: >> On Mar 13, 2023, at 11:42 AM, Joe Conway <mail@joeconway.com> wrote: >> I am not entirely sure, but without actually testing it I suspect >> that since memory.max = high (that is, the limit is whatever the >> host has available) the OOM kill is technically a cgroup OOM kill >> even though it is effectively a host level memory pressure event. Sorry, actually meant "memory.max = max" here >> Did you try setting "vm.overcommit_memory=2"? > root@novarupta:~# sysctl -w vm.overcommit_memory=2 > sysctl: setting key "vm.overcommit_memory", ignoring: Read-only file system > I’m thinking I wound up with a container rather than a full VM after > all - and as such, the best solution may be to migrate to a full VM > with some swap space available to avoid the issue in the first place. > I’ll have to get in touch with the sys admin for that though. Hmm, well big +1 for having swap turned on, but I recommend setting "vm.overcommit_memory=2" even so. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mar 13, 2023, at 12:16 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:On 2023-03-13 09:55:50 -0800, Israel Brewster wrote:On Mar 13, 2023, at 9:43 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:The syslog should contain a list of all tasks prior to the kill. For
example, I just provoked an OOM kill on my laptop and the syslog
contains (among lots of others) these lines:
Mar 13 21:00:36 trintignant kernel: [112024.084117] [ 2721] 126 2721 54563 2042 163840 555 -900 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084123] [ 2873] 126 2873 18211 85 114688 594 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084128] [ 2941] 126 2941 54592 1231 147456 565 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084134] [ 2942] 126 2942 54563 535 143360 550 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084139] [ 2943] 126 2943 54563 1243 139264 548 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084145] [ 2944] 126 2944 54798 561 147456 545 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084150] [ 2945] 126 2945 54563 215 131072 551 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084156] [ 2956] 126 2956 18718 506 122880 553 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084161] [ 2957] 126 2957 54672 269 139264 546 0 postgres
That's less helpful than it could be since all the postgres processes
are just listed as "postgres" without arguments. However, it is very
likely that the first one is actually the postmaster, because it has the
lowest pid (and the other pids follow closely) and it has an OOM score
of -900 as set in the systemd service file.
So I could compare the PID of the killed process with this list (in my
case the killed process wasn't one of them but a test program which just
allocates lots of memory).
1) The entries in my syslog all refer to an R process, not a postgresql process at all
2) The ‘Killed process’ entry *does* actually have the process name in it - it’s just since the process name was “R”, I wasn’t making the connection 😄
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
> On Mar 13, 2023, at 12:25 PM, Joe Conway <mail@joeconway.com> wrote: > > On 3/13/23 16:18, Israel Brewster wrote: >>> Did you try setting "vm.overcommit_memory=2"? > >> root@novarupta:~# sysctl -w vm.overcommit_memory=2 >> sysctl: setting key "vm.overcommit_memory", ignoring: Read-only file system > >> I’m thinking I wound up with a container rather than a full VM after >> all - and as such, the best solution may be to migrate to a full VM >> with some swap space available to avoid the issue in the first place. >> I’ll have to get in touch with the sys admin for that though. > > Hmm, well big +1 for having swap turned on, but I recommend setting "vm.overcommit_memory=2" even so. Makes sense. Presumably with a full VM I won’t get the “Read-only file system” error when trying to do so. Thanks! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
On 13.03.23 21:25, Joe Conway wrote: > Hmm, well big +1 for having swap turned on, but I recommend setting > "vm.overcommit_memory=2" even so. I've snipped out the context here, since my advice is very unspecific: do use swap only as a safety net. Once your system starts swapping performance goes down the toilet. *t
On Sat, Mar 18, 2023 at 6:02 PM Tomas Pospisek <tpo2@sourcepole.ch> wrote: > > On 13.03.23 21:25, Joe Conway wrote: > > > Hmm, well big +1 for having swap turned on, but I recommend setting > > "vm.overcommit_memory=2" even so. > > I've snipped out the context here, since my advice is very unspecific: > do use swap only as a safety net. Once your system starts swapping > performance goes down the toilet. To use swap as a safety net, set swappiness to a low value, like 2. Two will keep most data in RAM and reduce (but not eliminate) spilling to the file system. I have a bunch of old ARM dev boards that are resource constrained. They use SDcards, which have a limited lifetime based on writes. I give the boards a 1 GB swap file to avoid OOM kills when running the compiler on C++ programs. And I configure them with a swappiness of 2 to reduce swapping. Jeff
On 3/18/23 18:02, Tomas Pospisek wrote: > On 13.03.23 21:25, Joe Conway wrote: > >> Hmm, well big +1 for having swap turned on, but I recommend setting >> "vm.overcommit_memory=2" even so. > > I've snipped out the context here, since my advice is very unspecific: > do use swap only as a safety net. Once your system starts swapping > performance goes down the toilet. While I agree with this statement in principle, it is exactly the notion that "once your system starts swapping performance goes down the toilet" that leads people to conclude that having lots of memory and disabling swap will solve all their problems. Because of how the Linux kernel works, you should, IMHO, always have some swap available. For more on why, see: https://chrisdown.name/2018/01/02/in-defence-of-swap.html -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Mar 13, 2023 at 06:43:01PM +0100, Peter J. Holzer wrote: > On 2023-03-13 09:21:18 -0800, Israel Brewster wrote: > > I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more > > memory constrained than I would like, such that every week or so the various > > processes running on the machine will align badly and the OOM killer will kick > > in, killing off postgresql, as per the following journalctl output: > > > > Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer. > > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'. > > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time. > > > > And the service is no longer running. > > I might be misreading this, but it looks to me that systemd detects that > *some* process in the group was killed by the oom killer and stops the > service. Yeah. I found this old message on google. I'm surprised there aren't more, similar complaints about this. It's as Peter said: it (sometimes) causes systemd to actively *stop* the cluster after OOM, when it would've come back online on its own if the init (supervisor) process didn't interfere. My solution was to set: /usr/lib/systemd/system/postgresql@.service OOMPolicy=continue I suggest that the default unit files should do likewise. -- Justin