Thread: BUG #18469: OOM occurs and backend processes are kept in Zombie state.

BUG #18469: OOM occurs and backend processes are kept in Zombie state.

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18469
Logged by:          song yutao
Email address:      2986538596@qq.com
PostgreSQL version: 12.16
Operating system:   Linux
Description:

I was performing a lot of operations on a server deployed with postgresql
12.16. As heavy operations  performed continuously.  memory consumption has
been increased, the OS eventually got OOM and some background connection
processes that were taking up too much memory were killed. However, these
processes were not successfully killed and remained in Zombie state. In the
meantime, the whole database process seems to be stuck and time out happened
while connect via psql.

Below is the status  after OOM happened:
Ruby    7822    0.0    0.6 4485088 110940    の    May06    10:24 /usr/pgsql/bin/postmaster -D
/var/lib/pgsql/data
Ruby    7874    0.3    0.0    o    o    sZ    May06    33:30    [postmaster] <defunct>
Ruby    7893    0.0    0.0    。    。    sz    May06    3:34    [postmaster] <defunct>
Ruby    7919    0.0    0.0    70592    4344    Ss    May06    3:27    postgres: stats collector
Ruby    9061    0.0    0.1 4485000 17836 ?    Ss    May06    3:19    postgres: walwriter
Ruby    9062    0.0    0.0 4486544 2428 ?    ss    May06    0:03    postgres:
autovacuum    launcher
Ruby    9063    0.0    0.0    66364    992 ?    ss    May06    1:27    postgres: archivers    last was
00000002000002C5000000FB
Ruby    9064    0.0    0.0 4486384 3280 ?    sS    May06    00:0    postgres: logical replication
launcher
Ruby    14403    0.1    0.0 4487084 3788 ?    Ss    May06    18:53    postgres: walsender rdsRepl
192.168.13.78(43284) strean
Ruby    Ruby    2170474    2170401    0.0    0.0    0.0    0.0    May11    0:05    0:05    [postmaster]
<defunct>    [postmaster]    <defunct>

I would like to know if the postmaster process is stuck because of the
process Zombie state.


PG Bug reporting form <noreply@postgresql.org> writes:
> I was performing a lot of operations on a server deployed with postgresql
> 12.16. As heavy operations  performed continuously.  memory consumption has
> been increased, the OS eventually got OOM and some background connection
> processes that were taking up too much memory were killed. However, these
> processes were not successfully killed and remained in Zombie state. In the
> meantime, the whole database process seems to be stuck and time out happened
> while connect via psql.

It sounds to me like the OOM killer decided to kill the postmaster
process, rather than the child process(es) that were actually eating
memory.  That's *extremely* unhelpful behavior.  There is some advice
in our manual about configuring your system to not do that.

> Below is the status  after OOM happened:
> Ruby    7822    0.0    0.6 4485088 110940    の    May06    10:24 /usr/pgsql/bin/postmaster -D /var/lib/pgsql/data

It's not clear to me where this postmaster process came from,
but it appears to be younger than the other postgres-related
processes you're showing, so they are not its children.

I'd manually nuke all of these processes and start fresh.

            regards, tom lane