PostgreSQL 15.5 stops processing user queries - Mailing list pgsql-general

From Andrey Zhidenkov
Subject PostgreSQL 15.5 stops processing user queries
Date
Msg-id CAN=gQ4A9Bq9=jri7qt8PUt4x7n576nZ6fqSpVx2DsdC=64uHeA@mail.gmail.com
Whole thread Raw
Responses Re: PostgreSQL 15.5 stops processing user queries
List pgsql-general
Hello all,

We have encountered an issue with our PostgreSQL 15.5 installation. The problem is that
PostgreSQL server periodically falls into a state when it accepts new connections but
doesn't execute any queries. The session which runs a query hangs and cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No matter how we connect to
the database - both TCP and unix socket sessions hang but it seems that existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which seemed to use an
existing connection to the database).

Here is a stack trace from gdb for one of the frozen sessions:

(gdb) bt 10
#0  0x00007f6d31dbd378 in poll () from /lib64/libc.so.6
#1  0x00007f6d3286aee1 in pqSocketCheck.part.2 () from /usr/pgsql-15/lib/libpq.so.5
#2  0x00007f6d3286b054 in pqWaitTimed () from /usr/pgsql-15/lib/libpq.so.5
#3  0x00007f6d32867848 in PQgetResult () from /usr/pgsql-15/lib/libpq.so.5
#4  0x0000000000411320 in ExecQueryAndProcessResults (query=query@entry=0x23a68b0 "select 1;", elapsed_msec=elapsed_msec@entry=0x7ffc2b5840a8, svpt_gone_p=svpt_gone_p@entry=0x7ffc2b5840a7,
    is_watch=is_watch@entry=false, opt=opt@entry=0x0, printQueryFout=printQueryFout@entry=0x0) at common.c:1426
#5  0x000000000040feb9 in SendQuery (query=0x23a68b0 "select 1;") at common.c:1117
#6  0x000000000040627b in main (argc=<optimized out>, argv=<optimized out>) at startup.c:384

We're using glibc-2.28-236.0.1.el8.7.x86_64 on this machine and PostgreSQL 15.5:

postgres=# select version();
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), 64-bit
(1 row)

We've tried to recreate a cluster from scratch using a logical dump on new hardware but it didn't
help though. And unfortunately we could not reproduce the issue, it looks like it occurs randomly and
when it happens only PostgreSQL restart helps. Also we have number of machines that run the
same version of PostgreSQL but we have the problem only with one cluster so maybe it somehow
related to queries that are specific to this cluster.

We also run patroni 2.1.4 on this cluster, for reference (I'm not sure if it can be related). We checked PostgreSQL logs,
of course - there are no any messages that could be related to the issue as well.

We will really appreciate any help, thanks!

--
With best regards, Andrei Zhidenkov.

pgsql-general by date:

Previous
From: "Peter J. Holzer"
Date:
Subject: Re: Help understanding server-side logging (and more...)
Next
From: veem v
Date:
Subject: Re: Read write performance check