Re: FW: query pg_stat_ssl hang 100%cpu - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: FW: query pg_stat_ssl hang 100%cpu
Date
Msg-id ZPlL0MVFmzKPPgSQ@paquier.xyz
Whole thread Raw
In response to FW: query pg_stat_ssl hang 100%cpu  ("James Pang (chaolpan)" <chaolpan@cisco.com>)
Responses RE: FW: query pg_stat_ssl hang 100%cpu
List pgsql-bugs
On Thu, Sep 07, 2023 at 01:35:00AM +0000, James Pang (chaolpan) wrote:
>     PGv14.8, OS RHEL8, no SSL enabled in this database, we have a
>     lot of client sessions who check it's ssl state by  query, all
>     other sessions got done very quickly, but only 1 session hang
>     there in 100% cpu tens of hours, even pg_terminate_backend does
>     not make it stopped either.  It looks like abnormal.
>
>    select ssl from pg_stat_ssl where pid=pg_backend_pid();

This is hard to act on without more details or even a reproducible and
self-contained test case.  Even a java script based on the JDBC driver
would be OK for me, for example, if it helps digging into what you are
seeing.

> #0  ensure_record_cache_typmod_slot_exists (typmod=0) at typcache.c:1714
> #1  0x000000000091185b in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x27bc738) at
typcache.c:2001
> #2  0x000000000091df03 in internal_get_result_type (funcid=<optimized out>, call_expr=<optimized out>,
rsinfo=<optimizedout>, 
>     resultTypeId=<optimized out>, resultTupleDesc=0x7ffc9dff8cd0) at funcapi.c:393
> #3  0x000000000091e263 in get_expr_result_type (expr=expr@entry=0x2792798,
resultTypeId=resultTypeId@entry=0x7ffc9dff8ccc,
>     resultTupleDesc=resultTupleDesc@entry=0x7ffc9dff8cd0) at funcapi.c:230
> #4  0x00000000006a2fa5 in ExecInitFunctionScan (node=node@entry=0x273afa8, estate=estate@entry=0x269e948,
eflags=eflags@entry=16)at nodeFunctionscan.c:370 
> #5  0x000000000069084e in ExecInitNode (node=node@entry=0x273afa8, estate=estate@entry=0x269e948,
eflags=eflags@entry=16)at execProcnode.c:255 
> #6  0x000000000068a96d in InitPlan (eflags=16, queryDesc=0x273b2d8) at execMain.c:936
> #7  standard_ExecutorStart (queryDesc=0x273b2d8, eflags=16) at execMain.c:263
> #8  0x00007f67c2821d5d in pgss_ExecutorStart (queryDesc=0x273b2d8, eflags=<optimized out>) at
pg_stat_statements.c:965
> #9  0x00000000007fc226 in PortalStart (portal=portal@entry=0x26848b8, params=params@entry=0x0, eflags=eflags@entry=0,
snapshot=snapshot@entry=0x0)
>     at pquery.c:514
> #10 0x00000000007fa27f in exec_bind_message (input_message=0x7ffc9dff90d0) at postgres.c:1995
> #11 PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffc9dff9370, dbname=<optimized out>, username=<optimized
out>)at postgres.c:4552 
> #12 0x000000000077a4ea in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4537
> #13 BackendStartup (port=<optimized out>) at postmaster.c:4259
> #14 ServerLoop () at postmaster.c:1745
> #15 0x000000000077b363 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x256abc0) at postmaster.c:1417
> #16 0x00000000004fec63 in main (argc=5, argv=0x256abc0) at main.c:209

This stack is referring to a code path where we are checking that some
of the type-related data associated to a record is around, but this
does not say exactly where the loop happens, so...  Are we looking on
a loop in the function execution itself from which the information of
pg_stat_ssl is retrieved (aka pg_stat_get_activity())?  Or is the
type cache somewhat broken because of the extended query protocol?
That's not really possible to see any evidence based on the
information provided, though it provides a few hits that can help.
FWIW, I've not heard about an issue like that in the field.

The first thing I would do is update to 14.9, which is the latest
version of Postgres available for this major version.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: "James Pang (chaolpan)"
Date:
Subject: FW: query pg_stat_ssl hang 100%cpu
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: BUG #18046: stats collection behaviour change is affecting the usability of information.