Re: FW: query pg_stat_ssl hang 100%cpu - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: FW: query pg_stat_ssl hang 100%cpu
Date
Msg-id ZPqLfAk8n/HrB6G4@paquier.xyz
Whole thread Raw
In response to Re: FW: query pg_stat_ssl hang 100%cpu  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: FW: query pg_stat_ssl hang 100%cpu
List pgsql-bugs
On Thu, Sep 7, 2023 at 10:39 PM James Pang (chaolpan)
<chaolpan(at)cisco(dot)com> wrote:
> (gdb) p RecordCacheArray
> $1 = (TupleDesc *) 0x7f5fac365d90
> (gdb) p RecordIdentifierArray
> $2 = (uint64 *) 0x0

Oh, yeah, this stack is broken.  You have been really unlucky to hit
that.  This can randomly cause any session to get stuck, and no need
for the extended query protocol here.

(I am not sure how, but my email server has somewhat not been able to
get the previous messages from James.  Anyway.)

On Fri, Sep 08, 2023 at 11:45:51AM +1200, Thomas Munro wrote:
> I think the lazy fix would be to re-order those allocations.  A
> marginally more elegant fix would be to merge the arrays, as in the
> attached.  Thoughts?

So, ensure_record_cache_typmod_slot_exists() would allocate the
initial RecordCacheArray and if it fails on the second one it keeps
RecordCacheArrayLen at 0.  When coming back again to this code path,
the second part of the routine causes an infinite loop because the
allocation has been done, but the tracked length is 0.  Fun.

This is broken since 4b93f57 where the second array has been
introduced.  Getting that fixed before 11 is EOL is nice as it was
introduced there, so good timing.

There is a repalloc_extended(), but I cannot get excited to use
MCXT_ALLOC_NO_OOM in this code path if there is a simpler method to
avoid this issue with a single allocation for the all information set.

+static RecordCacheArrayEntry * RecordCacheArray = NULL;

pgindent is annoyed by that..  typedefs.list has been updated in your
patch, so I guess that you missed one extra indentation after this is
refreshed.

Note that RememberToFreeTupleDescAtEOX() does something similar to the
type cache, and it uses the same approach as your patch.

+1 to your proposal of using a struct for the entries in the cache.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.
Next
From: PG Bug reporting form
Date:
Subject: BUG #18097: Immutable expression not allowed in generated at