Re: Pg stuck at 100% cpu, for multiple days - Mailing list pgsql-hackers

From Joe Conway
Subject Re: Pg stuck at 100% cpu, for multiple days
Date
Msg-id 257d9bd3-6cd4-4307-2a6d-f78a5b9eba7d@joeconway.com
Whole thread Raw
In response to Re: Pg stuck at 100% cpu, for multiple days  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Pg stuck at 100% cpu, for multiple days  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Pg stuck at 100% cpu, for multiple days  (hubert depesz lubaczewski <depesz@depesz.com>)
List pgsql-hackers
On 8/30/21 3:34 PM, Justin Pryzby wrote:
> On Mon, Aug 30, 2021 at 09:09:20PM +0200, Laurenz Albe wrote:
>> On Mon, 2021-08-30 at 17:18 +0200, hubert depesz lubaczewski wrote:
>> > The thing is - I can't close it with pg_terminate_backend(), and I'd
>> > rather not kill -9, as it will, I think, close all other connections,
>> > and this is prod server.
>> 
>> Of course the cause should be fixed, but to serve your immediate need:
> 
> You might save a coredump of the process using gdb gcore before killing it, in
> case someone thinks how to debug it next month.
> 
> Depending on your OS, you might have to do something special to get shared
> buffers included in the dump (or excluded, if that's what's desirable).
> 
> I wonder how far up the stacktrace it's stuck ?
> You could set a breakpoint on LogicalDecodingProcessRecord and then "c"ontinue,
> and see if it hits the breakpoint in a few seconds.  If not, try the next
> frame until you know which one is being called repeatedly.
> 
> Maybe CheckForInterrupts should be added somewhere...

The spot in the backtrace...

#0  hash_seq_search (status=status@entry=0xffffdd90f380) at 
./build/../src/backend/utils/hash/dynahash.c:1448

...is in the middle of this while loop:
8<-----------------------------------------
     while ((curElem = segp[segment_ndx]) == NULL)
     {
         /* empty bucket, advance to next */
         if (++curBucket > max_bucket)
         {
             status->curBucket = curBucket;
             hash_seq_term(status);
             return NULL;  /* search is done */
         }
         if (++segment_ndx >= ssize)
         {
             segment_num++;
             segment_ndx = 0;
             segp = hashp->dir[segment_num];
         }
     }
8<-----------------------------------------

It would be interesting to step through a few times to see if it is 
really stuck in that loop. That would be consistent with 100% CPU and 
not checking for interrupts I think.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Can we get rid of repeated queries from pg_dump?
Next
From: Tom Lane
Date:
Subject: Re: Pg stuck at 100% cpu, for multiple days