Re: Pg stuck at 100% cpu, for multiple days - Mailing list pgsql-general
From | Joe Conway |
---|---|
Subject | Re: Pg stuck at 100% cpu, for multiple days |
Date | |
Msg-id | cb86c11d-9c9d-d7ac-8261-c06fba3a6612@joeconway.com Whole thread Raw |
In response to | Re: Pg stuck at 100% cpu, for multiple days (hubert depesz lubaczewski <depesz@depesz.com>) |
Responses |
Re: Pg stuck at 100% cpu, for multiple days
|
List | pgsql-general |
On 8/30/21 10:36 AM, hubert depesz lubaczewski > Anyway - it's 12.6 on aarm64. Couple of days there was replication > slot started, and now it seems to be stuck. > #0 hash_seq_search (status=status@entry=0xffffdd90f380) at ./build/../src/backend/utils/hash/dynahash.c:1448 > #1 0x0000aaaac3042060 in RelfilenodeMapInvalidateCallback (arg=<optimized out>, relid=105496194) at ./build/../src/backend/utils/cache/relfilenodemap.c:64 > #2 0x0000aaaac3033aa4 in LocalExecuteInvalidationMessage (msg=0xffff9b66eec8) at ./build/../src/backend/utils/cache/inval.c:595 > #3 0x0000aaaac2ec8274 in ReorderBufferExecuteInvalidations (rb=0xaaaac326bb00 <errordata>, txn=0xaaaac326b998 <formatted_start_time>,txn=0xaaaac326b998 <formatted_start_time>) at ./build/../src/backend/replication/logical/reorderbuffer.c:2149 > #4 ReorderBufferCommit (rb=0xaaaac326bb00 <errordata>, xid=xid@entry=2668396569, commit_lsn=187650393290540, end_lsn=<optimizedout>, commit_time=commit_time@entry=683222349268077, origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0)at ./build/../src/backend/replication/logical/reorderbuffer.c:1770 > #5 0x0000aaaac2ebd314 in DecodeCommit (xid=2668396569, parsed=0xffffdd90f7e0, buf=0xffffdd90f960, ctx=0xaaaaf5d396a0)at ./build/../src/backend/replication/logical/decode.c:640 > #6 DecodeXactOp (ctx=ctx@entry=0xaaaaf5d396a0, buf=0xffffdd90f960, buf@entry=0xffffdd90f9c0) at ./build/../src/backend/replication/logical/decode.c:248 > #7 0x0000aaaac2ebd42c in LogicalDecodingProcessRecord (ctx=0xaaaaf5d396a0, record=0xaaaaf5d39938) at ./build/../src/backend/replication/logical/decode.c:117 > #8 0x0000aaaac2ecfdfc in XLogSendLogical () at ./build/../src/backend/replication/walsender.c:2840 > #9 0x0000aaaac2ed2228 in WalSndLoop (send_data=send_data@entry=0xaaaac2ecfd98 <XLogSendLogical>) at ./build/../src/backend/replication/walsender.c:2189 > #10 0x0000aaaac2ed2efc in StartLogicalReplication (cmd=0xaaaaf5d175a8) at ./build/../src/backend/replication/walsender.c:1133 > #11 exec_replication_command (cmd_string=cmd_string@entry=0xaaaaf5c0eb00 "START_REPLICATION SLOT cdc LOGICAL 1A2D/4B3640(\"proto_version\" '1', \"publication_names\" 'cdc')") at ./build/../src/backend/replication/walsender.c:1549 > #12 0x0000aaaac2f258a4 in PostgresMain (argc=<optimized out>, argv=argv@entry=0xaaaaf5c78cd8, dbname=<optimized out>, username=<optimizedout>) at ./build/../src/backend/tcop/postgres.c:4257 > #13 0x0000aaaac2eac338 in BackendRun (port=0xaaaaf5c68070, port=0xaaaaf5c68070) at ./build/../src/backend/postmaster/postmaster.c:4484 > #14 BackendStartup (port=0xaaaaf5c68070) at ./build/../src/backend/postmaster/postmaster.c:4167 > #15 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1725 > #16 0x0000aaaac2ead364 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1398 > #17 0x0000aaaac2c3ca5c in main (argc=5, argv=0xaaaaf5c07720) at ./build/../src/backend/main/main.c:228 > > The thing is - I can't close it with pg_terminate_backend(), and I'd > rather not kill -9, as it will, I think, close all other connections, > and this is prod server. > still makes me ask: why does Pg end up in such place,> where it > doesn't do any syscalls, doesn't accept pg_terminate_backend(), and > is using 100% of cpu? src/backend/utils/hash/dynahash.c:1448 is in the middle of a while loop, which is apparently not exiting. There is no check for interrupts in there and it is a fairly tight loop which would explain both symptoms. As to how it got that way, I have to assume data corruption or a bug of some sort. I would repost the details to hackers for better visibility. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
pgsql-general by date: