Re: Backend handling replication slot stuck using 100% cpu, unkillable - Mailing list pgsql-bugs

From shveta malik
Subject Re: Backend handling replication slot stuck using 100% cpu, unkillable
Date
Msg-id CAJpy0uBNYcvVupUPzjmOv=g8xyobC_RLA-uSrWbxpmfkZ1cFAw@mail.gmail.com
Whole thread Raw
In response to Re: Backend handling replication slot stuck using 100% cpu, unkillable  (hubert depesz lubaczewski <depesz@depesz.com>)
Responses Re: Backend handling replication slot stuck using 100% cpu, unkillable
List pgsql-bugs
On Mon, Jul 3, 2023 at 6:28 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> On Mon, Jul 03, 2023 at 01:36:32PM +0200, hubert depesz lubaczewski wrote:
> > Hi,
> > we are using debezium to get change data from Pg.
> >
> > This particular pg is 12.9, and will be soon upgrade to 14.something
> > (this thursday).
>
> So, i installed dbgsym for this pg, and this bny accident upgraded pg to
> 12.14.
>
> Now I do have debug symbols, though, so backtrace can be more
> informative.
>
> I ran:
>
> for i in 1 2 3; do date; sudo gdb -batch -p 8938 -ex bt; sleep 30; echo; done
>
> And got this:
>
> #v+
> Mon Jul  3 12:50:14 UTC 2023
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> hash_seq_search (status=status@entry=0xfffff0173f40) at ./build/../src/backend/utils/hash/dynahash.c:1439
> 1439    ./build/../src/backend/utils/hash/dynahash.c: No such file or directory.
> #0  hash_seq_search (status=status@entry=0xfffff0173f40) at ./build/../src/backend/utils/hash/dynahash.c:1439
> #1  0x0000ffffa1bc8714 in rel_sync_cache_publication_cb (arg=<optimized out>, cacheid=<optimized out>,
hashvalue=<optimizedout>) at ./build/../src/backend/replication/pgoutput/pgoutput.c:665 
> #2  0x0000aaaab4bfcef4 in CallSyscacheCallbacks (cacheid=47, hashvalue=1542357812) at
./build/../src/backend/utils/cache/inval.c:1520
> #3  0x0000aaaab4a91884 in ReorderBufferExecuteInvalidations (rb=0xffffa4ac5308 <malloc+160>, txn=0xfffff0174240,
txn=0xfffff0174240)at ./build/../src/backend/replication/logical/reorderbuffer.c:2187 
> #4  ReorderBufferCommit (rb=0xffffa4ac5308 <malloc+160>, xid=xid@entry=2741814901, commit_lsn=187650155969544,
end_lsn=<optimizedout>, commit_time=commit_time@entry=741514150878208, origin_id=origin_id@entry=0,
origin_lsn=origin_lsn@entry=0)at ./build/../src/backend/replication/logical/reorderbuffer.c:1816 
> #5  0x0000aaaab4a869bc in DecodeCommit (xid=2741814901, parsed=0xfffff0174390, buf=0xfffff0174510,
ctx=0xaaaad5e1df00)at ./build/../src/backend/replication/logical/decode.c:654 
> #6  DecodeXactOp (ctx=ctx@entry=0xaaaad5e1df00, buf=0xfffff0174510, buf@entry=0xfffff0174570) at
./build/../src/backend/replication/logical/decode.c:249
> #7  0x0000aaaab4a86ad4 in LogicalDecodingProcessRecord (ctx=0xaaaad5e1df00, record=0xaaaad5e1e198) at
./build/../src/backend/replication/logical/decode.c:117
> #8  0x0000aaaab4a996ec in XLogSendLogical () at ./build/../src/backend/replication/walsender.c:2883
> #9  0x0000aaaab4a9bbb0 in WalSndLoop (send_data=send_data@entry=0xaaaab4a99688 <XLogSendLogical>) at
./build/../src/backend/replication/walsender.c:2232
> #10 0x0000aaaab4a9c674 in StartLogicalReplication (cmd=0xaaaad5e47f90) at
./build/../src/backend/replication/walsender.c:1134
> #11 exec_replication_command (cmd_string=cmd_string@entry=0xaaaad5d1db00 "START_REPLICATION SLOT \"slot_name\"
LOGICAL1D6C/92965050 (\"proto_version\" '1', \"publication_names\" 'xxx')") at
./build/../src/backend/replication/walsender.c:1602
> #12 0x0000aaaab4af0c08 in PostgresMain (argc=<optimized out>, argv=argv@entry=0xaaaad5d7aaf8, dbname=<optimized out>,
username=<optimizedout>) at ./build/../src/backend/tcop/postgres.c:4289 
> #13 0x0000aaaab4a759a8 in BackendRun (port=0xaaaad5d76150, port=0xaaaad5d76150) at
./build/../src/backend/postmaster/postmaster.c:4517
> #14 BackendStartup (port=0xaaaad5d76150) at ./build/../src/backend/postmaster/postmaster.c:4200
> #15 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1725
> #16 0x0000aaaab4a769d4 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1398
> #17 0x0000aaaab480355c in main (argc=5, argv=0xaaaad5d16720) at ./build/../src/backend/main/main.c:228
>
> Mon Jul  3 12:50:45 UTC 2023
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> 0x0000aaaab4c1e11c in hash_search_with_hash_value (hashp=0xaaaad5dea900, keyPtr=0xfffff0173f5c,
keyPtr@entry=0xfffff0173f7c,hashvalue=1107843932, action=action@entry=HASH_FIND, foundPtr=foundPtr@entry=0x0) at
./build/../src/backend/utils/hash/dynahash.c:949
> 949     ./build/../src/backend/utils/hash/dynahash.c: No such file or directory.
> #0  0x0000aaaab4c1e11c in hash_search_with_hash_value (hashp=0xaaaad5dea900, keyPtr=0xfffff0173f5c,
keyPtr@entry=0xfffff0173f7c,hashvalue=1107843932, action=action@entry=HASH_FIND, foundPtr=foundPtr@entry=0x0) at
./build/../src/backend/utils/hash/dynahash.c:949
> #1  0x0000aaaab4c1e79c in hash_search (hashp=<optimized out>, keyPtr=keyPtr@entry=0xfffff0173f7c,
action=action@entry=HASH_FIND,foundPtr=foundPtr@entry=0x0) at ./build/../src/backend/utils/hash/dynahash.c:911 
> #2  0x0000aaaab4c07180 in RelationCacheInvalidateEntry (relationId=<optimized out>) at
./build/../src/backend/utils/cache/relcache.c:2820
> #3  0x0000aaaab4bfcfe0 in LocalExecuteInvalidationMessage (msg=0xffffa033ab88) at
./build/../src/backend/utils/cache/inval.c:603
> #4  0x0000aaaab4a91884 in ReorderBufferExecuteInvalidations (rb=0xffffa4ac5308 <malloc+160>, txn=0xfffff0174240,
txn=0xfffff0174240)at ./build/../src/backend/replication/logical/reorderbuffer.c:2187 
> #5  ReorderBufferCommit (rb=0xffffa4ac5308 <malloc+160>, xid=xid@entry=2741814901, commit_lsn=187650155969544,
end_lsn=<optimizedout>, commit_time=commit_time@entry=741514150878208, origin_id=origin_id@entry=0,
origin_lsn=origin_lsn@entry=0)at ./build/../src/backend/replication/logical/reorderbuffer.c:1816 
> #6  0x0000aaaab4a869bc in DecodeCommit (xid=2741814901, parsed=0xfffff0174390, buf=0xfffff0174510,
ctx=0xaaaad5e1df00)at ./build/../src/backend/replication/logical/decode.c:654 
> #7  DecodeXactOp (ctx=ctx@entry=0xaaaad5e1df00, buf=0xfffff0174510, buf@entry=0xfffff0174570) at
./build/../src/backend/replication/logical/decode.c:249
> #8  0x0000aaaab4a86ad4 in LogicalDecodingProcessRecord (ctx=0xaaaad5e1df00, record=0xaaaad5e1e198) at
./build/../src/backend/replication/logical/decode.c:117
> #9  0x0000aaaab4a996ec in XLogSendLogical () at ./build/../src/backend/replication/walsender.c:2883
> #10 0x0000aaaab4a9bbb0 in WalSndLoop (send_data=send_data@entry=0xaaaab4a99688 <XLogSendLogical>) at
./build/../src/backend/replication/walsender.c:2232
> #11 0x0000aaaab4a9c674 in StartLogicalReplication (cmd=0xaaaad5e47f90) at
./build/../src/backend/replication/walsender.c:1134
> #12 exec_replication_command (cmd_string=cmd_string@entry=0xaaaad5d1db00 "START_REPLICATION SLOT \"slot_name\"
LOGICAL1D6C/92965050 (\"proto_version\" '1', \"publication_names\" 'xxx')") at
./build/../src/backend/replication/walsender.c:1602
> #13 0x0000aaaab4af0c08 in PostgresMain (argc=<optimized out>, argv=argv@entry=0xaaaad5d7aaf8, dbname=<optimized out>,
username=<optimizedout>) at ./build/../src/backend/tcop/postgres.c:4289 
> #14 0x0000aaaab4a759a8 in BackendRun (port=0xaaaad5d76150, port=0xaaaad5d76150) at
./build/../src/backend/postmaster/postmaster.c:4517
> #15 BackendStartup (port=0xaaaad5d76150) at ./build/../src/backend/postmaster/postmaster.c:4200
> #16 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1725
> #17 0x0000aaaab4a769d4 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1398
> #18 0x0000aaaab480355c in main (argc=5, argv=0xaaaad5d16720) at ./build/../src/backend/main/main.c:228
>
> Mon Jul  3 12:51:16 UTC 2023
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> hash_seq_search (status=status@entry=0xfffff0173f40) at ./build/../src/backend/utils/hash/dynahash.c:1439
> 1439    ./build/../src/backend/utils/hash/dynahash.c: No such file or directory.
> #0  hash_seq_search (status=status@entry=0xfffff0173f40) at ./build/../src/backend/utils/hash/dynahash.c:1439
> #1  0x0000ffffa1bc8714 in rel_sync_cache_publication_cb (arg=<optimized out>, cacheid=<optimized out>,
hashvalue=<optimizedout>) at ./build/../src/backend/replication/pgoutput/pgoutput.c:665 
> #2  0x0000aaaab4bfcef4 in CallSyscacheCallbacks (cacheid=47, hashvalue=3011071378) at
./build/../src/backend/utils/cache/inval.c:1520
> #3  0x0000aaaab4a91884 in ReorderBufferExecuteInvalidations (rb=0xffffa4ac5308 <malloc+160>, txn=0xfffff0174240,
txn=0xfffff0174240)at ./build/../src/backend/replication/logical/reorderbuffer.c:2187 
> #4  ReorderBufferCommit (rb=0xffffa4ac5308 <malloc+160>, xid=xid@entry=2741814901, commit_lsn=187650155969544,
end_lsn=<optimizedout>, commit_time=commit_time@entry=741514150878208, origin_id=origin_id@entry=0,
origin_lsn=origin_lsn@entry=0)at ./build/../src/backend/replication/logical/reorderbuffer.c:1816 
> #5  0x0000aaaab4a869bc in DecodeCommit (xid=2741814901, parsed=0xfffff0174390, buf=0xfffff0174510,
ctx=0xaaaad5e1df00)at ./build/../src/backend/replication/logical/decode.c:654 
> #6  DecodeXactOp (ctx=ctx@entry=0xaaaad5e1df00, buf=0xfffff0174510, buf@entry=0xfffff0174570) at
./build/../src/backend/replication/logical/decode.c:249
> #7  0x0000aaaab4a86ad4 in LogicalDecodingProcessRecord (ctx=0xaaaad5e1df00, record=0xaaaad5e1e198) at
./build/../src/backend/replication/logical/decode.c:117
> #8  0x0000aaaab4a996ec in XLogSendLogical () at ./build/../src/backend/replication/walsender.c:2883
> #9  0x0000aaaab4a9bbb0 in WalSndLoop (send_data=send_data@entry=0xaaaab4a99688 <XLogSendLogical>) at
./build/../src/backend/replication/walsender.c:2232
> #10 0x0000aaaab4a9c674 in StartLogicalReplication (cmd=0xaaaad5e47f90) at
./build/../src/backend/replication/walsender.c:1134
> #11 exec_replication_command (cmd_string=cmd_string@entry=0xaaaad5d1db00 "START_REPLICATION SLOT \"slot_name\"
LOGICAL1D6C/92965050 (\"proto_version\" '1', \"publication_names\" 'xxx')") at
./build/../src/backend/replication/walsender.c:1602
> #12 0x0000aaaab4af0c08 in PostgresMain (argc=<optimized out>, argv=argv@entry=0xaaaad5d7aaf8, dbname=<optimized out>,
username=<optimizedout>) at ./build/../src/backend/tcop/postgres.c:4289 
> #13 0x0000aaaab4a759a8 in BackendRun (port=0xaaaad5d76150, port=0xaaaad5d76150) at
./build/../src/backend/postmaster/postmaster.c:4517
> #14 BackendStartup (port=0xaaaad5d76150) at ./build/../src/backend/postmaster/postmaster.c:4200
> #15 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1725
> #16 0x0000aaaab4a769d4 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1398
> #17 0x0000aaaab480355c in main (argc=5, argv=0xaaaad5d16720) at ./build/../src/backend/main/main.c:228
> #v-
>
> Based on suggestion from IRC, i tried "return 0" and contiunue in gdb session.
>
> backtrace changed to:
>
> #v+
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> 0x0000aaaab4bfcf84 in LocalExecuteInvalidationMessage (msg=0xffff9ff33e18) at
./build/../src/backend/utils/cache/inval.c:578
> 578     ./build/../src/backend/utils/cache/inval.c: No such file or directory.
> #0  0x0000aaaab4bfcf84 in LocalExecuteInvalidationMessage (msg=0xffff9ff33e18) at
./build/../src/backend/utils/cache/inval.c:578
> #1  0x0000aaaab4a91884 in ReorderBufferExecuteInvalidations (rb=0xffffa4ac5308 <malloc+160>, txn=0xfffff0174240,
txn=0xfffff0174240)at ./build/../src/backend/replication/logical/reorderbuffer.c:2187 
> #2  ReorderBufferCommit (rb=0xffffa4ac5308 <malloc+160>, xid=xid@entry=2741814901, commit_lsn=187650155969544,
end_lsn=<optimizedout>, commit_time=commit_time@entry=741514150878208, origin_id=origin_id@entry=0,
origin_lsn=origin_lsn@entry=0)at ./build/../src/backend/replication/logical/reorderbuffer.c:1816 
> #3  0x0000aaaab4a869bc in DecodeCommit (xid=2741814901, parsed=0xfffff0174390, buf=0xfffff0174510,
ctx=0xaaaad5e1df00)at ./build/../src/backend/replication/logical/decode.c:654 
> #4  DecodeXactOp (ctx=ctx@entry=0xaaaad5e1df00, buf=0xfffff0174510, buf@entry=0xfffff0174570) at
./build/../src/backend/replication/logical/decode.c:249
> #5  0x0000aaaab4a86ad4 in LogicalDecodingProcessRecord (ctx=0xaaaad5e1df00, record=0xaaaad5e1e198) at
./build/../src/backend/replication/logical/decode.c:117
> #6  0x0000aaaab4a996ec in XLogSendLogical () at ./build/../src/backend/replication/walsender.c:2883
> #7  0x0000aaaab4a9bbb0 in WalSndLoop (send_data=send_data@entry=0xaaaab4a99688 <XLogSendLogical>) at
./build/../src/backend/replication/walsender.c:2232
> #8  0x0000aaaab4a9c674 in StartLogicalReplication (cmd=0xaaaad5e47f90) at
./build/../src/backend/replication/walsender.c:1134
> #9  exec_replication_command (cmd_string=cmd_string@entry=0xaaaad5d1db00 "START_REPLICATION SLOT
\"data_access_platform_cdc\"LOGICAL 1D6C/92965050 (\"proto_version\" '1', \"publication_names\" 'cdc')") at
./build/../src/backend/replication/walsender.c:1602
> #10 0x0000aaaab4af0c08 in PostgresMain (argc=<optimized out>, argv=argv@entry=0xaaaad5d7aaf8, dbname=<optimized out>,
username=<optimizedout>) at ./build/../src/backend/tcop/postgres.c:4289 
> #11 0x0000aaaab4a759a8 in BackendRun (port=0xaaaad5d76150, port=0xaaaad5d76150) at
./build/../src/backend/postmaster/postmaster.c:4517
> #12 BackendStartup (port=0xaaaad5d76150) at ./build/../src/backend/postmaster/postmaster.c:4200
> #13 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1725
> #14 0x0000aaaab4a769d4 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1398
> #15 0x0000aaaab480355c in main (argc=5, argv=0xaaaad5d16720) at ./build/../src/backend/main/main.c:228
> #v-
>
> and then it went back to hash_seq_search  :(
>
> Anything I can do about it?
>

Please check email thread [1]. Going by the stack trace shared by you,
this problem looks exactly the same. I feel upgrading to PG14 will
probably solve this issue.

[1]: https://www.postgresql.org/message-id/flat/17716-1fe42e7b44fc2f25%40postgresql.org

thanks
Shveta



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.
Next
From: Sandeep Thakkar
Date:
Subject: Re: BUG #18012: Installer fails to run .bat files when they are registered to Notepad++