Alexander Korotkov <aekorotkov@gmail.com> writes:
> I think this indicates unfinished intention to wait for checkpoint
> completion. But I think both cases (checkpoint finished and
> unfinished) should work correctly. So, I believe there is a backend
> problem. I'm trying to reproduce this locally. Sorry for the
> confusion.
I've found it not hugely difficult to reproduce: on two different
Mac M4 machines, repeated runs of 046_checkpoint_logical_slot.pl
hung up after 30 or so tries. Backtrace of the stuck query
looks like
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00000001836b01c8 libsystem_kernel.dylib`__semwait_signal + 8
frame #1: 0x000000018358c6f4 libsystem_c.dylib`nanosleep + 220
frame #2: 0x000000010070d358 postgres`pg_usleep(microsec=1000) at pgsleep.c:50:10 [opt]
frame #3: 0x00000001002a0508 postgres`read_local_xlog_page_guts(state=0x0000000126057298, targetPagePtr=687882240,
reqLen=277,targetRecPtr=<unavailable>, cur_page="\U00000018\xd1\U00000005", wait_for_wal=true) at xlogutils.c:950:4
[opt]
frame #4: 0x0000000100298f0c postgres`ReadPageInternal(state=0x0000000126057298, pageptr=687882240, reqLen=277) at
xlogreader.c:1061:12[opt]
frame #5: 0x0000000100298300 postgres`XLogReadAhead [inlined] XLogDecodeNextRecord(state=0x0000000126057298,
nonblocking=<unavailable>)at xlogreader.c:727:14 [opt]
frame #6: 0x00000001002980b4 postgres`XLogReadAhead(state=0x0000000126057298, nonblocking=false) at
xlogreader.c:974:11[opt]
frame #7: 0x0000000100297fd4 postgres`XLogReadRecord(state=0x0000000126057298, errormsg=0x000000016fc24d08) at
xlogreader.c:405:3[opt]
frame #8: 0x00000001004ee514 postgres`pg_logical_slot_get_changes_guts(fcinfo=<unavailable>, confirm=true,
binary=<unavailable>)at logicalfuncs.c:257:13 [opt]
frame #9: 0x00000001004ee1e8 postgres`pg_logical_slot_get_changes(fcinfo=<unavailable>) at logicalfuncs.c:333:9
[opt]
frame #10: 0x00000001003a77ec postgres`ExecMakeTableFunctionResult(setexpr=0x000000012603da40,
econtext=<unavailable>,argContext=<unavailable>, expectedDesc=0x00000001260472b0, randomAccess=false) at
execSRF.c:234:13[opt]
frame #11: 0x00000001003ba02c postgres`FunctionNext(node=0x000000012603d6d0) at nodeFunctionscan.c:94:5 [opt]
frame #12: 0x00000001003b4894 postgres`fetch_input_tuple [inlined] ExecProcNode(node=0x000000012603d6d0) at
executor.h:313:9[opt]
frame #13: 0x00000001003b4878 postgres`fetch_input_tuple(aggstate=0x000000012603cf28) at nodeAgg.c:563:10 [opt]
frame #14: 0x00000001003b2eb4 postgres`ExecAgg [inlined] agg_retrieve_direct(aggstate=0x000000012603cf28) at
nodeAgg.c:2450:17[opt]
frame #15: 0x00000001003b2ab8 postgres`ExecAgg(pstate=0x000000012603cf28) at nodeAgg.c:2265:14 [opt]
so apparently it's waiting patiently for WAL that will never appear.
(I do see the 'checkpoint complete' log message before the shutdown
in this particular run.) I still have this debugger session open
if there's any particular data you'd like me to look at.
regards, tom lane