Re: pgsql: Improve runtime and output of tests for replication slots checkp - Mailing list pgsql-committers

From Tom Lane
Subject Re: pgsql: Improve runtime and output of tests for replication slots checkp
Date
Msg-id 2593441.1750462258@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql: Improve runtime and output of tests for replication slots checkp  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-committers
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I think this indicates unfinished intention to wait for checkpoint
> completion.  But I think both cases (checkpoint finished and
> unfinished) should work correctly.  So, I believe there is a backend
> problem.  I'm trying to reproduce this locally.  Sorry for the
> confusion.

I've found it not hugely difficult to reproduce: on two different
Mac M4 machines, repeated runs of 046_checkpoint_logical_slot.pl
hung up after 30 or so tries.  Backtrace of the stuck query
looks like

 (lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001836b01c8 libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x000000018358c6f4 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x000000010070d358 postgres`pg_usleep(microsec=1000) at pgsleep.c:50:10 [opt]
    frame #3: 0x00000001002a0508 postgres`read_local_xlog_page_guts(state=0x0000000126057298, targetPagePtr=687882240,
reqLen=277,targetRecPtr=<unavailable>, cur_page="\U00000018\xd1\U00000005", wait_for_wal=true) at xlogutils.c:950:4
[opt]
    frame #4: 0x0000000100298f0c postgres`ReadPageInternal(state=0x0000000126057298, pageptr=687882240, reqLen=277) at
xlogreader.c:1061:12[opt] 
    frame #5: 0x0000000100298300 postgres`XLogReadAhead [inlined] XLogDecodeNextRecord(state=0x0000000126057298,
nonblocking=<unavailable>)at xlogreader.c:727:14 [opt] 
    frame #6: 0x00000001002980b4 postgres`XLogReadAhead(state=0x0000000126057298, nonblocking=false) at
xlogreader.c:974:11[opt] 
    frame #7: 0x0000000100297fd4 postgres`XLogReadRecord(state=0x0000000126057298, errormsg=0x000000016fc24d08) at
xlogreader.c:405:3[opt] 
    frame #8: 0x00000001004ee514 postgres`pg_logical_slot_get_changes_guts(fcinfo=<unavailable>, confirm=true,
binary=<unavailable>)at logicalfuncs.c:257:13 [opt] 
    frame #9: 0x00000001004ee1e8 postgres`pg_logical_slot_get_changes(fcinfo=<unavailable>) at logicalfuncs.c:333:9
[opt]
    frame #10: 0x00000001003a77ec postgres`ExecMakeTableFunctionResult(setexpr=0x000000012603da40,
econtext=<unavailable>,argContext=<unavailable>, expectedDesc=0x00000001260472b0, randomAccess=false) at
execSRF.c:234:13[opt] 
    frame #11: 0x00000001003ba02c postgres`FunctionNext(node=0x000000012603d6d0) at nodeFunctionscan.c:94:5 [opt]
    frame #12: 0x00000001003b4894 postgres`fetch_input_tuple [inlined] ExecProcNode(node=0x000000012603d6d0) at
executor.h:313:9[opt] 
    frame #13: 0x00000001003b4878 postgres`fetch_input_tuple(aggstate=0x000000012603cf28) at nodeAgg.c:563:10 [opt]
    frame #14: 0x00000001003b2eb4 postgres`ExecAgg [inlined] agg_retrieve_direct(aggstate=0x000000012603cf28) at
nodeAgg.c:2450:17[opt] 
    frame #15: 0x00000001003b2ab8 postgres`ExecAgg(pstate=0x000000012603cf28) at nodeAgg.c:2265:14 [opt]

so apparently it's waiting patiently for WAL that will never appear.
(I do see the 'checkpoint complete' log message before the shutdown
in this particular run.)  I still have this debugger session open
if there's any particular data you'd like me to look at.

            regards, tom lane



pgsql-committers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: pgsql: Improve runtime and output of tests for replication slots checkp
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Improve runtime and output of tests for replication slots checkp