On Tue, Feb 18, 2025 at 2:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Feb 17, 2025 at 11:25:05AM -0500, Tom Lane wrote:
> > This timeout failure on hachi looks suspicious as well:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2025-02-17%2003%3A05%3A03
> >
> > Might be relevant that they are both aarch64?
>
> Just logged into the host. The logs of the timed out run are still
> around, and the last information I can see is from lastcommand.log,
> which seems to have frozen in time when the timeout has begun its
> vacuuming work:
> ok 73 + index_including_gist 353 ms
> # parallel group (16 tests): create_cast errors create_aggregate drop_if_exists infinite_recurse
>
> gokiburi is on the same host, and it is currently frozen in time when
> trying to fetch a WAL buffer. One of the stack traces:
> #2 0x000000000084ec48 in WaitEventSetWaitBlock (set=0xd34ce0,
> cur_timeout=-1, occurred_events=0xffffffffadd8, nevents=1) at
> latch.c:1571
> #3 WaitEventSetWait (set=0xd34ce0, timeout=-1,
> occurred_events=occurred_events@entry=0xffffffffadd8,
> nevents=nevents@entry=1, wait_event_info=<optimized out>,
> wait_event_info@entry=134217781) at latch.c:1519
> #4 0x000000000084e964 in WaitLatch (latch=<optimized out>,
> wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=-1,
> wait_event_info=wait_event_info@entry=134217781) at latch.c:538
> #5 0x000000000085d2f8 in ConditionVariableTimedSleep
> (cv=0xffffec0799b0, timeout=-1, wait_event_info=134217781) at
> condition_variable.c:163
> #6 0x000000000085d1ec in ConditionVariableSleep
> (cv=0xfffffffffffffffc, wait_event_info=1) at condition_variable.c:98
> #7 0x000000000055f4f4 in AdvanceXLInsertBuffer
> (upto=upto@entry=112064880, tli=tli@entry=1, opportunistic=false) at
> xlog.c:2224
> #8 0x0000000000568398 in GetXLogBuffer (ptr=ptr@entry=112064880,
> tli=tli@entry=1) at xlog.c:1710
> #9 0x000000000055c650 in CopyXLogRecordToWAL (write_len=80,
> isLogSwitch=false, rdata=0xcc49b0 <hdr_rdt>, StartPos=<optimized out>,
> EndPos=<optimized out>, tli=1) at xlog.c:1245
> #10 XLogInsertRecord (rdata=rdata@entry=0xcc49b0 <hdr_rdt>,
> fpw_lsn=fpw_lsn@entry=112025520, flags=0 '\000', num_fpi=<optimized
> out>, num_fpi@entry=0, topxid_included=false) at xlog.c:928
> #11 0x000000000056b870 in XLogInsert (rmid=rmid@entry=16 '\020',
> info=<optimized out>, info@entry=16 '\020') at xloginsert.c:523
> #12 0x0000000000537acc in addLeafTuple (index=0xffffebf32950,
> state=0xffffffffd5e0, leafTuple=0xe43870, current=<optimized out>,
> parent=<optimized out>,
>
> So, yes, something looks really wrong with this patch. Sounds
> plausible to me that some other buildfarm animals could be stuck
> without their owners knowing about it. It's proving to be a good idea
> to force a timeout value in the configuration file of these animals..
Tom, Michael, thank you for the information.
This patch will be better tested before next attempt.
------
Regards,
Alexander Korotkov
Supabase