Home > mailing lists

suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?) - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?)
Date	February 27, 2025 01:08:57
Msg-id	67f7132d-3923-47a6-9de2-5b7d86ddb73f@vondra.me Whole thread Raw
Responses	Re: suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?)
List	pgsql-hackers

Tree view

Hi,

I have noticed one of my buildfarm machines - widowbird - did not report
any results since February 17. And it seems to be stuck somewhere in
amcheck:

$ ps ax | grep postgres
1180067 ?        Ss     0:02
/mnt/data/buildfarm/buildroot/HEAD/inst/bin/postgres -D data-C
1180069 ?        Ss     0:00 postgres: checkpointer
1180070 ?        Ss     0:00 postgres: background writer
1180072 ?        Ss     0:00 postgres: walwriter
1180073 ?        Ss     0:01 postgres: autovacuum launcher
1180074 ?        Ss     0:00 postgres: logical replication launcher
1180107 ?        Ss     0:05 postgres: buildfarm
contrib_regression_amcheck [local] INSERT
1180111 ?        Ss     0:00 postgres: autovacuum worker
1180134 ?        Ss     0:00 postgres: autovacuum worker
1180135 ?        Ss     0:00 postgres: autovacuum worker
1374029 pts/0    S+     0:00 grep --color=auto postgres

So there's PID 1180107, executing an insert, but not progressing. The
backtrace looks like this (first couple lines, full backtrace attached):

#0  0x0000007fa64b8ddc in __GI_epoll_pwait (epfd=5, events=0x55ad6285a8,
maxevents=1, timeout=timeout@entry=-1, set=set@entry=0x0) at
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x0000007fa64b8fe8 in epoll_wait (epfd=<optimized out>,
events=<optimized out>, maxevents=<optimized out>, timeout=timeout@entry=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:32
#2  0x000000558f043588 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ff8ed4e18, cur_timeout=-1, set=0x55ad628540) at
latch.c:1571
#3  WaitEventSetWait (set=0x55ad628540, timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7ff8ed4e18,
nevents=nevents@entry=1,
    wait_event_info=wait_event_info@entry=134217781) at latch.c:1519
#4  0x000000558f043778 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=-1,
wait_event_info=wait_event_info@entry=134217781)
    at latch.c:538
#5  0x000000558f052274 in ConditionVariableTimedSleep (cv=0x7f9ac9deb0,
timeout=timeout@entry=-1,
wait_event_info=wait_event_info@entry=134217781) at condition_variable.c:163
#6  0x000000558f05286c in ConditionVariableTimedSleep
(wait_event_info=134217781, timeout=-1, cv=<optimized out>) at
condition_variable.c:135
#7  0x000000558ed2fc90 in AdvanceXLInsertBuffer
(upto=upto@entry=608174080, tli=tli@entry=1,
opportunistic=opportunistic@entry=false) at xlog.c:2224

So, it's stuck in AdvanceXLInsertBuffer ... interesting. Another
interesting fact is it's testing 75dfde13639, which is just a couple
commits after 6a2275b895:

    commit 6a2275b8953a4462d44daf001bdd60b3d48f0946
    Author: Alexander Korotkov <akorotkov@postgresql.org>
    Date:   Mon Feb 17 04:19:01 2025 +0200

    Get rid of WALBufMappingLock

    Allow multiple backends to initialize WAL buffers concurrently.
    This way `MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in
    parallel without taking a single LWLock in exclusive mode.

    ...

which reworked AdvanceXLInsertBuffer() quite a bit, it seems. OTOH the
last (successful) run on widorbird was on eaf502747b, which already
includes 6a2275b895, so maybe it's unrelated.

Is there something else I could collect from the stuck instance, before
I restart it?

regards

-- 
Tomas Vondra

Attachment

widowbird.log

pgsql-hackers by date:

From: Tom Lane
Date: 27 February 2025, 00:58:16
Subject: Re: Anti join confusion

From: Peter Geoghegan
Date: 27 February 2025, 01:13:14
Subject: Re: suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?)

suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?) - Mailing list pgsql-hackers

Attachment

Previous

Next