Non-reproducible AIO failure - Mailing list pgsql-hackers

From Tom Lane
Subject Non-reproducible AIO failure
Date
Msg-id 1500090.1745443021@sss.pgh.pa.us
Whole thread Raw
Responses Re: Non-reproducible AIO failure
List pgsql-hackers
My buildfarm animal sifaka just failed like this [1]:

TRAP: failed Assert("aio_ret->result.status != PGAIO_RS_UNKNOWN"), File: "bufmgr.c", Line: 1605, PID: 79322
0   postgres                            0x0000000100e3df2c ExceptionalCondition + 108
1   postgres                            0x0000000100cbb154 WaitReadBuffers + 616
2   postgres                            0x0000000100f072bc read_stream_next_buffer.cold.1 + 184
3   postgres                            0x0000000100cb6dd8 read_stream_next_buffer + 300
4   postgres                            0x00000001009b75b4 heap_fetch_next_buffer + 136
5   postgres                            0x00000001009ad8c4 heapgettup + 368
6   postgres                            0x00000001009ad364 heap_getnext + 104
7   postgres                            0x00000001009b9770 heapam_index_build_range_scan + 700
8   postgres                            0x00000001009ef4b8 spgbuild + 480
9   postgres                            0x0000000100a41acc index_build + 428
10  postgres                            0x0000000100a43d14 reindex_index + 752
11  postgres                            0x0000000100ad09e0 ExecReindex + 1908
12  postgres                            0x0000000100d04628 ProcessUtilitySlow + 1300
13  postgres                            0x0000000100d03754 standard_ProcessUtility + 1528
14  pg_stat_statements.dylib            0x000000010158d7a0 pgss_ProcessUtility + 632
15  postgres                            0x0000000100d02cec PortalRunUtility + 184
16  postgres                            0x0000000100d02354 PortalRunMulti + 236
17  postgres                            0x0000000100d01d94 PortalRun + 456
18  postgres                            0x0000000100d00e10 exec_simple_query + 1308
...
2025-04-23 16:08:28.834 EDT [78788:4] LOG:  client backend (PID 79322) was terminated by signal 6: Abort trap: 6
2025-04-23 16:08:28.834 EDT [78788:5] DETAIL:  Failed process was running: reindex index spgist_point_idx;


Scraping the buildfarm database confirmed that there have been no
other similar failures since AIO went in, and unsurprisingly some
attempts to reproduce it manually didn't bear fruit.  So I'm
suspecting a race condition with a very narrow window, but I don't
have an idea where to look.  Any thoughts on how to gather more
data?

            regards, tom lane

[1]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=sifaka&dt=2025-04-23%2020%3A03%3A24&stg=recovery-check



pgsql-hackers by date:

Previous
From: Matheus Alcantara
Date:
Subject: Re: extension_control_path and "directory"
Next
From: "Robin Haberkorn"
Date:
Subject: Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details