Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Non-reproducible AIO failure
Date
Msg-id 2989628.1748056632@sss.pgh.pa.us
Whole thread Raw
In response to Non-reproducible AIO failure  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Alexander Lakhin <exclusion@gmail.com> writes:
> FWIW, that Assert have just triggered on another mac:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2025-05-23%2020%3A30%3A07

Yeah, I was just looking at that too.  There is a corefile
from that crash, but lldb seems unable to extract anything
from it :-(.  There is a low-resolution stack trace in the
postmaster log though:

0   postgres                            0x0000000105299c84 ExceptionalCondition + 108
1   postgres                            0x00000001051159ac WaitReadBuffers + 616
2   postgres                            0x00000001053611ec read_stream_next_buffer.cold.1 + 184
3   postgres                            0x0000000105111630 read_stream_next_buffer + 300
4   postgres                            0x0000000104e0b994 heap_fetch_next_buffer + 136
5   postgres                            0x0000000104e018f4 heapgettup_pagemode + 204
6   postgres                            0x0000000104e02010 heap_getnextslot + 84
7   postgres                            0x0000000104faebb4 SeqNext + 160


Of note here is that indri and sifaka run on the same host ---
indri uses some MacPorts packages while sifaka doesn't, but
that shouldn't have anything to do with our AIO code.

I trawled the buildfarm database and confirmed that these two crashes
are our only similar reports (grepping for "PGAIO_RS_UNKNOWN"):

 sysname | branch |      snapshot       |     stage     |                                                      l
                                              

---------+--------+---------------------+---------------+-------------------------------------------------------------------------------------------------------------
 indri   | HEAD   | 2025-05-23 20:30:07 | recoveryCheck | TRAP: failed Assert("aio_ret->result.status !=
PGAIO_RS_UNKNOWN"),File: "bufmgr.c", Line: 1605, PID: 20931 
 sifaka  | HEAD   | 2025-04-23 20:03:24 | recoveryCheck | TRAP: failed Assert("aio_ret->result.status !=
PGAIO_RS_UNKNOWN"),File: "bufmgr.c", Line: 1605, PID: 79322 
(2 rows)

So it seems that "very low-probability issue in our Mac AIO code" is
the most probable description.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Retiring some encodings?
Next
From: Amit Kapila
Date:
Subject: Re: Random subscription 021_twophase test failure on kestrel