Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Non-reproducible AIO failure
Date
Msg-id d5cheojbap2frawwyi66csyuj5k7eevblsmxywbgfhrqtijpkc@hyvfisha3fyy
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Andres Freund <andres@anarazel.de>)
Responses Re: Non-reproducible AIO failure
List pgsql-hackers
Hi,

Thanks for working on investigating this.


On 2025-06-03 08:00:01 +0300, Alexander Lakhin wrote:
> 02.06.2025 09:00, Alexander Lakhin wrote:
> > With additional logging (the patch is attached), I can see the following:
> > ...
> > !!!pgaio_io_reclaim [63817]| ioh: 0x1046b5660, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2
> > !!!AsyncReadBuffers [63817] (1)| blocknum: 18, ioh: 0x1046b5660,
> > ioh->op: 0, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0
> > 2025-06-02 00:45:11.391 EDT [63817:14] pg_regress/brin LOG:
> > !!!pgaio_io_before_start| ioh: 0x1046b5660, ioh->op: 1, ioh->state: 1,
> > ioh->result: 0, ioh->num_callbacks: 2
> > 
> > TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 63817
> 
> I've added logging of the ioh->generation field at the beginning and at
> the end of pgaio_io_reclaim() (the complete patch is attached) and got
> even more interesting output:
> ...
> !!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 1, ioh->state: 6,
> ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 21692
> !!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 0, ioh->generation: 21693
> !!!AsyncReadBuffers [25175] (1)| blocknum: 40, ioh: 0x104c3e1a0, ioh->op: 0,
> ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 21693
> !!!pgaio_io_start_readv [25175]| ioh: 0x104c3e1a0, ioh->op: 1, ioh->generation: 21693

IO starts.


> !!!pgaio_io_reclaim [25176]| ioh: 0x104c3e080, ioh->op: 1, ioh->state: 6,
> ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 21408
> !!!pgaio_io_reclaim [25176]| ioh: 0x104c3e080, ioh->op: 0, ioh->generation: 21409
> !!!AsyncReadBuffers [25176] (1)| blocknum: 49, ioh: 0x104c3e080, ioh->op: 0,
> ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 21409
> !!!pgaio_io_start_readv [25176]| ioh: 0x104c3e080, ioh->op: 1, ioh->generation: 21409

(other backend / other IO)


> !!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 1, ioh->state: 6,
> ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 21693
> !!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 0, ioh->generation: 21694

I completes and is marked as idle, as we'd expect.


> !!!AsyncReadBuffers [25175] (1)| blocknum: 41, ioh: 0x104c3e1a0, ioh->op: 0,
> ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 21694

And here, before starting a new IO, the op is 0 (i.e. PGAIO_OP_INVALID), as we
would expect.


> 2025-06-03 00:19:09.282 EDT [25175:1] LOG: !!!pgaio_io_before_start| ioh:
> 0x104c3e1a0, ioh->op: 1, ioh->state: 1, ioh->result: 0, ioh->num_callbacks:
> 2, ioh->generation: 21694

But here it somehow turned to 1 (PGAIO_OP_READV), despite there not being any
"normal" reason for that. We know that the IO wasn't actually started as
otherwise pgaio_io_start_readv() would have logged that fact.

Which then immediately leads to an assert failure:

> 2025-06-03 00:19:09.282 EDT [25175:2] STATEMENT:  explain (analyze, format 'json')
>           select count(*) from join_foo
>             left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
>             on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
> 
> TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 25175
> 
> 0   postgres                            0x0000000103563550 ExceptionalCondition + 236
> 1   postgres                            0x000000010326a260 pgaio_io_before_start + 516
> 2   postgres                            0x0000000103269fb4 pgaio_io_start_readv + 36
> 3   postgres                            0x0000000103289144 FileStartReadV + 252
> 4   postgres                            0x00000001032dc424 mdstartreadv + 668
> 5   postgres                            0x00000001032dfa0c smgrstartreadv + 116
> 6   postgres                            0x00000001032723b8 AsyncReadBuffers + 2028
> 7   postgres                            0x0000000103271354 StartReadBuffersImpl + 1196
> 8   postgres                            0x0000000103270e98 StartReadBuffers + 64
> 9   postgres                            0x000000010326e4a0 read_stream_start_pending_read + 1204
> 10  postgres                            0x000000010326db84 read_stream_look_ahead + 812
> 11  postgres                            0x000000010326d71c read_stream_next_buffer + 2356

I don't see how this can happen.

There is no mac specific code in AIO as of now, why aren't we seeing this on
any other platform?

IME the theory that there is something outside of a subsystem randomly
overwriting memory is almost always wrong, but I'm kinda starting to suspect
it here nonetheless.


I wonder if it's worth trying to put a fair bit of "padding" memory
before/after AIO's shared memory region(s). If that were to stop the crashes,
it'd likely be something random writing there. Unfortunately continuing to
crash wouldn't necessarily guarantee it's aio specific.

Hm. We had issues with wrong size calculations of shmem regions before, what
about writing a patch that puts mprotect(PROT_NONE) pages before/after each
ShmemAlloc'd region?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: a couple of small cleanup patches for DSM-related code
Next
From: Peter Eisentraut
Date:
Subject: Re: pg_upgrade: warn about roles with md5 passwords