Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Non-reproducible AIO failure
Date
Msg-id CA+hUKGK6ujMT5myrEkgQ+n-N3rquZA4haHfJszQVe4ofHd6z6A@mail.gmail.com
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Konstantin Knizhnik <knizhnik@garret.ru>)
List pgsql-hackers
On Sun, Aug 24, 2025 at 5:32 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> On 20/08/2025 9:00 PM, Alexander Lakhin wrote:
> > for i in {1..10}; do np=$((20 + $RANDOM % 10)); echo "iteration $i:
> > $np"; time parallel -j40 --linebuffer --tag  /tmp/repro-AIO-Assert.sh
> > {} ::: `seq $np` || break; sleep $(($RANDOM % 20)); done; echo -e "\007"
>
>
> Unfortunately I was not able to reproduce the problem on my MacBook M4
> Pro (which should be identical with yours) using your instructions: I
> tried five times but in all cases 10 iterations are completed normally.

I can't reproduce it either.  I think the sequence is:

IO worker: pgaio_io_update_state(ioh, PGAIO_HS_COMPLETED_SHARED)
Backend:   pgaio_io_wait(ioh, generation)
             pgaio_io_was_recycled(ioh, generation, &state)
               case for state == PGAIO_HS_COMPLETED_SHARED
                 pgaio_io_reclaim(boh)
                   pgaio_io_update_state(ioh, PGAIO_HS_IDLE)
                   ioh->op = PGAIO_OP_INVALID;
Backend:   pgaio_io_acquire_nb()
             Assert(ioh->state == PGAIO_HS_IDLE) *fails*

pgaio_io_update_state() is essentially:

    pg_write_barrier();
    ioh->state = new_state;

I looked at the code generated by a non-optimised build:

(lldb) dis --name pgaio_io_update_state
...
postgres[0x100687fc4] <+372>: dmb    ish                    XXX memory barrier
postgres[0x100687fc8] <+376>: ldur   w10, [x29, #-0xc]
postgres[0x100687fcc] <+380>: ldur   x9, [x29, #-0x8]
postgres[0x100687fd0] <+384>: ldrh   w8, [x9]               XXX load
state + target
postgres[0x100687fd4] <+388>: ldrb   w11, [x9, #0x2]        XXX load op
postgres[0x100687fd8] <+392>: orr    w8, w8, w11, lsl #16
postgres[0x100687fdc] <+396>: and    w10, w10, #0xff
postgres[0x100687fe0] <+400>: and    w8, w8, #0xffffff00
postgres[0x100687fe4] <+404>: orr    w10, w8, w10
postgres[0x100687fe8] <+408>: lsr    w8, w10, #16
postgres[0x100687fec] <+412>: strh   w10, [x9]              XXX store
state + target
postgres[0x100687ff0] <+416>: strb   w8, [x9, #0x2]         XXX store op
postgres[0x100687ff4] <+420>: ldp    x29, x30, [sp, #0x60]
postgres[0x100687ff8] <+424>: add    sp, sp, #0x70
postgres[0x100687ffc] <+428>: ret

Even the -O3 version does that, it just uses fancier instructions:

postgres[0x100687c08] <+316>: dmb    ish
postgres[0x100687c0c] <+320>: ldurh  w8, [x20, #0x1]
postgres[0x100687c10] <+324>: bfi    w19, w8, #8, #16
postgres[0x100687c14] <+328>: lsr    w8, w8, #8
postgres[0x100687c18] <+332>: strb   w8, [x20, #0x2]       XXX clobber op
postgres[0x100687c1c] <+336>: strh   w19, [x20]

Could it be that the store buffer was flushed between the two stores,
pgaio_io_was_recycled() saw the new state, pgaio_io_reclaim() assigned
ioh->op = PGAIO_OP_INVALID and it was flushed to L1, and then finally
the time travelling op value was flushed and clobbered it?

Apple clang 17 output (= LLVM 19) with optimisation is smart enough to
use a single store:

postgres[0x1002f0ae8] <+308>: dmb    ish
postgres[0x1002f0aec] <+312>: strb   w19, [x20]

That's also how open source clang 17 compiles it if you rip out the
bitfield.  I guess if you do that, you won't be able to reproduce
this, Alexander?  Something like:

 struct PgAioHandle
 {
        /* all state updates should go through pgaio_io_update_state() */
-       PgAioHandleState state:8;
+       uint8           state;

        /* what are we operating on */
-       PgAioTargetID target:8;
+       uint8           target;

        /* which IO operation */
-       PgAioOp         op:8;
+       uint8           op;

Curiously, Apple clang 17 without optimisation does some amount of
unnecessary clobbering, but not the whole bitset and it's only a
single 16 bit store.

postgres[0x10057bfb0] <+348>: dmb    ish
postgres[0x10057bfb4] <+352>: ldur   w10, [x29, #-0xc]
postgres[0x10057bfb8] <+356>: ldur   x9, [x29, #-0x8]
postgres[0x10057bfbc] <+360>: ldrh   w8, [x9]
postgres[0x10057bfc0] <+364>: and    w10, w10, #0xff
postgres[0x10057bfc4] <+368>: and    w8, w8, #0xffffff00
postgres[0x10057bfc8] <+372>: orr    w8, w8, w10
postgres[0x10057bfcc] <+376>: strh   w8, [x9]



pgsql-hackers by date:

Previous
From: Srinath Reddy Sadipiralla
Date:
Subject: Re: Potential problem in commit f777d773878 and 4f7f7b03758
Next
From: Andres Freund
Date:
Subject: Re: Test instability when pg_dump orders by OID