Re: Non-reproducible AIO failure - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Non-reproducible AIO failure |
Date | |
Msg-id | CA+hUKGK6ujMT5myrEkgQ+n-N3rquZA4haHfJszQVe4ofHd6z6A@mail.gmail.com Whole thread Raw |
In response to | Re: Non-reproducible AIO failure (Konstantin Knizhnik <knizhnik@garret.ru>) |
List | pgsql-hackers |
On Sun, Aug 24, 2025 at 5:32 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > On 20/08/2025 9:00 PM, Alexander Lakhin wrote: > > for i in {1..10}; do np=$((20 + $RANDOM % 10)); echo "iteration $i: > > $np"; time parallel -j40 --linebuffer --tag /tmp/repro-AIO-Assert.sh > > {} ::: `seq $np` || break; sleep $(($RANDOM % 20)); done; echo -e "\007" > > > Unfortunately I was not able to reproduce the problem on my MacBook M4 > Pro (which should be identical with yours) using your instructions: I > tried five times but in all cases 10 iterations are completed normally. I can't reproduce it either. I think the sequence is: IO worker: pgaio_io_update_state(ioh, PGAIO_HS_COMPLETED_SHARED) Backend: pgaio_io_wait(ioh, generation) pgaio_io_was_recycled(ioh, generation, &state) case for state == PGAIO_HS_COMPLETED_SHARED pgaio_io_reclaim(boh) pgaio_io_update_state(ioh, PGAIO_HS_IDLE) ioh->op = PGAIO_OP_INVALID; Backend: pgaio_io_acquire_nb() Assert(ioh->state == PGAIO_HS_IDLE) *fails* pgaio_io_update_state() is essentially: pg_write_barrier(); ioh->state = new_state; I looked at the code generated by a non-optimised build: (lldb) dis --name pgaio_io_update_state ... postgres[0x100687fc4] <+372>: dmb ish XXX memory barrier postgres[0x100687fc8] <+376>: ldur w10, [x29, #-0xc] postgres[0x100687fcc] <+380>: ldur x9, [x29, #-0x8] postgres[0x100687fd0] <+384>: ldrh w8, [x9] XXX load state + target postgres[0x100687fd4] <+388>: ldrb w11, [x9, #0x2] XXX load op postgres[0x100687fd8] <+392>: orr w8, w8, w11, lsl #16 postgres[0x100687fdc] <+396>: and w10, w10, #0xff postgres[0x100687fe0] <+400>: and w8, w8, #0xffffff00 postgres[0x100687fe4] <+404>: orr w10, w8, w10 postgres[0x100687fe8] <+408>: lsr w8, w10, #16 postgres[0x100687fec] <+412>: strh w10, [x9] XXX store state + target postgres[0x100687ff0] <+416>: strb w8, [x9, #0x2] XXX store op postgres[0x100687ff4] <+420>: ldp x29, x30, [sp, #0x60] postgres[0x100687ff8] <+424>: add sp, sp, #0x70 postgres[0x100687ffc] <+428>: ret Even the -O3 version does that, it just uses fancier instructions: postgres[0x100687c08] <+316>: dmb ish postgres[0x100687c0c] <+320>: ldurh w8, [x20, #0x1] postgres[0x100687c10] <+324>: bfi w19, w8, #8, #16 postgres[0x100687c14] <+328>: lsr w8, w8, #8 postgres[0x100687c18] <+332>: strb w8, [x20, #0x2] XXX clobber op postgres[0x100687c1c] <+336>: strh w19, [x20] Could it be that the store buffer was flushed between the two stores, pgaio_io_was_recycled() saw the new state, pgaio_io_reclaim() assigned ioh->op = PGAIO_OP_INVALID and it was flushed to L1, and then finally the time travelling op value was flushed and clobbered it? Apple clang 17 output (= LLVM 19) with optimisation is smart enough to use a single store: postgres[0x1002f0ae8] <+308>: dmb ish postgres[0x1002f0aec] <+312>: strb w19, [x20] That's also how open source clang 17 compiles it if you rip out the bitfield. I guess if you do that, you won't be able to reproduce this, Alexander? Something like: struct PgAioHandle { /* all state updates should go through pgaio_io_update_state() */ - PgAioHandleState state:8; + uint8 state; /* what are we operating on */ - PgAioTargetID target:8; + uint8 target; /* which IO operation */ - PgAioOp op:8; + uint8 op; Curiously, Apple clang 17 without optimisation does some amount of unnecessary clobbering, but not the whole bitset and it's only a single 16 bit store. postgres[0x10057bfb0] <+348>: dmb ish postgres[0x10057bfb4] <+352>: ldur w10, [x29, #-0xc] postgres[0x10057bfb8] <+356>: ldur x9, [x29, #-0x8] postgres[0x10057bfbc] <+360>: ldrh w8, [x9] postgres[0x10057bfc0] <+364>: and w10, w10, #0xff postgres[0x10057bfc4] <+368>: and w8, w8, #0xffffff00 postgres[0x10057bfc8] <+372>: orr w8, w8, w10 postgres[0x10057bfcc] <+376>: strh w8, [x9]
pgsql-hackers by date: