On Thu, Aug 28, 2025 at 11:08 AM Andres Freund <andres@anarazel.de> wrote:
> On 2025-08-26 16:59:54 +0300, Konstantin Knizhnik wrote:
> > Still it is not quite clear to me how bitfields can cause this issue.
>
> Same.
Here's what I speculated after reading the generated asm[1]:
"Could it be that the store buffer was flushed between the two stores,
pgaio_io_was_recycled() saw the new state, pgaio_io_reclaim() assigned
ioh->op = PGAIO_OP_INVALID and it was flushed to L1, and then finally
the time travelling op value was flushed and clobbered it?"
To put it in terms of cache line modes and store buffer operations,
the IO worker does this, omitting uninteresting instructions:
postgres[0x100687fd0] <+384>: ldrh w8, [x9] ; load state, target
postgres[0x100687fd4] <+388>: ldrb w11, [x9, #0x2] ; load op
... build new state + target in w10 ....
postgres[0x100687fec] <+412>: strh w10, [x9] ; state <- COMPLETED_SHARED
postgres[0x100687ff0] <+416>: strb w8, [x9, #0x2] ; op <- PGAIO_OP_READV
My speculation was that the two stores hit memory in separate store
buffer flushes for whatever reason (something to do with a context
switch, or maybe the store buffer is just full, or... who knows,
doesn't matter). Cache line exclusive mode is released and
re-acquired in between the two flushes. I know that merely executing
a store instruction doesn't acquire cache line exclusive mode, I'm
talking specifically about store buffer flush operations here, which
must.
In that window, pgaio_io_wait() in the owner backend sees state ==
COMPLETED_SHARED and runs pgaio_io_reclaim() which then stores and
flushes op <- PGAIO_OP_IDLE. The IO worker core's second flush
operation is delayed because it has to wait to re-acquire the cache
line in exclusive mode, ie wait for the owner backend to release it,
and then it clobbers op with the old value, later producing this
assertion failure in the owner backend:
TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c",
Line: 167, PID: 56420
To put it another way, this could only work correctly if the IO worker
did: store op (unnecessarily because bitfield blah blah), dmb ish,
store state, pairing with the owner's dmb ishld, load state. But we
have no control over that, the fact that the compiler generated two
stores for our byte-sized assignment is invisible from C.
Or if this sequence is not possible, what exactly prevents it?
[1] https://www.postgresql.org/message-id/CA%2BhUKGK6ujMT5myrEkgQ%2Bn-N3rquZA4haHfJszQVe4ofHd6z6A%40mail.gmail.com