Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Non-reproducible AIO failure
Date
Msg-id CA+hUKGKpLbWZT2DDy=hP4fX_c9X88KRV=AwNb5UzLs-xsgNdRg@mail.gmail.com
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Konstantin Knizhnik <knizhnik@garret.ru>)
Responses Re: Non-reproducible AIO failure
List pgsql-hackers
On Mon, Aug 25, 2025 at 6:11 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> In theory even replacing bitfield with in should not
> avoid race condition, because they are still shared the same cache line.

I'm no expert in this stuff, but that's not my understanding of how it
works.  Plain stores to normal memory go into the store buffer and are
eventually flushed to the memory hierarchy, but all modifications that
reach the cache hierarchy have a consistent view of memory created by
the cache coherency protocol (in ARM's case MOESI[1]): only one core
can change a cache line at a time while it has exclusive access (with
some optimisations, owner mode, snooping, etc but AFAIK that doesn't
change the basic consistency).  While modifying memory contents, the
core has the most up to date contents of the cache line and it doesn't
touch any other bytes.  What I mean is that cores don't make changes
to bytes that you didn't explicitly change with a store instruction,
it's just the order and timing that the contents of the store buffer
hits the memory system that is in question.  So I guess the key thing
here is that there is more than one store.

If the Apple's fork of LLVM 17 generated the same code as the MacPorts
LLVM 17 output I disassembled above, then I guess this double store
with the Apple toolchain until recently, but if you upgraded Xcode in
the past few months[2] then it would have stopped happening, which is
probably why Alexander had to use Homebrew's clang 17 to show the
problem.  (For the record, what I disassembled earlier was from
MacPorts's package for clang 17.0.6, so I guess it's remotely possible
that some difference between Homebrew's package and MacPorts' package
is causing my attempts to reproduce it to fail; it claims to be the
same version though and I don't expect them to patch anything.).

I don't understand these things, but maybe this commit or something
near it caused the change:


https://github.com/llvm/llvm-project/blob/b64499c6ef73881378b7e767b31c0fa06e3fe7b4/llvm/lib/Transforms/Scalar/EarlyCSE.cpp

> I tried to create small test reproducing this data flow.
> But still failed to find and problem with this test.
> May be somebody can change this test to be more similar with real AIO
> dataflows?

(lldb) dis --name update_state
a.out`update_state:
a.out[0x100000548] <+0>:  sub    sp, sp, #0x10
a.out[0x10000054c] <+4>:  str    x0, [sp, #0x8]
a.out[0x100000550] <+8>:  str    w1, [sp, #0x4]
a.out[0x100000554] <+12>: dmb    ish
a.out[0x100000558] <+16>: ldrb   w10, [sp, #0x4]
a.out[0x10000055c] <+20>: ldr    x9, [sp, #0x8]
a.out[0x100000560] <+24>: ldr    w8, [x9]
a.out[0x100000564] <+28>: and    w8, w8, #0xffffff00
a.out[0x100000568] <+32>: orr    w8, w8, w10
a.out[0x10000056c] <+36>: str    w8, [x9]           <--- single store
a.out[0x100000570] <+40>: add    sp, sp, #0x10
a.out[0x100000574] <+44>: ret

I think you need to add the following uint8 member to the struct, so
that it has to store 24 bits with two stores (assuming you are testing
with LLVM 17 and not LLVM 19 which chooses to clobber only the first
16 bits anyway for some reason).  In that code it's storing 32 bits at
once so although it's still clobbering op, it's doing it atomically so
nothing goes wrong.

If that doesn't work, maybe try adding a random number of other
stores, to try to find the timing that overflows the store queue
between the two stores, or maybe it has to do with delivering signals
or context switches that cause memory barriers, or who knows...  Hmm,
yeah maybe it's actually context switches that are doing this, as
Alexander is running a huge number of processes.

[1] https://en.wikipedia.org/wiki/MOESI_protocol
[2] https://bfbot.cputube.org/project/apple_xcode/



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: No error checking when reading from file using zstd in pg_dump
Next
From: Noah Misch
Date:
Subject: Re: Inval reliability, especially for inplace updates