On Mon, Aug 25, 2025 at 11:42 AM Nico Williams <nico@cryptonector.com> wrote:
> I think the issue is that if the compiler decides to coalesce what we
> think of as distinct (but neighboring) bitfields, then when you update
> one of the bitfields you could be updating the other with stale data
> from an earlier read where the cached stale data is cached in a
> _register_. Thus the fact that the cache line should have the most up
> to date data for that other field is irrelevant because the stale data
> is in a _register_.
Yeah. It so happens that the other bitfield members are never updated
by other backends in our particular usage, or another way to put, in
states after STAGED which are entirely managed by the life-long owner
of the PgAioHandle, and we have paired memory barriers to make sure we
always see changes to state after all other data changes, so in our
usage we'd always reload that, both at compiler barrier and memory
barrier level (the latter being the thing that is being defeated by
this multiple store re-writing neighbouring bitfields thing, it's not
a stale-value-in-register problem). But I'm sure you're right in
general.
> The very fact that this can happen, that the C specs allow it, argues
> that one must never have adjacent distinct (for some value of
> "distinct") bitfields for anything that requires atomics.
C doesn't even say this should be safe for plain integers, only
sig_atomic_t and the newer _Atomic types. (I guess word addressable
architectures like Alpha and Cray that needed bitfield-esque
load-modify-store for small types at one end, and microcontrollers
might have the opposite problem. C11 lets you query that but you have
to go through the appropriate types.)
> struct { PgAioHandleState v:8; } state;
This preserves type safety and compiles to strb two properties we
want, but it seems to waste space (look at the offsets for the
stores):
a.out[0x1000005f8] <+140>: ldr x8, [sp, #0x8]
a.out[0x1000005fc] <+144>: strb wzr, [x8, #0x8]
a.out[0x100000600] <+148>: ldr x8, [sp, #0x8]
a.out[0x100000604] <+152>: strb wzr, [x8, #0x4]
I was wracking my brain for ways to keep the desired type safety,
atomicity, AND storage size. The best I could come up with was GCC
pack attributes, but I couldn't figure out how to do that in MSVC
without using C++, so I gave up.
The only other thing I could think of is that if you put padding
after the bitfields, it should be able to generate a single 32 bit
store (as Konstantin's race.c shows). But that'd be baking in shaky
knowledge the possible dirtiness other bitfields that I mentioned in
parentheses above a little too hard, on top of a new shaky assumption
about how compilers implement bitfield merging and assignment, so I
abandoned that thought quickly too.