Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Non-reproducible AIO failure
Date
Msg-id f9e85423-4d1e-4106-a468-d348dee4610d@garret.ru
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Andres Freund <andres@anarazel.de>)
Responses Re: Non-reproducible AIO failure
List pgsql-hackers


On 26/08/2025 3:37 AM, Andres Freund wrote:
Hi,

I'm a bit confused by this focus on bitfields - both Alexander and Konstantin
stated they could reproduce the issue without the bitfields.


Sorry if I am not correct, but it seems that the problem was never reproduced with replaced bitfields.
I have once again looked through all this thread (it is really long;) and find just one place when you wrote, that you was able to reproduce the problem without bitfields. But as far as understand it was the different problem which was fixed by your patch v1-0001-aio-Add-missing-memory-barrier-when-waiting-for-I.patch

Still it is not quite clear to me how bitfields can cause this issue.
Yes, fitfields are updated all together - when you update one field, all other bitfields are also rewritten.
But actually compiler (at least some versions) does quite straightforward thing (especially without optimization):
1. Loads all bitfields.
2. Update required bitfield.
3. Stores all bitfields.

It is very unlikely that some stale version is stored in registers - without -O optimizer is using registers only locally and doesn't try to keep some variables in registers. Also not clear how we can read some stale value from memory if we perform write barrier before updating status (in pgaio_io_update_state).
It it impossible that `op` and `state` fields belongs to different cache lines because PgAioHandle is aligned on 8. So doesn;t matter whether there are bitfields or bytes, this fields should be  propagated between cores all together.
Certainly, extraction of bitfields is not an atomic operations (as we see in assembler, it is done using two loads).
So we can observe inconsistent state of this two fields.

Let's consider fields `op` and `state`. Let's their initial state is `op=1`m `state=1`.
The we do update `op=2` and call pgaio_io_update_state(2). Some other task can see (op=1, state=1)  or (op=2, state=1) or (op=2, state=2) but not (op=1, state=2). 

The problem can happen only if some other task tries to update `op` without prior checking for `state`.
In this case we can get (op=3,state=1) as well as (op=2,state=2). 
But `op` field is updated in just two places: pgaio_io_reclaim and pgaio_io_stage. And in both cases we first check 'state' value.

What makes me concern is what happen in case of branch misprediction.
Assume that we have code:

if (state == 1) {
    op = 2;
} else { 
    op = 3;
}

It seems to be absolutely legal, but what will happen in case if with speculative execution processor start executing first branch and does assignment op=2, but then state is loaded and used to be not equal to 1, so processor has to undone this execution. But for some moment we we can have state (state=0 and op=2) which can be observed by other task and cause assertion failure.
Is it really possible?

But we have observed the generated code being pretty grotty and it's caused
more than enough confusion - so let's just replace them with plain uint8's and
cast in switches.

+1

May be I am wrong, but it seems to me that after add-moissing-memory-barrier patch was applied nobody reproduced assertion failure with replaced bitfields.

pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: Dead code in ps_status.c
Next
From: Andrei Lepikhov
Date:
Subject: Redundant parameter in the get_useful_pathkeys_for_relation