Re: Non-reproducible AIO failure - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Non-reproducible AIO failure |
Date | |
Msg-id | 92b33ab2-0596-40fe-9db6-a6d821d08e8a@gmail.com Whole thread Raw |
In response to | Re: Non-reproducible AIO failure (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hello Andres and Tom,
06.06.2025 22:37, Andres Freund wrote:
06.06.2025 22:37, Andres Freund wrote:
On 2025-06-06 15:21:13 -0400, Tom Lane wrote:It's a mistake to think that this is a compiler bug. The C standard explicitly allows compilers to use word-wide operations to access bit-field struct members. Such accesses may fetch or overwrite other bit-fields in the same word, using a non-atomic read/modify/write sequence. I don't have a recent C standard at hand, but I found this quote on the web [1]: The C Standard, 3.17, paragraph 3 [ISO/IEC 9899:2024], states Note 2 to entry: A bit-field and an adjacent non-bit-field member are in separate memory locations. The same applies to two bit-fields, if one is declared inside a nested structure declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field member declaration ... So it's our code that is busted. No doubt, what is happening is that process A is fetching two fields, modifying one of them, and storing the word back (with the observed value of the other field) concurrently with some other process trying to update the other field. So the other process's update is lost.There shouldn't be any concurrent accesses here, so I don't really see how the above would explain the problem (the IO can only ever be modified by one backend, initially the "owning backend", then, when submitted, by the IO worker, and then again by the backend).
FWIW, I tried the idea Matthias proposed upthread:
--- a/src/include/storage/aio_internal.h
+++ b/src/include/storage/aio_internal.h
@@ -98,16 +98,17 @@ struct PgAioHandle
/* all state updates should go through pgaio_io_update_state() */
PgAioHandleState state:8;
+ /* bitfield of PgAioHandleFlags */
+ uint8 flags;
+
/* what are we operating on */
PgAioTargetID target:8;
+ uint8 num_callbacks;
+
/* which IO operation */
PgAioOp op:8;
- /* bitfield of PgAioHandleFlags */
- uint8 flags;
-
- uint8 num_callbacks;
/* using the proper type here would use more space */
uint8 callbacks[PGAIO_HANDLE_MAX_CALLBACKS];
and got 120 iterations passed cleanly. Without this change, I got failures
on iterations 41, 5, 20, Not a proof, just an observation...
May be the op field gets broken because of a (concurrent?) write to
not that field, by to state or target in some other place?
I also tried to reproduce the same on an ARM server again with clang 18,
using -O2 and -O3, and we with Konstantin didn't see such a sophisticated
assembly for pgaio_io_stage() — the code is much simpler and the failure
is not reproduced. The triplet there is:
PostgreSQL 18beta1 on aarch64-unknown-linux-gnu, compiled by Ubuntu clang version 18.1.3 (1ubuntu1), 64-bit
On that my MacBook, the triplet is:
PostgreSQL 18beta1 on aarch64-apple-darwin23.5.0, compiled by Apple clang version 16.0.0 (clang-1600.0.26.6), 64-bit
indri:
PostgreSQL 18beta1 on aarch64-apple-darwin24.5.0, compiled by Apple clang version 17.0.0 (clang-1700.0.13.3), 64-bit
sifaka:
PostgreSQL 18devel on aarch64-apple-darwin24.4.0, compiled by Apple clang version 17.0.0 (clang-1700.0.13.3), 64-bit
So it looks like "aarch64-apple" enables some extra optimizations with
bitfields...
By the way, there are also some other places with adjacent bitfields, e, g.:
typedef union PgAioTargetData
{
struct
{
RelFileLocator rlocator; /* physical relation identifier */
BlockNumber blockNum; /* blknum relative to begin of reln */
BlockNumber nblocks;
ForkNumber forkNum:8; /* don't waste 4 byte for four values */
bool is_temp:1; /* proc can be inferred by owning AIO */
bool skip_fsync:1;
} smgr;
and
typedef struct PgStat_KindInfo
{
/*
* Do a fixed number of stats objects exist for this kind of stats (e.g.
* bgwriter stats) or not (e.g. tables).
*/
bool fixed_amount:1;
/*
* Can stats of this kind be accessed from another database? Determines
* whether a stats object gets included in stats snapshots.
*/
bool accessed_across_databases:1;
/* Should stats be written to the on-disk stats file? */
bool write_to_file:1;
...
Best regards,
Alexander Lakhin
pgsql-hackers by date: