Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Non-reproducible AIO failure
Date
Msg-id 8678425d-50d0-4fcd-94e2-b92e711bf8f0@garret.ru
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Non-reproducible AIO failure
List pgsql-hackers
On 09/06/2025 2:05 am, Thomas Munro wrote:
> On Sat, Jun 7, 2025 at 6:47 AM Andres Freund <andres@anarazel.de> wrote:
>> On 2025-06-06 14:03:12 +0300, Konstantin Knizhnik wrote:
>>> There is really essential difference in code generated by clang 15 (working)
>>> and 16 (not working).
>> There also are code gen differences between upstream clang 17 and apple's
>> clang, which is based on llvm 17 as well (I've updated the toolchain, it
>> repros with that as well).
> Just for the record, Apple clang 17 (self-reported clobbered version)
> is said to be based on LLVM 19[1].  For a long time it was off by one
> but tada now it's apparently two.  Might be relevant if people are
> comparing generated code up that close....
>
> . o O (I wonder if one could corroborate that by running "strings" on
> upstream clang binaries (as compiled by MacPorts/whatever) for each
> major version and finding new strings, ie strings that don't appear in
> earlier major versions, and then seeing which ones are present in
> Apple's clang binaries...  What a silly problem.)
>
> [1] https://en.wikipedia.org/wiki/Xcode#Xcode_15.0_-_16.x_(since_visionOS_support)


Some updates: I was able to reproduce the problem at my Mac with old 
clang (15.0) but only with disabled optimization (CFLAGS=-O0).
So very unlikely it is bug in compiler.

Why it is better reproduced in debug build? May be because of timing.
Or may be because without optimization compiler is doing stupid things: 
loads all three bitfields from memory to register (one half word+one 
byte), then does some manipulations with this register and writes it 
back to memory. Can register somehow be clobbered between read and write 
(for example by signal handler)? Very unlikely...
So still do not have any good hypothesis.

But with bitfields replaced with uint8 the bug is not reproduced any more.
May be just do this change (which seems to be good thing in any case)?







pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: schema variables
Next
From: David Geier
Date:
Subject: Re: Buffer overflow in SerializeLibraryState() found by Address Sanitizer