Hello,
I noticed a strange issue that I can only reproduce with clang:
clang version 19.1.7 (RESF 19.1.7-2.module+el8.10.0+1965+112b558b) on
the devel branch (SHA: 137935bd1167a94b0bfea7239033f1ba1a1d95bb).
We are getting a control file checksum mismatch during initdb. I added
some prints in a small debug patch, and recorded the postgres process
using rr. I have uploaded the rr archive (made with rr pack, tar-ed up) [1].
$ initdb -n -d -D /usr/local/pgsql/data &> initdb.out
2025-06-11 16:19:54.343 UTC [3070] LOG: WriteControlFile crc = 3457434907, algo is avx = 1
...
2025-06-11 16:19:54.343 UTC [3070] LOG: ReadControlFile crc = 3457434907, ControlFile->crc = 3457434907, algo is avx = 1
...
2025-06-11 16:19:54.346 UTC [3070] LOG: update_controlfile crc = 2065009488, algo is avx = 1
...
2025-06-11 16:20:13.914 UTC [3070] LOG: update_controlfile crc = 3406554082, algo is avx = 1
...
2025-06-11 16:20:13.920 UTC [3070] LOG: update_controlfile crc = 1234673735, algo is avx = 1
...
2025-06-11 16:20:13.923 UTC [3070] NOTICE: database system is shut down
...
2025-06-11 16:20:13.923 UTC [3070] DEBUG: proc_exit(-1): 0 callbacks to make
ok
performing post-bootstrap initialization ... 2025-06-11 16:20:13.984 UTC [3072] LOG: ReadControlFile crc = 2925279607, ControlFile->crc = 1234673735, algo is avx = 1
2025-06-11 16:20:13.984 UTC [3072] FATAL: incorrect checksum in control file
child process exited with exit code 1
initdb: data directory "/usr/local/pgsql/data" not removed at user's request
Note that this only reproduces with clang-19 -O0 and NOT -O3. I haven't
tried with other versions of clang.
OTOH, gcc-14 is cool with both -O0 and -O3, with AVX-512 getting picked
for both cases, for CRC instructions.
Environment:
(1) Configure options:
./configure --prefix=/usr/local/pgsql --with-python --enable-depend --without-icu --enable-debug CFLAGS='-O0 -fno-omit-frame-pointer' CC=clang
(2) Config log shows:
configure:18262: checking for vectorized CRC-32C
configure:18268: result: AVX-512 with runtime check
pgac_cv_avx512_pclmul_intrinsics=yes
pgac_cv_xsave_intrinsics=yes
(3) Confirmation that we have AVX-512 CRC getting selected at runtime:
(rr) f
#0 WriteControlFile () at xlog.c:4386
4386 ControlFile->pg_control_version = PG_CONTROL_VERSION;
(rr) p pg_comp_crc32c
$1 = (pg_crc32c (*)(pg_crc32c, const void *, size_t)) 0xc8b8b0 <pg_comp_crc32c_avx512>
(4) This is running in a VM with:
Rocky Linux release 8.10 (Green Obsidian)
16 vCPUs
Hypervisor: VMware ESXi, 8.0.3, 24022510
Model: PowerEdge R650
Processor Type: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
vCenter:
Version: 8.0.3
Build: 24322831
I attempted both vSAN and local storage. That didn't influence matters.
There is a known vSAN bug with invalid checksums + AVX-512, but that has
been fixed in an older version [2] (and besides the issue reproes with local storage too).
Please let me know if there is any other info I can provide.
Regards,
Deep (VMware)