Re: Tomas Vondra
> >> So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
> >> attached patch. It still calls numa_available(), so that we don't
> >> silently miss future libnuma changes.
> >>
> >> Can you check this makes it work inside the docker container?
> >
> > Yes your patch works. (Sorry I meant to test earlier, but RL...)
>
> Thanks. I've pushed the fix (and backpatched to 18).
It looks like we are not done here yet :(
postgresql-18 is failing here intermittently with this diff:
12:20:24 --- /build/reproducible-path/postgresql-18-18.1/src/test/regress/expected/numa.out 2025-11-10
21:52:06.000000000+0000
12:20:24 +++ /build/reproducible-path/postgresql-18-18.1/build/src/test/regress/results/numa.out 2025-12-11
11:20:22.618989603+0000
12:20:24 @@ -6,8 +6,4 @@
12:20:24 -- switch to superuser
12:20:24 \c -
12:20:24 SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
12:20:24 - ok
12:20:24 -----
12:20:24 - t
12:20:24 -(1 row)
12:20:24 -
12:20:24 +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
That's REL_18_STABLE @ 580b5c, with the Debian packaging on top.
I've seen it on unstable/amd64, unstable/arm64, and Ubuntu
questing/amd64, where libnuma should take care of this itself, without
the extra patch in PG. There was another case on bullseye/amd64 which
has the old libnuma.
It's been frequent enough so it killed 4 out of the 10 builds
currently visible on
https://jengus.postgresql.org/job/postgresql-18-binaries-snapshot/.
(Though to be fair, only one distribution/arch combination was failing
for each of them.)
There is also one instance of it in
https://jengus.postgresql.org/job/postgresql-19-binaries-snapshot/
I currently have no idea what's happening.
Christoph