On 12/16/25 15:48, Christoph Berg wrote:
> Re: To Tomas Vondra
>> I've managed to reproduce it once, running this loop on
>> 18-as-of-today. It errored out after a few 100 iterations:
>>
>> while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done
>>
>> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
>> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM
pg_shmem_allocations_numa
>>
>> That was on the apt.pg.o amd64 build machine while a few things were
>> just building. Maybe ENOENT "The page is not present" means something
>> was just swapped out because the machine was under heavy load.
>
> I played a bit more with it.
>
> * It seems to trigger only once for a running cluster. The next one
> needs a restart
> * If it doesn't trigger within the first 30s, it probably never will
> * It seems easier to trigger on a system that is under load (I started
> a few pgmodeler compile runs in parallel (C++))
>
> But none of that answers the "why".
>
Hmmm, so this is interesting. I tried this on my workstation (with a
single NUMA node), and I see this:
1) right after opening a connection, I get this
test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 290
-2 | 32478
(2 rows)
2) but a select from pg_shmem_allocations_numa works fine
test=# select numa_node, count(*) from pg_shmem_allocations_numa group by 1;
numa_node | count
-----------+-------
0 | 72
(1 row)
3) and if I repeat the pg_buffercache_numa query, it now works
test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 32768
(1 row)
That's a bit strange. I have no idea why is this happening. If I
reconnect, I start getting the failures again.
regards
--
Tomas Vondra