Re: failed NUMA pages inquiry status: Operation not permitted - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: failed NUMA pages inquiry status: Operation not permitted
Date
Msg-id 183fe9ab-6010-4cca-b648-1deca332ce2a@vondra.me
Whole thread Raw
In response to Re: failed NUMA pages inquiry status: Operation not permitted  (Christoph Berg <myon@debian.org>)
Responses Re: failed NUMA pages inquiry status: Operation not permitted
List pgsql-hackers
On 12/16/25 15:48, Christoph Berg wrote:
> Re: To Tomas Vondra
>> I've managed to reproduce it once, running this loop on
>> 18-as-of-today. It errored out after a few 100 iterations:
>>
>> while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done
>>
>> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR:  invalid NUMA node id outside of allowed range [0, 0]: -2
>> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_shmem_allocations_numa
>>
>> That was on the apt.pg.o amd64 build machine while a few things were
>> just building. Maybe ENOENT "The page is not present" means something
>> was just swapped out because the machine was under heavy load.
> 
> I played a bit more with it.
> 
> * It seems to trigger only once for a running cluster. The next one
>   needs a restart
> * If it doesn't trigger within the first 30s, it probably never will
> * It seems easier to trigger on a system that is under load (I started
>   a few pgmodeler compile runs in parallel (C++))
> 
> But none of that answers the "why".
> 

Hmmm, so this is interesting. I tried this on my workstation (with a
single NUMA node), and I see this:

1) right after opening a connection, I get this

test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
 numa_node | count
-----------+-------
         0 |   290
        -2 | 32478
(2 rows)


2) but a select from pg_shmem_allocations_numa works fine

test=# select numa_node, count(*) from pg_shmem_allocations_numa group by 1;
 numa_node | count
-----------+-------
         0 |    72
(1 row)


3) and if I repeat the pg_buffercache_numa query, it now works

test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
 numa_node | count
-----------+-------
         0 | 32768
(1 row)


That's a bit strange. I have no idea why is this happening. If I
reconnect, I start getting the failures again.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Viktor Holmberg
Date:
Subject: Re: ON CONFLICT DO SELECT (take 3)
Next
From: Peter Eisentraut
Date:
Subject: Re: Fix and improve allocation formulas