Re: Draft for basic NUMA observability - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Draft for basic NUMA observability
Date
Msg-id Z/FhOOCmTxuB2h0b@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Draft for basic NUMA observability  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
Hi,

On Sat, Apr 05, 2025 at 04:33:28PM +0200, Tomas Vondra wrote:
> On 4/5/25 15:23, Tomas Vondra wrote:
> > I was thinking we'd change the definition of the existing page_num
> > column, i.e. it wouldn't be 0..N sequence for each buffer, but a global
> > page ID. But I don't know if this would be useful in practice.
> > 
> 
> See the attached v25 with a draft of this in patch 0003.

I see, thanks for sharing. I think that's useful because that could help
identify which buffers share the same OS page.

> While working on this, I realized it's probably wrong to use TYPEALIGN()
> to calculate the OS page pointer. The code did this:
> 
>     os_page_ptrs[idx]
>         = (char *) TYPEALIGN(os_page_size,
>                              buffptr + (os_page_size * j));
> 
> but TYPEALIGN() rounds "up". Let's assume we have 1KB buffers and 4KB
> memory pages, and that the first buffer is aligned to 4kB (i.e. it
> starts right at the beginning of a memory page). Then we expect to get
> page_num sequence:
> 
>     0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ...
> 
> with 4 buffers per memory page. But we get this:
> 
>     0, 1, 1, 1, 1, 2, 2, 2, 2, ...
> 
> So I've changed this to TYPEALIGN_DOWN(), which fixes the result.

Good catch, that makes fully sense.

But now I can see some page_num < 0 :

postgres=# select page_num,node_id,count(*) from pg_buffercache_numa group by page_num,node_id order by 1 limit 4;
 page_num | node_id | count
----------+---------+-------
       -1 |       0 |   386
        0 |       1 |  1024
        1 |       0 |  1024
        2 |       1 |  1024

I think that can be solved that way:

-  startptr = (char *) BufferGetBlock(1);
+  startptr = (char *) TYPEALIGN_DOWN(os_page_size, (char *) BufferGetBlock(1));

so that startptr is aligned to the same boundaries. But I guess that we'll
have the same question as the following one:

> The one question I have about this is whether we know the pointer
> returned by TYPEALIGN_DOWN() is valid. It's before ent->location (or
> before the first shared buffer) ...

Yeah, I'm not 100% sure about that... Maybe for safety we could use TYPEALIGN_DOWN()
for the reporting and use the actual buffer address when pg_numa_touch_mem_if_required()
is called? (to avoid touching "invalid" memory).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Mahendra Singh Thalor
Date:
Subject: Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote
Next
From: Bertrand Drouvot
Date:
Subject: Re: Draft for basic NUMA observability