RE: Draft for basic NUMA observability - Mailing list pgsql-hackers

From Shinoda, Noriyoshi (SXD Japan FSI)
Subject RE: Draft for basic NUMA observability
Date
Msg-id DM4PR84MB1734308EB741A6ECFF040C27EEAA2@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: Draft for basic NUMA observability  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Draft for basic NUMA observability
List pgsql-hackers
Hi, 

Thanks for developing this great feature. 
The manual says that the 'size' column of the pg_shmem_allocations_numa view is 'int4', but the implementation is
'int8'.
 
The attached small patch fixes the manual.

Regards,
Noriyoshi Shinoda

-----Original Message-----
From: Tomas Vondra <tomas@vondra.me> 
Sent: Tuesday, April 8, 2025 6:59 AM
To: Jakub Wartak <jakub.wartak@enterprisedb.com>
Cc: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>; Andres Freund <andres@anarazel.de>; Alvaro Herrera
<alvherre@alvh.no-ip.org>;Nazir Bilal Yavuz <byavuz81@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
 
Subject: Re: Draft for basic NUMA observability



On 4/7/25 23:50, Jakub Wartak wrote:
> On Mon, Apr 7, 2025 at 11:27 PM Tomas Vondra <tomas@vondra.me> wrote:
>>
>> Hi,
>>
>> I've pushed all three parts of v29, with some additional corrections 
>> (picked lower OIDs, bumped catversion, fixed commit messages).
> 
> Hi Tomas, great, awesome! (this is an awesome feeling)! Thank You for 
> such incredible support on the last mile of this and also to Bertrand 
> (for persistence!), Andres and Alvaro for lots of babysitting.
> 

Glad I could help, thanks for the patch.

>> AFAIK v29 fixed this, the end pointer calculations were wrong. With 
>> that it passed for me with/without THP, different blocks sizes etc.
> 
> Yeah, that was a typo, I've started writing about v28, but then in the 
> middle of that v29 landed and I still was chasing that finding, I've 
> just forgotten to bump this.
> 
>> We don't align buffers to os_page_size, we align them 
>> PG_IO_ALIGN_SIZE, which is 4kB or so. And it's determined at compile 
>> time, while THP is determined when starting the cluster.
> [..]
>> Right, this is because that's where the THP boundary happens to be. 
>> And that one "duplicate" entry is for a buffer that happens to span 
>> two pages. This is *exactly* the misalignment of blocks and pages 
>> that I was wondering about earlier, and with the fixed endptr 
>> calculation we handle that just fine.
>>
>> No opinion on the aligment - maybe we should do that, but it's not 
>> something this patch needs to worry about.
> 
> Agreed.I was wondering even if there are other drawbacks of the 
> situation, but other than not reporting duplicates here in this 
> pg_buffercache view, I cannot identify anything worthwhile.
> 

Well, the drawback is that accessing the buffer may require hitting two different NUMA nodes. I'm not 100% sure it can
actuallyhappen, though.
 
the buffer should be initialized as a whole, so it should got to the same node. But maybe it could be "split" by THP
migration,or something like that.
 

In any case, that's not caused by this patch, and it's less serious with huge pages - it's only affect buffers on the
boundaries.But with the small 4K pages it can happen for *every* buffer.
 


regards

--
Tomas Vondra




Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCH] Automatic client certificate selection support for libpq v1
Next
From: Andres Freund
Date:
Subject: Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions