Re: Add os_page_num to pg_buffercache - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Add os_page_num to pg_buffercache
Date
Msg-id aR2NqVj9ON8PbOVD@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Add os_page_num to pg_buffercache  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On Tue, Nov 18, 2025 at 02:39:36PM +0900, Michael Paquier wrote:
> On Fri, Aug 22, 2025 at 08:48:57AM +0000, Bertrand Drouvot wrote:
> > Yeah, I think so. Thanks for the ping, done in attached.
> 
> The patch has been marked as ready for committer, and I see the value
> in providing a view where it is possible to know to which OS page a
> given shared buffer is linked to, based on the OS page size and our
> shared buffer size.  No problem with that.

Thanks for looking at it!

> Now, I am not really a fan of the duplication created here, where most
> of the code pg_buffercache_os_pages() is a plain copy-paste of
> pg_buffercache_numa_pages(), with the difference being that we want
> only os_page_status via a call to pg_numa_query_pages(),

Agree. 0001 helps to avoid code duplication but I agree that we could do more.

> It seems to me that it would be simpler to make the allocations and
> information of os_page_ptrs and os_page_status conditional depending
> on the result of pg_numa_init(), and that we could just fill numa_node
> with NULLs if numa is not available in the environment where the query
> is run.  This includes making pg_numa_touch_mem_if_required()
> conditional, of course, not called when pg_numa_init() fails.

That's a good idea and I think we have 2 options here.

Option 1:

We could simply create the pg_buffercache_os_pages view on top of the
pg_buffercache_numa one. The cons I can think of is that, when numa is available,
then pg_buffercache_os_pages would pay the extra cost that also make
pg_buffercache_numa slow.

Then there is no real benefits for adding a new view, we could just keep
pg_buffercache_numa and fill numa_node with NULLs if numa is not available and
document also the use case (with an example) when numa is not available.

That would achieve the main goal.

Option 2:

Still make changes in pg_buffercache_numa_pages() and fill with NULL when 
numa is not available. Then create an helper to do the mapping buffers to OS
pages without any NUMA specific operations.

That way we could create a dedicated view pg_buffercache_os_pages on top of
a new function. No code duplication and the new view would not get the extra
cost if numa is available.

> Or is
> there a strong reason where it would be better to rely on an
> elog(ERROR) if numa(3) fails, based on numa_available()?  The purpose
> of these views being monitoring, it is usually easier in my experience
> to rely on NULLs rather than facing periodic errors when we don't know
> something.  That makes JOINs more predictible, for one.

Yeah I don't think that's an issue to fill with NULL instead of generating errors.
Specially as it will give added value. 

> Please note that I don't mind the extra view pg_buffercache_os_pages
> that can provide some information that's transparent cross-platform,
> but let's make it something that calls pg_buffercache_numa_pages()
> instead.

So, I think we have Option 1 and Option 2. Option 1 is very simple and Option
2 would allow to get the desired information with no performance penalties when
numa is availble.

I'm tempted to vote for 1 as I'm not sure the larger code refactoring of option
2 is worth the benefits. Thoughts?

> Note 2: Nice catch about the description of os_page_num, applied this
> one separately.

Thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: Peter Eisentraut
Date:
Subject: Re: Update timezone to C99