Re: Add os_page_num to pg_buffercache - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Add os_page_num to pg_buffercache
Date
Msg-id aGQOMPEENZc/2fJm@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Add os_page_num to pg_buffercache  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Add os_page_num to pg_buffercache
List pgsql-hackers
Hi,

On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote:
> On 7/1/25 15:45, Bertrand Drouvot wrote:
> 
> I took a quick look on this,

Thanks for looking at it!

> and I doubt we want to change the schema of
> pg_buffercache like this. Adding columns is fine, but it seems rather
> wrong to change the cardinality. The view is meant to be 1:1 mapping for
> buffers, but now suddenly it's 1:1 with memory pages. Or rather (buffer,
> page), to be precise.
> 
> I think this will break a lot of monitoring queries, and possibly in a
> very subtle way - especially on systems with huge pages, where most
> buffers will have one row, but then a buffer that happens to be split on
> two pages will have two rows. That seems not great.
> 
> IMHO it'd be better to have a new view for this info, something like
> pg_buffercache_pages, or something like that.

That's a good point, fully agree!

> But I'm also starting to question if the patch really is that useful.
> Sure, people may not have NUMA support enabled (e.g. on non-linux
> platforms), and even if they do the _numa view is quite expensive.
> 

Yeah, it's not for day to day activities, more for configuration testing and
also for development activity/testing.

For example, If I set BLCKSZ to 8KB and enable huge pages (2MB), then I may
expect to see buffers not spread across pages.

But what I can see is:

SELECT
    pages_per_buffer,
    COUNT(*) as buffer_count
FROM (
    SELECT bufferid, COUNT(*) as pages_per_buffer
    FROM pg_buffercache
    GROUP BY bufferid
) subq
GROUP BY pages_per_buffer
ORDER BY pages_per_buffer;

 pages_per_buffer | buffer_count
------------------+--------------
                1 |       261120
                2 |         1024

This is due to the shared buffers being aligned to PG_IO_ALIGN_SIZE.

If I change it to:

BufferManagerShmemInit(void)

        /* Align buffer pool on IO page size boundary. */
        BufferBlocks = (char *)
-               TYPEALIGN(PG_IO_ALIGN_SIZE,
+               TYPEALIGN(2 * 1024 * 1024,
                                  ShmemInitStruct("Buffer Blocks",
-                                                                 NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE,
+                                                                 NBuffers * (Size) BLCKSZ + (2 * 1024 * 1024),
                                                                  &foundBufs));

Then I get:

 pages_per_buffer | buffer_count
------------------+--------------
                1 |       262144
(1 row)


So we've been able to see that some buffers were spread across pages due to 
shared buffer alignment on PG_IO_ALIGN_SIZE. And that if we change the alignment
to be set to 2MB then I don't see any buffers spread across pages anymore.

I think that it helps "visualize" some configuration or code changes.

What are your thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: No error checking when reading from file using zstd in pg_dump
Next
From: Peter Geoghegan
Date:
Subject: Re: Making Row Comparison NULL row member handling more robust during skip scans