Re: pg_buffercache: Add per-relation summary stats - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: pg_buffercache: Add per-relation summary stats
Date
Msg-id 7ab3914f-da59-4c1f-b809-225637b586e8@iki.fi
Whole thread Raw
In response to Re: pg_buffercache: Add per-relation summary stats  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: pg_buffercache: Add per-relation summary stats
Re: pg_buffercache: Add per-relation summary stats
List pgsql-hackers
On 28/03/2026 06:18, Ashutosh Bapat wrote:
> Parallely myself and Palak Chaturvedi developed a quick patch to
> modernise pg_buffercache_pages() and use tuplestore so that it doesn't
> have to rely on NBuffers being the same between start of the scan,
> when memory allocated, when the scan ends - a condition possible with
> resizing buffer cache. It seems to improve the timings by about 10-30%
> on my laptop for 128MB buffercache size. Without this patch the time
> taken to execute Lukas's query varies between 10-15ms on my laptop.
> With this patch it varies between 8-9ms. So the timing is more stable
> as a side effect. It's not a 10x improvement that we are looking for
> but it looks like a step in the right direction. That improvement
> seems to come purely because we avoid creating a heap tuple. I wonder
> if there are some places up in the execution tree where full
> heaptuples get formed again instead of continuing to use minimal
> tuples or places where we perform some extra actions that are not
> required.
> 
> I didn't dig into the history to find out why we didn't modernize
> pg_buffercache_pages(). I don't see any hazard though.

Committed this modernization patch, thanks!

It would be nice to have a proper row-at-a-time mode that would avoid 
materializing the result, but collecting all the data in a temporary 
array is clearly worse than just putting them to the tuplestore 
directly. The only reason I can think of why we'd prefer to use a 
temporary array like that is to get a more consistent snapshot of all 
the buffers, by keeping the time spent scanning the buffers as short as 
possible. But we're not getting a consistent view anyway, it's just a 
matter of degree.

I wondered about this in pg_buffercache_pages.c:

>     /*
>      * To smoothly support upgrades from version 1.0 of this extension
>      * transparently handle the (non-)existence of the pinning_backends
>      * column. We unfortunately have to get the result type for that... - we
>      * can't use the result type determined by the function definition without
>      * potentially crashing when somebody uses the old (or even wrong)
>      * function definition though.
>      */
>     if (get_call_result_type(fcinfo, NULL, &expected_tupledesc) != TYPEFUNC_COMPOSITE)
>         elog(ERROR, "return type must be a row type");
> 
>     if (expected_tupledesc->natts < NUM_BUFFERCACHE_PAGES_MIN_ELEM ||
>         expected_tupledesc->natts > NUM_BUFFERCACHE_PAGES_ELEM)
>         elog(ERROR, "incorrect number of output arguments");

I guess it's still needed, if you have pg_upgraded all the way from 1.0. 
To test that, I created this view to match the old 1.0 definition:

CREATE VIEW public.legacy_pg_buffercache AS
  SELECT bufferid,
     relfilenode,
     reltablespace,
     reldatabase,
     relforknumber,
     relblocknumber,
     isdirty,
     usagecount
    FROM public.pg_buffercache_pages() p(bufferid integer, relfilenode 
oid, reltablespace oid, reldatabase oid, relforknumber smallint, 
relblocknumber bigint, isdirty boolean, usagecount smallint);

"select * from public.legacy_pg_buffercache" still works, so all good.

- Heikki



pgsql-hackers by date:

Previous
From: Xuneng Zhou
Date:
Subject: Re: Implement waiting for wal lsn replay: reloaded
Next
From: Peter Eisentraut
Date:
Subject: Re: meson: Make test output much more useful on failure (both in CI and locally)