Re: pg_buffercache: Add per-relation summary stats - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: pg_buffercache: Add per-relation summary stats
Date
Msg-id CAExHW5to8LTCfo+zSqMV06JUnY8RfDSm9qjUP0w55voJE4tLiA@mail.gmail.com
Whole thread
In response to Re: pg_buffercache: Add per-relation summary stats  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Tue, Apr 7, 2026 at 6:37 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 28/03/2026 06:18, Ashutosh Bapat wrote:
> > Parallely myself and Palak Chaturvedi developed a quick patch to
> > modernise pg_buffercache_pages() and use tuplestore so that it doesn't
> > have to rely on NBuffers being the same between start of the scan,
> > when memory allocated, when the scan ends - a condition possible with
> > resizing buffer cache. It seems to improve the timings by about 10-30%
> > on my laptop for 128MB buffercache size. Without this patch the time
> > taken to execute Lukas's query varies between 10-15ms on my laptop.
> > With this patch it varies between 8-9ms. So the timing is more stable
> > as a side effect. It's not a 10x improvement that we are looking for
> > but it looks like a step in the right direction. That improvement
> > seems to come purely because we avoid creating a heap tuple. I wonder
> > if there are some places up in the execution tree where full
> > heaptuples get formed again instead of continuing to use minimal
> > tuples or places where we perform some extra actions that are not
> > required.
> >
> > I didn't dig into the history to find out why we didn't modernize
> > pg_buffercache_pages(). I don't see any hazard though.
>
> Committed this modernization patch, thanks!
>
> It would be nice to have a proper row-at-a-time mode that would avoid
> materializing the result, but collecting all the data in a temporary
> array is clearly worse than just putting them to the tuplestore
> directly. The only reason I can think of why we'd prefer to use a
> temporary array like that is to get a more consistent snapshot of all
> the buffers, by keeping the time spent scanning the buffers as short as
> possible. But we're not getting a consistent view anyway, it's just a
> matter of degree.
>

Thanks a lot. Makes code in buffer resizing a bit simpler esp. code
changes in this module. Probably it won't need any code changes now in
the buffer resizing patches.

> I wondered about this in pg_buffercache_pages.c:
>
> >       /*
> >        * To smoothly support upgrades from version 1.0 of this extension
> >        * transparently handle the (non-)existence of the pinning_backends
> >        * column. We unfortunately have to get the result type for that... - we
> >        * can't use the result type determined by the function definition without
> >        * potentially crashing when somebody uses the old (or even wrong)
> >        * function definition though.
> >        */
> >       if (get_call_result_type(fcinfo, NULL, &expected_tupledesc) != TYPEFUNC_COMPOSITE)
> >               elog(ERROR, "return type must be a row type");
> >
> >       if (expected_tupledesc->natts < NUM_BUFFERCACHE_PAGES_MIN_ELEM ||
> >               expected_tupledesc->natts > NUM_BUFFERCACHE_PAGES_ELEM)
> >               elog(ERROR, "incorrect number of output arguments");
>
> I guess it's still needed, if you have pg_upgraded all the way from 1.0.
> To test that, I created this view to match the old 1.0 definition:
>
> CREATE VIEW public.legacy_pg_buffercache AS
>   SELECT bufferid,
>      relfilenode,
>      reltablespace,
>      reldatabase,
>      relforknumber,
>      relblocknumber,
>      isdirty,
>      usagecount
>     FROM public.pg_buffercache_pages() p(bufferid integer, relfilenode
> oid, reltablespace oid, reldatabase oid, relforknumber smallint,
> relblocknumber bigint, isdirty boolean, usagecount smallint);
>
> "select * from public.legacy_pg_buffercache" still works, so all good.

Yeah. At some point we should get rid of this code, but it would
require some "upgrade" action from the customer as well.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Implement waiting for wal lsn replay: reloaded
Next
From: Heikki Linnakangas
Date:
Subject: Re: Better shared data structure management and resizable shared data structures