Re: pg_buffercache: Add per-relation summary stats - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_buffercache: Add per-relation summary stats
Date
Msg-id bdljrq3k5wm2jzxntpbenzbyozjx4syf43taodgxasc6qbqooo@nvnstsdlhboe
Whole thread
In response to Re: pg_buffercache: Add per-relation summary stats  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: pg_buffercache: Add per-relation summary stats
List pgsql-hackers
Hi,

On 2026-04-07 16:07:45 +0300, Heikki Linnakangas wrote:
> On 28/03/2026 06:18, Ashutosh Bapat wrote:
> > Parallely myself and Palak Chaturvedi developed a quick patch to
> > modernise pg_buffercache_pages() and use tuplestore so that it doesn't
> > have to rely on NBuffers being the same between start of the scan,
> > when memory allocated, when the scan ends - a condition possible with
> > resizing buffer cache. It seems to improve the timings by about 10-30%
> > on my laptop for 128MB buffercache size. Without this patch the time
> > taken to execute Lukas's query varies between 10-15ms on my laptop.
> > With this patch it varies between 8-9ms. So the timing is more stable
> > as a side effect. It's not a 10x improvement that we are looking for
> > but it looks like a step in the right direction. That improvement
> > seems to come purely because we avoid creating a heap tuple. I wonder
> > if there are some places up in the execution tree where full
> > heaptuples get formed again instead of continuing to use minimal
> > tuples or places where we perform some extra actions that are not
> > required.

I don't think that's the reason for the improvement - tuplestore_putvalues()
forms a minimal tuple, and the cost to form a minimal tuple and a heap tuple
aren't meaningfully different.

I think the problem is that we materialize rowmode SRFs as a tuplestore if
they are in the from list.  You can easily see this even with just
generate_series():

postgres[1520825][1]=# SELECT count(*) FROM generate_series(1, 1000000);
┌─────────┐
│  count  │
├─────────┤
│ 1000000 │
└─────────┘
(1 row)

Time: 117.939 ms
postgres[1520825][1]=# SELECT count(*) FROM (SELECT generate_series(1, 1000000));
┌─────────┐
│  count  │
├─────────┤
│ 1000000 │
└─────────┘
(1 row)

Time: 58.914 ms


Of course, because pg_buffercache_pages() is archaicially defined without
defininig its output columns, you can't actually use it in the select list.

But that can be fixed:

CREATE FUNCTION pg_buffercache_pages_fast(OUT bufferid integer, OUT relfilenode oid, OUT reltablespace oid, OUT
reldatabaseoid,
 
         OUT relforknumber int2, OUT relblocknumber int8, OUT isdirty bool, OUT usagecount int2,
         OUT pinning_backends int4)
RETURNS SETOF RECORD
AS '$libdir/pg_buffercache', 'pg_buffercache_pages'
LANGUAGE C PARALLEL SAFE;

60GB of s_b, mostly filled, with 257c8231bf97a77378f6fedb826b1243f0a41612
reverted.

SELECT count(*) FROM (SELECT pg_buffercache_pages_fast());
Time: 1518.704 ms (00:01.519)

SELECT count(*) FROM pg_buffercache_pages_fast();
Time: 2008.101 ms (00:02.008)


> > I didn't dig into the history to find out why we didn't modernize
> > pg_buffercache_pages(). I don't see any hazard though.
>
> Committed this modernization patch, thanks!
>
> It would be nice to have a proper row-at-a-time mode that would avoid
> materializing the result, but collecting all the data in a temporary array
> is clearly worse than just putting them to the tuplestore directly. The only
> reason I can think of why we'd prefer to use a temporary array like that is
> to get a more consistent snapshot of all the buffers, by keeping the time
> spent scanning the buffers as short as possible. But we're not getting a
> consistent view anyway, it's just a matter of degree.

Seems like a reasonably large difference in degree whether you have a snapshot
collected in one loop, or you do things like spilling a tuplestore to disk in
between.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Implement waiting for wal lsn replay: reloaded
Next
From: "Matheus Alcantara"
Date:
Subject: Re: Asynchronous MergeAppend