Re: pg_buffercache: Add per-relation summary stats - Mailing list pgsql-hackers
| From | Andres Freund |
|---|---|
| Subject | Re: pg_buffercache: Add per-relation summary stats |
| Date | |
| Msg-id | bdljrq3k5wm2jzxntpbenzbyozjx4syf43taodgxasc6qbqooo@nvnstsdlhboe Whole thread |
| In response to | Re: pg_buffercache: Add per-relation summary stats (Heikki Linnakangas <hlinnaka@iki.fi>) |
| Responses |
Re: pg_buffercache: Add per-relation summary stats
|
| List | pgsql-hackers |
Hi,
On 2026-04-07 16:07:45 +0300, Heikki Linnakangas wrote:
> On 28/03/2026 06:18, Ashutosh Bapat wrote:
> > Parallely myself and Palak Chaturvedi developed a quick patch to
> > modernise pg_buffercache_pages() and use tuplestore so that it doesn't
> > have to rely on NBuffers being the same between start of the scan,
> > when memory allocated, when the scan ends - a condition possible with
> > resizing buffer cache. It seems to improve the timings by about 10-30%
> > on my laptop for 128MB buffercache size. Without this patch the time
> > taken to execute Lukas's query varies between 10-15ms on my laptop.
> > With this patch it varies between 8-9ms. So the timing is more stable
> > as a side effect. It's not a 10x improvement that we are looking for
> > but it looks like a step in the right direction. That improvement
> > seems to come purely because we avoid creating a heap tuple. I wonder
> > if there are some places up in the execution tree where full
> > heaptuples get formed again instead of continuing to use minimal
> > tuples or places where we perform some extra actions that are not
> > required.
I don't think that's the reason for the improvement - tuplestore_putvalues()
forms a minimal tuple, and the cost to form a minimal tuple and a heap tuple
aren't meaningfully different.
I think the problem is that we materialize rowmode SRFs as a tuplestore if
they are in the from list. You can easily see this even with just
generate_series():
postgres[1520825][1]=# SELECT count(*) FROM generate_series(1, 1000000);
┌─────────┐
│ count │
├─────────┤
│ 1000000 │
└─────────┘
(1 row)
Time: 117.939 ms
postgres[1520825][1]=# SELECT count(*) FROM (SELECT generate_series(1, 1000000));
┌─────────┐
│ count │
├─────────┤
│ 1000000 │
└─────────┘
(1 row)
Time: 58.914 ms
Of course, because pg_buffercache_pages() is archaicially defined without
defininig its output columns, you can't actually use it in the select list.
But that can be fixed:
CREATE FUNCTION pg_buffercache_pages_fast(OUT bufferid integer, OUT relfilenode oid, OUT reltablespace oid, OUT
reldatabaseoid,
OUT relforknumber int2, OUT relblocknumber int8, OUT isdirty bool, OUT usagecount int2,
OUT pinning_backends int4)
RETURNS SETOF RECORD
AS '$libdir/pg_buffercache', 'pg_buffercache_pages'
LANGUAGE C PARALLEL SAFE;
60GB of s_b, mostly filled, with 257c8231bf97a77378f6fedb826b1243f0a41612
reverted.
SELECT count(*) FROM (SELECT pg_buffercache_pages_fast());
Time: 1518.704 ms (00:01.519)
SELECT count(*) FROM pg_buffercache_pages_fast();
Time: 2008.101 ms (00:02.008)
> > I didn't dig into the history to find out why we didn't modernize
> > pg_buffercache_pages(). I don't see any hazard though.
>
> Committed this modernization patch, thanks!
>
> It would be nice to have a proper row-at-a-time mode that would avoid
> materializing the result, but collecting all the data in a temporary array
> is clearly worse than just putting them to the tuplestore directly. The only
> reason I can think of why we'd prefer to use a temporary array like that is
> to get a more consistent snapshot of all the buffers, by keeping the time
> spent scanning the buffers as short as possible. But we're not getting a
> consistent view anyway, it's just a matter of degree.
Seems like a reasonably large difference in degree whether you have a snapshot
collected in one loop, or you do things like spilling a tuplestore to disk in
between.
Greetings,
Andres Freund
pgsql-hackers by date: