Re: monitoring usage count distribution - Mailing list pgsql-hackers

From Andres Freund
Subject Re: monitoring usage count distribution
Date
Msg-id 20230404232919.uibzbhjdylk3mlvp@awork3.anarazel.de
Whole thread Raw
In response to Re: monitoring usage count distribution  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: monitoring usage count distribution
List pgsql-hackers
Hi,

On 2023-04-04 14:31:36 -0400, Robert Haas wrote:
> On Mon, Jan 30, 2023 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > My colleague Jeremy Schneider (CC'd) was recently looking into usage count
> > distributions for various workloads, and he mentioned that it would be nice
> > to have an easy way to do $SUBJECT.  I've attached a patch that adds a
> > pg_buffercache_usage_counts() function.  This function returns a row per
> > possible usage count with some basic information about the corresponding
> > buffers.
> >
> >     postgres=# SELECT * FROM pg_buffercache_usage_counts();
> >      usage_count | buffers | dirty | pinned
> >     -------------+---------+-------+--------
> >                0 |       0 |     0 |      0
> >                1 |    1436 |   671 |      0
> >                2 |     102 |    88 |      0
> >                3 |      23 |    21 |      0
> >                4 |       9 |     7 |      0
> >                5 |     164 |   106 |      0
> >     (6 rows)
> >
> > This new function provides essentially the same information as
> > pg_buffercache_summary(), but pg_buffercache_summary() only shows the
> > average usage count for the buffers in use.  If there is interest in this
> > idea, another approach to consider could be to alter
> > pg_buffercache_summary() instead.
> 
> I'm skeptical that pg_buffercache_summary() is a good idea at all

Why? It's about two orders of magnitude faster than querying the equivalent
data by aggregating in SQL. And knowing how many free and dirty buffers are
over time is something quite useful to monitor / correlate with performance
issues.


> but having it display the average usage count seems like a particularly poor
> idea. That information is almost meaningless.

I agree there are more meaningful ways to represent the data, but I don't
agree that it's almost meaningless. It can give you a rough estimate of
whether data in s_b is referenced or not.


> Replacing that with a six-element integer array would be a clear improvement
> and, IMHO, better than adding yet another function to the extension.

I'd have no issue with that.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: monitoring usage count distribution
Next
From: Peter Smith
Date:
Subject: CREATE SUBSCRIPTION -- add missing tab-completes