On Mon, 30 Jan 2023 at 18:31, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> My colleague Jeremy Schneider (CC'd) was recently looking into usage count
> distributions for various workloads, and he mentioned that it would be nice
> to have an easy way to do $SUBJECT. I've attached a patch that adds a
> pg_buffercache_usage_counts() function. This function returns a row per
> possible usage count with some basic information about the corresponding
> buffers.
>
> postgres=# SELECT * FROM pg_buffercache_usage_counts();
> usage_count | buffers | dirty | pinned
> -------------+---------+-------+--------
> 0 | 0 | 0 | 0
> 1 | 1436 | 671 | 0
> 2 | 102 | 88 | 0
> 3 | 23 | 21 | 0
> 4 | 9 | 7 | 0
> 5 | 164 | 106 | 0
> (6 rows)
>
> This new function provides essentially the same information as
> pg_buffercache_summary(), but pg_buffercache_summary() only shows the
> average usage count for the buffers in use. If there is interest in this
> idea, another approach to consider could be to alter
> pg_buffercache_summary() instead.
Tom expressed skepticism that there's wide interest here. It seems as
much from the lack of response. But perhaps that's just because people
don't understand what the importance of this info is -- I certainly
don't :)
I feel like the original sin here is having the function return an
aggregate data. If it returned the raw data then people could slice,
dice, and aggregate the data in any ways they want using SQL. And
perhaps people would come up with queries that have more readily
interpretable important information?
Obviously there are performance questions in that but I suspect they
might be solvable given how small the data for each buffer are.
Just as a warning though -- if nobody was interested in this patch
please don't take my comments as a recommendation that you spend a lot
of time developing a more complex version in the same direction
without seeing if anyone agrees with my suggestion :)
--
greg