Thread: monitoring usage count distribution
My colleague Jeremy Schneider (CC'd) was recently looking into usage count distributions for various workloads, and he mentioned that it would be nice to have an easy way to do $SUBJECT. I've attached a patch that adds a pg_buffercache_usage_counts() function. This function returns a row per possible usage count with some basic information about the corresponding buffers. postgres=# SELECT * FROM pg_buffercache_usage_counts(); usage_count | buffers | dirty | pinned -------------+---------+-------+-------- 0 | 0 | 0 | 0 1 | 1436 | 671 | 0 2 | 102 | 88 | 0 3 | 23 | 21 | 0 4 | 9 | 7 | 0 5 | 164 | 106 | 0 (6 rows) This new function provides essentially the same information as pg_buffercache_summary(), but pg_buffercache_summary() only shows the average usage count for the buffers in use. If there is interest in this idea, another approach to consider could be to alter pg_buffercache_summary() instead. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, 30 Jan 2023 at 18:31, Nathan Bossart <nathandbossart@gmail.com> wrote: > > My colleague Jeremy Schneider (CC'd) was recently looking into usage count > distributions for various workloads, and he mentioned that it would be nice > to have an easy way to do $SUBJECT. I've attached a patch that adds a > pg_buffercache_usage_counts() function. This function returns a row per > possible usage count with some basic information about the corresponding > buffers. > > postgres=# SELECT * FROM pg_buffercache_usage_counts(); > usage_count | buffers | dirty | pinned > -------------+---------+-------+-------- > 0 | 0 | 0 | 0 > 1 | 1436 | 671 | 0 > 2 | 102 | 88 | 0 > 3 | 23 | 21 | 0 > 4 | 9 | 7 | 0 > 5 | 164 | 106 | 0 > (6 rows) > > This new function provides essentially the same information as > pg_buffercache_summary(), but pg_buffercache_summary() only shows the > average usage count for the buffers in use. If there is interest in this > idea, another approach to consider could be to alter > pg_buffercache_summary() instead. Tom expressed skepticism that there's wide interest here. It seems as much from the lack of response. But perhaps that's just because people don't understand what the importance of this info is -- I certainly don't :) I feel like the original sin here is having the function return an aggregate data. If it returned the raw data then people could slice, dice, and aggregate the data in any ways they want using SQL. And perhaps people would come up with queries that have more readily interpretable important information? Obviously there are performance questions in that but I suspect they might be solvable given how small the data for each buffer are. Just as a warning though -- if nobody was interested in this patch please don't take my comments as a recommendation that you spend a lot of time developing a more complex version in the same direction without seeing if anyone agrees with my suggestion :) -- greg
On Mon, Jan 30, 2023 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > My colleague Jeremy Schneider (CC'd) was recently looking into usage count > distributions for various workloads, and he mentioned that it would be nice > to have an easy way to do $SUBJECT. I've attached a patch that adds a > pg_buffercache_usage_counts() function. This function returns a row per > possible usage count with some basic information about the corresponding > buffers. > > postgres=# SELECT * FROM pg_buffercache_usage_counts(); > usage_count | buffers | dirty | pinned > -------------+---------+-------+-------- > 0 | 0 | 0 | 0 > 1 | 1436 | 671 | 0 > 2 | 102 | 88 | 0 > 3 | 23 | 21 | 0 > 4 | 9 | 7 | 0 > 5 | 164 | 106 | 0 > (6 rows) > > This new function provides essentially the same information as > pg_buffercache_summary(), but pg_buffercache_summary() only shows the > average usage count for the buffers in use. If there is interest in this > idea, another approach to consider could be to alter > pg_buffercache_summary() instead. I'm skeptical that pg_buffercache_summary() is a good idea at all, but having it display the average usage count seems like a particularly poor idea. That information is almost meaningless. Replacing that with a six-element integer array would be a clear improvement and, IMHO, better than adding yet another function to the extension. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 30, 2023 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> My colleague Jeremy Schneider (CC'd) was recently looking into usage count >> distributions for various workloads, and he mentioned that it would be nice >> to have an easy way to do $SUBJECT. > I'm skeptical that pg_buffercache_summary() is a good idea at all, but > having it display the average usage count seems like a particularly > poor idea. That information is almost meaningless. Replacing that with > a six-element integer array would be a clear improvement and, IMHO, > better than adding yet another function to the extension. I had not realized that pg_buffercache_summary() is new in v16, but since it is, we still have time to rethink its definition. +1 for de-aggregating --- I agree that the overall average is unlikely to have much value. regards, tom lane
On Tue, Apr 4, 2023 at 2:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Jan 30, 2023 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> My colleague Jeremy Schneider (CC'd) was recently looking into usage count > >> distributions for various workloads, and he mentioned that it would be nice > >> to have an easy way to do $SUBJECT. > > > I'm skeptical that pg_buffercache_summary() is a good idea at all, but > > having it display the average usage count seems like a particularly > > poor idea. That information is almost meaningless. Replacing that with > > a six-element integer array would be a clear improvement and, IMHO, > > better than adding yet another function to the extension. > > I had not realized that pg_buffercache_summary() is new in v16, > but since it is, we still have time to rethink its definition. > +1 for de-aggregating --- I agree that the overall average is > unlikely to have much value. So, I have used pg_buffercache_summary() to give me a high-level idea of the usage count when I am benchmarking a particular workload -- and I would have found it harder to look at 6 rows instead of 1. That being said, having six rows is more versatile as you could aggregate it yourself easily. - Melanie
Hi, On 2023-04-04 14:14:36 -0400, Greg Stark wrote: > Tom expressed skepticism that there's wide interest here. It seems as > much from the lack of response. But perhaps that's just because people > don't understand what the importance of this info is -- I certainly > don't :) pg_buffercache has exposed the raw data for a long time. The problem is that it's way too slow to look at that way. Greetings, Andres Freund
Hi, On 2023-04-04 14:31:36 -0400, Robert Haas wrote: > On Mon, Jan 30, 2023 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > My colleague Jeremy Schneider (CC'd) was recently looking into usage count > > distributions for various workloads, and he mentioned that it would be nice > > to have an easy way to do $SUBJECT. I've attached a patch that adds a > > pg_buffercache_usage_counts() function. This function returns a row per > > possible usage count with some basic information about the corresponding > > buffers. > > > > postgres=# SELECT * FROM pg_buffercache_usage_counts(); > > usage_count | buffers | dirty | pinned > > -------------+---------+-------+-------- > > 0 | 0 | 0 | 0 > > 1 | 1436 | 671 | 0 > > 2 | 102 | 88 | 0 > > 3 | 23 | 21 | 0 > > 4 | 9 | 7 | 0 > > 5 | 164 | 106 | 0 > > (6 rows) > > > > This new function provides essentially the same information as > > pg_buffercache_summary(), but pg_buffercache_summary() only shows the > > average usage count for the buffers in use. If there is interest in this > > idea, another approach to consider could be to alter > > pg_buffercache_summary() instead. > > I'm skeptical that pg_buffercache_summary() is a good idea at all Why? It's about two orders of magnitude faster than querying the equivalent data by aggregating in SQL. And knowing how many free and dirty buffers are over time is something quite useful to monitor / correlate with performance issues. > but having it display the average usage count seems like a particularly poor > idea. That information is almost meaningless. I agree there are more meaningful ways to represent the data, but I don't agree that it's almost meaningless. It can give you a rough estimate of whether data in s_b is referenced or not. > Replacing that with a six-element integer array would be a clear improvement > and, IMHO, better than adding yet another function to the extension. I'd have no issue with that. Greetings, Andres Freund
On Tue, Apr 4, 2023 at 7:29 PM Andres Freund <andres@anarazel.de> wrote: > > I'm skeptical that pg_buffercache_summary() is a good idea at all > > Why? It's about two orders of magnitude faster than querying the equivalent > data by aggregating in SQL. And knowing how many free and dirty buffers are > over time is something quite useful to monitor / correlate with performance > issues. Well, OK, fair point. > > but having it display the average usage count seems like a particularly poor > > idea. That information is almost meaningless. > > I agree there are more meaningful ways to represent the data, but I don't > agree that it's almost meaningless. It can give you a rough estimate of > whether data in s_b is referenced or not. I might have overstated my case. > > Replacing that with a six-element integer array would be a clear improvement > > and, IMHO, better than adding yet another function to the extension. > > I'd have no issue with that. Cool. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Apr 05, 2023 at 09:44:58AM -0400, Robert Haas wrote: > On Tue, Apr 4, 2023 at 7:29 PM Andres Freund <andres@anarazel.de> wrote: >> > Replacing that with a six-element integer array would be a clear improvement >> > and, IMHO, better than adding yet another function to the extension. >> >> I'd have no issue with that. > > Cool. The six-element array approach won't show the number of dirty and pinned buffers for each usage count, but I'm not sure that's a deal-breaker. Barring objections, I'll post an updated patch shortly with that approach. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Apr 5, 2023 at 1:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Wed, Apr 05, 2023 at 09:44:58AM -0400, Robert Haas wrote: > > On Tue, Apr 4, 2023 at 7:29 PM Andres Freund <andres@anarazel.de> wrote: > >> > Replacing that with a six-element integer array would be a clear improvement > >> > and, IMHO, better than adding yet another function to the extension. > >> > >> I'd have no issue with that. > > > > Cool. > > The six-element array approach won't show the number of dirty and pinned > buffers for each usage count, but I'm not sure that's a deal-breaker. > Barring objections, I'll post an updated patch shortly with that approach. Right, well, I would personally be OK with 6 rows too, but I don't know what other people want. I think either this or that is better than average. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 5, 2023 at 1:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> The six-element array approach won't show the number of dirty and pinned >> buffers for each usage count, but I'm not sure that's a deal-breaker. >> Barring objections, I'll post an updated patch shortly with that approach. > Right, well, I would personally be OK with 6 rows too, but I don't > know what other people want. I think either this or that is better > than average. Seems to me that six rows would be easier to aggregate manually. An array column seems less SQL-ish and harder to manipulate. regards, tom lane
Hi, On 2023-04-05 15:00:20 -0400, Robert Haas wrote: > On Wed, Apr 5, 2023 at 1:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Apr 05, 2023 at 09:44:58AM -0400, Robert Haas wrote: > > > On Tue, Apr 4, 2023 at 7:29 PM Andres Freund <andres@anarazel.de> wrote: > > >> > Replacing that with a six-element integer array would be a clear improvement > > >> > and, IMHO, better than adding yet another function to the extension. > > >> > > >> I'd have no issue with that. > > > > > > Cool. > > > > The six-element array approach won't show the number of dirty and pinned > > buffers for each usage count, but I'm not sure that's a deal-breaker. > > Barring objections, I'll post an updated patch shortly with that approach. > > Right, well, I would personally be OK with 6 rows too, but I don't > know what other people want. I think either this or that is better > than average. I would not mind having a separate function returning 6 rows, if we really want that, but making pg_buffercache_summary() return 6 rows would imo make it less useful for getting a quick overview. At least I am not that quick summing up multple rows, just to get a quick overview over the number of dirty rows. Greetings, Andres Freund
On Wed, Apr 05, 2023 at 03:07:10PM -0400, Tom Lane wrote: > Seems to me that six rows would be easier to aggregate manually. > An array column seems less SQL-ish and harder to manipulate. +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Apr 05, 2023 at 12:09:21PM -0700, Andres Freund wrote: > I would not mind having a separate function returning 6 rows, if we really > want that, but making pg_buffercache_summary() return 6 rows would imo make it > less useful for getting a quick overview. At least I am not that quick summing > up multple rows, just to get a quick overview over the number of dirty rows. This is what v1-0001 does. We could probably make pg_buffercache_summary a view on pg_buffercache_usage_counts, too, but that doesn't strike me as tremendously important. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > On Wed, Apr 05, 2023 at 12:09:21PM -0700, Andres Freund wrote: >> I would not mind having a separate function returning 6 rows, if we really >> want that, but making pg_buffercache_summary() return 6 rows would imo make it >> less useful for getting a quick overview. At least I am not that quick summing >> up multple rows, just to get a quick overview over the number of dirty rows. > This is what v1-0001 does. We could probably make pg_buffercache_summary a > view on pg_buffercache_usage_counts, too, but that doesn't strike me as > tremendously important. Having two functions doesn't seem unreasonable to me either. Robert spoke against it to start with, does he still want to advocate for that? regards, tom lane
On Wed, Apr 5, 2023 at 4:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: > > On Wed, Apr 05, 2023 at 12:09:21PM -0700, Andres Freund wrote: > >> I would not mind having a separate function returning 6 rows, if we really > >> want that, but making pg_buffercache_summary() return 6 rows would imo make it > >> less useful for getting a quick overview. At least I am not that quick summing > >> up multple rows, just to get a quick overview over the number of dirty rows. > > > This is what v1-0001 does. We could probably make pg_buffercache_summary a > > view on pg_buffercache_usage_counts, too, but that doesn't strike me as > > tremendously important. > > Having two functions doesn't seem unreasonable to me either. > Robert spoke against it to start with, does he still want to > advocate for that? My position is that if we replace the average usage count with something that gives a count for each usage count, that's a win. I don't have a strong opinion on an array vs. a result set vs. some other way of doing that. If we leave the average usage count in there and add yet another function to give the detail, I tend to think that's not a great plan, but I'll desist if everyone else thinks otherwise. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 5, 2023 at 4:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Having two functions doesn't seem unreasonable to me either. >> Robert spoke against it to start with, does he still want to >> advocate for that? > My position is that if we replace the average usage count with > something that gives a count for each usage count, that's a win. I > don't have a strong opinion on an array vs. a result set vs. some > other way of doing that. If we leave the average usage count in there > and add yet another function to give the detail, I tend to think > that's not a great plan, but I'll desist if everyone else thinks > otherwise. There seems to be enough support for the existing summary function definition to leave it as-is; Andres likes it for one, and I'm not excited about trying to persuade him he's wrong. But a second slightly-less-aggregated summary function is clearly useful as well. So I'm now thinking that we do want the patch as-submitted. (Caveat: I've not read the patch, just the description.) regards, tom lane
On Thu, Apr 06, 2023 at 01:32:35PM -0400, Tom Lane wrote: > There seems to be enough support for the existing summary function > definition to leave it as-is; Andres likes it for one, and I'm not > excited about trying to persuade him he's wrong. But a second > slightly-less-aggregated summary function is clearly useful as well. > So I'm now thinking that we do want the patch as-submitted. > (Caveat: I've not read the patch, just the description.) In case we want to do both, here's a 0002 that changes usagecount_avg to an array of usage counts. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Nathan Bossart <nathandbossart@gmail.com> writes: > On Thu, Apr 06, 2023 at 01:32:35PM -0400, Tom Lane wrote: >> There seems to be enough support for the existing summary function >> definition to leave it as-is; Andres likes it for one, and I'm not >> excited about trying to persuade him he's wrong. But a second >> slightly-less-aggregated summary function is clearly useful as well. >> So I'm now thinking that we do want the patch as-submitted. >> (Caveat: I've not read the patch, just the description.) > In case we want to do both, here's a 0002 that changes usagecount_avg to an > array of usage counts. I'm not sure if there is consensus for 0002, but I reviewed and pushed 0001. I made one non-cosmetic change: it no longer skips invalid buffers. Otherwise, the row for usage count 0 would be pretty useless. Also it seemed to me that sum(buffers) ought to agree with the shared_buffers setting. regards, tom lane
On Fri, Apr 07, 2023 at 02:29:31PM -0400, Tom Lane wrote: > I'm not sure if there is consensus for 0002, but I reviewed and pushed > 0001. I made one non-cosmetic change: it no longer skips invalid > buffers. Otherwise, the row for usage count 0 would be pretty useless. > Also it seemed to me that sum(buffers) ought to agree with the > shared_buffers setting. Makes sense. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com