Re: Summary function for pg_buffercache - Mailing list pgsql-hackers

From Melih Mutlu
Subject Re: Summary function for pg_buffercache
Date
Msg-id CAGPVpCRtjkm9jVAq6ND3NPTBAvs6mLYUu9fTm6ZgNh0FXNBU=g@mail.gmail.com
Whole thread Raw
In response to Re: Summary function for pg_buffercache  (Melih Mutlu <m.melihmutlu@gmail.com>)
Responses Re: Summary function for pg_buffercache
List pgsql-hackers
Hi,

Also I suggest changing the names of the columns in order to make them consistent with the rest of the system. If you consider pg_stat_activity and family [1] you will notice that the columns are named (entity)_(property), e.g. backend_xid, backend_type, client_addr, etc. So instead of used_buffers and unused_buffers the naming should be buffers_used and buffers_unused.

[1]: https://www.postgresql.org/docs/current/monitoring-stats.html
 
I changed these names and updated the patch.

However I have somewhat mixed feelings about avg_usagecount. Generally AVG() is a relatively useless methric for monitoring. What if the user wants MIN(), MAX() or let's say a 99th percentile? I suggest splitting it into usagecount_min, usagecount_max and usagecount_sum. AVG() can be derived as usercount_sum / used_buffers.

Won't be usagecount_max almost always 5 as "BM_MAX_USAGE_COUNT" set to 5 in buf_internals.h? I'm not sure about how much usagecount_min would add either. 
A usagecount is always an integer between 0 and 5, it's not something unbounded. I think the 99th percentile would be much better than average if strong outlier values could occur. But in this case, I feel like an average value would be sufficiently useful as well. 
usagecount_sum would actually be useful since average can be derived from it. If you think that the sum of usagecounts has a meaning just by itself, it makes sense to include it. Otherwise, wouldn't showing directly averaged value be more useful?

Aleksander, do you still think the average usagecount is a bit useless? Or does it make sense to you to keep it like this?

> I suggest we focus on saving the memory first and then think about the
> performance, if necessary.

Personally I think the locks part is at least as important - it's what makes
the production impact higher.

I agree that it's important due to its high impact. I'm not sure how to avoid any undefined behaviour without locks though.
Even with locks, performance is much better. But is it good enough for production?


Thanks,
Melih
 
Attachment

pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Add common function ReplicationOriginName.
Next
From: Richard Guo
Date:
Subject: Re: About displaying NestLoopParam