On 1/11/19 7:40 PM, Robert Haas wrote: > On Fri, Jan 11, 2019 at 5:21 AM Magnus Hagander <magnus@hagander.net> wrote: >> Would it make sense to add a column to pg_stat_database showing >> the total number of checksum errors that have occurred in a database? >> >> It's really a ">1 means it's bad", but it's a lot easier to monitor >> that in the statistics views, and given how much a lot of people >> set their systems out to log, it's far too easy to miss individual >> checksum matches in the logs. >> >> If we track it at the database level, I don't think the overhead >> of adding one more counter would be very high either. > > It's probably not the idea way to track it. If you have a terabyte or > fifty of data, and you see that you have some checksum failures, good > luck finding the offending blocks. >
Isn't that somewhat similar to deadlocks, which we also track in pg_stat_database? The number of deadlocks is rather useless on it's own, you need to dive into the server log to find the details. Same for checksum errors.
It is a bit similar yeah. Though a checksum counter is really a "you need to look at fixing this right away" in a bit more sense than deadlocks. But yes, the fact that we already tracks deadlocks there is a good example. (Of course, I believe I added that one at some point as well, so I'm clearly biased there)
> But I'm tentatively in favor of your proposal anyway, because it's > pretty simple and cheap and might help people, and doing something > noticeably better is probably annoyingly complicated. >
+1
Yeah, that's the idea behind it -- it's cheap, and an early-warning-indicator.