Re: Checksum errors in pg_stat_database - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Checksum errors in pg_stat_database
Date
Msg-id CABUevExzWLiQrAf-UohmTn5seLcNet0X5-HVP9Wq_TRRm_43xw@mail.gmail.com
Whole thread Raw
In response to Re: Checksum errors in pg_stat_database  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Checksum errors in pg_stat_database
List pgsql-hackers
On Fri, Jan 11, 2019 at 9:20 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:



On 1/11/19 7:40 PM, Robert Haas wrote:
> On Fri, Jan 11, 2019 at 5:21 AM Magnus Hagander <magnus@hagander.net> wrote:
>> Would it make sense to add a column to pg_stat_database showing
>> the total number of checksum errors that have occurred in a database?
>>
>> It's really a ">1 means it's bad", but it's a lot easier to monitor
>> that in the statistics views, and given how much a lot of people
>> set their systems out to log, it's far too easy to miss individual
>> checksum matches in the logs.
>>
>> If we track it at the database level, I don't think the overhead
>> of adding one more counter would be very high either.
>
> It's probably not the idea way to track it.  If you have a terabyte or
> fifty of data, and you see that you have some checksum failures, good
> luck finding the offending blocks.
>

Isn't that somewhat similar to deadlocks, which we also track in
pg_stat_database? The number of deadlocks is rather useless on it's own,
you need to dive into the server log to find the details. Same for
checksum errors.

It is a bit similar yeah. Though a checksum counter is really a "you need to look at fixing this right away" in a bit more sense than deadlocks. But yes, the fact that we already tracks deadlocks there is a good example. (Of course, I believe I added that one at some point as well, so I'm clearly biased there)


> But I'm tentatively in favor of your proposal anyway, because it's
> pretty simple and cheap and might help people, and doing something
> noticeably better is probably annoyingly complicated.
>

+1

Yeah, that's the idea behind it -- it's cheap, and an early-warning-indicator.  

--

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Checksum errors in pg_stat_database
Next
From: Tom Lane
Date:
Subject: Re: port of INSTALL file generation to XSLT