Home > mailing lists

Re: Checksum errors in pg_stat_database - Mailing list pgsql-hackers

From	Magnus Hagander
Subject	Re: Checksum errors in pg_stat_database
Date	December 11, 2022 20:18:42
Msg-id	CABUevExGXxStJaM0hLQY_kht_S3HnszgVH1=zk0xcx5ccz7tBQ@mail.gmail.com Whole thread
In response to	Re: Checksum errors in pg_stat_database ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses	Re: Checksum errors in pg_stat_database
List	pgsql-hackers

Tree view

On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:

On 4/2/19 7:06 PM, Magnus Hagander wrote:
> On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:
>
> On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
> > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:
> >> One thing which is not
> >> proposed on this patch, and I am fine with it as a first draft, is
> >> that we don't have any information about the broken block number and
> >> the file involved. My gut tells me that we'd want a separate view,
> >> like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
> >> blck) to be complete. But that's just for future work.
> >
> > That could indeed be nice.
>
> Actually, backpedaling on this one... pg_stat_checksums_details may
> be a bad idea as we could finish with one row per broken block. If
> a corruption is spreading quickly, pgstat would not be able to sustain
> that amount of objects. Having pg_stat_checksums would allow us to
> plugin more data easily based on the last failure state:
> - last relid of failure
> - last fork type of failure
> - last block number of failure.
> Not saying to do that now, but having that in pg_stat_database does
> not seem very natural to me. And on top of that we would have an
> extra row full of NULLs for shared objects in pg_stat_database if we
> adopt the unique view approach... I find that rather ugly.
>
>
> I think that tracking each and every block is of course a non-starter, as you've noticed.

I think that's less of a concern now that the stats collector process has gone and that the stats are now collected in shared memory, what do you think?

It would be less of a concern yes, but I think it still would be a concern. If you have a large amount of corruption you could quickly get to millions of rows to keep track of which would definitely be a problem in shared memory as well, wouldn't it?

But perhaps we could keep a list of "the last 100 checksum failures" or something like that?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

pgsql-hackers by date:

From: Tom Lane
Date: 11 December 2022, 18:29:57
Subject: Re: Error-safe user functions

From: Andres Freund
Date: 11 December 2022, 20:41:21
Subject: Re: Error-safe user functions

Re: Checksum errors in pg_stat_database - Mailing list pgsql-hackers

Previous

Next