Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Online checksums verification in the backend
Date
Msg-id 20200311071823.GA4643@nol
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Online checksums verification in the backend
Re: Online checksums verification in the backend
List pgsql-hackers
On Tue, Dec 10, 2019 at 11:12:34AM +0100, Julien Rouhaud wrote:
> On Tue, Dec 10, 2019 at 3:26 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Mon, Dec 09, 2019 at 07:02:43PM +0100, Julien Rouhaud wrote:
> > > On Mon, Dec 9, 2019 at 5:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >> Some people might prefer notices, because you can get those while the
> > >> thing is still running, rather than a result set, which you will only
> > >> see when the query finishes. Other people might prefer an SRF, because
> > >> they want to have the data in structured form so that they can
> > >> postprocess it. Not sure what you mean by "more globally."
> > >
> > > I meant having the results available system-wide, not only to the
> > > caller.  I think that emitting a log/notice level should always be
> > > done on top on whatever other communication facility we're using.
> >
> > The problem of notice and logs is that they tend to be ignored.  Now I
> > don't see no problems either in adding something into the logs which
> > can be found later on for parsing on top of a SRF returned by the
> > caller which includes all the corruption details, say with pgbadger
> > or your friendly neighborhood grep.  I think that any backend function
> > should also make sure to call pgstat_report_checksum_failure() to
> > report a report visible at database-level in the catalogs, so as it is
> > possible to use that as a cheap high-level warning.  The details of
> > the failures could always be dug from the logs or the result of the
> > function itself after finding out that something is wrong in
> > pg_stat_database.
>
> I agree that adding extra information in the logs and calling
> pgstat_report_checksum_failure is a must do, and I changed that
> locally.  However, I doubt that the logs is the right place to find
> the details of corrupted blocks.  There's no guarantee that the file
> will be accessible to the DBA, nor that the content won't get
> truncated by the time it's needed.  I really think that corruption is
> important enough to justify more specific location.


The cfbot reported a build failure, so here's a rebased v2 which also contains
the pg_stat_report_failure() call and extra log info.

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [PATCH] Skip llvm bytecode generation if LLVM is missing
Next
From: Fujii Masao
Date:
Subject: Re: Some problems of recovery conflict wait events