Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Online checksums verification in the backend
Date
Msg-id CA+fd4k5_gxSrcjXoxBk_47vB+xYGKzvcLGtwaLCbcBmW3v5oGw@mail.gmail.com
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Online checksums verification in the backend  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Wed, 11 Mar 2020 at 16:18, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Dec 10, 2019 at 11:12:34AM +0100, Julien Rouhaud wrote:
> > On Tue, Dec 10, 2019 at 3:26 AM Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > On Mon, Dec 09, 2019 at 07:02:43PM +0100, Julien Rouhaud wrote:
> > > > On Mon, Dec 9, 2019 at 5:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > >> Some people might prefer notices, because you can get those while the
> > > >> thing is still running, rather than a result set, which you will only
> > > >> see when the query finishes. Other people might prefer an SRF, because
> > > >> they want to have the data in structured form so that they can
> > > >> postprocess it. Not sure what you mean by "more globally."
> > > >
> > > > I meant having the results available system-wide, not only to the
> > > > caller.  I think that emitting a log/notice level should always be
> > > > done on top on whatever other communication facility we're using.
> > >
> > > The problem of notice and logs is that they tend to be ignored.  Now I
> > > don't see no problems either in adding something into the logs which
> > > can be found later on for parsing on top of a SRF returned by the
> > > caller which includes all the corruption details, say with pgbadger
> > > or your friendly neighborhood grep.  I think that any backend function
> > > should also make sure to call pgstat_report_checksum_failure() to
> > > report a report visible at database-level in the catalogs, so as it is
> > > possible to use that as a cheap high-level warning.  The details of
> > > the failures could always be dug from the logs or the result of the
> > > function itself after finding out that something is wrong in
> > > pg_stat_database.
> >
> > I agree that adding extra information in the logs and calling
> > pgstat_report_checksum_failure is a must do, and I changed that
> > locally.  However, I doubt that the logs is the right place to find
> > the details of corrupted blocks.  There's no guarantee that the file
> > will be accessible to the DBA, nor that the content won't get
> > truncated by the time it's needed.  I really think that corruption is
> > important enough to justify more specific location.
>
>
> The cfbot reported a build failure, so here's a rebased v2 which also contains
> the pg_stat_report_failure() call and extra log info.

In addition to comments from Michael-san, here are my comments:

1.
+   if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
+       ereport(ERROR,
+               (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                errmsg("only superuser or a member of the
pg_read_server_files role may use this function")));
+
+   if (!DataChecksumsEnabled())
+       elog(ERROR, "Data checksums are not enabled");

I think it's better to reverse the order of the above checks.

2.
+#define CRF_COLS           5   /* Number of output arguments in the SRF */

Should it be SRF_COLS?

3.
+static void
+check_delay_point(void)
+{
+   /* Always check for interrupts */
+   CHECK_FOR_INTERRUPTS();
+
+   /* Nap if appropriate */
+   if (!InterruptPending && VacuumCostBalance >= VacuumCostLimit)
+   {
+       int         msec;
+
+       msec = VacuumCostDelay * VacuumCostBalance / VacuumCostLimit;
+       if (msec > VacuumCostDelay * 4)
+           msec = VacuumCostDelay * 4;
+
+       pg_usleep(msec * 1000L);
+
+       VacuumCostBalance = 0;
+
+       /* Might have gotten an interrupt while sleeping */
+       CHECK_FOR_INTERRUPTS();
+   }
+}

Even if we use vacuum delay for this function, I think we need to set
VacuumDelayActive and return if it's false, or it's better to just
return if VacuumCostDelay == 0.

4.
+static void
+check_all_relations(TupleDesc tupdesc, Tuplestorestate *tupstore,
+                         ForkNumber forknum)

I also agree with Michael-san to remove this function. Instead we can
check all relations by:

select pg_check_relation(oid) from pg_class;

6.
Other typos

s/dirted/dirtied/
s/explictly/explicitly/

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Next
From: Suraj Kharage
Date:
Subject: Re: backup manifests