Re: Checksum errors in pg_stat_database - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Checksum errors in pg_stat_database
Date
Msg-id CAOBaU_ZTFe-xrYNVu4pDPsi3JUOoncXBm5=hh7ocv_gd-hxDig@mail.gmail.com
Whole thread Raw
In response to Re: Checksum errors in pg_stat_database  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Checksum errors in pg_stat_database
List pgsql-hackers
On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sun, Mar 10, 2019 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus@hagander.net> wrote:
> > > >
> > > > On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >>
> > > >> Sorry, I have again new comments after a little bit more thinking.
> > > >> I'm wondering if we can do something about shared objects while we're
> > > >> at it.  They don't belong to any database, so it's a little bit
> > > >> orthogonal to this proposal, but it seems quite important to track
> > > >> error on those too!
> > > >>
> > > >> What about adding a new field in PgStat_GlobalStats for that?  We can
> > > >> use the same lastDir to easily detect such objects and slightly adapt
> > > >> sendFile again, which seems quite straightforward.
> > >
> > > > Question is then what number that should show -- only the checksum counter in non-database-fields, or the total
numberacross the cluster?
 
> > >
> > > I'd say only for non-database-fields errors, especially if we can
> > > reset each counters separately.  If necessary, we can add a new view
> > > to give a global overview of checksum errors for DBA convenience.
> >
> > I'm considering adding a new PgStat_ChecksumStats for that purpose
> > instead, but I don't know if that's acceptable to do so in the last
> > commitfest.  It seems worthwhile to add it eventually, since we'll
> > probably end up having more things to report to users related to
> > checksum.  Online enabling of checksum could be the most immediate
> > potential target.
>
> I wasn't aware that we were already storing informations about shared
> objects in PgStat_StatDBEntry, with an InvalidOid as databaseid
> (though we don't have any system view that are actually showing
> information for such objects).
>
> As a result I ended up simply adding counters for the number of total
> checks and the timestamp of the last failure in PgStat_StatDBEntry,
> making attached patch very lightweight.  I moved all the checksum
> related counters out of pg_stat_database in a new pg_stat_checksum
> view.  It avoids to make pg_stat_database too wide, and also allows to
> display information about shared object in this new view (some of the
> other counters don't really make sense for shared objects or could
> break existing monitoring query).  While at it, I tried to add a
> little bit of documentation wrt. checksum monitoring.

and of course I forgot to attach the patch.

Attachment

pgsql-hackers by date:

Previous
From: ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Subject: Using the return value of strlcpy() and strlcat()
Next
From: Paul Ramsey
Date:
Subject: Re: Compressed TOAST Slicing