Re: Checksum errors in pg_stat_database - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Checksum errors in pg_stat_database
Date
Msg-id CAOBaU_bxJTyOdR2Up_yjoCs06wmF=e9YJKMx_DR=1P+h840Bqw@mail.gmail.com
Whole thread Raw
In response to Re: Checksum errors in pg_stat_database  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Checksum errors in pg_stat_database
List pgsql-hackers
On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus@hagander.net> wrote:
> >
> > On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >>
> >> Sorry, I have again new comments after a little bit more thinking.
> >> I'm wondering if we can do something about shared objects while we're
> >> at it.  They don't belong to any database, so it's a little bit
> >> orthogonal to this proposal, but it seems quite important to track
> >> error on those too!
> >>
> >> What about adding a new field in PgStat_GlobalStats for that?  We can
> >> use the same lastDir to easily detect such objects and slightly adapt
> >> sendFile again, which seems quite straightforward.
>
> > Question is then what number that should show -- only the checksum counter in non-database-fields, or the total
numberacross the cluster?
 
>
> I'd say only for non-database-fields errors, especially if we can
> reset each counters separately.  If necessary, we can add a new view
> to give a global overview of checksum errors for DBA convenience.

So, after reading current implementation, I don't think that
PgStat_GlobalStats is the right place.  It's already enough of a mess,
and clearly pg_stat_reset_shared('bgwriter') would not make any sense
if it did reset the shared relations checksum errors (though arguably
the fact that's it's resetting checkpointer stats right now is hardly
better), and handling a different target to reset part of
PgStat_GlobalStats counters would be an ugly kludge.

I'm considering adding a new PgStat_ChecksumStats for that purpose
instead, but I don't know if that's acceptable to do so in the last
commitfest.  It seems worthwhile to add it eventually, since we'll
probably end up having more things to report to users related to
checksum.  Online enabling of checksum could be the most immediate
potential target.

If that's acceptable, I was thinking this new stat could have those
fields with the first drop:

- number of non-db-related checksum checks done
- number of non-db-related checksum checks failed
- last stats reset

(and adding the number of checks for db-related blocks done with the
current checksum errors counter).  Maybe also adding a
pg_checksum_stats view that would summarise all the counters in one
place.

It'll obviously add some traffic to the stats collector, but I'd hope
not too much, since BufferAlloc shouldn't be that frequent, and
BASE_BACKUP reports stats only once per file.


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" testpending solution of its timing is (fwd)
Next
From: Dean Rasheed
Date:
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists