Re: Checksum errors in pg_stat_database - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Checksum errors in pg_stat_database
Date
Msg-id CABUevEwcfsiyvXGUQ=e4MrdFMmJc2P4Z955tUjHRsYvx43rV=g@mail.gmail.com
Whole thread Raw
In response to Re: Checksum errors in pg_stat_database  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Checksum errors in pg_stat_database
List pgsql-hackers
On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 9:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Mar 9, 2019 at 12:35 AM Magnus Hagander <magnus@hagander.net> wrote:
> >
> > On Mon, Mar 4, 2019 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >>
> >> On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote:
> >> >
> >> > It tracks things that happen in the general backends. Possibly we should also consider counting the errors actually found when running base backups? OTOH, that part of the code doesn't really track things like databases (as it operates just on the raw data directory underneath), so that implementation would definitely not be as clean...
> >>
> >> Sorry I just realized that I totally forgot this part of the thread.
> >>
> >> While it's true that we operate on raw directory, I see that sendDir()
> >> already setup a isDbDir var, and if this is true lastDir should
> >> contain the oid of the underlying database.  Wouldn't it be enough to
> >> call sendFile() using this, something like (untested):
> >>
> >> if (!sizeonly)
> >> - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true);
> >> + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true,
> >> isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);
> >>
> >> and accordingly report any checksum error from sendFile()?
> >
> > That seems it was easy enough. PFA an updated patch that does this, and also rebased so it doesn't conflict on oid.
> >

Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects while we're
at it.  They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!

What about adding a new field in PgStat_GlobalStats for that?  We can
use the same lastDir to easily detect such objects and slightly adapt
sendFile again, which seems quite straightforward.

Ah, didn't spot that one until after I pushed :/ Sorry about that.

Hmm. That's an interesting thought. And then add a column to pg_stat_bgwriter, I assume? (Which is an ever increasingly bad name for the view, but that's unrelated to this)

Question is then what number that should show -- only the checksum counter in non-database-fields, or the total number across the cluster?

--

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Checksum errors in pg_stat_database
Next
From: Julien Rouhaud
Date:
Subject: Re: Checksum errors in pg_stat_database