On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 9:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sat, Mar 9, 2019 at 12:35 AM Magnus Hagander <magnus@hagander.net> wrote: > > > > On Mon, Mar 4, 2019 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > >> > >> On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote: > >> > > >> > It tracks things that happen in the general backends. Possibly we should also consider counting the errors actually found when running base backups? OTOH, that part of the code doesn't really track things like databases (as it operates just on the raw data directory underneath), so that implementation would definitely not be as clean... > >> > >> Sorry I just realized that I totally forgot this part of the thread. > >> > >> While it's true that we operate on raw directory, I see that sendDir() > >> already setup a isDbDir var, and if this is true lastDir should > >> contain the oid of the underlying database. Wouldn't it be enough to > >> call sendFile() using this, something like (untested): > >> > >> if (!sizeonly) > >> - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true); > >> + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, > >> isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid); > >> > >> and accordingly report any checksum error from sendFile()? > > > > That seems it was easy enough. PFA an updated patch that does this, and also rebased so it doesn't conflict on oid. > >
Sorry, I have again new comments after a little bit more thinking. I'm wondering if we can do something about shared objects while we're at it. They don't belong to any database, so it's a little bit orthogonal to this proposal, but it seems quite important to track error on those too!
What about adding a new field in PgStat_GlobalStats for that? We can use the same lastDir to easily detect such objects and slightly adapt sendFile again, which seems quite straightforward.
Ah, didn't spot that one until after I pushed :/ Sorry about that.
Hmm. That's an interesting thought. And then add a column to pg_stat_bgwriter, I assume? (Which is an ever increasingly bad name for the view, but that's unrelated to this)
Question is then what number that should show -- only the checksum counter in non-database-fields, or the total number across the cluster?