Thread: pgsql: Track block level checksum failures in pg_stat_database

pgsql: Track block level checksum failures in pg_stat_database

From
Magnus Hagander
Date:
Track block level checksum failures in pg_stat_database

This adds a column that counts how many checksum failures have occurred
on files belonging to a specific database. Both checksum failures
during normal backend processing and those created when a base backup
detects a checksum failure are counted.

Author: Magnus Hagander
Reviewed by: Julien Rouhaud

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/6b9e875f7286d8535bff7955e5aa3602e188e436

Modified Files
--------------
doc/src/sgml/monitoring.sgml         |  5 ++++
src/backend/catalog/system_views.sql |  1 +
src/backend/postmaster/pgstat.c      | 56 ++++++++++++++++++++++++++++++++++++
src/backend/replication/basebackup.c | 16 +++++++----
src/backend/storage/page/bufpage.c   |  3 ++
src/backend/utils/adt/pgstatfuncs.c  | 15 ++++++++++
src/include/catalog/catversion.h     |  2 +-
src/include/catalog/pg_proc.dat      |  4 +++
src/include/pgstat.h                 | 18 +++++++++++-
src/test/regress/expected/rules.out  |  1 +
10 files changed, 114 insertions(+), 7 deletions(-)


Re: pgsql: Track block level checksum failures in pg_stat_database

From
Tomas Vondra
Date:
On Sat, Mar 09, 2019 at 06:48:11PM +0000, Magnus Hagander wrote:
>Track block level checksum failures in pg_stat_database
>
>This adds a column that counts how many checksum failures have occurred
>on files belonging to a specific database. Both checksum failures
>during normal backend processing and those created when a base backup
>detects a checksum failure are counted.
>

It seems this commit forgot to add PgStat_MsgChecksumFailure to the
PgStat_Msg union.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgsql: Track block level checksum failures in pg_stat_database

From
Julien Rouhaud
Date:
On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sat, Mar 09, 2019 at 06:48:11PM +0000, Magnus Hagander wrote:
> >Track block level checksum failures in pg_stat_database
> >
> >This adds a column that counts how many checksum failures have occurred
> >on files belonging to a specific database. Both checksum failures
> >during normal backend processing and those created when a base backup
> >detects a checksum failure are counted.
> >
>
> It seems this commit forgot to add PgStat_MsgChecksumFailure to the
> PgStat_Msg union.

Oops, indeed.  That's embarrassing.  Trivial fix attached if that helps.

Attachment

Re: pgsql: Track block level checksum failures in pg_stat_database

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> It seems this commit forgot to add PgStat_MsgChecksumFailure to the
>> PgStat_Msg union.

> Oops, indeed.  That's embarrassing.  Trivial fix attached if that helps.

Seems like this exposes a generic weakness in the way pgstat.c is coded.
I'm inclined to adjust the switch logic in PgstatCollectorMain along
the lines of

-                    pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len);
+                    pgstat_recv_inquiry(&msg.msg_inquiry, len);

so that you *couldn't* forget to extend the union, as long as you
followed the existing code's style.

            regards, tom lane



Re: pgsql: Track block level checksum failures in pg_stat_database

From
Julien Rouhaud
Date:
On Tue, Apr 30, 2019 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> >> It seems this commit forgot to add PgStat_MsgChecksumFailure to the
> >> PgStat_Msg union.
>
> > Oops, indeed.  That's embarrassing.  Trivial fix attached if that helps.
>
> Seems like this exposes a generic weakness in the way pgstat.c is coded.
> I'm inclined to adjust the switch logic in PgstatCollectorMain along
> the lines of
>
> -                    pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len);
> +                    pgstat_recv_inquiry(&msg.msg_inquiry, len);
>
> so that you *couldn't* forget to extend the union, as long as you
> followed the existing code's style.

Right, I was also wondering why there wasn't compile error in such
case.  I'll send an updated patch tomorrow, as I usually regret any
code I write after midnight.



Re: pgsql: Track block level checksum failures in pg_stat_database

From
Julien Rouhaud
Date:
On Wed, May 1, 2019 at 12:48 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Apr 30, 2019 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Julien Rouhaud <rjuju123@gmail.com> writes:
> > > On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra
> > > <tomas.vondra@2ndquadrant.com> wrote:
> > >> It seems this commit forgot to add PgStat_MsgChecksumFailure to the
> > >> PgStat_Msg union.
> >
> > > Oops, indeed.  That's embarrassing.  Trivial fix attached if that helps.
> >
> > Seems like this exposes a generic weakness in the way pgstat.c is coded.
> > I'm inclined to adjust the switch logic in PgstatCollectorMain along
> > the lines of
> >
> > -                    pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len);
> > +                    pgstat_recv_inquiry(&msg.msg_inquiry, len);
> >
> > so that you *couldn't* forget to extend the union, as long as you
> > followed the existing code's style.
>
> I'll send an updated patch tomorrow

Here's an updated version.  It turns out that PgStat_MsgTempFile had
also been forgotten and never noticed.

Attachment

Re: pgsql: Track block level checksum failures in pg_stat_database

From
Magnus Hagander
Date:


On Wed, May 1, 2019 at 9:56 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, May 1, 2019 at 12:48 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Apr 30, 2019 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Julien Rouhaud <rjuju123@gmail.com> writes:
> > > On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra
> > > <tomas.vondra@2ndquadrant.com> wrote:
> > >> It seems this commit forgot to add PgStat_MsgChecksumFailure to the
> > >> PgStat_Msg union.
> >
> > > Oops, indeed.  That's embarrassing.  Trivial fix attached if that helps.
> >
> > Seems like this exposes a generic weakness in the way pgstat.c is coded.
> > I'm inclined to adjust the switch logic in PgstatCollectorMain along
> > the lines of
> >
> > -                    pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len);
> > +                    pgstat_recv_inquiry(&msg.msg_inquiry, len);
> >
> > so that you *couldn't* forget to extend the union, as long as you
> > followed the existing code's style.
>
> I'll send an updated patch tomorrow

Here's an updated version.  It turns out that PgStat_MsgTempFile had
also been forgotten and never noticed.

Looks good to me in general

I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I missed a part of the discussion? Was that a change intended to be part of it?

Re: pgsql: Track block level checksum failures in pg_stat_database

From
Julien Rouhaud
Date:
On Wed, May 1, 2019 at 12:04 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> Looks good to me in general
>
> I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I
misseda part of the discussion? Was that a change intended to be part of it?
 

That wasn't discussed, I should have mentioned it sorry.  I just
changed it for consistency while at it, as it seemed a little bit
ambiguous or error prone:

PgStat_MsgAutovacStart -> msg_autovacuum
PgStat_MsgVacuum -> msg_vacuum

As the member of the union weren't used anywhere, I thought it's ok to
rename it now they're used.  I'm not strongly attached to this change,
so feel free to discard it.



Re: pgsql: Track block level checksum failures in pg_stat_database

From
Magnus Hagander
Date:


On Wed, May 1, 2019 at 12:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, May 1, 2019 at 12:04 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> Looks good to me in general
>
> I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I missed a part of the discussion? Was that a change intended to be part of it?

That wasn't discussed, I should have mentioned it sorry.  I just
changed it for consistency while at it, as it seemed a little bit
ambiguous or error prone:

PgStat_MsgAutovacStart -> msg_autovacuum
PgStat_MsgVacuum -> msg_vacuum

As the member of the union weren't used anywhere, I thought it's ok to
rename it now they're used.  I'm not strongly attached to this change,
so feel free to discard it.

Yeah, that does make sense. I just wanted to make sure I wasn't missing some discussion about it.

Pushed, thanks.

--

Re: pgsql: Track block level checksum failures in pg_stat_database

From
Julien Rouhaud
Date:
On Wed, May 1, 2019 at 12:35 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Wed, May 1, 2019 at 12:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>>
>> On Wed, May 1, 2019 at 12:04 PM Magnus Hagander <magnus@hagander.net> wrote:
>> >
>> > Looks good to me in general
>> >
>> > I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I
misseda part of the discussion? Was that a change intended to be part of it?
 
>>
>> That wasn't discussed, I should have mentioned it sorry.  I just
>> changed it for consistency while at it, as it seemed a little bit
>> ambiguous or error prone:
>>
>> PgStat_MsgAutovacStart -> msg_autovacuum
>> PgStat_MsgVacuum -> msg_vacuum
>>
>> As the member of the union weren't used anywhere, I thought it's ok to
>> rename it now they're used.  I'm not strongly attached to this change,
>> so feel free to discard it.
>
>
> Yeah, that does make sense. I just wanted to make sure I wasn't missing some discussion about it.

ok :)

> Pushed, thanks.

Thanks!