Thread: pgsql: Track block level checksum failures in pg_stat_database
Track block level checksum failures in pg_stat_database This adds a column that counts how many checksum failures have occurred on files belonging to a specific database. Both checksum failures during normal backend processing and those created when a base backup detects a checksum failure are counted. Author: Magnus Hagander Reviewed by: Julien Rouhaud Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/6b9e875f7286d8535bff7955e5aa3602e188e436 Modified Files -------------- doc/src/sgml/monitoring.sgml | 5 ++++ src/backend/catalog/system_views.sql | 1 + src/backend/postmaster/pgstat.c | 56 ++++++++++++++++++++++++++++++++++++ src/backend/replication/basebackup.c | 16 +++++++---- src/backend/storage/page/bufpage.c | 3 ++ src/backend/utils/adt/pgstatfuncs.c | 15 ++++++++++ src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_proc.dat | 4 +++ src/include/pgstat.h | 18 +++++++++++- src/test/regress/expected/rules.out | 1 + 10 files changed, 114 insertions(+), 7 deletions(-)
On Sat, Mar 09, 2019 at 06:48:11PM +0000, Magnus Hagander wrote: >Track block level checksum failures in pg_stat_database > >This adds a column that counts how many checksum failures have occurred >on files belonging to a specific database. Both checksum failures >during normal backend processing and those created when a base backup >detects a checksum failure are counted. > It seems this commit forgot to add PgStat_MsgChecksumFailure to the PgStat_Msg union. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sat, Mar 09, 2019 at 06:48:11PM +0000, Magnus Hagander wrote: > >Track block level checksum failures in pg_stat_database > > > >This adds a column that counts how many checksum failures have occurred > >on files belonging to a specific database. Both checksum failures > >during normal backend processing and those created when a base backup > >detects a checksum failure are counted. > > > > It seems this commit forgot to add PgStat_MsgChecksumFailure to the > PgStat_Msg union. Oops, indeed. That's embarrassing. Trivial fix attached if that helps.
Attachment
Julien Rouhaud <rjuju123@gmail.com> writes: > On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> It seems this commit forgot to add PgStat_MsgChecksumFailure to the >> PgStat_Msg union. > Oops, indeed. That's embarrassing. Trivial fix attached if that helps. Seems like this exposes a generic weakness in the way pgstat.c is coded. I'm inclined to adjust the switch logic in PgstatCollectorMain along the lines of - pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len); + pgstat_recv_inquiry(&msg.msg_inquiry, len); so that you *couldn't* forget to extend the union, as long as you followed the existing code's style. regards, tom lane
On Tue, Apr 30, 2019 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Julien Rouhaud <rjuju123@gmail.com> writes: > > On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra > > <tomas.vondra@2ndquadrant.com> wrote: > >> It seems this commit forgot to add PgStat_MsgChecksumFailure to the > >> PgStat_Msg union. > > > Oops, indeed. That's embarrassing. Trivial fix attached if that helps. > > Seems like this exposes a generic weakness in the way pgstat.c is coded. > I'm inclined to adjust the switch logic in PgstatCollectorMain along > the lines of > > - pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len); > + pgstat_recv_inquiry(&msg.msg_inquiry, len); > > so that you *couldn't* forget to extend the union, as long as you > followed the existing code's style. Right, I was also wondering why there wasn't compile error in such case. I'll send an updated patch tomorrow, as I usually regret any code I write after midnight.
On Wed, May 1, 2019 at 12:48 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Apr 30, 2019 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Julien Rouhaud <rjuju123@gmail.com> writes: > > > On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra > > > <tomas.vondra@2ndquadrant.com> wrote: > > >> It seems this commit forgot to add PgStat_MsgChecksumFailure to the > > >> PgStat_Msg union. > > > > > Oops, indeed. That's embarrassing. Trivial fix attached if that helps. > > > > Seems like this exposes a generic weakness in the way pgstat.c is coded. > > I'm inclined to adjust the switch logic in PgstatCollectorMain along > > the lines of > > > > - pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len); > > + pgstat_recv_inquiry(&msg.msg_inquiry, len); > > > > so that you *couldn't* forget to extend the union, as long as you > > followed the existing code's style. > > I'll send an updated patch tomorrow Here's an updated version. It turns out that PgStat_MsgTempFile had also been forgotten and never noticed.
Attachment
On Wed, May 1, 2019 at 9:56 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, May 1, 2019 at 12:48 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Apr 30, 2019 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Julien Rouhaud <rjuju123@gmail.com> writes:
> > > On Tue, Apr 30, 2019 at 6:33 PM Tomas Vondra
> > > <tomas.vondra@2ndquadrant.com> wrote:
> > >> It seems this commit forgot to add PgStat_MsgChecksumFailure to the
> > >> PgStat_Msg union.
> >
> > > Oops, indeed. That's embarrassing. Trivial fix attached if that helps.
> >
> > Seems like this exposes a generic weakness in the way pgstat.c is coded.
> > I'm inclined to adjust the switch logic in PgstatCollectorMain along
> > the lines of
> >
> > - pgstat_recv_inquiry((PgStat_MsgInquiry *) &msg, len);
> > + pgstat_recv_inquiry(&msg.msg_inquiry, len);
> >
> > so that you *couldn't* forget to extend the union, as long as you
> > followed the existing code's style.
>
> I'll send an updated patch tomorrow
Here's an updated version. It turns out that PgStat_MsgTempFile had
also been forgotten and never noticed.
I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I missed a part of the discussion? Was that a change intended to be part of it?
--
On Wed, May 1, 2019 at 12:04 PM Magnus Hagander <magnus@hagander.net> wrote: > > Looks good to me in general > > I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I misseda part of the discussion? Was that a change intended to be part of it? That wasn't discussed, I should have mentioned it sorry. I just changed it for consistency while at it, as it seemed a little bit ambiguous or error prone: PgStat_MsgAutovacStart -> msg_autovacuum PgStat_MsgVacuum -> msg_vacuum As the member of the union weren't used anywhere, I thought it's ok to rename it now they're used. I'm not strongly attached to this change, so feel free to discard it.
On Wed, May 1, 2019 at 12:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, May 1, 2019 at 12:04 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> Looks good to me in general
>
> I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I missed a part of the discussion? Was that a change intended to be part of it?
That wasn't discussed, I should have mentioned it sorry. I just
changed it for consistency while at it, as it seemed a little bit
ambiguous or error prone:
PgStat_MsgAutovacStart -> msg_autovacuum
PgStat_MsgVacuum -> msg_vacuum
As the member of the union weren't used anywhere, I thought it's ok to
rename it now they're used. I'm not strongly attached to this change,
so feel free to discard it.
Yeah, that does make sense. I just wanted to make sure I wasn't missing some discussion about it.
Pushed, thanks.
On Wed, May 1, 2019 at 12:35 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Wed, May 1, 2019 at 12:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote: >> >> On Wed, May 1, 2019 at 12:04 PM Magnus Hagander <magnus@hagander.net> wrote: >> > >> > Looks good to me in general >> > >> > I don't see the reasoning behind changing the name from msg_autovacuum to msg_autovacuum_start anywhere, perhaps I misseda part of the discussion? Was that a change intended to be part of it? >> >> That wasn't discussed, I should have mentioned it sorry. I just >> changed it for consistency while at it, as it seemed a little bit >> ambiguous or error prone: >> >> PgStat_MsgAutovacStart -> msg_autovacuum >> PgStat_MsgVacuum -> msg_vacuum >> >> As the member of the union weren't used anywhere, I thought it's ok to >> rename it now they're used. I'm not strongly attached to this change, >> so feel free to discard it. > > > Yeah, that does make sense. I just wanted to make sure I wasn't missing some discussion about it. ok :) > Pushed, thanks. Thanks!