Re: Why our counters need to be time-based WAS: WIP: cross column correlation ... - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date
Msg-id 4D6BE3C6.1000609@agliodbs.com
Whole thread Raw
In response to Re: WIP: cross column correlation ...  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
List pgsql-hackers
> Well, what we have now is a bunch of counters in pg_stat_all_tables
> and pg_statio_all_tables. 

Right.   What I'm saying is those aren't good enough, and have never
been good enough.  Counters without a time basis are pretty much useless
for performance monitoring/management (Baron Schwartz has a blog post
talking about this, but I can't find it right now).

Take, for example, a problem I was recently grappling with for Nagios.
I'd like to do a check as to whether or not tables are getting
autoanalyzed often enough.  After all, autovac can fall behind, and we'd
want to be alerted of that.

The problem is, in order to measure whether or not autoanalyze is
behind, you need to count how many inserts,updates,deletes have happened
since the last autoanalyze.  pg_stat_user_tables just gives us the
counters since the last reset ... and the reset time isn't even stored
in PostgreSQL.

This means that, without adding external tools like pg_statsinfo, we
can't autotune autoanalyze at all.

There are quite a few other examples where the counters could contribute
to autotuning and DBA performance monitoring if only they were
time-based. As it is, they're useful for finding unused indexes and
that's about it.

One possibility, of course, would be to take pg_statsinfo and make it
part of core.  There's a couple disadvantages of that; (1) is the
storage and extra objects required, which would then require us to add
extra management routines as well.  (2) is that pg_statsinfo only stores
top-level view history, meaning that it wouldn't be very adaptable to
improvements we make in system views in the future.

On the other hand, anything which increases the size of pg_statistic
would be a nightmare.

One possible compromise solution might be to implement code for the
stats collector to automatically reset the stats at a given clock
interval.  If we combined this with keeping the reset time, and keeping
a snapshot of the stats from the last clock tick (and their reset time)
that would be "good enough" for most monitoring.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Snapshot synchronization, again...
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...