Thread: What do people like to monitor (or in other words, what might be nice in pgsnmpd)?

Work is beginning on pgsnmpd v 2.0, and I figured it would be a good
time to ask folks what they typically like to monitor, so we can make
sure pgsnmpd instruments it properly. The current version of pgsnmpd
supports something called RDBMS-MIB, which is a set of data designed
to be applicable to any relational database, so it doesn't get very
PostgreSQL-specific. The next version will augment that with
PGSQL-MIB, which we have yet to write.

PGSQL-MIB should contain data elements for, ideally, anything specific
to the database that someone could possibly want to monitor in a
generic PostgreSQL installation within reason. Things like CPU load,
available disk space, total system memory, etc. would not be included,
because they're not PostgreSQL specific, but things like CPU and
memory usage of individual PostgreSQL processes are very good
candidates for inclusion in PGSQL-MIB. Current plans have us including
SNMP representations of all the statistics tables as well as the
system catalogs, runtime information about PostgreSQL processes (such
as CPU and RAM usage), shared memory usage information, and
potentially mechanisms to easily include administrator-specified
queries and generate SNMP traps based on LISTEN/NOTIFY.

So please respond, if you feel so inclined, describing things you like
to monitor in your PostgreSQL instances as well as things you would
like to be able to easily monitor in a more ideal world. Many thanks,
and apologies for any breach of netiquette I may have committed in
posting to two lists simultaneously.

- Josh Tolley

Josh Tolley wrote:

> So please respond, if you feel so inclined, describing things you like
> to monitor in your PostgreSQL instances as well as things you would
> like to be able to easily monitor in a more ideal world.

I can think of a few things I'd like to be able to monitor...

Connection usage:
- total number of connections
- number of idle vs active connections
- total number per user/database
- number of idle vs active connections per user/database

I'm not entirely sure whether to split on user or on database or maybe both?

Also interesting: The number of queries that take more than an arbitraty
amount of time to complete. Maybe per user/database?
I suppose this number is only interesting on an uncongested database
server. Otherwise there will be queries passing that treshold that
normally wouldn't, because they have to wait for the real troublemakers
to finish.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

I'd like to know what the age of the oldest running transaction is.
i.e. hunt look out for old idle in transaction transactions that are
holding up vacuuming.

Info on the shared buffers like % used, % that hasn't been updated or
seen in x minutes / hours / days.

% used on various tablespaces

connection stats: how many clients connected, by what accounts, %
failed auths, stale connections harvested by tcp_keepalive timeouts...

I usually monitor blks_read and blks_hit (of block level stats), when
the latter is high
I see shared memory is doing a good job, when the former then it also
shows something

Also, database-wide number of commits and rollbacks (btw, Slony has a habit of
calling ROLLBACK when it done nothing -- I wonder if calling ROLLBACK instead
of COMMIT on a SELECT-only transaction is such a win?  It certainly blurrs the
image for me. ;)

And a number of clients waiting on a lock.

By the way, one nice thing to have could be counters which record how much
time did it take to load a page into shared memory (less than 1ms, <2ms, <4ms,
<8ms, <16m and so on. Could help fine-tuning things like vacuum cost/delay
and so on.  Seen it somewhere in Oraclish stats tables.

Regards,
    Dawid

On 8/2/07, Gavin M. Roy <gmr@myyearbook.com> wrote:
> Are you contemplating providing access to data that's currently not stored
> in the pg_ catalog tables?  I currently monitor the statio data,
> transactions per second, and active/idle backends.  Things that I think
> would be useful would be average query execution time, longest execution
> time, etc.  Other pie in the sky ideas would include current level of total
> bloat in a database, total size on disk of a database broken down by tables,
> indexes, etc.
>
> Regards,
>
> Gavin

My own goal is to have pgsnmpd able, as much as possible, to fill the
same role the set of scripts an arbitrary PostgreSQL DBA sets up on a
typical production server. That includes statistics tables and catalog
tables, but certainly isn't limited to just that. So doing things like
categorizing total sessions in interesting and useful ways (for
instance, # of idle connections, # of active connections, max
transaction length, etc.) are certainly within pgsnmpd's purview.

In short, all the suggestions you listed are useful, and provided the
framework allows us to get reasonably good values for them, worthy of
implementation in pgsnmpd. Thanks.

-Josh

Josh Tolley escribió:
> On 8/2/07, Gavin M. Roy <gmr@myyearbook.com> wrote:
> > Are you contemplating providing access to data that's currently not stored
> > in the pg_ catalog tables?  I currently monitor the statio data,
> > transactions per second, and active/idle backends.  Things that I think
> > would be useful would be average query execution time, longest execution
> > time, etc.  Other pie in the sky ideas would include current level of total
> > bloat in a database, total size on disk of a database broken down by tables,
> > indexes, etc.
>
> My own goal is to have pgsnmpd able, as much as possible, to fill the
> same role the set of scripts an arbitrary PostgreSQL DBA sets up on a
> typical production server. That includes statistics tables and catalog
> tables, but certainly isn't limited to just that. So doing things like
> categorizing total sessions in interesting and useful ways (for
> instance, # of idle connections, # of active connections, max
> transaction length, etc.) are certainly within pgsnmpd's purview.

More ideas: autovacuum metrics, for example how long since the last
vacuum of tables, age(pg_class.relfrozenxid), how many dead tuples there
are, pg_class.relpages (do tables shrink, grow or stay constant-size?),
etc.

--
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"La felicidad no es mañana. La felicidad es ahora"

Hmm.. also data such as what is the background writer currently doing, where are we at in checkpoint segments, how close to checkpoint timeouts are we, etc.

On 8/2/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Josh Tolley escribió:
> On 8/2/07, Gavin M. Roy <gmr@myyearbook.com> wrote:
> > Are you contemplating providing access to data that's currently not stored
> > in the pg_ catalog tables?  I currently monitor the statio data,
> > transactions per second, and active/idle backends.  Things that I think
> > would be useful would be average query execution time, longest execution
> > time, etc.  Other pie in the sky ideas would include current level of total
> > bloat in a database, total size on disk of a database broken down by tables,
> > indexes, etc.
>
> My own goal is to have pgsnmpd able, as much as possible, to fill the
> same role the set of scripts an arbitrary PostgreSQL DBA sets up on a
> typical production server. That includes statistics tables and catalog
> tables, but certainly isn't limited to just that. So doing things like
> categorizing total sessions in interesting and useful ways (for
> instance, # of idle connections, # of active connections, max
> transaction length, etc.) are certainly within pgsnmpd's purview.

More ideas: autovacuum metrics, for example how long since the last
vacuum of tables, age(pg_class.relfrozenxid), how many dead tuples there
are, pg_class.relpages (do tables shrink, grow or stay constant-size?),
etc.

--
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"La felicidad no es mañana. La felicidad es ahora"

 
On Wed, 2007-08-01 at 20:41 -0600, Josh Tolley wrote:
> So please respond, if you feel so inclined, describing things you like
> to monitor in your PostgreSQL instances as well as things you would
> like to be able to easily monitor in a more ideal world. Many thanks,
> and apologies for any breach of netiquette I may have committed in
> posting to two lists simultaneously.

I think there's also a related question here: can we develop
implementations of these measurements that satisfy a lot of DBAs?

For instance, when I measure idle transactions, I poll periodically for
any transactions that have been idle for more than 1 minute. That's
simple and probably useful to a lot of DBAs to catch certain types of
problems. This would probably be useful as a trap, or could be polled.

However, some of the ideas, like trying to come up with numbers that
represent the amount of time queries are waiting on locks, or the
behavior of checkpoints/bgwriter, aren't as obvious to me. If one person
posts their script to monitor one of these things, will other DBAs want
to use the same instrumentation, or would they end up reinventing it
anyway? Can the numbers be effectively graphed with something like
OpenNMS on a 5-minute poll interval, and maybe have effective thresholds
for notifications?

I think -- even aside from pgsnmpd -- a lot of people would be
interested in seeing a variety of monitoring/notification scripts used
by other DBAs.

Also, here are some relevant pgfoundry projects:
http://pgfoundry.org/projects/nagiosplugins/
http://pgfoundry.org/projects/pgtools/

Regards,
    Jeff Davis


Are you contemplating providing access to data that's currently not stored in the pg_ catalog tables?  I currently monitor the statio data, transactions per second, and active/idle backends.  Things that I think would be useful would be average query execution time, longest execution time, etc.  Other pie in the sky ideas would include current level of total bloat in a database, total size on disk of a database broken down by tables, indexes, etc.  

Regards,

Gavin

On 8/1/07, Josh Tolley <eggyknap@gmail.com> wrote:
Work is beginning on pgsnmpd v 2.0, and I figured it would be a good
time to ask folks what they typically like to monitor, so we can make
sure pgsnmpd instruments it properly. The current version of pgsnmpd
supports something called RDBMS-MIB, which is a set of data designed
to be applicable to any relational database, so it doesn't get very
PostgreSQL-specific. The next version will augment that with
PGSQL-MIB, which we have yet to write.

PGSQL-MIB should contain data elements for, ideally, anything specific
to the database that someone could possibly want to monitor in a
generic PostgreSQL installation within reason. Things like CPU load,
available disk space, total system memory, etc. would not be included,
because they're not PostgreSQL specific, but things like CPU and
memory usage of individual PostgreSQL processes are very good
candidates for inclusion in PGSQL-MIB. Current plans have us including
SNMP representations of all the statistics tables as well as the
system catalogs, runtime information about PostgreSQL processes (such
as CPU and RAM usage), shared memory usage information, and
potentially mechanisms to easily include administrator-specified
queries and generate SNMP traps based on LISTEN/NOTIFY.

So please respond, if you feel so inclined, describing things you like
to monitor in your PostgreSQL instances as well as things you would
like to be able to easily monitor in a more ideal world. Many thanks,
and apologies for any breach of netiquette I may have committed in
posting to two lists simultaneously.

- Josh Tolley

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly