Re: Expose checkpoint start/finish times into SQL. - Mailing list pgsql-patches

From Greg Smith
Subject Re: Expose checkpoint start/finish times into SQL.
Date
Msg-id Pine.GSO.4.64.0804032331110.13806@westnet.com
Whole thread Raw
In response to Re: Expose checkpoint start/finish times into SQL.  (Robert Treat <xzilla@users.sourceforge.net>)
Responses Re: Expose checkpoint start/finish times into SQL.
Re: Expose checkpoint start/finish times into SQL.
List pgsql-patches
On Thu, 3 Apr 2008, Robert Treat wrote:

> You can plug a single item graphed over time into things like rrdtool to
> get good trending information. And it's often easier to do this using
> sql interfaces to get the data than pulling it out of log files (almost
> like the db was designed for that :-)

The pg_stat_bgwriter value for buffers_checkpoint was intentionally
implemented in 8.3 such that it jumps in one big lump when the checkpoint
is done.  While it's not the ideal interface for what you're looking for,
the reason for that is to made it possible to build a "when was the last
checkpoint finished?" interface via some remote monitoring tool just by
determining the last time that the value jumped upwards.  You can easily
see them just by graphing that value, it shouldn't be too hard to teach
something with rrdtool guts to find them.

Since checkpoints have a fairly predictable duration in 8.3, as long as
you catch the start or end of them you can make a resonable guess where
the other side was.  The case you're trying to avoid here, the system
going a long time without checkpointing, can be implemented by looking for
a begin or end regularly, you don't need to track both.  As long as
there's a checkpoint finish "pulse" in buffers_checkpoint showing up
regularly you're fine.  The only situation I can think of where this might
be problematic is where the system has been idle enough to not have any
buffers to write at checkpoint time, but I recall a code path there where
checkpoints stop altogether unless there's been activity so even tracking
the time may not change that.

Ultimately a lot of the other questions you might ask (i.e. "how many
buffers have been written per hour by checkpoints?") require processing
the numbers in this way anyway, and I thought this implementation was good
enough to monitor the situation you're trying to avoid--presuming you're
using some sort of moderately powerful remote monitoring tool.  Theo's
patch would make it easier to answer with a simple command which has some
value; a little SQL in a cron job would be good enough to trigger an alert
rather than needing a real monitoring probe.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Expose checkpoint start/finish times into SQL.
Next
From: Bruce Momjian
Date:
Subject: Re: Expose checkpoint start/finish times into SQL.