Robert Treat wrote:
> 1) Alert if checkpointing stops occuring within a reasonable time frame (note
> there are failure cases and normal use cases where this might occur) (also
> note I'll agree, this isn't common, but the results are pretty disatrous if
> it does happen)
What are the normal use cases where this would occur? I can't think of any.
Wrt. failure cases, there's a million things that can go wrong in a
system, and only few of them will give the symptom of "checkpoints
stopped happening", so I'm not excited about adding a facility to
monitor just that.
More hooks for monitoring purposes in general would be nice, and I would
like to see them exposed as SQL functions, but I'd like to see a much
bigger proposal for that.
> 2) Can be graphed over time (using rrdtool and others) for trending checkpoint
> activity
Hmm. You'd need the historical data to do that properly. In particular,
if two checkpoints happen between the polling interval, you'd miss that.
log_checkpoints=on, in CSV output, seems like a better approach for that.
> 3) Ease monitoring of checkpoints across pitr setups
How is that different from monitoring in a non-pitr setup?
> 4) Help determine if your checkpoints are being timeout driven or segment
> driven, or if you need to look at those settings
Seems like the log_checkpoints output is more useful for that, as it
directly tells you what triggered the checkpoint.
> 5) Determine the number of log files that will need to be replayed in the
> event of a crash
If you have a requirement for that, just set checkpoint_segments
accordingly. And again, you can get the information in the logs by
log_checkpoints=on already.
> 6) Helps give an indication on if you should enter a manual checkpoint before
> issuing a pg_start_backup call
If you want the backup to begin immediately, just do a manual checkpoint
unconditionally. It'll finish quickly if there hasn't been much activity
since the last one. We talked about adding a new variant of
pg_start_backup() that does that automatically (or rather, just hurries
the current checkpoint) in the 8.3 cycle, but didn't do it in the end.
Perhaps we should add that, after all?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com