On Tue, May 01, 2012 at 02:19:16PM +0100, Peter Geoghegan wrote:
> Currently the following informal categories of error are bunched
> together at ERROR severity:
>
> * Integrity constraint violations
> * Very serious situations, like running out of disk space
> * Serious disasters that often relate to hardware failure, like "xlog
> flush request %X/%X is not satisfied --- flushed only to %X/%X"
> * Errors that if seen relate to a bug within PostgreSQL, with obscure
> error messages, as from most of the elog calls within the planner, for
> example.
>
> The first category of error is something that the DBA will often see
> very frequently. The latter 3 are situations which I'd like to be
> woken up in the middle of the night to respond to. We ought to be
> facilitating monitoring tools (including very simple ones like grep),
> so that they can make this very important practical distinction. The
> hard part is replacing the severity level of many existing
> elog/ereport call sites, but that's not much of a problem, really.
I agree that some means to mechanically distinguish these cases would
constitute a significant boon for admin monitoring. Note, however, that the
same split appears at other severity levels:
FATAL, routine: terminating connection due to conflict with recovery
FATAL, critical: incorrect checksum in control file
WARNING, routine: nonstandard use of escape in a string literal
WARNING, critical: locallock table corrupted
We'd be adding at least three new severity levels to cover the necessary
messages by this approach.