Re: Log file monitoring and event notification - Mailing list pgsql-general

From Antman, Jason (CMG-Atlanta)
Subject Re: Log file monitoring and event notification
Date
Msg-id 534056CC.6090009@coxinc.com
Whole thread Raw
In response to Log file monitoring and event notification  (Andy Colson <andy@squeakycode.net>)
List pgsql-general
General thought:

It's entirely possible my current Postgres environment is missing
something (I'm an automation engineer, not a DBA - most of my postgres
knowledge has been learned on the job or from Google), but we actively
monitor the receive and replay lag (i.e. comparing
pg_current_xlog_location() on the master to
pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on
the slaves) and alert off of that. We don't use any logs for replication
alerts.

We *do*, however, monitor postgres logs for other things. We use Nagios
(actually Icinga) as our monitoring system, and there's a nice Perl
plugin available online called check_logfiles
(http://exchange.nagios.org/directory/Plugins/Log-Files/check_logfiles/details)
that handles alerting on regular expressions in a log file, and also
very nicely handles file rotation (even compression), and is highly
configurable (including perl hook scripts to run if a match is found).

In the easiest case (like if you're not using a real monitoring system),
you could just configure this script, run it however you want (cron?)
and if it exits non-zero, mail the output.

In terms of embedding things in Postgres, I'm a staunch believer that
for performance and reliability, something like alerting shouldn't be
embedded in the application itself but should be handled by an external
(and easily replace-able) component. It's easy enough to do with
logging_collector, or to do with syslog (AFAIK the worry about not
capturing everything is only if you're shipping syslog over the network,
not if you're running a syslogd on the same host as postgres and writing
the logs locally).

 From a systems management/monitoring standpoint, I'd much rather see
something in postgres that sends detailed, well-structured log messages
to a message queue than put the alerting logic in it (syslog works with
everything, but it's so horribly obsolete).

-Jason

On 04/05/14 11:47, Andy Colson wrote:
> Hi All.
>
> I've started using replication, and I'd like to monitor my logs for
> any errors or problems.  I don't want to do it manually, and I'm not
> interested in stats (a la PgBadger).
>
> What I'd like, is the instant PG logs: "FATAL: wal segment already
> removed" (or some such bad thing), I'd like to get an email.
>
> 1st: is anyone using a program that does something like this? What do
> you use?  How do you like it?
>
> My thinking has been along these lines:
>
>  + log to syslog doesnt really help, and I recall seeing somewhere
> "syslog may not capture everything".  I still have monitoring and log
> rotation problems.
>
>  + log to stderr and write my own collector works, but then I have to
> duplicate what logging_collector already does (rotating, truncating,
> age, size, etc).  Too much work.
>
>  + log with logging_collector, then write a thing to figure out what
> file its writing to and tail it, watch for rotation, etc. This is just
> messy.
>
> If there isn't a program already available (which I've searched for,
> believe me), I'd like to get feedback on extending logging_collector
> with some lua scriptable event notification.
>
> Lua is small, fast, and mostly easy to embed.  It would allow an admin
> to customize whatever kind of monitoring they want.  When an event
> matches logging_collector would spawn off a different app to handle
> the event notification.  The app would be launched in the background
> and forgotten about so that logging isn't delayed.
>
> I'm thinking:
>
> function checkLine(item)
>   if item:find('FATAL') then
>      launch('/usr/bin/mynotify.pl', item)
>   end
> end
>
> Logging_collector would then do something like (forgive the perl
> pseudo code):
>
> ... regular log file rotation stuff ..
> open OUT
> while ($line = <stderr>)
> {
>   checkLine($line);
>   print OUT $line;
> }
>
> ... etc, etc ...
>
> Lua could also have another handy events defined:
>     OnLogRotate(newFile)
>     OnStartup()
>     OnShutdown()
>
>
> Lua can also keep state, so maybe you dont want to email on the first
> FATAL, but on the third.
>
> local cc = 0
> function checkLine(item)
>   if item:find('FATAL') then
>      cc = cc + 1
>      if cc > 2 then
>        launch('/usr/bin/mynotify.pl', item)
>        cc = 0
>      end
>   end
> end
>
> Thoughts?
>
> -Andy
>
>


--

Jason Antman | Systems Engineer | CMGdigital
jason.antman@coxinc.com | p: 678-645-4155


pgsql-general by date:

Previous
From: Andy Colson
Date:
Subject: Log file monitoring and event notification
Next
From: John R Pierce
Date:
Subject: Re: SSD Drives