Thread: Log file monitoring and event notification
Hi All. I've started using replication, and I'd like to monitor my logs for any errors or problems. I don't want to do it manually,and I'm not interested in stats (a la PgBadger). What I'd like, is the instant PG logs: "FATAL: wal segment already removed" (or some such bad thing), I'd like to get anemail. 1st: is anyone using a program that does something like this? What do you use? How do you like it? My thinking has been along these lines: + log to syslog doesnt really help, and I recall seeing somewhere "syslog may not capture everything". I still have monitoringand log rotation problems. + log to stderr and write my own collector works, but then I have to duplicate what logging_collector already does (rotating,truncating, age, size, etc). Too much work. + log with logging_collector, then write a thing to figure out what file its writing to and tail it, watch for rotation,etc. This is just messy. If there isn't a program already available (which I've searched for, believe me), I'd like to get feedback on extending logging_collectorwith some lua scriptable event notification. Lua is small, fast, and mostly easy to embed. It would allow an admin to customize whatever kind of monitoring they want. When an event matches logging_collector would spawn off a different app to handle the event notification. The appwould be launched in the background and forgotten about so that logging isn't delayed. I'm thinking: function checkLine(item) if item:find('FATAL') then launch('/usr/bin/mynotify.pl', item) end end Logging_collector would then do something like (forgive the perl pseudo code): ... regular log file rotation stuff .. open OUT while ($line = <stderr>) { checkLine($line); print OUT $line; } ... etc, etc ... Lua could also have another handy events defined: OnLogRotate(newFile) OnStartup() OnShutdown() Lua can also keep state, so maybe you dont want to email on the first FATAL, but on the third. local cc = 0 function checkLine(item) if item:find('FATAL') then cc = cc + 1 if cc > 2 then launch('/usr/bin/mynotify.pl', item) cc = 0 end end end Thoughts? -Andy
General thought: It's entirely possible my current Postgres environment is missing something (I'm an automation engineer, not a DBA - most of my postgres knowledge has been learned on the job or from Google), but we actively monitor the receive and replay lag (i.e. comparing pg_current_xlog_location() on the master to pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on the slaves) and alert off of that. We don't use any logs for replication alerts. We *do*, however, monitor postgres logs for other things. We use Nagios (actually Icinga) as our monitoring system, and there's a nice Perl plugin available online called check_logfiles (http://exchange.nagios.org/directory/Plugins/Log-Files/check_logfiles/details) that handles alerting on regular expressions in a log file, and also very nicely handles file rotation (even compression), and is highly configurable (including perl hook scripts to run if a match is found). In the easiest case (like if you're not using a real monitoring system), you could just configure this script, run it however you want (cron?) and if it exits non-zero, mail the output. In terms of embedding things in Postgres, I'm a staunch believer that for performance and reliability, something like alerting shouldn't be embedded in the application itself but should be handled by an external (and easily replace-able) component. It's easy enough to do with logging_collector, or to do with syslog (AFAIK the worry about not capturing everything is only if you're shipping syslog over the network, not if you're running a syslogd on the same host as postgres and writing the logs locally). From a systems management/monitoring standpoint, I'd much rather see something in postgres that sends detailed, well-structured log messages to a message queue than put the alerting logic in it (syslog works with everything, but it's so horribly obsolete). -Jason On 04/05/14 11:47, Andy Colson wrote: > Hi All. > > I've started using replication, and I'd like to monitor my logs for > any errors or problems. I don't want to do it manually, and I'm not > interested in stats (a la PgBadger). > > What I'd like, is the instant PG logs: "FATAL: wal segment already > removed" (or some such bad thing), I'd like to get an email. > > 1st: is anyone using a program that does something like this? What do > you use? How do you like it? > > My thinking has been along these lines: > > + log to syslog doesnt really help, and I recall seeing somewhere > "syslog may not capture everything". I still have monitoring and log > rotation problems. > > + log to stderr and write my own collector works, but then I have to > duplicate what logging_collector already does (rotating, truncating, > age, size, etc). Too much work. > > + log with logging_collector, then write a thing to figure out what > file its writing to and tail it, watch for rotation, etc. This is just > messy. > > If there isn't a program already available (which I've searched for, > believe me), I'd like to get feedback on extending logging_collector > with some lua scriptable event notification. > > Lua is small, fast, and mostly easy to embed. It would allow an admin > to customize whatever kind of monitoring they want. When an event > matches logging_collector would spawn off a different app to handle > the event notification. The app would be launched in the background > and forgotten about so that logging isn't delayed. > > I'm thinking: > > function checkLine(item) > if item:find('FATAL') then > launch('/usr/bin/mynotify.pl', item) > end > end > > Logging_collector would then do something like (forgive the perl > pseudo code): > > ... regular log file rotation stuff .. > open OUT > while ($line = <stderr>) > { > checkLine($line); > print OUT $line; > } > > ... etc, etc ... > > Lua could also have another handy events defined: > OnLogRotate(newFile) > OnStartup() > OnShutdown() > > > Lua can also keep state, so maybe you dont want to email on the first > FATAL, but on the third. > > local cc = 0 > function checkLine(item) > if item:find('FATAL') then > cc = cc + 1 > if cc > 2 then > launch('/usr/bin/mynotify.pl', item) > cc = 0 > end > end > end > > Thoughts? > > -Andy > > -- Jason Antman | Systems Engineer | CMGdigital jason.antman@coxinc.com | p: 678-645-4155
On Sat, Apr 5, 2014 at 8:47 AM, Andy Colson <andy@squeakycode.net> wrote:
Hi All.
I've started using replication, and I'd like to monitor my logs for any errors or problems. I don't want to do it manually, and I'm not interested in stats (a la PgBadger).
What I'd like, is the instant PG logs: "FATAL: wal segment already removed" (or some such bad thing), I'd like to get an email.
1st: is anyone using a program that does something like this? What do you use? How do you like it?
Tail 'n' Mail from Bucardo might be what you're after: http://bucardo.org/wiki/Tail_n_mail
On 04/05/2014 08:47 AM, Andy Colson wrote: > Hi All. > > I've started using replication, and I'd like to monitor my logs for > any errors or problems. I don't want to do it manually, and I'm not > interested in stats (a la PgBadger). > > What I'd like, is the instant PG logs: "FATAL: wal segment already > removed" (or some such bad thing), I'd like to get an email.... As one component of our monitoring we route logging through syslog which has all messages go to one location for use by PgBadger and friends and simultaneously any message with a WARN or higher priority goes to a separate temporary "postgresql_trouble.log." A cron-job checks this file periodically (currently we use 5-minutes) for content. If the file has content the script sends the appropriate emails and truncates the trouble log. Cheers, Steve