Thread: Industrial-Strength Logging

Industrial-Strength Logging

From
Tim Holloway
Date:
It's rude, it's crude, but it's finally here!

The mods for the logging subsystem have been posted to pgsql-patches
for your amusement and edification.

Or whatever.

At the moment it does nothing but start itself up, channel a message
every time a backend starts, and leave some slop hanging around the
system when you shutdown, but I hope that its potential will shine through anyway.
  regards,
    Tim Holloway


Re: Industrial-Strength Logging

From
Giles Lean
Date:
------- Blind-Carbon-Copy

To: pgsql-hackers@postgresql.org
Subject: Re: Industrial-Strength Logging
In-reply-to: <Pine.LNX.4.21.0006021547090.406-100000@localhost.localdomain>
Date: Sat, 03 Jun 2000 22:59:34 +1000
Message-ID: <16919.960037174@nemeton.com.au>
From: Giles Lean <giles@nemeton.com.au>


On Sat, 3 Jun 2000 01:48:33 +0200 (CEST)  Peter Eisentraut wrote:

> Yeah, let's have another logging discussion... :)

Mmm, seems popular.  There was a mention on -ports and -general a
couple of weeks ago, and here we are (were) on -patches.  I'm moving
this discussion to -hackers (hope that's a good choice) since that is
where Tim Holloway's proposals were discussed late last year.

A start point I found in the archives for Tim's proposal is:
http://www.postgresql.org/mhonarc/pgsql-hackers/1999-10/msg00747.html

I'm not proposing anything that advanced. In particular, I'm not
discussing the -content- of log messages at all.  For now it would be
nice to see the logging mechanism improved; changing or improving the
content can be another project.

I don't discuss the current logging implementation except to note that
the backend postgres processes' logging depends on whether the process
is running under postmaster or not, has a controlling terminal or not,
whether a -o option was provided, and whether postgres was compiled to
use syslog.  Maybe that functionality can be simplified a bit ... ;-)

One more thing I don't discuss is how the debug log level is set.
Certainly something more sophisticated and dynamically variable than
the current command line method would be nice, but that too can be a
discussion for another day; it isn't much related to -how- the error
messages are tucked away.

Typical logging methods
=======================

(a)(i)  write to standard error with redirection to a file

    Pro:
    - what the code (mostly) does currently
    - very easy to set up, just redirect standard error at startup
    - efficient and low overhead

    Con:
    - can't rotate log files
    - problematic when there is an I/O error or the filesystem the log
      file is on fills up

(a)(ii) write to standard error, with standard error piped to another
        process

    Pro:
    - administrator chooses between (i) and (ii) and can change this
      via shutdown and restart, no recompilation needed
    - no code changes to backend programs
    - clean separation of functionality
    - choice of backend logging programs
      o Bernstein's logtools
      o Apache's rotatelogs
      o swatch
      o logsurfer
      o ...

    Con:
    - backend can block if the logging process is not reading log
      messages fast enough (can the backends generate enough data for
      this to be a problem in practice?)
    - reliability of message logging is dependent on the log
      process
    - log messages can be lost if the log process aborts, or is not
      started (solution: make portmaster responsible for starting and
      restartin the log process)

(b) write to named log file(s)

    One way to allow rotation of log files is for the backend
    processes to know what log files they write to, and to have them
    open them directly without shell redirection.  There is some
    support for this with the postgres -o option, but no support
    for rotating these files that I have seen so far.

    In the simplest case, the backend processes open the log file at
    when they start and close it when they exit.  This allows rotation
    of the log file by moving it and waiting for all the currently
    running backend processes to finish.

    Pro:
    - relatively simple code change
    - still efficient and low overhead

    Con:
    - backend processes can run for some time, and postmaster runs
      indefinitely, so at least postmaster needs to know about log
      file rotation
    - doesn't help much for I/O errors or full filesystem

    To address these limitations some applications open their log file
    for each message and then close it afterward:

    Pro:
    - nothing holds the log file open for long
    - still efficient and low overhead for the actual writing the log
      file

    Con:
    - all error logging has to be via a log routine.  This would be
      elog(), but there is some use of fprintf(stderr, ...) around the
      place that would want to be changed

    - there will be some efficiency hit for the open() and close()
      calls. This won't be -too- bad since the operating system's
      inode cache (or local equivalent) should contain an entry for
      the log file, but it is still two more system calls.

    Another way to handle rotation with long running processes is to
    signal them to re-open their log file, like syslogd is managed:

    Pro:
    - it's a solution

    Con:
    - more code in the backend processes
    - more communication with the backend processes
    - more complication

(c) log via some logging facility such as syslogd

    Pro:
    - people know syslogd
    - syslogd "looks like" the right answer

    Con:
    - messages are sent to it typically via UDP, so it's easy for them
      to get lost
    - syslogd is not coded robustly, and can hang, stop listening to
      input sources, stop writing to output files, and put wrong
      timestamps on messages
    - using non-UDP transport to syslogd is non-portable (some systems
      allow Unix domain sockets, some named pipes, some neither)
    - syslogd can be hard to secure (e.g. to stop it listening on
      particular network interfaces)
    - do you know what your syslog(3) routine does when it gets a
      write(2) error?
    - it is not supported by all (any?) vendors to have syslogd write
      to a pipe, so on-the-fly processing of error messages is not
      possible; they have to be written to the filesystem
    - Unix specific(?)

(d) log into the database

    Pro:
    - we've got a nice tool, let's use it

    Con:
    - chicken-and-egg problem of logging serious/fatal errors
    - infinite recursion problem of logging messages that cause more errors
    - error messages are predominantly text messages, very susceptible
      to processing with perl, grep and friends.  It isn't crystal
      clear that having them inside the database helps a lot
    - using the database is friendly to a DBA, but quite possibly not
      to the system adminstrator who very possibly knows no SQL but
      who (on a Unix system, at least) has many text log files to look
      after
    - postmaster doesn't access tables according to the previous
      discussion

Recommendations
===============

From here on it's definitely all IMHO!  Your mileage may vary.

To restate the methods I discussed:

    (a)(i)  standard error to file
       (ii) standard error piped to a process
    (b) named log file(s)
    (c) syslogd
    (d) database

I would recommend (a)(ii), with (a)(i) available for anyone who wants
it.  (Someone who has high load 9-5 but who can shut down daily might
be happy writing directly to a log file, for example.)

Using the database is too complex/fragile, using syslogd fails too
often, and having every process involved manage log files as in (b)
seems like more work than is necessary unless/until logging via a
single process turns out to be a bottleneck.

There has been a suggestion to use the Apache "rotatelogs" program to
do the logging.  That program is perhaps a little -too- simplistic:

- - no timestamp option
- - no way to make ad-hoc requests for log file rotation
- - write errors cause it to exit (... oops!)
- - no fsync option

On the other hand the program is only 100 lines.  Writing a robust
version isn't rocket science, and if postmaster is taught to start it,
then pg_ctl looks like a natural place to control log file rotation
and possibly one day log levels as well.

Regards,

Giles

P.S. Yes, I've some coding time to offer.  I'll wait and see what is
liked before I do anything though. ;-)





------- End of Blind-Carbon-Copy

Re: Re: Industrial-Strength Logging

From
Tom Lane
Date:
Giles Lean <giles@nemeton.com.au> writes:
>> Yeah, let's have another logging discussion... :)
>
> [ good summary of different approaches: ]
>     (a)(i)  standard error to file
>        (ii) standard error piped to a process
>     (b) named log file(s)
>     (c) syslogd
>     (d) database
> I would recommend (a)(ii), with (a)(i) available for anyone who wants
> it.  (Someone who has high load 9-5 but who can shut down daily might
> be happy writing directly to a log file, for example.)

You mentioned the issue of trying to deal with out-of-disk-space errors
for the log file, but there is another kind of resource exhaustion
problem that should also be taken into account.  Namely, inability to
open the log file due to EMFILE (no kernel filetable slots left) errors.
This is fresh in my mind because I just finished making some fixes to
make Postgres more robust in the full-filetable scenario.  It's quite
easy for a Postgres installation to run the kernel out of filetable
slots if the admin has set a large MaxBackends limit without increasing
the kernel's NFILE parameter enough to cope.  So this isn't a very
farfetched scenario, and we ought to take care that our logging
mechanism doesn't break down when it happens.

You mentioned that case (b) has a popular variant of opening and closing
the logfile for each message.  I think this would be the most prone to
EMFILE failures, since the backends wouldn't normally be holding the
logfile open.  In the other cases the logfile or log pipe is held open
continually by each backend so there's no risk at that point.  Of
course, the downstream logging daemon in cases (a)(ii) and (c) might
suffer EMFILE at the time that it's trying to rotate to a new logfile.
I doubt we can expect that syslogd has a good strategy for coping with
this :-(.  If the daemon is of our own making, the first thought that
comes to mind is to hold the previous logfile open until after we
successfully open the new one.  If we get a failure on opening the new
file, we just keep logging into the old one, while periodically trying
to rotate again.

The recovery strategy for individual backends faced with EMFILE failures
is to close inessential files until the open() request succeeds.  (There
are normally plenty of inessential open files, since most backend I/O
goes through VFDs managed by fd.c, and any of those that are physically
open can be closed at need.)  If we use case (b) then a backend that
finds itself unable to open a log file could try to recover that way.
However there are two problems with it: one, we might be unable to log
startup failures under EMFILE conditions (since there might well be no
open VFDs in a newly-started backend, especially if the system is in
filetable trouble), and two, there's some risk of circularity problems
if fd.c is itself trying to write a log message and has to be called
back by elog.c.

Case (d), logging to a database table, would be OK in the face of EMFILE
during normal operation, but again I worry about the prospect of being
unable to log startup failures.  (Actually, there's a more serious
problem with it for startup failures: a backend cannot be expected to do
database writes until it's pretty fully up to speed.  Between that and
the fact the postmaster can't write to tables either, I think we can
reject case (d) for our purposes.)

So from this point of view, it again seems that case (a)(i) or (a)(ii)
is the best alternative, so long as the logging daemon is coded not to
give up its handle for an old log file until it's successfully acquired
a new one.


Seems like the next step should be for someone to take a close look at
the several available log-daemon packages and see which of them looks
like the best bet for our purposes.  (I assume there's no good reason
to roll our own from scratch...)
        regards, tom lane


Re: Re: Industrial-Strength Logging

From
Giles Lean
Date:
On Sat, 03 Jun 2000 12:36:58 -0400  Tom Lane wrote:

> You mentioned the issue of trying to deal with out-of-disk-space errors
> for the log file, but there is another kind of resource exhaustion
> problem that should also be taken into account.  Namely, inability to
> open the log file due to EMFILE (no kernel filetable slots left)
> errors.

s/EMFILE/ENFILE/, but yes.

> I doubt we can expect that syslogd has a good strategy for coping with
> this :-(.
    if ((f->f_file = open(p, O_WRONLY|O_APPEND, 0)) < 0) {        f->f_type = F_UNUSED;        logerror(p);

'nuff said.

> If the daemon is of our own making, the first thought that
> comes to mind is to hold the previous logfile open until after we
> successfully open the new one.  If we get a failure on opening the new
> file, we just keep logging into the old one, while periodically trying
> to rotate again.

Costs an extra file descriptor, but it's only one and it's temporary.
I can't see anything better to do.

> Seems like the next step should be for someone to take a close look at
> the several available log-daemon packages and see which of them looks
> like the best bet for our purposes.  (I assume there's no good reason
> to roll our own from scratch...)

Suggestions anyone?  I like Bernstein's tools, but they're not freely
distributable enough to be integrated into Postgres.  The Apache
program would be an OK starting place ... if 100 lines of not quite
right code is better than a blank page.

The other tools I mentioned -- swatch and logsurfer -- are really
analysis programs, and while someone might use them for on-the-fly
data reduction they're not really what we want as a default.  Anyone?

It wouldn't hurt to discuss the requirements a little bit more here,
too.  There is a compile time option to postgres to add timestamps.
Should the log program do that instead?  Should the log writing
program have any responsibility for filtering?

Bernstein's "multilog" can do this sort of thing, although I don't
expect the interface to be to many peoples' liking:
   http://cr.yp.to/daemontools/multilog.html

Regards,

Giles

P.S. I sent two copies of the message that Tom was replying to to
-hackers; I saw none come back.  Very peculiar, since the one was sent
from an address subscribed to pgsql-loophole and one from an address
actually subscribed to the list.


Re: Industrial-Strength Logging

From
Kovacs Zoltan Sandor
Date:
I realized that 7.0.2 still doesn't have documentation about 'database
recovery'. Searching the PostgreSQL web server, I found worthy comments.
What are the results of this discussion? My poor idea is to create a text
file which made from the debugging messages (via syslog and pg_options)
and then, after a crash, resend all the queries from the text file. I have
no idea to send the original time to the database if a recovery is needed
(but this is only the first problem, I'm sure there will be many). What
solution do you suggest? Does anybody plan to write the documentation
about this?

TIA,
Zoltan