Re: Re: Industrial-Strength Logging - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Re: Industrial-Strength Logging |
Date | |
Msg-id | 4426.960050218@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Industrial-Strength Logging (Giles Lean <giles@nemeton.com.au>) |
Responses |
Re: Re: Industrial-Strength Logging
|
List | pgsql-hackers |
Giles Lean <giles@nemeton.com.au> writes: >> Yeah, let's have another logging discussion... :) > > [ good summary of different approaches: ] > (a)(i) standard error to file > (ii) standard error piped to a process > (b) named log file(s) > (c) syslogd > (d) database > I would recommend (a)(ii), with (a)(i) available for anyone who wants > it. (Someone who has high load 9-5 but who can shut down daily might > be happy writing directly to a log file, for example.) You mentioned the issue of trying to deal with out-of-disk-space errors for the log file, but there is another kind of resource exhaustion problem that should also be taken into account. Namely, inability to open the log file due to EMFILE (no kernel filetable slots left) errors. This is fresh in my mind because I just finished making some fixes to make Postgres more robust in the full-filetable scenario. It's quite easy for a Postgres installation to run the kernel out of filetable slots if the admin has set a large MaxBackends limit without increasing the kernel's NFILE parameter enough to cope. So this isn't a very farfetched scenario, and we ought to take care that our logging mechanism doesn't break down when it happens. You mentioned that case (b) has a popular variant of opening and closing the logfile for each message. I think this would be the most prone to EMFILE failures, since the backends wouldn't normally be holding the logfile open. In the other cases the logfile or log pipe is held open continually by each backend so there's no risk at that point. Of course, the downstream logging daemon in cases (a)(ii) and (c) might suffer EMFILE at the time that it's trying to rotate to a new logfile. I doubt we can expect that syslogd has a good strategy for coping with this :-(. If the daemon is of our own making, the first thought that comes to mind is to hold the previous logfile open until after we successfully open the new one. If we get a failure on opening the new file, we just keep logging into the old one, while periodically trying to rotate again. The recovery strategy for individual backends faced with EMFILE failures is to close inessential files until the open() request succeeds. (There are normally plenty of inessential open files, since most backend I/O goes through VFDs managed by fd.c, and any of those that are physically open can be closed at need.) If we use case (b) then a backend that finds itself unable to open a log file could try to recover that way. However there are two problems with it: one, we might be unable to log startup failures under EMFILE conditions (since there might well be no open VFDs in a newly-started backend, especially if the system is in filetable trouble), and two, there's some risk of circularity problems if fd.c is itself trying to write a log message and has to be called back by elog.c. Case (d), logging to a database table, would be OK in the face of EMFILE during normal operation, but again I worry about the prospect of being unable to log startup failures. (Actually, there's a more serious problem with it for startup failures: a backend cannot be expected to do database writes until it's pretty fully up to speed. Between that and the fact the postmaster can't write to tables either, I think we can reject case (d) for our purposes.) So from this point of view, it again seems that case (a)(i) or (a)(ii) is the best alternative, so long as the logging daemon is coded not to give up its handle for an old log file until it's successfully acquired a new one. Seems like the next step should be for someone to take a close look at the several available log-daemon packages and see which of them looks like the best bet for our purposes. (I assume there's no good reason to roll our own from scratch...) regards, tom lane
pgsql-hackers by date: