Thread: Feature request: separate logging

Feature request: separate logging

From
otheus uibk
Date:
A glaring weakness in Postgresql for production systems is that the administrator has no way of controlling what types of logs go where. There are at least two types of logs: errors and statement logs. (I could also add: connection, syntax error, query duration, audit). It has becomes increasingly important in the Linux world, with its over reliance on systemd, that we admins be able to distinguish between these log outputs. Systemd wants to control all the daemons and services run on a Linux host. To do this effectively, it needs to capture the stdout and stderr or possibly parse the external logfile of the daemon it spawns. The benefits of systemd's journald subsystem include being able to automatically identify daemons that are failing to start properly. (The merits of systemd are beyond the scope of this discussion; it is now the all-but-ubiquitous standard of linux distributions, and has become nearly intractable). 

The Postgresql startup process could greatly benefit from systemd. As it is now, PGDG distributes postgresql with init scripts which really do very little to check if postgresql bombed shortly after startup. I have improved upon that script to do some smart detection of a failed startup, but the result is having two different files to monitor. On the one hand, I want to use postgresql's built-in file or CSV logging. On the other, I need to check the output of the startup process. These logs have different formats, especially for the smart DBA who wants to customize the log prefix with a more usable timestamp and other fields. Logging to external files is nice because postgresql rotates the logs for us automatically. (External log rotation is problematic and risky because it requires a configuration reload which may lead to undesirable side-effects. Right?) 

One alternative is to capture everything into the local logging system, or to send all to stdout/stderr and capture this with journald (or runit's svlogd or something). But then you have the following problem: if I enable statement-level logging or set log_min_duration_statement=0, the in-memory journald will quickly be overrun with statement logs. The journald subsystem will then become useless to the other daemons and subsystems.

While pgbadger has some nice features, it doesn't really solve the problem of allowing postgresql to be used with systemd+journald. 

What I do today is to configure postgresql to write csvlogs. Stdout/stderr are captured by journald. A custom perl script with the Text::CSV module and tail -F semantics continuously processes the csvlog file, ignores query, dml, and detail log lines,  and sends the rest via syslog() (which journald then handles).  

It's not the right way.

I would like to see postgresql to have the ability to 

   1. Write to a csvlog with one set of selectors 
   2. Write to stdout/stderr a different set of selectors (no statement, no autovacuum, etc) using a purely line-oriented output that 

       2.1. has the kind of detail contained in the CSV.  Currently, the log-prefix option does not offer some of the information provided in the CSV logs. Really, the CSV log should simply be an implementation of the log-prefix.
       2.2. Collapses multi-lined queries into one line (newlines and tabs are escaped with backslashes or the x1B character). 
  
Finally, if these changes can be implemented, is it impossible to backport them to prior versions, say 9.1 and up? If I wrote a patch, under what conditions would the patch be accepted for inclusion in official releases of older versions?


--

Re: Feature request: separate logging

From
Guillaume Lelarge
Date:

Le 18 nov. 2016 2:03 PM, "otheus uibk" <otheus.uibk@gmail.com> a écrit :
>
> A glaring weakness in Postgresql for production systems is that the administrator has no way of controlling what types of logs go where. There are at least two types of logs: errors and statement logs. (I could also add: connection, syntax error, query duration, audit). It has becomes increasingly important in the Linux world, with its over reliance on systemd, that we admins be able to distinguish between these log outputs. Systemd wants to control all the daemons and services run on a Linux host. To do this effectively, it needs to capture the stdout and stderr or possibly parse the external logfile of the daemon it spawns. The benefits of systemd's journald subsystem include being able to automatically identify daemons that are failing to start properly. (The merits of systemd are beyond the scope of this discussion; it is now the all-but-ubiquitous standard of linux distributions, and has become nearly intractable). 
>
> The Postgresql startup process could greatly benefit from systemd. As it is now, PGDG distributes postgresql with init scripts which really do very little to check if postgresql bombed shortly after startup. I have improved upon that script to do some smart detection of a failed startup, but the result is having two different files to monitor. On the one hand, I want to use postgresql's built-in file or CSV logging. On the other, I need to check the output of the startup process. These logs have different formats, especially for the smart DBA who wants to customize the log prefix with a more usable timestamp and other fields. Logging to external files is nice because postgresql rotates the logs for us automatically. (External log rotation is problematic and risky because it requires a configuration reload which may lead to undesirable side-effects. Right?) 
>
> One alternative is to capture everything into the local logging system, or to send all to stdout/stderr and capture this with journald (or runit's svlogd or something). But then you have the following problem: if I enable statement-level logging or set log_min_duration_statement=0, the in-memory journald will quickly be overrun with statement logs. The journald subsystem will then become useless to the other daemons and subsystems.
>
> One solution is proposed in these forums: https://www.postgresql.org/message-id/flat/etPan.53397e77.643c9869.1a0%40palos#etPan.53397e77.643c9869.1a0@palos
> While pgbadger has some nice features, it doesn't really solve the problem of allowing postgresql to be used with systemd+journald. 
>
> What I do today is to configure postgresql to write csvlogs. Stdout/stderr are captured by journald. A custom perl script with the Text::CSV module and tail -F semantics continuously processes the csvlog file, ignores query, dml, and detail log lines,  and sends the rest via syslog() (which journald then handles).  
>
> It's not the right way.
>
> I would like to see postgresql to have the ability to 
>
>    1. Write to a csvlog with one set of selectors 
>    2. Write to stdout/stderr a different set of selectors (no statement, no autovacuum, etc) using a purely line-oriented output that 
>
>        2.1. has the kind of detail contained in the CSV.  Currently, the log-prefix option does not offer some of the information provided in the CSV logs. Really, the CSV log should simply be an implementation of the log-prefix.
>        2.2. Collapses multi-lined queries into one line (newlines and tabs are escaped with backslashes or the x1B character). 
>   
> Finally, if these changes can be implemented, is it impossible to backport them to prior versions, say 9.1 and up? If I wrote a patch, under what conditions would the patch be accepted for inclusion in official releases of older versions?
>

Only bug fixes are accepted in older releases. What you're proposing isn't a bug fix.

Re: Feature request: separate logging

From
Michael Paquier
Date:
On Sun, Nov 20, 2016 at 12:56 PM, Guillaume Lelarge
<guillaume@lelarge.info> wrote:
> Le 18 nov. 2016 2:03 PM, "otheus uibk" <otheus.uibk@gmail.com> a écrit :
>> Finally, if these changes can be implemented, is it impossible to backport
>> them to prior versions, say 9.1 and up? If I wrote a patch, under what
>> conditions would the patch be accepted for inclusion in official releases of
>> older versions?
>>
>
> Only bug fixes are accepted in older releases. What you're proposing isn't a
> bug fix.

To put it in other words, if you are willing to write a patch that
would be accepted by upstream, the first version where it could land
is Postgres 10. Currently this version is still in development and the
last commit fest will be in March. See more details here:
https://wiki.postgresql.org/wiki/Submitting_a_Patch

If you are willing to get something done in older versions, it may be
interesting to look at the hook in elog.c and try to implement what
you are looking for as plugin loaded by the server at startup using
shared_preload_libraries. Reading roughly what you are writing above,
I guess that it could be possible: the elog hook offers ways to
redirect message strings generated by the server the way you want.

I wrote such a thing some time ago to generate logs with a JSON
format, you may want to look at that to give you more ideas:
https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
--
Michael


Re: Feature request: separate logging

From
Greg Stark
Date:
On 18 November 2016 at 13:00, otheus uibk <otheus.uibk@gmail.com> wrote:
> What I do today is to configure postgresql to write csvlogs. Stdout/stderr
> are captured by journald. A custom perl script with the Text::CSV module and
> tail -F semantics continuously processes the csvlog file, ignores query,
> dml, and detail log lines,  and sends the rest via syslog() (which journald
> then handles).

Postgres supports syslog directly and syslogd supports directing logs
into various destinations. If you want to filter the logs and send
them to different servers you can interpose a remote-syslogd server to
do exactly what you want

I think using a pipe or writing to a file and then having a program
parse the text file back into messages and try to parse the fields out
is fundamentally just huge waste of programmer time and cycles as well
as error prone and susceptible to security problems. Much better to
just have options to have Postgres generate logs in the right format
to begin with and send them with the right protocol to begin with.

>    1. Write to a csvlog with one set of selectors
>    2. Write to stdout/stderr a different set of selectors (no statement, no
> autovacuum, etc)

Being able to send different messages differnent places isn't a bad
idea. But everyone's going to have a different idea of what should go
in which bucket so this will need more thought about the detail.

Perhaps we could get away with just using the error class (the first
two characters of the sql error code, see
src/backend/utils/errcodes.h) but that doesn't help with warnings and
lower level messages. And some of those warnings are pretty important
operational tips like raising checkpoint parameters or autovacuum
parameters.

>        2.1. has the kind of detail contained in the CSV.  Currently, the
> log-prefix option does not offer some of the information provided in the CSV
> logs. Really, the CSV log should simply be an implementation of the
> log-prefix.
>        2.2. Collapses multi-lined queries into one line (newlines and tabs
> are escaped with backslashes or the x1B character).

CSV specifies exactly how to handle newlines and quoting and if you're
not happy with that format -- and I would agree with you -- there are
a myriad of other standard formats such as JSON and msgpack. There's
no need to invent an almost-CSV with most of the problems of CSV
except one. One question immediately arises -- how do you plan to
escape the x1B character? (And before you say it's unlikely to appear
in the data consider that one of the main uses for csv logs is to load
them into Postgres so...)

I feel your pain, I'm trying to get logstash or fluentd working here
too and I'm amazed they don't have any correct CSV parser. It seems
like such a basic requirement for something designed to handle logs so
it's quite mysterious to me. Both of them have the same dumpster fire
of a multiline parser that depends on recognizing continuation lines
with a regexp.


> Finally, if these changes can be implemented, is it impossible to backport
> them to prior versions, say 9.1 and up? If I wrote a patch, under what
> conditions would the patch be accepted for inclusion in official releases of
> older versions?

The only way to support older versions would be to publish it
separately as an extension like the jsonlog extension. There's a hook
for logging so it should be possible. But it might not be easy. The
existing jsonlog extension has some quirky bits to deal with messages
at startup for example.


-- 
greg