Thread: [HACKERS] WIP: Separate log file for extension

[HACKERS] WIP: Separate log file for extension

From
Antonin Houska
Date:
Attached is a draft patch to allow extension to write log messages to a
separate file. It introduces a concept of a "log stream". The extension's
shared library gets its stream assigned by calling this function from
_PG_init()

    my_stream_id = get_log_stream("my_extension", &my_log_stream);

Then it's supposed to change some of its attributes

    adjust_log_stream_attr(&stream->filename, "my_extension.log");

and to use the stream id in ereport() calls

    ereport(LOG, (errmsg("Hello world"), errstream(my_stream_id)));

The EXEC_BACKEND mechanism makes initialization of the log streams by
postmaster child processes non-trivial. I decided to extend
save_backend_variables() and restore_backend_variables() accordingly. Maybe
someone has better idea.

pgaudit seems to be the most obvious use case for this enhancement, but it
might be useful for many other extensions.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] WIP: Separate log file for extension

From
Craig Ringer
Date:
On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:
Attached is a draft patch to allow extension to write log messages to a
separate file.

I like the idea a lot. I'm not so sure about the approach.

How will this play with syslog? csvlog? etc?

I wonder if a level of indirection is appropriate here, where extensions (or postgres subsystems, even) provide a log stream label. Then the logging backed takes care of using that appropriately for the logging mechanism in use; for logging to file that'd generally be separate files.  Same for CSVlog. Other mechanisms could be left as stubs initially.

So the outcome would be the same, just without the assumption of specific file name and output mechanism baked in.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] WIP: Separate log file for extension

From
Antonin Houska
Date:
Craig Ringer <craig@2ndquadrant.com> wrote:

> On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:
>
> How will this play with syslog? csvlog? etc?

There's nothing special about csvlog: the LogStream structure has a
"destination" field, so if particular extension wants this kind of output, it
simply sets the LOG_DESTINATION_CSVLOG bit here.

LOG_DESTINATION_SYSLOG is a problem because (AFAIK) a single process can
maintain no more than one connection to the syslog server. I don't like the
idea of closing the current connection and opening a different one whenever
the next ereport() is associated with a different stream than the previous
was.

As for LOG_DESTINATION_EVENTLOG, I haven't checked yet.

> I wonder if a level of indirection is appropriate here, where extensions (or
> postgres subsystems, even) provide a log stream label. Then the logging
> backed takes care of using that appropriately for the logging mechanism in
> use; for logging to file that'd generally be separate files. Same for
> CSVlog. Other mechanisms could be left as stubs initially.
>
> So the outcome would be the same, just without the assumption of specific
> file name and output mechanism baked in.

The example I shown in my previous email was the simplest case I could think
of. But it does not mean that the file name has to be hard-coded in the
extension. Instead of setting the LogStream.filename field, you can pass a
pointer to this field to DefineCustomStringVariable() function, so the
specific GUC can control the value.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



Re: [HACKERS] WIP: Separate log file for extension

From
Masahiko Sawada
Date:
On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska <ah@cybertec.at> wrote:
> Attached is a draft patch to allow extension to write log messages to a
> separate file.

Does it allow postgres core code to write log messages to multiple log
files as well? I can imagine a use case where we want to write log
messages to separate log files by error level or purpose (server log
and SQL log etc).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] WIP: Separate log file for extension

From
Antonin Houska
Date:
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska <ah@cybertec.at> wrote:
> > Attached is a draft patch to allow extension to write log messages to a
> > separate file.
>
> Does it allow postgres core code to write log messages to multiple log
> files as well? I can imagine a use case where we want to write log
> messages to separate log files by error level or purpose (server log
> and SQL log etc).

At low level it should work as long as several log streams are reserved for
the core code. However, as every single stream needs multiple variables, I
have no idea how to configure those additional streams w/o adding many new GUC
variables to the core.

Also the discussion what should (not) be logged separate would probably be
difficult.

Specifically to log SQL statements to a separate file, I think (but not
verified) that you can write an extension that reserves a stream for itself
and also uses emit_log_hook to recognize SQL statements among the logged
messages. If the hook adjusts the syslogger_stream field of ErrorData
accordingly, the message should end up in the separate file.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



Re: [HACKERS] WIP: Separate log file for extension

From
Tomas Vondra
Date:

On 08/28/2017 11:23 AM, Antonin Houska wrote:
> Craig Ringer <craig@2ndquadrant.com> wrote:
> 
>> On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:
>>
>> How will this play with syslog? csvlog? etc?
> 
> There's nothing special about csvlog: the LogStream structure has a
> "destination" field, so if particular extension wants this kind of output, it
> simply sets the LOG_DESTINATION_CSVLOG bit here.
> 

I assume Craig's point was more about CSVLOG requiring log_collector=on.
So what will happen if the PostgreSQL is started without the collector,
and an extension attempts to use LOG_DESTINATION_CSVLOG? Or will it
start a separate collector for each such extension?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Separate log file for extension

From
Magnus Hagander
Date:
On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote:
Attached is a draft patch to allow extension to write log messages to a
separate file. It introduces a concept of a "log stream". The extension's
shared library gets its stream assigned by calling this function from
_PG_init()

        my_stream_id = get_log_stream("my_extension", &my_log_stream);

I like this idea in general.

 
Then it's supposed to change some of its attributes

        adjust_log_stream_attr(&stream->filename, "my_extension.log");

This, however, seems to be wrong.

The logfile name does not belong in the extension, it belongs in the configuration file. I think the extension should set it's "stream id" or whatever you want to call it, and then it should be possible to control in postgresql.conf where that log is sent.

Also, what if this extension is loaded on demand in a session and not via shared_preload_libraries? It looks like the syslogger only gets the list of configured streams when it starts?

In short, I think the solution should be more generic, and not "just for extensions".

I completely missed this thread when I did my quick-wip at https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com#CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com  -- some of the changes made were close enough that I got the two confused :) Based on the feedback of that one, have you done any performance checks?


--

Re: [HACKERS] WIP: Separate log file for extension

From
Antonin Houska
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On 08/28/2017 11:23 AM, Antonin Houska wrote:
> > Craig Ringer <craig@2ndquadrant.com> wrote:
> >
> >> On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:
> >>
> >> How will this play with syslog? csvlog? etc?
> >
> > There's nothing special about csvlog: the LogStream structure has a
> > "destination" field, so if particular extension wants this kind of output, it
> > simply sets the LOG_DESTINATION_CSVLOG bit here.
> >
>
> I assume Craig's point was more about CSVLOG requiring log_collector=on.
> So what will happen if the PostgreSQL is started without the collector,
> and an extension attempts to use LOG_DESTINATION_CSVLOG? Or will it
> start a separate collector for each such extension?

The patch does not try to change this behavior. So there'll be no csvlog file
and only the plain (not csv-formatted) message will be written to console.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Separate log file for extension

From
Antonin Houska
Date:
Magnus Hagander <magnus@hagander.net> wrote:

> > On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote:

>
> I like this idea in general.
>
>  Then it's supposed to change some of its attributes
>
> >  adjust_log_stream_attr(&stream->filename, "my_extension.log");
>
> This, however, seems to be wrong.
>
> The logfile name does not belong in the extension, it belongs in the
> configuration file. I think the extension should set it's "stream id" or
> whatever you want to call it, and then it should be possible to control in
> postgresql.conf where that log is sent.

Doesn't the last paragraph of

https://www.postgresql.org/message-id/11412.1503912190%40localhost

address your concerns?

> Also, what if this extension is loaded on demand in a session and not via
> shared_preload_libraries? It looks like the syslogger only gets the list of
> configured streams when it starts?

Yes, the syslogger gets the list of streams only when it starts, so the
extension that wants to use this feature needs to provide the file information
via shared_preload_libraries. I consider it sufficient because various
existing logging-related GUCs also can't be changed on-the-fly.

> In short, I think the solution should be more generic, and not "just for extensions".

o.k. Any idea about dividing the streams into categories? Should they for
example correspond somehow to categories of GUC variables?

> I completely missed this thread when I did my quick-wip at
>
https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com#CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com
> -- some of the changes made were close enough that I got the two confused :)
> Based on the feedback of that one, have you done any performance checks?

I don't expect mere routing of messages into multiple files to bring any
overhead. I'll run some tests, just out of curiosity.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Separate log file for extension

From
Antonin Houska
Date:
Antonin Houska <ah@cybertec.at> wrote:

> Magnus Hagander <magnus@hagander.net> wrote:
>
> > > On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote:
>
> >
> > I like this idea in general.
> >
> >  Then it's supposed to change some of its attributes
> >
> > >  adjust_log_stream_attr(&stream->filename, "my_extension.log");
> >
> > This, however, seems to be wrong.
> >
> > The logfile name does not belong in the extension, it belongs in the
> > configuration file. I think the extension should set it's "stream id" or
> > whatever you want to call it, and then it should be possible to control in
> > postgresql.conf where that log is sent.
>
> Doesn't the last paragraph of
>
> https://www.postgresql.org/message-id/11412.1503912190%40localhost
>
> address your concerns?

Besides a new version of the patch, an example extension is attached that uses
the feature.

> > Also, what if this extension is loaded on demand in a session and not via
> > shared_preload_libraries? It looks like the syslogger only gets the list of
> > configured streams when it starts?
>
> Yes, the syslogger gets the list of streams only when it starts, so the
> extension that wants to use this feature needs to provide the file information
> via shared_preload_libraries. I consider it sufficient because various
> existing logging-related GUCs also can't be changed on-the-fly.
>
> > In short, I think the solution should be more generic, and not "just for extensions".

statement_log.diff demonstrates how the feature can be used by various
subsystems of PG core. Please consider it an example rather than part of the
"separate log patch". Even if there were no other design questions, there's
too much copy & paste in guc.c. I have no good idea right now how to improve
this part.

> o.k. Any idea about dividing the streams into categories? Should they for
> example correspond somehow to categories of GUC variables?
>
> > I completely missed this thread when I did my quick-wip at
> >
https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com#CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com
> > -- some of the changes made were close enough that I got the two confused :)
> > Based on the feedback of that one, have you done any performance checks?
>
> I don't expect mere routing of messages into multiple files to bring any
> overhead. I'll run some tests, just out of curiosity.

After having read the thread on your patch I think that the reason you were
asked to evaluate performance was that your patch can possibly make syslogger
a bottleneck. In contrast, my patch does not prevent user from disabling the
syslogger if it (the syslogger) seems to cause performance issues.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


Attachment

Re: [HACKERS] WIP: Separate log file for extension

From
Tom Lane
Date:
Antonin Houska <ah@cybertec.at> writes:
> After having read the thread on your patch I think that the reason you were
> asked to evaluate performance was that your patch can possibly make syslogger
> a bottleneck. In contrast, my patch does not prevent user from disabling the
> syslogger if it (the syslogger) seems to cause performance issues.

Just to clarify that: we know that in workloads that emit lots of log
output, the log collector *already is* a bottleneck; there are reports
that some people can't use it because it's too slow.  See e.g. towards
the end of this thread:

https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj%2BBQZjpvmRurdw%40mail.gmail.com

and particularly the referenced thread from 2011.  (I seem to recall other
reports but didn't have much luck finding them.)

I'm quite concerned by the proposed patch, and not even so much any
performance issues; what bothers me is that it adds complexity into a
portion of the system where we can ill afford it.  Bugs in the logging
mechanism compromise one's ability to have any faith in tracking down
other bugs.  The difficulty of reconfiguring the logger on the fly
is another reason to not want more configuration options for it.
On the whole, therefore, I'd just as soon not go there --- especially
seeing that there's been little field demand for such features.

            regards, tom lane


Re: [HACKERS] WIP: Separate log file for extension

From
Jing Wang
Date:
Hi Antonin,


1. check-world run error as following:

for extra in contrib/adminpack; do make -C '../..'/$extra DESTDIR='/db/pgmaster/postgres'/tmp_install install >>'/db/pgmaster/postgres'/tmp_install/log/install.log || exit; done
In file included from ../../src/include/c.h:1135:0,
                 from ../../src/include/postgres.h:47,
                 from adminpack.c:15:
adminpack.c: In function ‘convert_and_check_filename’:
adminpack.c:84:38: error: ‘Log_directory’ undeclared (first use in this function)
    (!logAllowed || !is_absolute_path(Log_directory) ||

2. In function get_log_stream

The following code
if (strcmp(id, stream->id))
    ereport(ERROR,
            (errmsg("log stream with id \"%s\" already exists", id)));

should be " if (strcmp(id,stream->id)==0)" ?

3. In function pipeThread

csvlog_file can't be found but referred by the following statements:

if (ftell(syslog_file) >= Log_RotationSize * 1024L ||
    (csvlog_file != NULL &&


4. In logfile_open function
    LogStream  *stream = &log_streams[stream_id];
    int         file_mode = stream->file_mode;

It may be better to be as following:
int         file_mode = log_streams[stream_id].file_mode


--
Regards,
Jing Wang
Fujitsu Australia

Re: Re: [HACKERS] WIP: Separate log file for extension

From
David Steele
Date:
Hi Antonin,

On 1/10/18 5:38 PM, Tom Lane wrote:
> Antonin Houska <ah@cybertec.at> writes:
>> After having read the thread on your patch I think that the reason you were
>> asked to evaluate performance was that your patch can possibly make syslogger
>> a bottleneck. In contrast, my patch does not prevent user from disabling the
>> syslogger if it (the syslogger) seems to cause performance issues.
> 
> Just to clarify that: we know that in workloads that emit lots of log
> output, the log collector *already is* a bottleneck; there are reports
> that some people can't use it because it's too slow.  See e.g. towards
> the end of this thread:
> 
> https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj%2BBQZjpvmRurdw%40mail.gmail.com
> 
> and particularly the referenced thread from 2011.  (I seem to recall other
> reports but didn't have much luck finding them.)
> 
> I'm quite concerned by the proposed patch, and not even so much any
> performance issues; what bothers me is that it adds complexity into a
> portion of the system where we can ill afford it.  Bugs in the logging
> mechanism compromise one's ability to have any faith in tracking down
> other bugs.  The difficulty of reconfiguring the logger on the fly
> is another reason to not want more configuration options for it.
> On the whole, therefore, I'd just as soon not go there --- especially
> seeing that there's been little field demand for such features.

I think this feature would be useful, especially for an extension like
pgaudit.  It's a request I hear fairly frequently.

However, there doesn't seem to be consensus that this is a viable
approach, so marked as Returned with Feedback for this CF.

This may be too invasive a feature to be a good fit for the last PG11 CF
in March but I hope you keep working on the idea.

Regards,
-- 
-David
david@pgmasters.net