Thread: [HACKERS] WIP: Separate log file for extension
Attached is a draft patch to allow extension to write log messages to a separate file. It introduces a concept of a "log stream". The extension's shared library gets its stream assigned by calling this function from _PG_init() my_stream_id = get_log_stream("my_extension", &my_log_stream); Then it's supposed to change some of its attributes adjust_log_stream_attr(&stream->filename, "my_extension.log"); and to use the stream id in ereport() calls ereport(LOG, (errmsg("Hello world"), errstream(my_stream_id))); The EXEC_BACKEND mechanism makes initialization of the log streams by postmaster child processes non-trivial. I decided to extend save_backend_variables() and restore_backend_variables() accordingly. Maybe someone has better idea. pgaudit seems to be the most obvious use case for this enhancement, but it might be useful for many other extensions. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:
Attached is a draft patch to allow extension to write log messages to a
separate file.
I like the idea a lot. I'm not so sure about the approach.
How will this play with syslog? csvlog? etc?
I wonder if a level of indirection is appropriate here, where extensions (or postgres subsystems, even) provide a log stream label. Then the logging backed takes care of using that appropriately for the logging mechanism in use; for logging to file that'd generally be separate files. Same for CSVlog. Other mechanisms could be left as stubs initially.
So the outcome would be the same, just without the assumption of specific file name and output mechanism baked in.
Craig Ringer <craig@2ndquadrant.com> wrote: > On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote: > > How will this play with syslog? csvlog? etc? There's nothing special about csvlog: the LogStream structure has a "destination" field, so if particular extension wants this kind of output, it simply sets the LOG_DESTINATION_CSVLOG bit here. LOG_DESTINATION_SYSLOG is a problem because (AFAIK) a single process can maintain no more than one connection to the syslog server. I don't like the idea of closing the current connection and opening a different one whenever the next ereport() is associated with a different stream than the previous was. As for LOG_DESTINATION_EVENTLOG, I haven't checked yet. > I wonder if a level of indirection is appropriate here, where extensions (or > postgres subsystems, even) provide a log stream label. Then the logging > backed takes care of using that appropriately for the logging mechanism in > use; for logging to file that'd generally be separate files. Same for > CSVlog. Other mechanisms could be left as stubs initially. > > So the outcome would be the same, just without the assumption of specific > file name and output mechanism baked in. The example I shown in my previous email was the simplest case I could think of. But it does not mean that the file name has to be hard-coded in the extension. Instead of setting the LogStream.filename field, you can pass a pointer to this field to DefineCustomStringVariable() function, so the specific GUC can control the value. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska <ah@cybertec.at> wrote: > Attached is a draft patch to allow extension to write log messages to a > separate file. Does it allow postgres core code to write log messages to multiple log files as well? I can imagine a use case where we want to write log messages to separate log files by error level or purpose (server log and SQL log etc). Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska <ah@cybertec.at> wrote: > > Attached is a draft patch to allow extension to write log messages to a > > separate file. > > Does it allow postgres core code to write log messages to multiple log > files as well? I can imagine a use case where we want to write log > messages to separate log files by error level or purpose (server log > and SQL log etc). At low level it should work as long as several log streams are reserved for the core code. However, as every single stream needs multiple variables, I have no idea how to configure those additional streams w/o adding many new GUC variables to the core. Also the discussion what should (not) be logged separate would probably be difficult. Specifically to log SQL statements to a separate file, I think (but not verified) that you can write an extension that reserves a stream for itself and also uses emit_log_hook to recognize SQL statements among the logged messages. If the hook adjusts the syslogger_stream field of ErrorData accordingly, the message should end up in the separate file. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
On 08/28/2017 11:23 AM, Antonin Houska wrote: > Craig Ringer <craig@2ndquadrant.com> wrote: > >> On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote: >> >> How will this play with syslog? csvlog? etc? > > There's nothing special about csvlog: the LogStream structure has a > "destination" field, so if particular extension wants this kind of output, it > simply sets the LOG_DESTINATION_CSVLOG bit here. > I assume Craig's point was more about CSVLOG requiring log_collector=on. So what will happen if the PostgreSQL is started without the collector, and an extension attempts to use LOG_DESTINATION_CSVLOG? Or will it start a separate collector for each such extension? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote:
-- Attached is a draft patch to allow extension to write log messages to a
separate file. It introduces a concept of a "log stream". The extension's
shared library gets its stream assigned by calling this function from
_PG_init()
my_stream_id = get_log_stream("my_extension", &my_log_stream);
I like this idea in general.
Then it's supposed to change some of its attributes
adjust_log_stream_attr(&stream->filename, "my_extension.log");
This, however, seems to be wrong.
The logfile name does not belong in the extension, it belongs in the configuration file. I think the extension should set it's "stream id" or whatever you want to call it, and then it should be possible to control in postgresql.conf where that log is sent.
Also, what if this extension is loaded on demand in a session and not via shared_preload_libraries? It looks like the syslogger only gets the list of configured streams when it starts?
In short, I think the solution should be more generic, and not "just for extensions".
I completely missed this thread when I did my quick-wip at https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com#CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com -- some of the changes made were close enough that I got the two confused :) Based on the feedback of that one, have you done any performance checks?
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 08/28/2017 11:23 AM, Antonin Houska wrote: > > Craig Ringer <craig@2ndquadrant.com> wrote: > > > >> On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote: > >> > >> How will this play with syslog? csvlog? etc? > > > > There's nothing special about csvlog: the LogStream structure has a > > "destination" field, so if particular extension wants this kind of output, it > > simply sets the LOG_DESTINATION_CSVLOG bit here. > > > > I assume Craig's point was more about CSVLOG requiring log_collector=on. > So what will happen if the PostgreSQL is started without the collector, > and an extension attempts to use LOG_DESTINATION_CSVLOG? Or will it > start a separate collector for each such extension? The patch does not try to change this behavior. So there'll be no csvlog file and only the plain (not csv-formatted) message will be written to console. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Magnus Hagander <magnus@hagander.net> wrote: > > On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote: > > I like this idea in general. > > Then it's supposed to change some of its attributes > > > adjust_log_stream_attr(&stream->filename, "my_extension.log"); > > This, however, seems to be wrong. > > The logfile name does not belong in the extension, it belongs in the > configuration file. I think the extension should set it's "stream id" or > whatever you want to call it, and then it should be possible to control in > postgresql.conf where that log is sent. Doesn't the last paragraph of https://www.postgresql.org/message-id/11412.1503912190%40localhost address your concerns? > Also, what if this extension is loaded on demand in a session and not via > shared_preload_libraries? It looks like the syslogger only gets the list of > configured streams when it starts? Yes, the syslogger gets the list of streams only when it starts, so the extension that wants to use this feature needs to provide the file information via shared_preload_libraries. I consider it sufficient because various existing logging-related GUCs also can't be changed on-the-fly. > In short, I think the solution should be more generic, and not "just for extensions". o.k. Any idea about dividing the streams into categories? Should they for example correspond somehow to categories of GUC variables? > I completely missed this thread when I did my quick-wip at > https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com#CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com > -- some of the changes made were close enough that I got the two confused :) > Based on the feedback of that one, have you done any performance checks? I don't expect mere routing of messages into multiple files to bring any overhead. I'll run some tests, just out of curiosity. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote: > Magnus Hagander <magnus@hagander.net> wrote: > > > > On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote: > > > > > I like this idea in general. > > > > Then it's supposed to change some of its attributes > > > > > adjust_log_stream_attr(&stream->filename, "my_extension.log"); > > > > This, however, seems to be wrong. > > > > The logfile name does not belong in the extension, it belongs in the > > configuration file. I think the extension should set it's "stream id" or > > whatever you want to call it, and then it should be possible to control in > > postgresql.conf where that log is sent. > > Doesn't the last paragraph of > > https://www.postgresql.org/message-id/11412.1503912190%40localhost > > address your concerns? Besides a new version of the patch, an example extension is attached that uses the feature. > > Also, what if this extension is loaded on demand in a session and not via > > shared_preload_libraries? It looks like the syslogger only gets the list of > > configured streams when it starts? > > Yes, the syslogger gets the list of streams only when it starts, so the > extension that wants to use this feature needs to provide the file information > via shared_preload_libraries. I consider it sufficient because various > existing logging-related GUCs also can't be changed on-the-fly. > > > In short, I think the solution should be more generic, and not "just for extensions". statement_log.diff demonstrates how the feature can be used by various subsystems of PG core. Please consider it an example rather than part of the "separate log patch". Even if there were no other design questions, there's too much copy & paste in guc.c. I have no good idea right now how to improve this part. > o.k. Any idea about dividing the streams into categories? Should they for > example correspond somehow to categories of GUC variables? > > > I completely missed this thread when I did my quick-wip at > > https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com#CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com > > -- some of the changes made were close enough that I got the two confused :) > > Based on the feedback of that one, have you done any performance checks? > > I don't expect mere routing of messages into multiple files to bring any > overhead. I'll run some tests, just out of curiosity. After having read the thread on your patch I think that the reason you were asked to evaluate performance was that your patch can possibly make syslogger a bottleneck. In contrast, my patch does not prevent user from disabling the syslogger if it (the syslogger) seems to cause performance issues. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
Attachment
Antonin Houska <ah@cybertec.at> writes: > After having read the thread on your patch I think that the reason you were > asked to evaluate performance was that your patch can possibly make syslogger > a bottleneck. In contrast, my patch does not prevent user from disabling the > syslogger if it (the syslogger) seems to cause performance issues. Just to clarify that: we know that in workloads that emit lots of log output, the log collector *already is* a bottleneck; there are reports that some people can't use it because it's too slow. See e.g. towards the end of this thread: https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj%2BBQZjpvmRurdw%40mail.gmail.com and particularly the referenced thread from 2011. (I seem to recall other reports but didn't have much luck finding them.) I'm quite concerned by the proposed patch, and not even so much any performance issues; what bothers me is that it adds complexity into a portion of the system where we can ill afford it. Bugs in the logging mechanism compromise one's ability to have any faith in tracking down other bugs. The difficulty of reconfiguring the logger on the fly is another reason to not want more configuration options for it. On the whole, therefore, I'd just as soon not go there --- especially seeing that there's been little field demand for such features. regards, tom lane
Hi Antonin,
1. check-world run error as following:
for extra in contrib/adminpack; do make -C '../..'/$extra DESTDIR='/db/pgmaster/postgres'/tmp_install install >>'/db/pgmaster/postgres'/tmp_install/log/install.log || exit; done
In file included from ../../src/include/c.h:1135:0,
from ../../src/include/postgres.h:47,
from adminpack.c:15:
adminpack.c: In function ‘convert_and_check_filename’:
adminpack.c:84:38: error: ‘Log_directory’ undeclared (first use in this function)
(!logAllowed || !is_absolute_path(Log_directory) ||
2. In function get_log_stream
The following code
if (strcmp(id, stream->id))
ereport(ERROR,
(errmsg("log stream with id \"%s\" already exists", id)));
should be " if (strcmp(id,stream->id)==0)" ?
3. In function pipeThread
csvlog_file can't be found but referred by the following statements:
if (ftell(syslog_file) >= Log_RotationSize * 1024L ||
(csvlog_file != NULL &&
4. In logfile_open function
LogStream *stream = &log_streams[stream_id];
int file_mode = stream->file_mode;
It may be better to be as following:
int file_mode = log_streams[stream_id].file_mode
1. check-world run error as following:
for extra in contrib/adminpack; do make -C '../..'/$extra DESTDIR='/db/pgmaster/postgres'/tmp_install install >>'/db/pgmaster/postgres'/tmp_install/log/install.log || exit; done
In file included from ../../src/include/c.h:1135:0,
from ../../src/include/postgres.h:47,
from adminpack.c:15:
adminpack.c: In function ‘convert_and_check_filename’:
adminpack.c:84:38: error: ‘Log_directory’ undeclared (first use in this function)
(!logAllowed || !is_absolute_path(Log_directory) ||
2. In function get_log_stream
The following code
if (strcmp(id, stream->id))
ereport(ERROR,
(errmsg("log stream with id \"%s\" already exists", id)));
should be " if (strcmp(id,stream->id)==0)" ?
3. In function pipeThread
csvlog_file can't be found but referred by the following statements:
if (ftell(syslog_file) >= Log_RotationSize * 1024L ||
(csvlog_file != NULL &&
4. In logfile_open function
LogStream *stream = &log_streams[stream_id];
int file_mode = stream->file_mode;
It may be better to be as following:
int file_mode = log_streams[stream_id].file_mode
--
Regards,
Jing Wang
Fujitsu Australia
Hi Antonin, On 1/10/18 5:38 PM, Tom Lane wrote: > Antonin Houska <ah@cybertec.at> writes: >> After having read the thread on your patch I think that the reason you were >> asked to evaluate performance was that your patch can possibly make syslogger >> a bottleneck. In contrast, my patch does not prevent user from disabling the >> syslogger if it (the syslogger) seems to cause performance issues. > > Just to clarify that: we know that in workloads that emit lots of log > output, the log collector *already is* a bottleneck; there are reports > that some people can't use it because it's too slow. See e.g. towards > the end of this thread: > > https://www.postgresql.org/message-id/flat/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj%2BBQZjpvmRurdw%40mail.gmail.com > > and particularly the referenced thread from 2011. (I seem to recall other > reports but didn't have much luck finding them.) > > I'm quite concerned by the proposed patch, and not even so much any > performance issues; what bothers me is that it adds complexity into a > portion of the system where we can ill afford it. Bugs in the logging > mechanism compromise one's ability to have any faith in tracking down > other bugs. The difficulty of reconfiguring the logger on the fly > is another reason to not want more configuration options for it. > On the whole, therefore, I'd just as soon not go there --- especially > seeing that there's been little field demand for such features. I think this feature would be useful, especially for an extension like pgaudit. It's a request I hear fairly frequently. However, there doesn't seem to be consensus that this is a viable approach, so marked as Returned with Feedback for this CF. This may be too invasive a feature to be a good fit for the last PG11 CF in March but I hope you keep working on the idea. Regards, -- -David david@pgmasters.net