Thread: Patch review for logging hooks (CF 2012-01)

Patch review for logging hooks (CF 2012-01)

From

Marti Raudsepp

Date:

17 January 2012, 13:36:44

Hi,

Here's my review for the "logging hooks" patch queued for the 2012-01
CommitFest by Martin Pihlak.

Submission review
----
The patch is in context diff format and applies fine. Tests are not
included and don't seem practical for this patch.

More documentation would always be nice, but as other hooks aren't
documented either, it seems that's acceptable. The function prototype
and ErrorData structure are largely self-documenting.

Usability review
----
> Do we want that?
> Do we already have it?

The patch aims to allow custom processing of log messages. There are
two slightly overlapping features already in Postgres: csvlog and
syslog. Both seem to have their drawbacks; csvlog is written to disk
first, and has to be processed in the background in batches. Syslog is
pretty universal, but the output format is more limited, and details
of a single report are split up into several lines. Also passing extra
data for parsing via log_line_prefix seems hacky and failure prone.

The proposal also allows more flexible filtering/instrumentation of
log messages, which is not available with current methods.

> Does the patch actually implement that?

The hooked EmitErrorReport function is responsible for calling other
log handlers, so all relevant logging and context information is
available.

> Does it follow SQL spec, or the community-agreed behavior?

So far, a few people have stated that this sort of a logging hook is
desirable. Nobody has opposed it so far.

Feature test
----
I tested the hook using Martin's own pg_logforward extension:
https://github.com/mpihlak/pg_logforward

I verified that the hook works for messages with DEBUG, LOG, NOTICE,
ERROR and FATAL levels, from the backend as well as postmaster.

> Are there corner cases the author has failed to consider?

Whether the hook is called depends on both the 'client_min_messages'
and 'log_min_messages' settings because of this optimization in
errstart:/* Skip processing effort if non-error message will not be output */if (elevel < ERROR && !output_to_server &&
!output_to_client)   return false;
 

This will certainly be surprising to users. I think making it depend
*only* on output_to_server (and thus log_min_messages) would be more
predictable.

Coding review
----
> Does it follow the project coding guidelines?

There's a minor whitespace problem. When declaring variables, and the
data type is longer than 12 characters, just use 1 single space
character to delimit the variable name, instead of tab:

extern emit_log_hook_type    emit_log_hook;

> Will it work on Windows/BSD etc?

I see that other hooks are declared with PGDLLIMPORT. I presume that
this is necessary on Windows:
extern PGDLLIMPORT emit_log_hook_type emit_log_hook;

> Are the comments sufficient and accurate?

I think the hook warrants a comment that, whether the messages will be
seen, depends on the log_min_messages setting.

----

PS: This is my first patch review ever, any feedback would be welcome.

Regards,
Marti

Re: Patch review for logging hooks (CF 2012-01)

From

Martin Pihlak

Date:

17 January 2012, 17:07:46

On 01/17/2012 07:35 PM, Marti Raudsepp wrote:
> Here's my review for the "logging hooks" patch queued for the 2012-01
> CommitFest by Martin Pihlak.
>

Thanks for reviewing!

> There's a minor whitespace problem. When declaring variables, and the
> data type is longer than 12 characters, just use 1 single space
> character to delimit the variable name, instead of tab:
>

Fixed both in .h and .c

> I see that other hooks are declared with PGDLLIMPORT. I presume that
> this is necessary on Windows:
> extern PGDLLIMPORT emit_log_hook_type emit_log_hook;

Indeed, fixed now.

> I think the hook warrants a comment that, whether the messages will be
> seen, depends on the log_min_messages setting.
>

Comment added.

regards,
Martin

Attachment

logging-hooks.patch

Re: Patch review for logging hooks (CF 2012-01)

From

Marti Raudsepp

Date:

17 January 2012, 17:40:53

Hi!

On Tue, Jan 17, 2012 at 23:07, Martin Pihlak <martin.pihlak@gmail.com> wrote:
>> I think the hook warrants a comment that, whether the messages will be
>> seen, depends on the log_min_messages setting.
>
> Comment added.

Nice :)

It seems you missed a comment, that the current implementation is also
affected by client_min_messages. I think that being affected by
client-specific settings is surprising. I would put the
if(emit_log_hook) inside the existing if(edata->output_to_server)
condition. Unless you have some reason to do it this way?

Regards,
Marti

Re: Patch review for logging hooks (CF 2012-01)

From

Fujii Masao

Date:

17 January 2012, 21:57:06

On Wed, Jan 18, 2012 at 2:35 AM, Marti Raudsepp <marti@juffo.org> wrote:
>> Are there corner cases the author has failed to consider?

The hook can be executed by various processes since it's in
EmitErrorReport(). OTOH, log messages are written to the log
file by one process like syslogger (if you use csvlog or stderr)
or syslog process (if you use syslog). So ISTM that there is no
guarantee that the order of log messages processed by the
hook is same as that of messages written to the log file. For
example, imagine the case where two backends call EmitErrorReport()
at the same time. Is this OK? If not, the hook might need to be
in syslogger.

EmitErrorReport() can be called before shared library like Martin's
pg_logforward is preloaded. Which means that the hook may miss
some log messages. Is this OK?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Patch review for logging hooks (CF 2012-01)

From

Martin Pihlak

Date:

18 January 2012, 12:13:03

On 01/18/2012 03:56 AM, Fujii Masao wrote:
> or syslog process (if you use syslog). So ISTM that there is no
> guarantee that the order of log messages processed by the
> hook is same as that of messages written to the log file. For
> example, imagine the case where two backends call EmitErrorReport()
> at the same time. Is this OK? If not, the hook might need to be
> in syslogger.

For high volume logging I'd avoid going through the syslogger. One
big issue with syslogger is that it creates a choke point - everything
has to pass through it, and if it cannot keep up it starts stalling
the backends. Also, in EmitErrorReport the hook gets to have access
to the actual ErrorData structure -- that makes filtering and looking
at message content much simpler.

> EmitErrorReport() can be called before shared library like Martin's
> pg_logforward is preloaded. Which means that the hook may miss
> some log messages. Is this OK?
> 

True, even though the shared or local preload libraries are loaded
quite early in PostmasterMain/PostgresMain there is a chance that
some messages (mostly initialization failures) are missed. I guess
there is no way to avoid this with module based approach. But IMHO
this is acceptable -- when the backend fails to initialise, it cannot
be  assumed that the logging subsystem is set up properly. I guess it
deserves at least a comment though. Thanks for pointing this out.

regards,
Martin

Re: Patch review for logging hooks (CF 2012-01)

From

Andrew Dunstan

Date:

18 January 2012, 12:28:11


On 01/18/2012 11:12 AM, Martin Pihlak wrote:
> On 01/18/2012 03:56 AM, Fujii Masao wrote:
>> or syslog process (if you use syslog). So ISTM that there is no
>> guarantee that the order of log messages processed by the
>> hook is same as that of messages written to the log file. For
>> example, imagine the case where two backends call EmitErrorReport()
>> at the same time. Is this OK? If not, the hook might need to be
>> in syslogger.
> For high volume logging I'd avoid going through the syslogger. One
> big issue with syslogger is that it creates a choke point - everything
> has to pass through it, and if it cannot keep up it starts stalling
> the backends. Also, in EmitErrorReport the hook gets to have access
> to the actual ErrorData structure -- that makes filtering and looking
> at message content much simpler.
>
>

Hmm, interesting. I don't think I've encountered a situation where 
backends would actually stall. But in any case, I don't think we have to 
be that deterministic. The only thing that needs to be absolutely 
guaranteed is that the log messages from a given backend are in order. 
Some slight fuzz between backends seems acceptable.

cheers

andrew

Re: Patch review for logging hooks (CF 2012-01)

From

Alvaro Herrera

Date:

18 January 2012, 12:42:05

Excerpts from Andrew Dunstan's message of mié ene 18 13:27:40 -0300 2012:
>
> On 01/18/2012 11:12 AM, Martin Pihlak wrote:
> > On 01/18/2012 03:56 AM, Fujii Masao wrote:
> >> or syslog process (if you use syslog). So ISTM that there is no
> >> guarantee that the order of log messages processed by the
> >> hook is same as that of messages written to the log file. For
> >> example, imagine the case where two backends call EmitErrorReport()
> >> at the same time. Is this OK? If not, the hook might need to be
> >> in syslogger.
> > For high volume logging I'd avoid going through the syslogger. One
> > big issue with syslogger is that it creates a choke point - everything
> > has to pass through it, and if it cannot keep up it starts stalling
> > the backends. Also, in EmitErrorReport the hook gets to have access
> > to the actual ErrorData structure -- that makes filtering and looking
> > at message content much simpler.
>
> Hmm, interesting. I don't think I've encountered a situation where
> backends would actually stall.

You have to have really high velocity for this to happen.  At least one
customer of ours has suffered this problem (I vaguely recall a second
case but I'm not really sure), having had to switch to syslog (which
uses lossy sockets, with the advantage that it doesn't cause stalls).

> But in any case, I don't think we have to
> be that deterministic. The only thing that needs to be absolutely
> guaranteed is that the log messages from a given backend are in order.
> Some slight fuzz between backends seems acceptable.

Agreed.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Patch review for logging hooks (CF 2012-01)

From

Martin Pihlak

Date:

18 January 2012, 16:56:32

On 01/17/2012 11:40 PM, Marti Raudsepp wrote:
> It seems you missed a comment, that the current implementation is also
> affected by client_min_messages. I think that being affected by
> client-specific settings is surprising. I would put the
> if(emit_log_hook) inside the existing if(edata->output_to_server)
> condition. Unless you have some reason to do it this way?
>

I have no strong feelings about this -- if the behaviour seems
surprising, lets remove it. We need to keep the "if" separate
though -- the hook might want to omit the message from server
log so the "output_to_server" needs to be rechecked.

Updated patch attached.

regards,
Martin

Attachment

logging-hooks.patch

Re: Patch review for logging hooks (CF 2012-01)

From

Marti Raudsepp

Date:

18 January 2012, 17:23:50

On Wed, Jan 18, 2012 at 22:56, Martin Pihlak <martin.pihlak@gmail.com> wrote:
> We need to keep the "if" separate
> though -- the hook might want to omit the message from server
> log so the "output_to_server" needs to be rechecked.

Oh, yes makes sense.

The updated patch looks good, marking as 'Ready for Committer'

Regards,
Marti

Re: Patch review for logging hooks (CF 2012-01)

From

Greg Smith

Date:

20 January 2012, 03:47:37

On 01/18/2012 04:23 PM, Marti Raudsepp wrote:
> The updated patch looks good, marking as 'Ready for Committer'

Patches without documentation are never ready for commit.  For this one, 
I'm not sure if that should be in the form of a reference example in 
contrib, or just something that documents that the hook exists and what 
the ground rules are for grabbing it.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Re: Patch review for logging hooks (CF 2012-01)

From

Robert Haas

Date:

20 January 2012, 11:02:09

On Fri, Jan 20, 2012 at 2:47 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>> The updated patch looks good, marking as 'Ready for Committer'
>
> Patches without documentation are never ready for commit.  For this one, I'm
> not sure if that should be in the form of a reference example in contrib, or
> just something that documents that the hook exists and what the ground rules
> are for grabbing it.

Hooks are frequently not documented, and we only sometimes even bother
to include an example in contrib.  We should probably at least have a
working example for testing purposes, though, whether or not we end up
committing it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Patch review for logging hooks (CF 2012-01)

From

Marti Raudsepp

Date:

20 January 2012, 11:19:56

On Fri, Jan 20, 2012 at 17:01, Robert Haas <robertmhaas@gmail.com> wrote:
> We should probably at least have a
> working example for testing purposes, though, whether or not we end up
> committing it.

Martin Pihlak sent a short description of how to test the patch with
his pg_logforward module:
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00790.php

That's what I used when reviewing, too.

Regards,
Marti

Re: Patch review for logging hooks (CF 2012-01)

From

Andrew Dunstan

Date:

04 March 2012, 11:45:51

On 01/20/2012 10:01 AM, Robert Haas wrote:
> On Fri, Jan 20, 2012 at 2:47 AM, Greg Smith<greg@2ndquadrant.com>  wrote:
>>> The updated patch looks good, marking as 'Ready for Committer'
>> Patches without documentation are never ready for commit.  For this one, I'm
>> not sure if that should be in the form of a reference example in contrib, or
>> just something that documents that the hook exists and what the ground rules
>> are for grabbing it.
> Hooks are frequently not documented, and we only sometimes even bother
> to include an example in contrib.  We should probably at least have a
> working example for testing purposes, though, whether or not we end up
> committing it.
>

I'm just looking at this patch, and I agree, it should be testable. I'm 
wondering if it wouldn't be a good idea to have a module or set of 
modules for demonstrating and testing bits of the API that we expose. 
src/test/api or something similar? I'm not sure how we'd automate a test 
for this case, though. I guess we could use something like pg_logforward 
and have a UDP receiver catch the messages and write them to a file. 
Something like that should be possible to rig up in Perl. But all that 
seems a lot of work at this stage of the game. So the question is do we 
want to commit this patch without it?

cheers

andrew

Re: Patch review for logging hooks (CF 2012-01)

From

Robert Haas

Date:

05 March 2012, 12:00:39

On Sun, Mar 4, 2012 at 10:45 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> I'm just looking at this patch, and I agree, it should be testable. I'm
> wondering if it wouldn't be a good idea to have a module or set of modules
> for demonstrating and testing bits of the API that we expose. src/test/api
> or something similar? I'm not sure how we'd automate a test for this case,
> though. I guess we could use something like pg_logforward and have a UDP
> receiver catch the messages and write them to a file. Something like that
> should be possible to rig up in Perl. But all that seems a lot of work at
> this stage of the game. So the question is do we want to commit this patch
> without it?

The latest version of this patch looks sound to me.  We haven't
insisted on having even a sample application for every hook before,
let alone a regression test, so I don't think this patch needs one
either.  Now, it might be fairly said that we ought to have regression
tests for a lot more things than we do right now, but that's basically
a limitation of our regression-testing environment which the author of
this patch shouldn't be obliged to fix.

So my vote is to go ahead and commit it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Patch review for logging hooks (CF 2012-01)

From

Tom Lane

Date:

05 March 2012, 13:08:56

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Mar 4, 2012 at 10:45 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> I'm just looking at this patch, and I agree, it should be testable. I'm
>> wondering if it wouldn't be a good idea to have a module or set of modules
>> for demonstrating and testing bits of the API that we expose. src/test/api
>> or something similar? I'm not sure how we'd automate a test for this case,
>> though. I guess we could use something like pg_logforward and have a UDP
>> receiver catch the messages and write them to a file. Something like that
>> should be possible to rig up in Perl. But all that seems a lot of work at
>> this stage of the game. So the question is do we want to commit this patch
>> without it?

> The latest version of this patch looks sound to me.  We haven't
> insisted on having even a sample application for every hook before,
> let alone a regression test, so I don't think this patch needs one
> either.

What we've generally asked for with hooks is a working sample usage of
the hook, just as a cross-check that something useful can be done with
it and you didn't overlook any obvious usability problems.  I agree that
a regression test is often not practical, especially not if you're not
prepared to create a whole contrib module to provide a sample usage.

In the case at hand, ISTM there are some usability questions around
where/when the hook is called: in particular, if I'm reading it right,
the hook could not override a log_min_messages-based decision that a
given message is not to be emitted.  Do we care?  Also, if the hook
is meant to be able to change the data that gets logged, as seems to be
the case, do we care that it would also affect what gets sent to the
client?

I'd like to see a spec for exactly which fields of ErrorData the hook is
allowed to change, and some rationale.
        regards, tom lane

Re: Patch review for logging hooks (CF 2012-01)

From

Andrew Dunstan

Date:

05 March 2012, 13:50:56


On 03/05/2012 12:08 PM, Tom Lane wrote:
> Robert Haas<robertmhaas@gmail.com>  writes:
>> On Sun, Mar 4, 2012 at 10:45 AM, Andrew Dunstan<andrew@dunslane.net>  wrote:
>>> I'm just looking at this patch, and I agree, it should be testable. I'm
>>> wondering if it wouldn't be a good idea to have a module or set of modules
>>> for demonstrating and testing bits of the API that we expose. src/test/api
>>> or something similar? I'm not sure how we'd automate a test for this case,
>>> though. I guess we could use something like pg_logforward and have a UDP
>>> receiver catch the messages and write them to a file. Something like that
>>> should be possible to rig up in Perl. But all that seems a lot of work at
>>> this stage of the game. So the question is do we want to commit this patch
>>> without it?
>> The latest version of this patch looks sound to me.  We haven't
>> insisted on having even a sample application for every hook before,
>> let alone a regression test, so I don't think this patch needs one
>> either.
> What we've generally asked for with hooks is a working sample usage of
> the hook, just as a cross-check that something useful can be done with
> it and you didn't overlook any obvious usability problems.  I agree that
> a regression test is often not practical, especially not if you're not
> prepared to create a whole contrib module to provide a sample usage.
>
> In the case at hand, ISTM there are some usability questions around
> where/when the hook is called: in particular, if I'm reading it right,
> the hook could not override a log_min_messages-based decision that a
> given message is not to be emitted.  Do we care?


That's what I understand too. We could relax that at some stage in the 
future if we had a requirement, I guess.


>    Also, if the hook
> is meant to be able to change the data that gets logged, as seems to be
> the case, do we care that it would also affect what gets sent to the
> client?
>
> I'd like to see a spec for exactly which fields of ErrorData the hook is
> allowed to change, and some rationale.
>
>             

Good question. I'd somewhat be inclined to say that it should only be 
able to change output_to_server and output_to_client, and possibly only 
to change them from true to false (i.e. I'm not sure the hook should be 
able to induce more verbose logging.) But maybe that's too restrictive. 
I doubt we can enforce good behaviour, though, only state that if you 
break things you get to keep all the pieces.

cheers

andrew

Re: Patch review for logging hooks (CF 2012-01)

From

Robert Haas

Date:

05 March 2012, 14:00:05

On Mon, Mar 5, 2012 at 12:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> The latest version of this patch looks sound to me.  We haven't
>>> insisted on having even a sample application for every hook before,
>>> let alone a regression test, so I don't think this patch needs one
>>> either.
>>
>> What we've generally asked for with hooks is a working sample usage of
>> the hook, just as a cross-check that something useful can be done with
>> it and you didn't overlook any obvious usability problems.  I agree that
>> a regression test is often not practical, especially not if you're not
>> prepared to create a whole contrib module to provide a sample usage.
>>
>> In the case at hand, ISTM there are some usability questions around
>> where/when the hook is called: in particular, if I'm reading it right,
>> the hook could not override a log_min_messages-based decision that a
>> given message is not to be emitted.  Do we care?
>
> That's what I understand too. We could relax that at some stage in the
> future if we had a requirement, I guess.
>
>>   Also, if the hook
>> is meant to be able to change the data that gets logged, as seems to be
>> the case, do we care that it would also affect what gets sent to the
>> client?
>>
>> I'd like to see a spec for exactly which fields of ErrorData the hook is
>> allowed to change, and some rationale.
>
>
> Good question. I'd somewhat be inclined to say that it should only be able
> to change output_to_server and output_to_client, and possibly only to change
> them from true to false (i.e. I'm not sure the hook should be able to induce
> more verbose logging.) But maybe that's too restrictive. I doubt we can
> enforce good behaviour, though, only state that if you break things you get
> to keep all the pieces.

It sort of looks like it's intended to apply only to server output, so
I'd find it odd to say that it should be able to mess with
output_to_client.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Patch review for logging hooks (CF 2012-01)

From

Tom Lane

Date:

05 March 2012, 14:19:29

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Mar 5, 2012 at 12:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> I'd like to see a spec for exactly which fields of ErrorData the hook is
>>> allowed to change, and some rationale.

>> Good question. I'd somewhat be inclined to say that it should only be able
>> to change output_to_server and output_to_client, and possibly only to change
>> them from true to false (i.e. I'm not sure the hook should be able to induce
>> more verbose logging.) But maybe that's too restrictive. I doubt we can
>> enforce good behaviour, though, only state that if you break things you get
>> to keep all the pieces.

The reason it can't sensibly expect to change them from off to on is
that control will never get here in the first place if they're both off.
That is a feature not a bug: the ereport mechanisms are designed to skip
most of the message-construction work for messages that are so low
priority that they're not going to get printed anywhere.  If we wanted
the hook to be able to override that, it would have to be called from
errstart and it would then have much less information to work with.

> It sort of looks like it's intended to apply only to server output, so
> I'd find it odd to say that it should be able to mess with
> output_to_client.

OK, so let's document that the only supported change in ErrorData is to
turn output_to_server from on to off.  I can see how that constitutes
a potential feature: the hook might be intended as a filter that allows
logging or not logging according to some rule not supported by the core
system.  You'd have to ensure that log_min_messages is set low enough
for all desired messages to get generated in the first place, but then
the hook could control things beyond that.  I can also imagine a hook
that's meant as some sort of aggregator, so that it would accumulate a
summary form of messages that it then told the core not to print.
        regards, tom lane

Re: Patch review for logging hooks (CF 2012-01)

From

Tom Lane

Date:

06 March 2012, 16:38:01

Martin Pihlak <martin.pihlak@gmail.com> writes:
> Updated patch attached.

Applied with minor editorialization (mainly just improving the
comments).
        regards, tom lane

Re: Patch review for logging hooks (CF 2012-01)

From

Martin Pihlak

Date:

08 March 2012, 08:34:21

On Tue, Mar 6, 2012 at 10:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Applied with minor editorialization (mainly just improving the
> comments).

Thanks and kudos to all the reviewers!

regards,
Martin