Thread: log_duration is redundant, no?

log_duration is redundant, no?

From
Tom Lane
Date:
AFAICS, there is absolutely no difference anymore between turning
log_duration ON and setting log_min_duration_statement to zero.
ISTM that having the two redundant GUC settings is just confusing,
and we should remove log_duration to simplify things.
        regards, tom lane


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
Tom,

On 9/7/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> AFAICS, there is absolutely no difference anymore between turning
> log_duration ON and setting log_min_duration_statement to zero.
> ISTM that having the two redundant GUC settings is just confusing,
> and we should remove log_duration to simplify things.

log_duration can be useful if we restore a behaviour similar to what
it did in 7.4 ie when you can log duration only for every query. This
way you can have a global overview of your database activity.
I mean:
log_duration = on
log_min_duration_statement = 500
would log only duration for queries faster than 500 ms and duration +
query text for queries slower than 500ms (we can easily avoid
redundancy).

This allows you to have all query statistics without too much I/O. It
doesn't give you all the information as you don't have the query text
but it can give you useful statistics (to explain why the database
server is suddenly slower - we executed 2 millions queries last month
- now we execute 8 millions queries. We cannot easily have this sort
of information as a log_min_duration_statement = 500 won't give it and
a log_min_duration_statement = 0 generates far too I/O).

For another real life example, we are switching from a manual vacuum
to autovacuum for one of our customers and we want to know the
consequences. The query text is useless for a first analysis (and will
slow down the database too much) as we just want to detect if it's
slower or not.

Could we consider reintroduce the old behaviour rather than removing
this setting (we can rename it to a better name if needed)? I already
have a patch for that as we run a patched version of 8.1.4 here to
have this very behaviour.

--
Guillaume


Re: log_duration is redundant, no?

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> I mean:
> log_duration = on
> log_min_duration_statement = 500
> would log only duration for queries faster than 500 ms and duration +
> query text for queries slower than 500ms (we can easily avoid
> redundancy).

I don't find this very persuasive --- it sounds awfully messy, and in
fact isn't that exactly the old behavior we got rid of because no one
could understand it?
        regards, tom lane


Re: log_duration is redundant, no?

From
David Fetter
Date:
On Thu, Sep 07, 2006 at 06:06:51PM -0400, Tom Lane wrote:
> "Guillaume Smet" <guillaume.smet@gmail.com> writes:
> > I mean:
> > log_duration = on
> > log_min_duration_statement = 500
> > would log only duration for queries faster than 500 ms and
> > duration + query text for queries slower than 500ms (we can easily
> > avoid redundancy).
> 
> I don't find this very persuasive --- it sounds awfully messy, and
> in fact isn't that exactly the old behavior we got rid of because no
> one could understand it?

Guillaume's the author of pgfouine, which understands it and helps
others to do same.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!


Re: log_duration is redundant, no?

From
Josh Berkus
Date:
Tom,

> I don't find this very persuasive --- it sounds awfully messy, and in
> fact isn't that exactly the old behavior we got rid of because no one
> could understand it?

Well, we want analogous functionality.   We could stand to have it 
named/organized differently.   But maybe we should hold those chages for 
8.3, so that they can be tested properly?

I am finding that the log format prior to Bruce's change, which we were 
using for TPCE, makes it very hard to do log digest analysis if you use 
SPs or prepared queries at all.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/8/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't find this very persuasive --- it sounds awfully messy, and in
> fact isn't that exactly the old behavior we got rid of because no one
> could understand it?

I gave real use cases and we use it every day. It really helps us as a
PostgreSQL hosting company.

The fact is that no tool could really exploit this behaviour before. I
agree it's a totally useless information if you don't have a tool to
analyze the logs. This is no longer the case as pgFouine can extract
this information and make it useful by aggregating it.

Perhaps we could rename it to log_all_duration (my english is not that
good so I'm not sure it's a good name) and explain how it can be
useful in the documentation.

--
Guillaume


Re: log_duration is redundant, no?

From
Bruce Momjian
Date:
Guillaume Smet wrote:
> On 9/8/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I don't find this very persuasive --- it sounds awfully messy, and in
> > fact isn't that exactly the old behavior we got rid of because no one
> > could understand it?
> 
> I gave real use cases and we use it every day. It really helps us as a
> PostgreSQL hosting company.
> 
> The fact is that no tool could really exploit this behaviour before. I
> agree it's a totally useless information if you don't have a tool to
> analyze the logs. This is no longer the case as pgFouine can extract
> this information and make it useful by aggregating it.
> 
> Perhaps we could rename it to log_all_duration (my english is not that
> good so I'm not sure it's a good name) and explain how it can be
> useful in the documentation.

If you are using an external tool, can't you just restrict what you
display based on the logged duration?

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: log_duration is redundant, no?

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> If you are using an external tool, can't you just restrict what you
> display based on the logged duration?

I think his basic complaint is that doing the full logging pushup for
even short-duration queries is too expensive, and that logging only the
duration and not the query text or parameters makes a significant speed
difference.  I'm not at all sure that I buy that, but if it's true then
subsequent filtering obviously doesn't help.
        regards, tom lane


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/8/06, Bruce Momjian <bruce@momjian.us> wrote:
> If you are using an external tool, can't you just restrict what you
> display based on the logged duration?

It's not a matter of having too much information in our reports (the
more information I have, the happier I am :)). It's a matter of
slowing down too much the server with too much I/O.
We can afford to log every duration and queries slower than 500ms
nearly without any overhead. We can't afford to log every query, it
generates too much I/O - note that we tried to do it and it was really
too slow.
With the former configuration we log 1.2 GB/day, with the latter I
suspect it will be far more than 60 GB/day (I don't have the exact
number as we can't do it for real but queries slower than 500 ms
represents 1/100 of the total amount of queries).

Query logging is really a nice way to monitor the activity of a
PostgreSQL server and the overhead is not that high if logging I/O are
not too intensive.

--
Guillaume


Re: log_duration is redundant, no?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > If you are using an external tool, can't you just restrict what you
> > display based on the logged duration?
> 
> I think his basic complaint is that doing the full logging pushup for
> even short-duration queries is too expensive, and that logging only the
> duration and not the query text or parameters makes a significant speed
> difference.  I'm not at all sure that I buy that, but if it's true then
> subsequent filtering obviously doesn't help.

Well, except for bind, all the log output display is zero cost, just a
printf(), as I remember.  The only cost that is significant, I think, is
the timing of the query, and that is happening for all the setttings
discussed.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: log_duration is redundant, no?

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Well, except for bind, all the log output display is zero cost, just a
> printf(), as I remember.  The only cost that is significant, I think, is
> the timing of the query, and that is happening for all the setttings
> discussed.

On a machine with slow gettimeofday(), measuring duration at all is
going to hurt, but apparently that is not Guillaume's situation ---
what's costing him is sheer log volume.  And remember that the
slow-gettimeofday problem exists mainly on cheap PCs, not server-grade
hardware.  Based on his experience I'm prepared to believe that there
is a use-case for logging just the duration for short queries.

It seems like we should either remove the separate log_duration boolean
or make it work as he suggests.  I'm leaning to the second answer now.
What's your vote?
        regards, tom lane


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/8/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think his basic complaint is that doing the full logging pushup for
> even short-duration queries is too expensive, and that logging only the
> duration and not the query text or parameters makes a significant speed
> difference.  I'm not at all sure that I buy that, but if it's true then
> subsequent filtering obviously doesn't help.

That's exactly my point. And on our highly loaded servers, the
log_duration behaviour makes the difference between:
- we have a clear overview of the server activity and
- we don't have any idea of what happens on this server (apart that
there are queries slower than X ms).

Regards,

--
Guillaume


Re: log_duration is redundant, no?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Well, except for bind, all the log output display is zero cost, just a
> > printf(), as I remember.  The only cost that is significant, I think, is
> > the timing of the query, and that is happening for all the setttings
> > discussed.
> 
> On a machine with slow gettimeofday(), measuring duration at all is
> going to hurt, but apparently that is not Guillaume's situation ---
> what's costing him is sheer log volume.  And remember that the
> slow-gettimeofday problem exists mainly on cheap PCs, not server-grade
> hardware.  Based on his experience I'm prepared to believe that there
> is a use-case for logging just the duration for short queries.
> 
> It seems like we should either remove the separate log_duration boolean
> or make it work as he suggests.  I'm leaning to the second answer now.
> What's your vote?

#2, I think, but I am confused if you don't know the query, how valuable
is the log_duration.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: log_duration is redundant, no?

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

> #2, I think, but I am confused if you don't know the query, how valuable
> is the log_duration.

Statistics?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: log_duration is redundant, no?

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > #2, I think, but I am confused if you don't know the query, how valuable
> > is the log_duration.
> 
> Statistics?

Oh, interesting.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/8/06, Bruce Momjian <bruce@momjian.us> wrote:
> Alvaro Herrera wrote:
> > Statistics?
>
> Oh, interesting.

We build this type of report for our customers:
http://pgfouine.projects.postgresql.org/reports/sample_hourly.html
This one is a real one.

As you can see, we cannot tell the type of every query as we don't log
the text of all of them (we identified 138,788 of the 12,358,514
queries executed this day).

It helped to detect there was a weird behaviour at 11am which was not
a good idea as we have a lot of visits at 11am. We asked our customers
if there was any cronjob running at 11am we could move to another time
when the load was less intensive as it really slowed down the
database.

Sometimes we also detect that we suddenly have a big up in the number
of queries executed (and it can be small queries we cannot detect with
log_min_duration_statement = 500) without the same increase on the
HTTP side. We contact our customer to ask him what happens and if it's
normal. If it's not, they take a look at the changes they made to  the
code the previous day. Sometimes, we can even give them a hint because
it's at a specific time and we can suspect it's a cronjob running.
If they find the problem, it's directly corrected and we can check the
next day if it's okay. If they don't find the problem, then we enable
full query logging for a short time and we analyze them to find the
offending queries.

And sometimes, in the case we just host the database without
developing the application, we just simply need it for contractual
reasons to demonstrate why the server is suddenly becoming slow.

I hope it helps to clarify how we use it.

--
Guillaume


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
Tom,

On 9/8/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It seems like we should either remove the separate log_duration boolean
> or make it work as he suggests.  I'm leaning to the second answer now.

Do you want me to propose a patch or do you prefer to work on it
yourself? If so, do we keep the log_duration name or do we change it
to log_all_duration or another more appropriate name?

I attached the little patch I use to apply on our packages. I can work
on it to make it apply to HEAD and update the documentation.

I suppose we should also change the FE/BE protocol logging accordingly
but ISTM you already planned to change it for other reasons.

--
Guillaume

Attachment

Re: log_duration is redundant, no?

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> Do you want me to propose a patch or do you prefer to work on it
> yourself?

It's done already ...
        regards, tom lane


Re: log_duration is redundant, no?

From
Peter Eisentraut
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > #2, I think, but I am confused if you don't know the query, how
> > valuable is the log_duration.
>
> Statistics?

I doubt that there is a statistical merit to calculating aggregate 
values over the duration of an anonymous set of queries.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: log_duration is redundant, no?

From
David Fetter
Date:
On Sat, Sep 09, 2006 at 06:33:10PM +0200, Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > #2, I think, but I am confused if you don't know the query, how
> > > valuable is the log_duration.
> >
> > Statistics?
> 
> I doubt that there is a statistical merit to calculating aggregate
> values over the duration of an anonymous set of queries.

"How heavily loaded is the server" is a perfectly legitimate metric to
have available, especially when more detailed, i.e. more invasive
probes could bring it down.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!


Re: log_duration is redundant, no?

From
Martijn van Oosterhout
Date:
On Sat, Sep 09, 2006 at 06:33:10PM +0200, Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > #2, I think, but I am confused if you don't know the query, how
> > > valuable is the log_duration.
> >
> > Statistics?
>
> I doubt that there is a statistical merit to calculating aggregate
> values over the duration of an anonymous set of queries.

Eh? Sure there is, if you want totals per database, number of queries
and where they were spread over the day. A perfectly reasonable example
was given in this thread.

If you're a service provider you don't care about what queries the
users are doing, just how many resources they're taking.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/8/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's done already ...

(Working on implementing the last changes you made in formatting in pgFouine)

Is it normal that when I set log_duration to on and log_statement to
all, I have the following output when I prepare/bind/execute a
prepared statement using the protocol:
LOG:  duration: 0.250 ms
LOG:  duration: 0.057 ms
LOG:  execute my_query: SELECT * FROM shop WHERE $1 = $2
DETAIL:  parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets'

I suppose the first line is the prepare and the second line is the
bind but I'm not sure it's the desired behaviour.

Any comment?

--
Guillaume


Re: log_duration is redundant, no?

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> Is it normal that when I set log_duration to on and log_statement to
> all, I have the following output when I prepare/bind/execute a
> prepared statement using the protocol:
> LOG:  duration: 0.250 ms
> LOG:  duration: 0.057 ms
> LOG:  execute my_query: SELECT * FROM shop WHERE $1 = $2
> DETAIL:  parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets'

> I suppose the first line is the prepare and the second line is the
> bind but I'm not sure it's the desired behaviour.

Well, considering that the parse and bind may take longer than the
execute, I hardly think we want to ignore them for log_duration
purposes.  And we agreed that if log_duration is on and
log_min_duration_statement is not triggered, log_duration should print
*only* duration.  So I'm not sure what else you expected.
        regards, tom lane


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/16/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, considering that the parse and bind may take longer than the
> execute, I hardly think we want to ignore them for log_duration
> purposes.  And we agreed that if log_duration is on and
> log_min_duration_statement is not triggered, log_duration should print
> *only* duration.  So I'm not sure what else you expected.

I don't know exactly what I expected. I'm just surprised to have only
the duration when log_statement is set to all.
If we consider that the prepare and the bind operations are important
(and I agree they can be), I wonder why do we remove the output we
have when log_min_duration_statement is set to 0 (I'm thinking of the
parse: and bind: lines)?

(sorry for the double post, I forgot to cc: the list)


Re: log_duration is redundant, no?

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> If we consider that the prepare and the bind operations are important
> (and I agree they can be), I wonder why do we remove the output we
> have when log_min_duration_statement is set to 0 (I'm thinking of the
> parse: and bind: lines)?

Well, we remove it for the execute: too if you have only log_duration
on.  My view of this is that log_duration is meant to find out the total
amount of time spent doing stuff, and you set log_min_duration_statement
to whatever your threshold of pain is for the amount of time spent doing
a single thing.  If it's less than log_min_duration_statement then
you're saying you don't care about the details of that individual step,
only the aggregate runtime.

log_statement has another goal entirely, which is to record *what* is
being done in a semantic sense, and so for that I think it makes sense
to log only actual executions (and not parse/bind leading up to 'em).

The only asymmetry in the thing is that if log_statement fired then
we suppress duplicate printing of the query in the later duration log
message (if any) for that query.  But that seems like the right thing
if you're at all concerned about log volume.
        regards, tom lane


Re: log_duration is redundant, no?

From
Alvaro Herrera
Date:
Tom Lane wrote:
> "Guillaume Smet" <guillaume.smet@gmail.com> writes:
> > If we consider that the prepare and the bind operations are important
> > (and I agree they can be), I wonder why do we remove the output we
> > have when log_min_duration_statement is set to 0 (I'm thinking of the
> > parse: and bind: lines)?
> 
> Well, we remove it for the execute: too if you have only log_duration
> on.  My view of this is that log_duration is meant to find out the total
> amount of time spent doing stuff, and you set log_min_duration_statement
> to whatever your threshold of pain is for the amount of time spent doing
> a single thing.  If it's less than log_min_duration_statement then
> you're saying you don't care about the details of that individual step,
> only the aggregate runtime.

It might make sense to log _what_ is going on, without telling all the
little details, for example

LOG:  parse duration: 0.250 ms
LOG:  bind duration: 0.057 ms
LOG:  execute my_query: SELECT * FROM shop WHERE $1 = $2
DETAIL:  parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets'

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/16/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> It might make sense to log _what_ is going on, without telling all the
> little details, for example
>
> LOG:  parse duration: 0.250 ms
> LOG:  bind duration: 0.057 ms
> LOG:  execute my_query: SELECT * FROM shop WHERE $1 = $2
> DETAIL:  parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets'

It's not really the idea when you use log_statement and log_duration.
Lines are completely different semantically speaking.
So you should have:
LOG:  parse (log_statement)
LOG:  duration: 0.250 ms (log_duration)

--
Guillaume


Re: log_duration is redundant, no?

From
"Guillaume Smet"
Date:
On 9/16/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The only asymmetry in the thing is that if log_statement fired then
> we suppress duplicate printing of the query in the later duration log
> message (if any) for that query.  But that seems like the right thing
> if you're at all concerned about log volume.

Perhaps I'm not representative of the users of these settings but when
I used log_statement='all', I didn't really care about the log volume.
I knew it really generates a lot of log lines and it slows down my
database.

My only concern was that we now have less information with
log_statement='all' than with log_min_duration_statement.

That said, I don't use it myself now: I use exclusively
log_min_duration_statement and log_duration. So if you think it's
better like that, it's ok for me.

Does anyone else have an opinion about this?

--
Guillaume


Re: log_duration is redundant, no?

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> My only concern was that we now have less information with
> log_statement='all' than with log_min_duration_statement.

Well, you don't have the durations, but log_statement isn't supposed to
tell you that.  So I'm still quite confused about what it is that you
want to do differently.
        regards, tom lane