Re: Patch proposal for log_duration - Mailing list pgsql-patches
From | Guillaume Smet |
---|---|
Subject | Re: Patch proposal for log_duration |
Date | |
Msg-id | 1d4e0c10604020418ydf16063ybfaa736488dd56a0@mail.gmail.com Whole thread Raw |
In response to | Re: Patch proposal for log_duration (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Patch proposal for log_duration
|
List | pgsql-patches |
Tom, On 3/30/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I really find it pretty bizarre to want to log a duration without > logging the statement that caused it. Seems like the > log_min_duration_statement parameter ought to be enough. These logging > options are already messy and unintuitive, and adding still more odd > frammishes doesn't help that ... It's not so odd as it was the default behaviour for 7.4. I agree with you having just the duration is not the best solution but we cannot log every query for performances reason. The point is we are hosting several databases for our customers and for these databases, we don't know the application. I'm going to take a real example we have here to explain why the behaviour of log_duration in 7.4 was perfect for us. This database is a 2 GB database receiving 13 millions queries a day. We usually have the same old big queries which cannot be really optimized in the log due to log_min_duration_statement so this setting doesn't give us any clue on how our server performs because these queries are always slow. Here are the cases when a global overview with all duration was useful: - our customer tells us their site is slower than usual. We check the database log reports and we can see if the database is slower (average duration higher) or not. If this is the case, perhaps there is a bug or a new feature in their application performing a lot more queries than before: we can tell them too as we have all the queries logged by their duration. - we have a maintenance to plan which requires the database to be down (namely the upgrade from 7.4 to 8.1 we did a few weeks ago) and we need to choose a time when the database activity is not so high as we will move the db to a slower server. In our case, it's not directly correlated to the web statistics due to an advanced cache system. Having every query logged, we know when the database activity is lower. - we see on the graphs the database is significantly slower after let's say 5pm. This can be due to a the need of a vacuum on a specific table because of a cronjob running. That's true that log_duration won't be enough but we will use log_statement='mod' during this period the following day to see if our hypothesis is true. So our point is: - log_min_duration_statement is not enough as it does not give us a global overview of the database activity. - log_statement='all' generates far too I/O for us and we cannot enable it. - log_duration=on in 7.4 was a good compromise allowing us to detect the problems on the database server while generating a not so big log file (1.2 GB a day) and nearly no overhead. That's why we'd like to see this behaviour back again in PostgreSQL. > Changing code without changing the associated comments isn't a good way > to get your ideas accepted, either, eg here: [...] > The patch makes that comment a lie. This is not acceptable coding practice. Sorry for that. The attached patch is fixed. I'm not sure if we should remove the log_duration log if log_min_duration_statement logs the query and duration for the current query. PostgreSQL 7.4 logged a line for the duration due to log_duration=on and a line for statement + duration due to log_min_duration_statement. I have kept this behaviour to be consistent with 7.4 but it's perhaps better to not log the query twice (the tool we use count the query only once anyway). Any feedback on this is welcome. I'll update the patch if needed. Thanks for your attention and comments. Regards, -- Guillaume
Attachment
pgsql-patches by date: