Thread: suggestion: log_statement = sample
hi, http://archives.postgresql.org/pgsql-general/2009-03/msg00581.php This was my suggestion about introducing a statment to get a sample of SQL statements. Nobody answered yet. Why not? i think my suggestion would help a lot. Or was it kind of stupid? kind regards Janning
In response to Janning Vygen <vygen@kicktipp.de>: > hi, > > http://archives.postgresql.org/pgsql-general/2009-03/msg00581.php > > This was my suggestion about introducing a statment to get a sample of SQL > statements. Nobody answered yet. Why not? i think my suggestion would help a > lot. Or was it kind of stupid? For my part, I don't think this would be useful. Since most of your queries are run by software, you're going to see a fairly predictable pattern to the queries, which means your sampling isn't going to be anywhere near random, thus it will still be inaccurate and incomplete. In my experience, I've found that enabling full logging for a short time (perhaps a few hours) gathers enough data to run through tools like pgFouine and find problem areas. Also, we have development servers that run automated tests, and since it's not critical that they be performant, we can run full query logging on them all the time. Additionally, we make sure our production systems have enough hardware behind them that we can add additional tasks without it affecting production use. All of these are (in my opinion) better approaches to the problem than yet another arbitrary query filtering technique. I mean, logging only the most time-consuming queries is already arbitrary enough (as you already stated). -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
hi, thanks for your comments on this. On Thursday 16 July 2009 15:05:58 you wrote: > In response to Janning Vygen <vygen@kicktipp.de>: > > hi, > > > > http://archives.postgresql.org/pgsql-general/2009-03/msg00581.php > > > > This was my suggestion about introducing a statment to get a sample of > > SQL statements. Nobody answered yet. Why not? i think my suggestion would > > help a lot. Or was it kind of stupid? > > For my part, I don't think this would be useful. > > Since most of your queries are run by software, you're going to see a > fairly predictable pattern to the queries, which means your sampling isn't > going to be anywhere near random, thus it will still be inaccurate and > incomplete. I dont think so. In my use case i will get a good sampling of queries as I could keep my log_sample running over long period of time. The sampling is in any case much better than with log_minduration while logging all statement is not acceptable in production. > In my experience, I've found that enabling full logging for a short time > (perhaps a few hours) gathers enough data to run through tools like > pgFouine and find problem areas. It is not possible for us. Logging millions of statements take too much time. > Also, we have development servers that > run automated tests, and since it's not critical that they be performant, > we can run full query logging on them all the time. But you dont run the real use cases with automated tests. There so many factors involved in real time: caching, concurrency, data, peaktime, deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a development system without lots of effort. > Additionally, we make > sure our production systems have enough hardware behind them that we can > add additional tasks without it affecting production use. that's nice, but not everybody can afford it. Of course i would love to log every statement. But do you really log every statement in production? I guess not. > All of these are (in my opinion) better approaches to the problem than > yet another arbitrary query filtering technique. I mean, logging only > the most time-consuming queries is already arbitrary enough (as you > already stated). With log_min duration i get only most time-consuming queries. With log sample i can detect if there is a fast query which is called to often. This is impossible today. Again: for my use case it makes sense to have a log_sample feature. kind regards Janning
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >> In my experience, I've found that enabling full logging for a short time >> (perhaps a few hours) gathers enough data to run through tools like >> pgFouine and find problem areas. > It is not possible for us. Logging millions of statements take too much time. Too much time to do what? Where is the bottleneck? > But you dont run the real use cases with automated tests. There so many > factors involved in real time: caching, concurrency, data, peaktime, > deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a > development system without lots of effort. Agreed. >> Additionally, we make >> sure our production systems have enough hardware behind them that we can >> add additional tasks without it affecting production use. > that's nice, but not everybody can afford it. Of course i would love to log > every statement. But do you really log every statement in production? I guess > not. We have clients that do, and we always recommend it for all of our clients. It can get very expensive (hint: ship them off with syslog over an internal network to a dedicated box), but being able to see exactly what happened in your database when something goes wrong after the fact is invaluable. Plus we can then run tools like pgsi (http://bucardo.org/pgsi/) to get nice statistics and find weak spots in the application code. > Again: for my use case it makes sense to have a log_sample feature. Perhaps, but I don't think you've quite overcome the 'log everything' counter-argument. I don't think log_sample is necessarily a bad idea, but I doubt it is going to happen soon, and certainly not in time to put on your production system in the next year or so. Perhaps you can attack this from another angle, such as using some sort of filter or script to grab every X lines from the logs and discard the rest while in full logging mode. That would have the advantage of being fully backwards-compatible and usable today. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200907201256 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkpkoh4ACgkQvJuQZxSWSsi5swCfSgxE/3Vs+kCmfqSERL6u84XJ nNQAoO5oOs/Nwuhe27FQ+THZEUVcdULO =7JPQ -----END PGP SIGNATURE-----
In response to "Greg Sabino Mullane" <greg@turnstep.com>: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > > >> In my experience, I've found that enabling full logging for a short time > >> (perhaps a few hours) gathers enough data to run through tools like > >> pgFouine and find problem areas. > > > It is not possible for us. Logging millions of statements take too much time. This is a ridiculous statement. In actual practice, full query logging is 1/50 the amount of disk I/O as the actual database activity. If your systems are so stressed that they can't handle another 2% increase, then you've got bigger problems lurking. Have you benchmarked the load it creates under your workload? > > But you dont run the real use cases with automated tests. There so many > > factors involved in real time: caching, concurrency, data, peaktime, > > deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a > > development system without lots of effort. Logging a sample query every arbitrary number of queries isn't a real use case either, thus your counter argument makes no sense. I suggested the test system because it's a good compromise. > >> Additionally, we make > >> sure our production systems have enough hardware behind them that we can > >> add additional tasks without it affecting production use. > > > that's nice, but not everybody can afford it. Of course i would love to log > > every statement. But do you really log every statement in production? I guess > > not. Try reading my responses instead of guessing. We run full query logging on occasion to get a sample of real usage on our production systems. As a result, we know that our servers can handle the load. We're working on the infrastructure to manage the amount of data so we can do it all the time (we don't currently have enough disk space). Overall, it seems like you've decided that you want this feature and nothing else will do. If that's the case, then just go ahead and write it. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote: > In response to "Greg Sabino Mullane" <greg@turnstep.com>: > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: RIPEMD160 > > > > > > >> In my experience, I've found that enabling full logging for a short time > > >> (perhaps a few hours) gathers enough data to run through tools like > > >> pgFouine and find problem areas. > > > > > It is not possible for us. Logging millions of statements take too much time. > > This is a ridiculous statement. No it isn't. > In actual practice, full query logging > is 1/50 the amount of disk I/O as the actual database activity. If your > systems are so stressed that they can't handle another 2% increase, then > you've got bigger problems lurking. It depends on the system. I have seen even big systems take a huge hit by full logging due to transactional velocity. Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
On Jul 20, 2009, at 10:24 AM, Bill Moran wrote: > In response to "Greg Sabino Mullane" <greg@turnstep.com>: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: RIPEMD160 >> >> >>>> In my experience, I've found that enabling full logging for a >>>> short time >>>> (perhaps a few hours) gathers enough data to run through tools like >>>> pgFouine and find problem areas. >> >>> It is not possible for us. Logging millions of statements take too >>> much time. > > This is a ridiculous statement. No, it isn't. > In actual practice, full query logging > is 1/50 the amount of disk I/O as the actual database activity. If > your > systems are so stressed that they can't handle another 2% increase, > then > you've got bigger problems lurking. That's really not true. Ever, probably, but certainly in my situation. A lot of my inserts are large text fields (10k - 1M). Many of those are never read again. If I want to log all statements or slow statements then a lot of what's logged is those large inserts. There's currently no way to log the statement without logging all the data inserted, so the log traffic produced by each insert is huge. Logging via syslog on Solaris I've had reports of that slowing the machine down to unusability. (I'm fairly sure I know why, and I suspect you can guess too, but this is on customer machines). The hit due to logging can be huge, even on fairly overpowered systems. There are a lot of limitations in the current logging system when it comes to capturing data for performance analysis, and that's one of them. There's certainly significant room for improvement in the logging system - some of that can be added externally, but some of it really needs to be done within the core code. Cheers, Steve
In response to "Joshua D. Drake" <jd@commandprompt.com>: > On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote: > > In response to "Greg Sabino Mullane" <greg@turnstep.com>: > > > > > > -----BEGIN PGP SIGNED MESSAGE----- > > > Hash: RIPEMD160 > > > > > > > > > >> In my experience, I've found that enabling full logging for a short time > > > >> (perhaps a few hours) gathers enough data to run through tools like > > > >> pgFouine and find problem areas. > > > > > > > It is not possible for us. Logging millions of statements take too much time. > > > > This is a ridiculous statement. > > No it isn't. > > > In actual practice, full query logging > > is 1/50 the amount of disk I/O as the actual database activity. If your > > systems are so stressed that they can't handle another 2% increase, then > > you've got bigger problems lurking. > > It depends on the system. I have seen even big systems take a huge hit > by full logging due to transactional velocity. Perhaps I'm just in a foul mood today, but I feel like people are picking my emails apart to make me the bad guy. Note that you trimmed out a key part of what I said here: "Have you benchmarked the load it creates under your workload?" Perhaps I should have explicitly said, "There are likely some cases where this statement isn't true, so you should benchmark your specific load case, but you don't mention that you've done that." Or, perhaps I should just abstain from posting to mailing lists when I'm in a foul mood ... -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Mon, 2009-07-20 at 13:46 -0400, Bill Moran wrote: > In response to "Joshua D. Drake" <jd@commandprompt.com>: > > It depends on the system. I have seen even big systems take a huge hit > > by full logging due to transactional velocity. > > Perhaps I'm just in a foul mood today, but I feel like people are picking Possibly. :) > my emails apart to make me the bad guy. Note that you trimmed out a key > part of what I said here: Right. I didn't have issue with the rest of your email. Its just the idea what the person said was ridiculous. > > "Have you benchmarked the load it creates under your workload?" Which is certainly a valid question. > > Perhaps I should have explicitly said, "There are likely some cases where > this statement isn't true, so you should benchmark your specific load > case, but you don't mention that you've done that." > > Or, perhaps I should just abstain from posting to mailing lists when I'm > in a foul mood ... It has certainly gotten me in trouble more than once. There is a reason they say, "Don't go JD". Joshua D. Drake > -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
On Monday 20 July 2009 18:58:21 Greg Sabino Mullane wrote: > Perhaps, but I don't think you've quite overcome the 'log everything' > counter-argument. # Not everybody can afford a system with lots of raid arrays or dedicated logging boxes. Many people log to the same disk. I do it in some projects. So log_sample gives you quite as much log analysing as full logging but without the I/O overhead. Maybe most of the postgresql guys have lots of hardware with RAID 10 SCSI Disks but reality sometimes differ. we run most of our boxes on IDE RAID 1 without additional disks for logging. In this scenario you just can't log everything. log_min_duration gives you only a view to long running queries but that is just on half of the whole picture. kind regards Janning
On Monday 20 July 2009 19:24:13 Bill Moran wrote: > > > It is not possible for us. Logging millions of statements take too much > > > time. > > This is a ridiculous statement. In actual practice, full query logging > is 1/50 the amount of disk I/O as the actual database activity. If your > systems are so stressed that they can't handle another 2% increase, then > you've got bigger problems lurking. > > Have you benchmarked the load it creates under your workload? Yes, it takes up to 15% of our workload in an average use case. But we have peak times where we can not afford 15% lost for logging! And if we log on the same disk, it slows down reading and writing. So it does not really matter how much work load it creates. If it slows down my response, i dont want to log everything. (I know that logging onto the same disk is bad!) > Overall, it seems like you've decided that you want this feature and > nothing else will do. If that's the case, then just go ahead and write it. It was really just a suggestion. I know that i can handle my problems in a different way like doing full logging from time to time, better automatic tests, more hardware and so on. I just thought, it couldnt be so hard to log every nth statement instead of all. And i think that there are lot of usescases out there for a directive like this. I really just wanted to help to make postgresql better. Even if it would be implemented soon, it wouldn't help me. As i need to analyze my load now! kind regards Janning
Janning Vygen <vygen@kicktipp.de> writes: > On Monday 20 July 2009 19:24:13 Bill Moran wrote: >> Have you benchmarked the load it creates under your workload? > Yes, it takes up to 15% of our workload in an average use case. But we have > peak times where we can not afford 15% lost for logging! Well, you could turn it off during the peak times. Or buy better hardware --- if you have less than 15% headroom then it's past time to be doing that anyway. You really haven't given any convincing case for this behavior... regards, tom lane
On Tuesday 21 July 2009 15:49:36 Tom Lane wrote: > Janning Vygen <vygen@kicktipp.de> writes: > > On Monday 20 July 2009 19:24:13 Bill Moran wrote: > >> Have you benchmarked the load it creates under your workload? > > > > Yes, it takes up to 15% of our workload in an average use case. But we > > have peak times where we can not afford 15% lost for logging! > > Well, you could turn it off during the peak times. It affords a server restart which is not a good idea. And i will never get any real time analyze of what happens at peak time. > Or buy better > hardware --- if you have less than 15% headroom then it's past time > to be doing that anyway. yes, you are right, but software solutions are so much nicer... > You really haven't given any convincing case for this behavior... that's fine more me. I still don't understand it why it was not convincing. I guess you guys run your DB with lots of hardware power, RAIDs and NAS and so on. But there are many people running it on small boxes with one IDE hard disk. Switching full logging on with such hardware is like shutting it down as IO is blocked! I still think it has reasonable uses cases without lots of effort (just a guess, i havn't looked at the source code yet). But, hey: I just wanted to help improving postgresql. If you think this is not an improvment, its fine for me. Thanks to all for discussing it. kind regards Janning
Janning Vygen <vygen@kicktipp.de> writes: > On Tuesday 21 July 2009 15:49:36 Tom Lane wrote: >> Well, you could turn it off during the peak times. > It affords a server restart which is not a good idea. Changing logging options does not require a server restart. regards, tom lane
On Mon, Jul 20, 2009 at 1:28 PM, Joshua D. Drake<jd@commandprompt.com> wrote: > On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote: >> In actual practice, full query logging >> is 1/50 the amount of disk I/O as the actual database activity. If your >> systems are so stressed that they can't handle another 2% increase, then >> you've got bigger problems lurking. > > It depends on the system. I have seen even big systems take a huge hit > by full logging due to transactional velocity. > > Joshua D. Drake I've seen this too. On high transaction load systems, log_statement has big overhead. I haven't tested this lately though, and I suspect this information is version dependent (in the 'old' days, it used to be a really big deal). merlin
On Tuesday 21 July 2009 18:09:57 you wrote: > Janning Vygen <vygen@kicktipp.de> writes: > > On Tuesday 21 July 2009 15:49:36 Tom Lane wrote: > >> Well, you could turn it off during the peak times. > > > > It affords a server restart which is not a good idea. > > Changing logging options does not require a server restart. great. I didn't know that. thank you! Janning