Re: auto_explain WAS: RFC: Timing Events - Mailing list pgsql-hackers
From | Gavin Flower |
---|---|
Subject | Re: auto_explain WAS: RFC: Timing Events |
Date | |
Msg-id | 50AD64B6.4030607@archidevsys.co.nz Whole thread Raw |
In response to | Re: auto_explain WAS: RFC: Timing Events (Greg Smith <greg@2ndQuadrant.com>) |
Responses |
Re: auto_explain WAS: RFC: Timing Events
(Jim Nasby <jim@nasby.net>)
|
List | pgsql-hackers |
On 22/11/12 12:15, Greg Smith wrote: > On 11/8/12 2:16 PM, Josh Berkus wrote: > >> Also, logging only the long-running queries is less useful than people >> on this list seem to think. When I'm doing real performance analysis, I >> need to see *everything* which was run, not just the slow stuff. Often >> the real problem is a query which used to take 1.1ms, now takes 1.8ms, >> and gets run 400 times/second. Looking just at the slow queries won't >> tell you that. > > No argument here. I've tried to be clear that some of these > high-speed but lossy things I'm hacking on are not going to be useful > for everyone. A solution that found most of these 1.8ms queries, but > dropped some percentage under the highest peak load conditions, would > still be very useful to me. > > An anecdote on this topic seems relevant. I have a troublesome > production server that has moved log_min_duration_statement from 100ms > to 500ms to 2000ms as the system grew. Even the original setting > wasn't short enough to catch everything we would like to watch *now*, > but seeing sub-second data is a dream at this point. The increases > have been forced by logging contention becoming unmanagable when every > client has to fight for the log to write to disk. I can see the > buggers stack up as waiting for locks if I try to log shorter > statements, stalling enough that it drags the whole server down under > peak load. > > If I could just turn off logging just during those periods--basically, > throwing them away only when some output rate throttled component hit > its limit--I could still find them in the data collected the rest of > the time. There are some types of problems that also only occur under > peak load that this idea would miss, but you'd still be likely to get > *some* of them, statistically. > > There's a really hard trade-off here: > > -Sometimes you must save data on every query to capture fine details > -Making every query wait for disk I/O is impractical > > The sort of ideas you threw out for making things like auto-explain > logging per-session can work. The main limitation there though is > that it presumes you even know what area the problem is in the first > place. I am not optimistic that covers a whole lot of ground either. > > Upthread I talked about a background process that persists shared > memory queues as a future consumer of timing events--one that might > consume slow query data too. That is effectively acting as the ideal > component I described above, one that only loses things when it > exceeds the system's write capacity for saving them. I wouldn't want > to try and retrofit the existing PostgreSQL logging facility for such > a thing though. Log parsing as the way to collect data is filled with > headaches anyway. > > I don't see any other good way to resolve this trade-off. To help > with the real-world situation you describe, ideally you dump all the > query data somewhere, fast, and have the client move on. You can't > make queries wait for storage, something else (with a buffer!) needs > to take over that job. > > I can't find the URL right now, but at PG.EU someone was showing me a > module that grabbed the new 9.2 logging hook and shipped the result to > another server. Clients burn a little CPU and network time and they > move on, and magically disk I/O goes out of their concerns. How much > overhead persisting the data takes isn't the database server's job at > all then. That's the sort of flexibility people need to have with > logging eventually. Something so efficient that every client can > afford to do it; it is capable of saving all events under ideal > conditions; but under adverse ones, you have to keep going and accept > the loss. > Would it be useful to be able to specify a fixed sampling rate, say 1 in 100. That way you could get a well defined statistical sample, yet not cause excessive I/O? Cheers, Gavin
pgsql-hackers by date: