On Mon, Oct 21, 2013 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Gavin Flower <GavinFlower@archidevsys.co.nz> writes: >>>> If we're going to extend pg_stat_statements, even more than min and max >>>> I'd like to see the standard deviation in execution time. > >> How about the 'median', often a lot more useful than the 'arithmetic >> mean' (which most people call the 'average'). > > AFAIK, median is impossible to calculate cheaply (in particular, with > a fixed amount of workspace). So this apparently innocent request > is actually moving the goalposts a long way, because the space per > query table entry is a big concern for pg_stat_statements.
Yeah, and I worry about min and max not being very usable - once they get pushed out to extreme values, there's nothing to drag them back toward normality except resetting the stats, and that's not something we want to encourage people to do frequently. Of course, averages over very long sampling intervals may not be too useful anyway, dunno.
I think the pg_stat_statements_reset() should be done every time you make a change which you think (or hope) will push the system into a new regime, which goes for either min/max or for average/stdev.
A histogram would be cool, but it doesn't seem very practical to implement. If I really needed that I'd probably set log_min_duration_statement = 0 and mine the log files. But that means I'd have to wait to accumulate enough logs once I made that change, then remember to turn it off.
What I'd like most in pg_stat_statements now is the ability to distinguish which queries have a user grinding their teeth, versus which ones have a cron job patiently doing a wait4. I don't know the best way to figure that out, other than stratify on application_name. Or maybe a way to selectively undo the query text normalization, so I could see which parameters were causing the problem.