Re: Add min and max execute statement time in pg_stat_statement - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: Add min and max execute statement time in pg_stat_statement |
Date | |
Msg-id | 20150219181054.GB8831@fetter.org Whole thread Raw |
In response to | Re: Add min and max execute statement time in pg_stat_statement ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: Add min and max execute statement time in pg_stat_statement
|
List | pgsql-hackers |
On Wed, Feb 18, 2015 at 08:31:09PM -0700, David G. Johnston wrote: > On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 02/18/2015 08:34 PM, David Fetter wrote: > > > >> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: > >> > >>> On 1/20/15 6:32 PM, David G Johnston wrote: > >>> > >>>> In fact, as far as the database knows, the values provided to this > >>>> function do represent an entire population and such a correction > >>>> would be unnecessary. I guess it boils down to whether "future" > >>>> queries are considered part of the population or whether the > >>>> population changes upon each query being run and thus we are > >>>> calculating the ever-changing population variance. > > > I think we should be calculating the population variance. > > >> Why population variance and not sample variance? In distributions > >> where the second moment about the mean exists, it's an unbiased > >> estimator of the variance. In this, it's different from the > >> population variance. > > > Because we're actually measuring the whole population, and not a sample? We're not. We're taking a sample, which is to say past measurements, and using it to make inferences about the population, which includes all queries in the future. > The key incorrect word in David Fetter's statement is "estimator". We are > not estimating anything but rather providing descriptive statistics for a > defined population. See above. > Users extrapolate that the next member added to the population would be > expected to conform to this statistical description without bias (though > see below). We can also then define the new population by including this > new member and generating new descriptive statistics (which allows for > evolution to be captured in the statistics). > > Currently (I think) we allow the end user to kill off the entire population > and build up from scratch so that while, in the short term, the ability to > predict the attributes of future members is limited once the population has > reached a statistically significant level new predictions will no longer > be skewed by population members who attributes were defined in a older and > possibly significantly different environment. In theory it would be nice > to be able to give the user the ability to specify - by time or percentage > - a subset of the population to leave alive. I agree that stale numbers can fuzz things in a way we don't like, but let's not creep too much feature in here. > Actual time-weighted sampling would be an alternative but likely one > significantly more difficult to accomplish. Right. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: