Re: Add min and max execute statement time in pg_stat_statement - Mailing list pgsql-hackers

From David Fetter
Subject Re: Add min and max execute statement time in pg_stat_statement
Date
Msg-id 20150219181054.GB8831@fetter.org
Whole thread Raw
In response to Re: Add min and max execute statement time in pg_stat_statement  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Add min and max execute statement time in pg_stat_statement
List pgsql-hackers
On Wed, Feb 18, 2015 at 08:31:09PM -0700, David G. Johnston wrote:
> On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> > On 02/18/2015 08:34 PM, David Fetter wrote:
> >
> >> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
> >>
> >>> On 1/20/15 6:32 PM, David G Johnston wrote:
> >>>
> >>>> In fact, as far as the database knows, the values provided to this
> >>>> function do represent an entire population and such a correction
> >>>> would be unnecessary.  I guess it boils down to whether "future"
> >>>> queries are considered part of the population or whether the
> >>>> population changes upon each query being run and thus we are
> >>>> calculating the ever-changing population variance.
> 
> > I think we should be calculating the population variance.
> 
> >> Why population variance and not sample variance?  In distributions
> >> where the second moment about the mean exists, it's an unbiased
> >> estimator of the variance.  In this, it's different from the
> >> population variance.
> 
> > Because we're actually measuring the whole population, and not a sample?

We're not.  We're taking a sample, which is to say past measurements,
and using it to make inferences about the population, which includes
all queries in the future.

> The key incorrect word in David Fetter's statement is "estimator".  We are
> not estimating anything but rather providing descriptive statistics for a
> defined population.

See above.

> Users extrapolate that the next member added to the population would be
> expected to conform to this statistical description without bias (though
> see below).  We can also then define the new population by including this
> new member and generating new descriptive statistics (which allows for
> evolution to be captured in the statistics).
> 
> Currently (I think) we allow the end user to kill off the entire population
> and build up from scratch so that while, in the short term, the ability to
> predict the attributes of future members is limited once the population has
> reached a statistically significant level ​new predictions will no longer
> be skewed by population members who attributes were defined in a older and
> possibly significantly different environment.  In theory it would be nice
> to be able to give the user the ability to specify - by time or percentage
> - a subset of the population to leave alive.

I agree that stale numbers can fuzz things in a way we don't like, but
let's not creep too much feature in here.

> Actual time-weighted sampling would be an alternative but likely one
> significantly more difficult to accomplish.

Right.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: deparsing utility commands
Next
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0