Re: Add min and max execute statement time in pg_stat_statement - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Add min and max execute statement time in pg_stat_statement
Date
Msg-id CAKFQuwZL4DTEuimUeVYSE=f_itYoaVaPBMTUiHkj7EMTDZQr0g@mail.gmail.com
Whole thread Raw
In response to Re: Add min and max execute statement time in pg_stat_statement  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Add min and max execute statement time in pg_stat_statement  (David Fetter <david@fetter.org>)
List pgsql-hackers
On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 02/18/2015 08:34 PM, David Fetter wrote:
On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
On 1/20/15 6:32 PM, David G Johnston wrote:
In fact, as far as the database knows, the values provided to this
function do represent an entire population and such a correction
would be unnecessary.  I guess it boils down to whether "future"
queries are considered part of the population or whether the
population changes upon each query being run and thus we are
calculating the ever-changing population variance.
 
I think we should be calculating the population variance.
 
Why population variance and not sample variance?  In distributions
where the second moment about the mean exists, it's an unbiased
estimator of the variance.  In this, it's different from the
population variance.
 
Because we're actually measuring the whole population, and not a sample?

​This.  

The key incorrect word in David Fetter's statement is "estimator".  We are not estimating anything but rather providing descriptive statistics for a defined population.

Users extrapolate that the next member added to the population would be expected to conform to this statistical description without bias (though see below).  We can also then define the new population by including this new member and generating new descriptive statistics (which allows for evolution to be captured in the statistics).

Currently (I think) we allow the end user to kill off the entire population and build up from scratch so that while, in the short term, the ability to predict the attributes of future members is limited once the population has reached a statistically significant level ​new predictions will no longer be skewed by population members who attributes were defined in a older and possibly significantly different environment.  In theory it would be nice to be able to give the user the ability to specify - by time or percentage - a subset of the population to leave alive.

Actual time-weighted sampling would be an alternative but likely one significantly more difficult to accomplish.  I really have dug too deep into the mechanics of the current code but I don't see any harm in sharing the thought.

David J.

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL
Next
From: Michael Paquier
Date:
Subject: Dead code in gin_private.h related to page split in WAL