On 10 April 2015 at 15:26, Peter Eisentraut <peter_e@gmx.net> wrote:
> What is your intended use case for this feature?
Likely use cases are:
* Limits on numbers of rows in sample. Some research colleagues have
published a new mathematical analysis that will allow a lower limit
than previously considered.
* Time limits on sampling. This allows data visualisation approaches
to gain approximate answers in real time.
* Stratified sampling. Anything with some kind of filtering, lifting
or bias. Allows filtering out known incomplete data.
* Limits on sample error
Later use cases would allow custom aggregates to work together with
custom sampling methods, so we might work our way towards i) an SUM()
function that provides the right answer even when used with a sample
scan, ii) custom aggregates that report the sample error, allowing you
to get both AVG() and AVG_STDERR(). That would be technically possible
with what we have here, but I think a lot more thought required yet.
These have all come out of detailed discussions with two different
groups of data mining researchers.
-- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services