Home > mailing lists

Re: TABLESAMPLE patch - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: TABLESAMPLE patch
Date	April 12, 2015 13:31:01
Msg-id	CA+U5nMLj-ZghN4i=ZEEZTKdf6mmOYTMd9KtnXXbt+CEoOBJM2g@mail.gmail.com Whole thread Raw
In response to	Re: TABLESAMPLE patch (Peter Eisentraut <peter_e@gmx.net>)
List	pgsql-hackers

Tree view

On 10 April 2015 at 15:26, Peter Eisentraut <peter_e@gmx.net> wrote:

> What is your intended use case for this feature?

Likely use cases are:
* Limits on numbers of rows in sample. Some research colleagues have
published a new mathematical analysis that will allow a lower limit
than previously considered.
* Time limits on sampling. This allows data visualisation approaches
to gain approximate answers in real time.
* Stratified sampling. Anything with some kind of filtering, lifting
or bias. Allows filtering out known incomplete data.
* Limits on sample error

Later use cases would allow custom aggregates to work together with
custom sampling methods, so we might work our way towards i) an SUM()
function that provides the right answer even when used with a sample
scan, ii) custom aggregates that report the sample error, allowing you
to get both AVG() and AVG_STDERR(). That would be technically possible
with what we have here, but I think a lot more thought required yet.

These have all come out of detailed discussions with two different
groups of data mining researchers.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services

pgsql-hackers by date:

From: 彭瑞华
Date: 12 April 2015, 11:53:03
Subject: Re: WIP Patch for GROUPING SETS phase 1

From: Magnus Hagander
Date: 12 April 2015, 17:13:04
Subject: Re: SSL information view

Re: TABLESAMPLE patch - Mailing list pgsql-hackers

Previous

Next