Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Date
Msg-id 55C3C75F.9090101@2ndquadrant.com
Whole thread Raw
In response to Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On 2015-08-06 22:25, Josh Berkus wrote:
> On 08/06/2015 01:19 PM, Simon Riggs wrote:
>> For me, the docs seem exactly correct. The mathematical implications of
>> that just aren't recorded explicitly.
>
> Well, for the SELECT page, all we need is the following (one changed
> sentence):
>
> The SYSTEM method is significantly faster than the BERNOULLI method when
> small sampling percentages are specified, but it may return a
> less-random sample of the table as a result of clustering effects, and
> may return a highly variable number of results for very small sample sizes.
>

BTW this was one of the motivations for making tsm_system_rows contrib 
module, that one will give you exact number of tuples while still doing 
page level sampling. But since it does linear probing it's only useful 
if you want those really small amounts of tuples because it will always 
do random I/O even if you are scanning large part of the table.

>>
>> I will try to reword or add something to make it clear that this can
>> return a variable number of blocks and thus produces a result with
>> greater variability in the number of rows returned.
>>
>> It's documented on the SELECT page only; plus there is a whole new
>> section on writing tablesample functions.
>
> Seems like it would be nice to have more detailed user docs somewhere
> which explain the sampling algos we have, especially if we get more in
> the future.  Not sure where would be appropriate for that, though.
>
> If there is no appropriate place, I'll just write a blog.
>

There is a blog post on 2ndQ blog page which tries to describe the 
sampling methods visually, not sure if it's more obvious from that or 
not. It's somewhat broken on planet though (only title there).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: 9.5 release notes
Next
From: Alvaro Herrera
Date:
Subject: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows