Home > mailing lists

Re: [PATCH]Tablesample Submission - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: [PATCH]Tablesample Submission
Date	June 6, 2013 19:09:11
Msg-id	CA+U5nMLRtfbA-Zv0q1UvZ9+aMEyZLMAS3=4uDSSZyBA4uo8rGw@mail.gmail.com Whole thread
In response to	Re: [PATCH]Tablesample Submission (Hitoshi Harada <umi.tanuki@gmail.com>)
List	pgsql-hackers

Tree view

On 18 September 2012 10:32, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

> As wiki says, BERNOULLI relies on the statistics of the table, which
> doesn't sound good to me.  Of course we could say this is our
> restriction and say good-bye to users who hadn't run ANALYZE first,
> but it is too hard for a normal users to use it.  We may need
> quick-and-rough count(*) for this.

For Bernoulli sampling, SQL Standard says "Further, whether a given
row of RT is included in result of TF is independent of whether other
rows of RT are included in result of TF."

Which means BERNOULLI sampling looks essentially identical to using
 WHERE random() <= ($percent/100)

So my proposed implementation route for bernoulli sampling is to
literally add an AND-ed qual that does a random() test (and
repeatability also). That looks fairly simple and it is still
accurate, because it doesn't matter whether we do the indpendent test
to include the tuple before or after any other quals. I realise that
isn't a cool and hip approach, but it works and is exactly accurate.
Which would change the patch quite a bit.

Taking the random() approach would mean we don't rely on statistics either.

Thoughts?

SYSTEM sampling uses a completely different approach and is the really
interesting part of this feature.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Jim Nasby
Date: 06 June 2013, 18:48:30
Subject: Re: MVCC catalog access

From: Jeff Janes
Date: 06 June 2013, 19:34:11
Subject: Re: Cost limited statements RFC

Re: [PATCH]Tablesample Submission - Mailing list pgsql-hackers

Previous

Next