Re: benchmarking the query planner - Mailing list pgsql-hackers

From Robert Haas
Subject Re: benchmarking the query planner
Date
Msg-id 603c8f070812120344m67ef2c1fs41806cfb4ff9e396@mail.gmail.com
Whole thread Raw
In response to Re: benchmarking the query planner  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: benchmarking the query planner  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: benchmarking the query planner  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The existing sampling mechanism is tied to solid statistics. It
>> provides the correct sample size to get a consistent confidence range
>> for range queries. This is the same mathematics which governs election
>> polling and other surveys. The sample size you need to get +/- 5% 19
>> times out of 20 increases as the population increases, but not by very
>> much.
>
> Sounds great, but its not true. The sample size is not linked to data
> volume, so how can it possibly give a consistent confidence range?

I'm not 100% sure how relevant it is to this case, but I think what
Greg is referring to is:

http://en.wikipedia.org/wiki/Margin_of_error#Effect_of_population_size

It is a pretty well-known mathematical fact that for something like an
opinion poll your margin of error does not depend on the size of the
population but only on the size of your sample.

...Robert


pgsql-hackers by date:

Previous
From: Zdenek Kotala
Date:
Subject: [Patch] Space reservation (pgupgrade)
Next
From: Peter Eisentraut
Date:
Subject: psql commands for SQL/MED