Re: Query optimizer 8.0.1 (and 8.0) - Mailing list pgsql-hackers

From Bruno Wolff III
Subject Re: Query optimizer 8.0.1 (and 8.0)
Date
Msg-id 20050207203453.GA7517@wolff.to
Whole thread Raw
In response to Re: Query optimizer 8.0.1 (and 8.0)  (pgsql@mohawksoft.com)
Responses Re: Query optimizer 8.0.1 (and 8.0)
List pgsql-hackers
On Mon, Feb 07, 2005 at 13:28:04 -0500,
> >
> > For large populations the accuracy of estimates of statistics based on
> > random
> > samples from that population are not very sensitve to population size and
> > depends primarily on the sample size. So that you would not expect to need
> > to use larger sample sizes on larger data sets for data sets over some
> > minimum size.
> 
> That assumes a fairly low standard deviation. If the standard deviation is
> low, then a minimal sample size works fine. If there was zero deviation in
> the  data, then a sample of one works fine.

This doesn't assume a low standard deviation. That wouldn't make sense
anyway since the standard deviation depends on the units used for
the measurements.

> In the current implementation of analyze.c, the default is 100 samples. On
> a table of 10,000 rows, that is probably a good number characterize the
> data enough for the query optimizer (1% sample). For a table with 4.6
> million rows, that's less than 0.002%

The fraction of rows isn't relevant unless it is a large fraction of the
total, in which case you can use a smaller sample than you might otherwise.

> Think about an iregularly occuring event, unevenly distributed throughout
> the data set. A randomized sample strategy normalized across the whole
> data set with too few samples will mischaracterize the event or even miss
> it altogether.

What you are saying here is that if you want more accurate statistics, you
need to sample more rows. That is true. However, the size of the sample
is essentially only dependent on the accuracy you need and not the size
of the population, for large populations.


pgsql-hackers by date:

Previous
From: pgsql@mohawksoft.com
Date:
Subject: Re: Query optimizer 8.0.1 (and 8.0)
Next
From: Mark Kirkwood
Date:
Subject: Re: Query optimizer 8.0.1 (and 8.0)