Home > mailing lists

Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: ANALYZE sampling is too good
Date	December 11, 2013 00:58:19
Msg-id	CA+U5nM+3M7PfwrZs3ivt_oCuF-yTRaaA1Wq=u4GDTjKJxr2Kpg@mail.gmail.com Whole thread
In response to	Re: ANALYZE sampling is too good (Greg Stark <stark@mit.edu>)
Responses	Re: ANALYZE sampling is too good
List	pgsql-hackers

Tree view

On 11 December 2013 00:44, Greg Stark <stark@mit.edu> wrote:
> On Wed, Dec 11, 2013 at 12:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> When we select a block we should read all rows on that block, to help
>> identify the extent of clustering within the data.
>
> So how do you interpret the results of the sample read that way that
> doesn't introduce bias?

Yes, it is not a perfect statistical sample. All sampling is subject
to an error that is data dependent.

I'm happy that we have an option to select this/or not and a default
that maintains current behaviour, since otherwise we might expect some
plan instability.

I would like to be able to

* allow ANALYZE to run faster in some cases
* increase/decrease sample size when it matters
* have the default sample size vary according to the size of the
table, i.e. a proportional sample

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Andres Freund
Date: 11 December 2013, 00:56:28
Subject: Re: [COMMITTERS] pgsql: Add a new reloption, user_catalog_table.

From: Claudio Freire
Date: 11 December 2013, 01:09:24
Subject: Re: Why we are going to have to go DirectIO

Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

Previous

Next