Home > mailing lists

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Improving N-Distinct estimation by ANALYZE
Date	January 5, 2006 11:02:18
Msg-id	87irsy4t1o.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: Improving N-Distinct estimation by ANALYZE (Josh Berkus <josh@agliodbs.com>)
Responses	Re: Improving N-Distinct estimation by ANALYZE
List	pgsql-hackers

Tree view

Josh Berkus <josh@agliodbs.com> writes:

> > Only if your sample is random and independent. The existing mechanism tries
> > fairly hard to ensure that every record has an equal chance of being selected.
> > If you read the entire block and not appropriate samples then you'll introduce
> > systematic sampling errors. For example, if you read an entire block you'll be
> > biasing towards smaller records.
> 
> Did you read any of the papers on block-based sampling?   These sorts of issues
> are specifically addressed in the algorithms.

We *currently* use a block based sampling algorithm that addresses this issue
by taking care to select rows within the selected blocks in an unbiased way.
You were proposing reading *all* the records from the selected blocks, which
throws away that feature.

-- 
greg

pgsql-hackers by date:

From: Stephen Frost
Date: 05 January 2006, 10:40:53
Subject: Re: [PATCHES] TRUNCATE, VACUUM, ANALYZE privileges

From: Greg Stark
Date: 05 January 2006, 11:12:38
Subject: Re: Improving N-Distinct estimation by ANALYZE

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

Previous

Next