Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Improving N-Distinct estimation by ANALYZE
Date
Msg-id 43BCBC87.3050108@agliodbs.com
Whole thread Raw
In response to Re: Improving N-Distinct estimation by ANALYZE  (Greg Stark <gsstark@mit.edu>)
Responses Re: Improving N-Distinct estimation by ANALYZE  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Greg,

> Only if your sample is random and independent. The existing mechanism tries
> fairly hard to ensure that every record has an equal chance of being selected.
> If you read the entire block and not appropriate samples then you'll introduce
> systematic sampling errors. For example, if you read an entire block you'll be
> biasing towards smaller records.

Did you read any of the papers on block-based sampling?   These sorts of 
issues are specifically addressed in the algorithms.

--Josh


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Improving N-Distinct estimation by ANALYZE
Next
From: Josh Berkus
Date:
Subject: Re: Improving N-Distinct estimation by ANALYZE