On Wed, 2010-10-20 at 15:15 -0700, Josh Berkus wrote:
> >> Maybe what should be done about this is to have separate sizes for the
> >> MCV list and the histogram, where the MCV list is automatically sized
> >> during ANALYZE.
>
> It's been suggested multiple times that we should base our sample size
> on a % of the table, or at least offer that as an option. I've pointed
> out (with math, which Simon wrote a prototype for) that doing
> block-based sampling instead of random-row sampling would allow us to
> collect, say, 2% of a very large table without more I/O than we're doing
> now.
>
> Nathan Boley has also shown that we could get tremendously better
> estimates without additional sampling if our statistics collector
> recognized common patterns such as normal, linear and geometric
> distributions. Right now our whole stats system assumes a completely
> random distribution.
>
> So, I think we could easily be quite a bit smarter than just increasing
> the MCV. Although that might be a nice start.
References would be nice.
JD
>
> --
> -- Josh Berkus
> PostgreSQL Experts Inc.
> http://www.pgexperts.com
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt