Home > mailing lists

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

From	Andrew Dunstan
Subject	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date	April 29, 2005 03:03:34
Msg-id	426F9703.4010108@dunslane.net Whole thread Raw
In response to	Re: [HACKERS] Bad n_distinct estimation; hacks suggested? (Mischa Sandberg <mischa.sandberg@telus.net>)
Responses	Re: Distinct-Sampling (Gibbons paper) for Postgres
List	pgsql-performance

Tree view

Mischa Sandberg wrote:

>
>Perhaps I can save you some time (yes, I have a degree in Math). If I
>understand correctly, you're trying extrapolate from the correlation
>between a tiny sample and a larger sample. Introducing the tiny sample
>into any decision can only produce a less accurate result than just
>taking the larger sample on its own; GIGO. Whether they are consistent
>with one another has no relationship to whether the larger sample
>correlates with the whole population. You can think of the tiny sample
>like "anecdotal" evidence for wonderdrugs.
>
>
>

Ok, good point.

I'm with Tom though in being very wary of solutions that require even
one-off whole table scans. Maybe we need an additional per-table
statistics setting which could specify the sample size, either as an
absolute number or as a percentage of the table. It certainly seems that
where D/N ~ 0.3, the estimates on very large tables at least are way way
out.

Or maybe we need to support more than one estimation method.

Or both ;-)

cheers

andrew

pgsql-performance by date:

From: Enrico Weigelt
Date: 29 April 2005, 02:35:15
Subject: index on different types

From: Michael Fuhr
Date: 29 April 2005, 03:28:51
Subject: Re: index on different types

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

Previous

Next