Home > mailing lists

Re: [PERFORM] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date	April 24, 2005 01:49:20
Msg-id	25382.1114318139@sss.pgh.pa.us Whole thread Raw
In response to	Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Josh Berkus <josh@agliodbs.com>)
Responses	Re: [PERFORM] Bad n_distinct estimation; hacks suggested? Re: [PERFORM] Bad n_distinct estimation; hacks suggested? Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
List	pgsql-hackers

Tree view

Josh Berkus <josh@agliodbs.com> writes:
> Overall, our formula is inherently conservative of n_distinct.   That is, I
> believe that it is actually computing the *smallest* number of distinct
> values which would reasonably produce the given sample, rather than the
> *median* one.  This is contrary to the notes in analyze.c, which seem to
> think that we're *overestimating* n_distinct.

Well, the notes are there because the early tests I ran on that formula
did show it overestimating n_distinct more often than not.  Greg is
correct that this is inherently a hard problem :-(

I have nothing against adopting a different formula, if you can find
something with a comparable amount of math behind it ... but I fear
it'd only shift the failure cases around.

            regards, tom lane

pgsql-hackers by date:

From: Paul Tillotson
Date: 23 April 2005, 22:56:10
Subject: Re: possible TODO: read-only tables, select from indexes

From: Tom Lane
Date: 24 April 2005, 02:52:24
Subject: Re: Wierd performance issue with 8.1cvs

Re: [PERFORM] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-hackers

Previous

Next