Re: Problem with n_distinct being consistently inaccurate. - Mailing list pgsql-admin

From Tom Lane
Subject Re: Problem with n_distinct being consistently inaccurate.
Date
Msg-id 20330.1064338636@sss.pgh.pa.us
Whole thread Raw
In response to Problem with n_distinct being consistently inaccurate.  ("Nick Fankhauser" <nickf@ontko.com>)
Responses Re: Problem with n_distinct being consistently inaccurate.  ("Nick Fankhauser" <nickf@ontko.com>)
List pgsql-admin
"Nick Fankhauser" <nickf@ontko.com> writes:
> So the question is- how can I get a better estimate of n_distinct from
> analyze?
> If I alter the stats target as high as it will go, I get closer, but it
> still shows the index to be about 1/2 as selective as it actually is:

AFAIK, estimating number of distinct values from a small sample is
inherently an ill-conditioned problem.   You should probably be happy
it can get within a factor of 2 ;-).

You could try sticking the correct n_distinct into pg_statistic by hand
just to see if it really does change the plan, but I'd like to think
that getting within a factor of 2 is good enough.  If it's not, then we
probably ought to look for ways to avoid using number-of-distinct-values
statistics altogether, because we'll seldom have a hard value for it.

            regards, tom lane

pgsql-admin by date:

Previous
From: "Priya G"
Date:
Subject: Re: help needed!!!
Next
From: "Nick Fankhauser"
Date:
Subject: Re: Problem with n_distinct being consistently inaccurate.