Re: Bogus ANALYZE results for an otherwise-unique column with many nulls - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
Date
Msg-id CAEZATCVQ9AGw1thJiViYWHXXZ46_p6FfDPBeyTC9BSNDz+6L6g@mail.gmail.com
Whole thread Raw
In response to Re: Bogus ANALYZE results for an otherwise-unique column with many nulls  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
List pgsql-hackers
On 5 August 2016 at 21:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> OK, thanks.  What shall we do about Andreas' request to back-patch this?
> I'm personally willing to do it, but there is the old bugaboo of "maybe
> it will destabilize a plan that someone is happy with".
>

My inclination would be to back-patch it because arguably it's a
bug-fix -- at the very least the old behaviour didn't match the docs
for stadistinct:
     The number of distinct nonnull data values in the column.     A value greater than zero is the actual number of
distinctvalues.     A value less than zero is the negative of a multiplier for the number     of rows in the table; for
example,a column in which values appear about     twice on the average could be represented by
<structfield>stadistinct</>= -0.5.
 

Additionally, I think that example is misleading because it's only
really true if there are no null values in the column. Perhaps it
would help to have a more explicit example to illustrate how nulls
affect stadistinct, for example:
     ... for example, a column in which about 80% of the values are nonnull     and each nonnull value appears about
twiceon average could be     represented by <structfield>stadistinct</> = -0.4.
 

Regards,
Dean



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Consolidate 'unique array values' logic into a reusable function?
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: Bogus ANALYZE results for an otherwise-unique column with many nulls