Thread: "Bug" in statistics for v7.2?

"Bug" in statistics for v7.2?

From
"Marc G. Fournier"
Date:
Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?

But, I have one field that has 5 distinct values:

iwantu=# select distinct(profiles_faith) from iwantu_profiles;profiles_faith
----------------             0             1             2             7             8
(5 rows)

But pg_stats is reporting 1:
   tablename    |        attname         | avg_width | n_distinct
-----------------+------------------------+-----------+------------iwantu_profiles | profiles_faith         |         2
|         1
 

So am I reading n_distinct wrong?



Re: "Bug" in statistics for v7.2?

From
Tom Lane
Date:
"Marc G. Fournier" <scrappy@hub.org> writes:
> Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
> should represent # of distinct values in that row, no?
> But, I have one field that has 5 distinct values:
> But pg_stats is reporting 1:

The pg_stats values are only, um, statistical.  If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones.  AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.
        regards, tom lane


Re: "Bug" in statistics for v7.2?

From
"Marc G. Fournier"
Date:
That explains it ...
profiles_faith | count
----------------+--------             0 | 485938             1 |      2             2 |      6             7 |      2
         8 |     21
 
(5 rows)

Cool, another waste of space *sigh*

thanks ...


On Wed, 13 Feb 2002, Tom Lane wrote:

> "Marc G. Fournier" <scrappy@hub.org> writes:
> > Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
> > should represent # of distinct values in that row, no?
> > But, I have one field that has 5 distinct values:
> > But pg_stats is reporting 1:
>
> The pg_stats values are only, um, statistical.  If 99.9% of the table is
> the same value and the other four values appear only once or twice, it's
> certainly possible for ANALYZE's sample to include only the common value
> and miss the rare ones.  AFAIK that will not break anything; if you have
> an example where the planner seems to be fooled because of this, let's
> see it.
>
>             regards, tom lane
>



Re: "Bug" in statistics for v7.2?

From
"Zeugswetter Andreas SB SD"
Date:
> That explains it ...
> 
>  profiles_faith | count
> ----------------+--------
>               0 | 485938
>               1 |      2
>               2 |      6
>               7 |      2
>               8 |     21
> (5 rows)
> 
> Cool, another waste of space *sigh*
> 
> thanks ...
> 
> 
> On Wed, 13 Feb 2002, Tom Lane wrote:
> 
> > "Marc G. Fournier" <scrappy@hub.org> writes:
> > > Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
> > > should represent # of distinct values in that row, no?
> > > But, I have one field that has 5 distinct values:
> > > But pg_stats is reporting 1:
> >
> > The pg_stats values are only, um, statistical.  If 99.9% of the table is
> > the same value and the other four values appear only once or twice, it's
> > certainly possible for ANALYZE's sample to include only the common value
> > and miss the rare ones.  AFAIK that will not break anything; if you have
> > an example where the planner seems to be fooled because of this, let's
> > see it.

Hmm ? How about select * from xxx where profiles_faith = 7
would estimate all rows, no ? Instead of 2.
That is why I think a bin for "very uncommon" values could also be 
useful sometimes.

Andreas


Re: "Bug" in statistics for v7.2?

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> The pg_stats values are only, um, statistical.  If 99.9% of the table is
> the same value and the other four values appear only once or twice, it's
> certainly possible for ANALYZE's sample to include only the common value
> and miss the rare ones.  AFAIK that will not break anything; if you have
> an example where the planner seems to be fooled because of this, let's
> see it.

> Hmm ? How about select * from xxx where profiles_faith = 7
> would estimate all rows, no ? Instead of 2.

Not in 7.2 ... nor in previous versions AFAIR.

> That is why I think a bin for "very uncommon" values could also be 
> useful sometimes.

Perhaps you should experiment or read the code before opining...
        regards, tom lane