> Tom Lane wrote:
> > The thing that jumps out at me from this example is the much larger
> > estimate of returned rows in the second case. The planner is clearly
>
> Good catch! There were 296 possible issues the table. One had 86,544
> articles associated with it. The next highest was 5,949. Then the
> numbers drop to 630, 506, 412, 184 and then the rest are all under 62.
> Out of curiosity, how does vacuum decide on the large estimate?
>
> The maximum is 86,544.
> The average row return for ixissue = x is 3412.
> The median is 25.
> The mode is 25.
>
> ixissue is the result of a sequence.
>
> Thanks for the heads up on this...
Here is the relevent comment from vacuum.c. It is not perfect, but was
the best thing I could think of.
---------------------------------------------------------------------------
/** vc_attrstats() -- compute column statistics used by the optimzer** We compute the column min, max, null and
non-nullcounts.* Plus we attempt to find the count of the value that occurs most* frequently in each column. These
figuresare used to compute * the selectivity of the column.** We use a three-bucked cache to get the most frequent
item.* The 'guess' buckets count hits. A cache miss causes guess1* to get the most hit 'guess' item in the most
recentcycle, and* the new item goes into guess2. Whenever the total count of hits* of a 'guess' entry is larger than
'best','guess' becomes 'best'.** This method works perfectly for columns with unique values, and columns* with only
twounique values, plus nulls.** It becomes less perfect as the number of unique values increases and* their
distributionin the table becomes more random.**/
-- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026