Home > mailing lists

Re: On Distributions In 7.2.1 - Mailing list pgsql-general

From	Tom Lane
Subject	Re: On Distributions In 7.2.1
Date	May 2, 2002 10:37:37
Msg-id	7233.1020348710@sss.pgh.pa.us Whole thread Raw
In response to	Re: On Distributions In 7.2.1 (Mark kirkwood <markir@slingshot.co.nz>)
Responses	Tracking down Database growth Re: On Distributions In 7.2.1
List	pgsql-general

Tree view

Mark kirkwood <markir@slingshot.co.nz> writes:
> However Tom's observation is still valid (in spite of my math) - all the
> frequencies are overestimated, rather than the expected "some bigger,
> some smaller" sort of thing.

No, that makes sense.  The values that get into the most-common-values
list are only going to be ones that are significantly more common (in
the sample) than the estimated average frequency.  So if the thing makes
a good estimate of the average frequency, you'll only see upside
outliers in the MCV list.  The relevant logic is in analyze.c:

        /*
         * Decide how many values are worth storing as most-common values.
         * If we are able to generate a complete MCV list (all the values
         * in the sample will fit, and we think these are all the ones in
         * the table), then do so.    Otherwise, store only those values
         * that are significantly more common than the (estimated)
         * average. We set the threshold rather arbitrarily at 25% more
         * than average, with at least 2 instances in the sample.  Also,
         * we won't suppress values that have a frequency of at least 1/K
         * where K is the intended number of histogram bins; such values
         * might otherwise cause us to emit duplicate histogram bin
         * boundaries.
         */

            regards, tom lane

pgsql-general by date:

From: "Christopher Kings-Lynne"
Date: 02 May 2002, 10:16:10
Subject: PureFTPd

From: Tom Lane
Date: 02 May 2002, 11:06:46
Subject: Re: Using views and MS access via odbc

Re: On Distributions In 7.2.1 - Mailing list pgsql-general

Previous

Next