Home > mailing lists

Re: Expected accuracy of planner statistics - Mailing list pgsql-general

From	Tom Lane
Subject	Re: Expected accuracy of planner statistics
Date	September 29, 2006 12:53:44
Msg-id	16723.1159545168@sss.pgh.pa.us Whole thread Raw
In response to	Re: Expected accuracy of planner statistics ("John D. Burger" <john@mitre.org>)
Responses	Re: Expected accuracy of planner statistics
List	pgsql-general

Tree view

"John D. Burger" <john@mitre.org> writes:
> Tom Lane wrote:
>> IIRC I picked an equation out of the literature partially on the basis
>> of it being simple and fairly cheap to compute...

> I'm very curious about this - can you recall where you got this, or
> at least point me to where in the code this happens?

src/backend/commands/analyze.c, around line 1930 as of CVS HEAD:

            /*----------
             * Estimate the number of distinct values using the estimator
             * proposed by Haas and Stokes in IBM Research Report RJ 10025:
             *        n*d / (n - f1 + f1*n/N)
             * where f1 is the number of distinct values that occurred
             * exactly once in our sample of n rows (from a total of N),
             * and d is the total number of distinct values in the sample.
             * This is their Duj1 estimator; the other estimators they
             * recommend are considerably more complex, and are numerically
             * very unstable when n is much smaller than N.
             *
             * Overwidth values are assumed to have been distinct.
             *----------
             */

            regards, tom lane

pgsql-general by date:

From: km
Date: 29 September 2006, 12:41:28
Subject: Re: 8.1.4 compile problem

From: Tom Lane
Date: 29 September 2006, 13:15:39
Subject: Array assignment behavior (was Re: [ADMIN] Stored procedure array limits)

Re: Expected accuracy of planner statistics - Mailing list pgsql-general

Previous

Next