Home > mailing lists

Re: [GENERAL] how to get accurate values in pg_statistic - Mailing list pgsql-performance

From	scott.marlowe
Subject	Re: [GENERAL] how to get accurate values in pg_statistic
Date	September 11, 2003 13:00:37
Msg-id	Pine.LNX.4.33.0309110953090.18742-100000@css120.ihs.com Whole thread Raw
In response to	Re: [GENERAL] how to get accurate values in pg_statistic (continued) (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-performance

Tree view

On Thu, 11 Sep 2003, Tom Lane wrote:

> Christopher Browne <cbbrowne@libertyrms.info> writes:
> > The "right answer" for most use seems likely to involve:
> >  a) Getting an appropriate number of bins (I suspect 10 is a bit
> >     small, but I can't justify that mathematically), and
>
> I suspect that also, but I don't have real evidence for it either.
> We've heard complaints from a number of people for whom it was indeed
> too small ... but that doesn't prove it's not appropriate in the
> majority of cases ...
>
> > Does the sample size change if you increase the number of bins?
>
> Yes, read the comments in backend/commands/analyze.c.
>
> > Do we also need a parameter to control sample size?
>
> Not if the paper I read before writing that code is correct.

I was just talking to a friend of mine who does statistical analysis, and
he suggested a different way of looking at this.  I know little of the
analyze.c, but I'll be reading it some today.

His theory was that we can figure out the number of target bins by
basically running analyze twice with two different random seeds, and
initially setting the bins to 10.

The, compare the variance of the two runs.  If the variance is great,
increase the target by X, and run two again.  repeat, wash, rinse, until
the variance drops below some threshold.

I like the idea, I'm not at all sure if it's practical for Postgresql to
implement it.

pgsql-performance by date:

From: Jeff
Date: 11 September 2003, 12:55:44
Subject: Re: Upgrade Woes

From: Christopher Browne
Date: 11 September 2003, 14:38:25
Subject: Re: [GENERAL] how to get accurate values in pg_statistic

Re: [GENERAL] how to get accurate values in pg_statistic - Mailing list pgsql-performance

Previous

Next