Re: On Distributions In 7.2.1 - Mailing list pgsql-general

From Mark kirkwood
Subject Re: On Distributions In 7.2.1
Date
Msg-id 1020332199.3822.16.camel@spikey.slithery.org
Whole thread Raw
In response to Re: On Distributions In 7.2.1  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: On Distributions In 7.2.1
List pgsql-general
err - some of my puzzlement is explained by this (silly) error :

there are 3000 distinct values, each with 1000 occurrences (*not* 3000)
so the frequency I am "chasing" is 0.00033... ( not 0.001)

so for 10 (quantiles) we see ~ 0.0016 (too big)
for 30 seeing ~ 0.001 (too big but closer)
for 100 seeing ~ 0.00066 (still too big but closer)

so actually its "creeping" closer (rather than oscillating), which seems
a much healthier situation.

However Tom's observation is still valid (in spite of my math) - all the
frequencies are overestimated, rather than the expected "some bigger,
some smaller" sort of thing.

I will do some more ANALYZE runs and see what happens...

(then the log distribution could be fun)

regards

Mark


>On Thu, 2002-05-02 at 17:00, Tom Lane wrote:
> Mark kirkwood <markir@slingshot.co.nz> writes:
> > There is slightly odd behaviour with the frequencies decreasing with
> > increasing number of quantiles (same as 7.2 .. same code here ?).
>
> That does seem curious.  With the inevitable sampling error, you'd
> expect that some values would be sampled at a bit more than their
> true frequency, and others at a bit less.  The oversampled ones would
> be the ones to get into the MCV list.  But what you've got here is
> that even the most-commonly-sampled value showed up at a bit less
> than its true frequency.  Is this repeatable if you do ANALYZE over
> and over?  Maybe it was just a statistical fluke.
>
> > I am wondering if this is caused by my example not having any "real" most
> > common values (they are all as common as each other).
> > I am going to fiddle with my data generation script, skew the
> > distribution and see what effect that has.
>
> Someone else reported some results that made it look like a logarithmic
> frequency distribution was a difficult case for the stats gatherer:
>     http://archives.postgresql.org/pgsql-general/2002-03/msg01300.php
> So please be sure to try that.
>
>             regards, tom lane
>



pgsql-general by date:

Previous
From: Vince Vielhaber
Date:
Subject: Re: Fwd: Postfix Relay Hub SMTP server: errors from
Next
From: Jean-Michel POURE
Date:
Subject: Re: HOWTO - Random character generation for primary key