Home > mailing lists

Re: On Distributions In 7.2.1 - Mailing list pgsql-general

From	Tom Lane
Subject	Re: On Distributions In 7.2.1
Date	May 2, 2002 04:01:25
Msg-id	4886.1020315651@sss.pgh.pa.us Whole thread Raw
In response to	On Distributions In 7.2.1 (Mark kirkwood <markir@slingshot.co.nz>)
Responses	Re: On Distributions In 7.2.1
List	pgsql-general

Tree view

Mark kirkwood <markir@slingshot.co.nz> writes:
> There is slightly odd behaviour with the frequencies decreasing with
> increasing number of quantiles (same as 7.2 .. same code here ?).

That does seem curious.  With the inevitable sampling error, you'd
expect that some values would be sampled at a bit more than their
true frequency, and others at a bit less.  The oversampled ones would
be the ones to get into the MCV list.  But what you've got here is
that even the most-commonly-sampled value showed up at a bit less
than its true frequency.  Is this repeatable if you do ANALYZE over
and over?  Maybe it was just a statistical fluke.

> I am wondering if this is caused by my example not having any "real" most
> common values (they are all as common as each other).
> I am going to fiddle with my data generation script, skew the
> distribution and see what effect that has.

Someone else reported some results that made it look like a logarithmic
frequency distribution was a difficult case for the stats gatherer:
    http://archives.postgresql.org/pgsql-general/2002-03/msg01300.php
So please be sure to try that.

            regards, tom lane

pgsql-general by date:

From: Mark kirkwood
Date: 02 May 2002, 03:22:36
Subject: On Distributions In 7.2.1

From: Hiroshi Inoue
Date: 02 May 2002, 04:46:49
Subject: Re: Using views and MS access via odbc

Re: On Distributions In 7.2.1 - Mailing list pgsql-general

Previous

Next