Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

From Gordon A. Runkle
Subject Re: Odd statistics behaviour in 7.2
Date
Msg-id 1013620076.21157.92.camel@spiff.runkleinc.com
Whole thread Raw
In response to Re: Odd statistics behaviour in 7.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2002-02-13 at 11:21, Tom Lane wrote:
> Gordon Runkle <gar@integrated-dynamics.com> writes:
> > I have a table with 1,066,673 rows.  The column I'm interested in has
> > this distribution of values:
> 
> >  tdnr_ct |   ct   
> > ---------+--------
> >       16 |      1
> >        4 |      1
> >        3 |     58
> >        2 |  68904
> >        1 | 928171
> 
> > This means that 'ct' records have 'tdnr_ct' duplicate values.
> 
> I'm confused.  You mean that there is one value that appears 16 times,
> one that appears 4 times, etc etc, and 928171 values that appear only
> once?

Yes, exactly.  I could have stated that more clearly, but probably not
at 0-dark-thirty...  ;-)


> > Under v7.2, it only sometimes does.  I've looked at the statistics,
> > thanks to what I learned from Tom and Marc's discussion, and I see that
> > sometimes when I VACUUM ANALYZE the table, 'n_distinct' for this column
> > gets a value of '-1' (desireable), and other times a value such as 59483
> > or something.
> 
> This seems quite bizarre; given those stats it's hard to see how you
> could get anything but -1 or close to it, even with a very unlucky
> statistical sampling.  Don't suppose you'd want to trace through the
> ANALYZE code and find out why it's computing a bad value?

I can do that.  I need to build a version of PostgreSQL with debug
enabled, right?


> Alternatively, if you could send me a dump of just the ct column,
> I could try to reproduce the behavior here.  (CREATE TABLE foo AS
> SELECT ct FROM yourtab and then pg_dump -t foo should do it.)

I can do that too.  It's pretty large, I'll email you separately
with a URL from which you can retrieve it.  Thanks!

Thomas suggested in his reply that perhaps the table isn't randomly
populated, but if it is storing the data in the order in which it was
COPYed in, it's pretty random.  The column's values generally trend
upward, but pretty randomly.  We load 6000-10000 new records per week.

Thanks again,

Gordon.
-- 
"Far and away the best prize that life has to offer is the chance to work hard at work worth doing."      -- Theodore
Roosevelt




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: FATAL 2: XLogFlush: request is not satisfied
Next
From: Kovacs Zoltan
Date:
Subject: Re: alter table drop column status