More On 7.2 Distributions - Estimates For Number Distinct < 0 - Mailing list pgsql-general

From Mark kirkwood
Subject More On 7.2 Distributions - Estimates For Number Distinct < 0
Date
Msg-id 01102922183801.04563@spikey.slithery.org
Whole thread Raw
In response to Re: On Distributions In 7.2 (Longish)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: More On 7.2 Distributions - Estimates For Number Distinct < 0
List pgsql-general
In the process of attempting to understand the data in pg_stats, I created a
(very) simple example :

CREATE TABLE test(id integer);

INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(1);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(2);
INSERT INTO test VALUES(3);
INSERT INTO test VALUES(4);
INSERT INTO test VALUES(5);

ANALYZE test;

SELECT * FROM pg_stats WHERE tablename='test';

tablename        test
attname          id
null_frac        0
avg_width        4
n_distinct       -0.5
most_common_vals {"1","2"}
most_common_vals {"0.4","0.3"}
histogram_bounds {"3","4","5"}
correlation      1

everything looks good except for n_distinct ( its negative - should be 5)
(I wasn't too worried about avg_width )

Using fairly crude tracing (adding elog calls) in
src/backend/commands/analyze.c :

DEBUG:  Analyzing test
DEBUG:    Analyze : beginning a column
DEBUG:      Have 10 total values in relation (totalrows)
DEBUG:      Have 10 values in relation       (numrows)
DEBUG:      Have 10 values in sample         (values_cnt)
DEBUG:      Have 5 distinct values in sample (ndistinct)
DEBUG:      Have 2 multiple values in sample (nmultiple)
DEBUG:      calc 5.000000 distinct via Chaudhuri rule
DEBUG:      calc -0.500000 distinct via >10 percent rowcount rule

So we had the correct answer before applying the 10 percent rowcount code.

This 10 percent rowcount code being line 1340 or thereabouts :

if (stats->stadistinct > 0.1 * totalrows)
{
stats->stadistinct = -(stats->stadistinct / totalrows);
}

My example is pretty contrived, but I wonder if I have "stumbled" on a bug
here.

regards

Mark


pgsql-general by date:

Previous
From: Mark kirkwood
Date:
Subject: Re: On Distributions In 7.2
Next
From: Jean-Michel POURE
Date:
Subject: Re: resend: Chinese sort order problem