Home > mailing lists

Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

From	Josh Berkus
Subject	Bad n_distinct estimation; hacks suggested?
Date	April 19, 2005 16:04:13
Msg-id	200504191209.05181.josh@agliodbs.com Whole thread Raw
Responses	Re: Bad n_distinct estimation; hacks suggested?
List	pgsql-performance

Tree view

Folks,

Params:  PostgreSQL 8.0.1 on Solaris 10
Statistics = 500
(tablenames have been changed to protect NDA)

e1=# select tablename, null_frac, correlation, n_distinct from pg_stats where
tablename = 'clickstream1' andattname = 'session_id';
      tablename       | null_frac | correlation | n_distinct
----------------------+-----------+-------------+------------
 clickstream1         |         0 |    0.412034 |     378174
(2 rows)

e1=# select count(distinct session_id) from clickstream1;
  count
---------
 3174813

As you can see, n_distinct estimation is off by a factor of 10x and it's
causing query planning problems.   Any suggested hacks to improve the
histogram on this?

(BTW, increasing the stats to 1000 only doubles n_distinct, and doesn't solve
the problem)

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

From: Tom Lane
Date: 19 April 2005, 15:34:49
Subject: Re: Question on REINDEX

From: "Dave Held"
Date: 19 April 2005, 17:02:05
Subject: Re: Bad n_distinct estimation; hacks suggested?

Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

Previous

Next