Home > mailing lists

Re: select distinct uses index scan vs full table scan - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: select distinct uses index scan vs full table scan
Date	December 13, 2011 15:58:16
Msg-id	9562.1323806277@sss.pgh.pa.us Whole thread Raw
In response to	select distinct uses index scan vs full table scan (Jon Nelson <jnelson+pgsql@jamponi.net>)
Responses	Re: select distinct uses index scan vs full table scan
List	pgsql-performance

Tree view

Jon Nelson <jnelson+pgsql@jamponi.net> writes:
> I've got a 5GB table with about 12 million rows.
> Recently, I had to select the distinct values from just one column.
> The planner chose an index scan. The query took almost an hour.
> When I forced index scan off, the query took 90 seconds (full table scan).

Usually, we hear complaints about the opposite.  Are you using
nondefault cost settings?

> The planner estimated 70,000 unique values when, in fact, there are 12
> million (the value for this row is *almost* but not quite unique).
> What's more, despite bumping the statistics on that column up to 1000
> and re-analyzing, the planner now thinks that there are 300,000 unique
> values.

Accurate ndistinct estimates are hard, but that wouldn't have much of
anything to do with this particular choice, AFAICS.

> How can I tell the planner that a given column is much more unique
> than, apparently, it thinks it is?

9.0 and up have ALTER TABLE ... ALTER COLUMN ... SET n_distinct.

            regards, tom lane

pgsql-performance by date:

From: Jon Nelson
Date: 13 December 2011, 14:13:30
Subject: select distinct uses index scan vs full table scan

From: Jon Nelson
Date: 13 December 2011, 16:18:37
Subject: Re: select distinct uses index scan vs full table scan

Re: select distinct uses index scan vs full table scan - Mailing list pgsql-performance

Previous

Next