kp <pgsql@pobox.gr> writes:
> The other difference between these two columns (or rather the kind of
> data they contain) is that the column for which postgres *uses* the
> index on contains around 32000 distinct values while the other only
> contains 14 distinct values.
I think it's doing what it's supposed to, then. Using an indexscan to
select 1/14th of a table is a loser --- the seqscan will be faster.
(If you disbelieve this, try timing it both ways. You can set
enable_seqscan to off to force the planner to do it the other way.)
> I have a suspicion that postgres (i'm using 7.1.3) incorrectly assumes
> that a sequence scan is cheaper for the column with the 14 distinct
> values in it and ends up scanning 1.6 million rows.
The only way it's wrong is if the particular value being searched for is
much less than 1/14th of the table. IIRC Postgres 7.1 does not have
statistics about anything beyond the most common value of the column,
and so it's easily misled by nonuniform data distributions. If that's
your situation then an update to 7.2.* or 7.3.* seems called for.
regards, tom lane