Andrei Popescu-Belis <Andrei.Popescu-Belis@issco.unige.ch> writes:
> Using the first column as an index seems quite counter-
> productive, as some values correspond to only one entry, and
> others to tens of thousands.
Mmm, that's the source of the problem. Currently the planner's estimate
of the selectivity of "C1 = 17000" is driven off the most common value
in the column. If you were asking for one of the values with tens of
thousands of hits, then indeed a sequential scan would be the way to go.
The planner has no idea that the value you want has only a few hits.
(Notice that the estimated result row count has nothing to do with
reality :-()
The long-term answer for this is to maintain better statistics, so that
we can know something more about the distribution of values in the
column. I've heard of many examples where there are a small number
of very frequent entries, with everything else much less frequent.
If we stored the top ten or so entries, not just one, we'd be able to
realize that a value that's none of the top ten must have a low
frequency.
> Is it possible to *force* the planner to always choose
> the Index Scan ?
You could try experimenting with SET enable_seqscan TO OFF.
Be wary that you don't shoot yourself in the foot for other
queries, however.
regards, tom lane