(Note to hackers: Ole sent me a 1000-row test case off list.)
> oletest=> explain select * from av_parts where partnumber = '123456';
> NOTICE: QUERY PLAN:
>
> Index Scan using av_parts_partnumber_index on av_parts (cost=2.04 rows=1
> width=124)
>
> EXPLAIN
> oletest=> explain select * from av_parts where nsn = '123456';
> NOTICE: QUERY PLAN:
>
> Seq Scan on av_parts (cost=48.00 rows=995 width=124)
OK, I confirm seeing this behavior. I don't have time to dig into
the code right now, but will do so when I get a chance.
It looks like the highly skewed distribution of nsn values (what you
sent me had 997 '' entries, only 3 non-empty strings) is confusing the
selectivity estimation code somehow, such that the system thinks that
the query is going to match most of the rows. Notice it is estimating
995 returned rows for the nsn select! Under these circumstances it will
prefer a sequential scan, since the more-expensive-per-tuple index scan
doesn't look like it will be able to avoid reading most of the table.
That logic is OK, it's the 0.995 selectivity estimate that's wrong...
Exactly why the selectivity estimate is so ludicrous remains to
be seen, but I know that there are some bogosities in that code
(search the pghackers archives for "selectivity" for more info).
I am hoping to do some extensive revisions of the selectivity code
for 6.6 or 6.7. This particular problem might be easily fixable,
or it might have to wait for the rewrite.
Thanks for the test case!
regards, tom lane