John Meinel <john@johnmeinel.com> writes:
> I was looking into another problem, and I found something that surprised
> me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
> Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs
> maybe 100,000 times. Without the LIMIT, this query should definitely do
> a sequential scan.
> But with the LIMIT, doesn't it know that it will return at max 1 value,
> and thus be able to use the index?
But the LIMIT will cut the cost of the seqscan case too. Given the
numbers you posit above, about one row in five will have 'myval', so a
seqscan can reasonably expect to hit the first matching row in the first
page of the table. This is still cheaper than doing an index scan
(which must require reading at least one index page plus at least one
table page).
The test case you are showing is probably suffering from nonrandom
placement of this particular data value; which is something that the
statistics we keep are too crude to detect.
regards, tom lane