Re: Sequential Scan with LIMIT - Mailing list pgsql-performance

From Tom Lane
Subject Re: Sequential Scan with LIMIT
Date
Msg-id 17689.1098648713@sss.pgh.pa.us
Whole thread Raw
In response to Sequential Scan with LIMIT  (John Meinel <john@johnmeinel.com>)
Responses Re: Sequential Scan with LIMIT  (John Meinel <john@johnmeinel.com>)
Re: Sequential Scan with LIMIT  ("Jim C. Nasby" <decibel@decibel.org>)
List pgsql-performance
John Meinel <john@johnmeinel.com> writes:
> I was looking into another problem, and I found something that surprised
> me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
> Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs
> maybe 100,000 times. Without the LIMIT, this query should definitely do
> a sequential scan.

> But with the LIMIT, doesn't it know that it will return at max 1 value,
> and thus be able to use the index?

But the LIMIT will cut the cost of the seqscan case too.  Given the
numbers you posit above, about one row in five will have 'myval', so a
seqscan can reasonably expect to hit the first matching row in the first
page of the table.  This is still cheaper than doing an index scan
(which must require reading at least one index page plus at least one
table page).

The test case you are showing is probably suffering from nonrandom
placement of this particular data value; which is something that the
statistics we keep are too crude to detect.

            regards, tom lane

pgsql-performance by date:

Previous
From: John Meinel
Date:
Subject: Sequential Scan with LIMIT
Next
From: John Meinel
Date:
Subject: Re: Sequential Scan with LIMIT