Re: Sequential Scan with LIMIT - Mailing list pgsql-performance

From John Meinel
Subject Re: Sequential Scan with LIMIT
Date
Msg-id 417C23DC.1090407@johnmeinel.com
Whole thread Raw
In response to Re: Sequential Scan with LIMIT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:
> John Meinel <john@johnmeinel.com> writes:
>
>>I was looking into another problem, and I found something that surprised
>>me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
>>Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs
>>maybe 100,000 times. Without the LIMIT, this query should definitely do
>>a sequential scan.
>
>
>>But with the LIMIT, doesn't it know that it will return at max 1 value,
>>and thus be able to use the index?
>
>
> But the LIMIT will cut the cost of the seqscan case too.  Given the
> numbers you posit above, about one row in five will have 'myval', so a
> seqscan can reasonably expect to hit the first matching row in the first
> page of the table.  This is still cheaper than doing an index scan
> (which must require reading at least one index page plus at least one
> table page).
>
> The test case you are showing is probably suffering from nonrandom
> placement of this particular data value; which is something that the
> statistics we keep are too crude to detect.
>
>             regards, tom lane

You are correct about non-random placement. I'm a little surprised it
doesn't change with values, then. For instance,

# select count(*) from finst_t where store_id = 52;
13967

Still does a sequential scan for the "select id from..." query.

The only value it does an index query for is 9605 which only has 1 row.

It estimates ~18,000 rows, but that is still < 3% of the total data.

This row corresponds to disk location where files can be found. So when
a storage location fills up, generally a new one is created. This means
that *generally* the numbers will be increasing as you go further in the
table (not guaranteed, as there are multiple locations open at any one
time).

Am I better off in this case just wrapping my query with:

set enable_seqscan to off;
query
set enable_seqscan to on;


There is still the possibility that there is a better way to determine
existence of a value in a column. I was wondering about something like:

SELECT 1 WHERE EXISTS
    (SELECT id FROM finst_t WHERE store_id=52 LIMIT 1);

Though the second part is the same, so it still does the sequential scan.

This isn't critical, I was just trying to understand what's going on.
Thanks for your help.

John
=:->

Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sequential Scan with LIMIT
Next
From: "Rod Dutton"
Date:
Subject: Reindexdb and REINDEX