Sequential Scan with LIMIT - Mailing list pgsql-performance

From John Meinel
Subject Sequential Scan with LIMIT
Date
Msg-id 417C023F.1080502@johnmeinel.com
Whole thread Raw
Responses Re: Sequential Scan with LIMIT
Re: Sequential Scan with LIMIT
List pgsql-performance
I was looking into another problem, and I found something that surprised
me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs
maybe 100,000 times. Without the LIMIT, this query should definitely do
a sequential scan.

But with the LIMIT, doesn't it know that it will return at max 1 value,
and thus be able to use the index?

It seems to be doing the LIMIT too late.

The real purpose of this query is to check to see if a value exists in
the column, so there might be a better way of doing it. Here is the demo
info:

# select count(*) from finst_t;
542315

# select count(*) from finst_t where store_id = 539960;
85076

# explain analyze select id from finst_t where store_id = 539960 limit 1;
                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..0.13 rows=1 width=4) (actual time=860.000..860.000
rows=1 loops=1)
     ->  Seq Scan on finst_t  (cost=0.00..11884.94 rows=88217 width=4)
(actual time=860.000..860.000 rows=1 loops=1)
           Filter: (store_id = 539960)
   Total runtime: 860.000 ms

Notice that the "actual rows=1", meaning it is aware of the limit as it
is going through the table. But for some reason the planner thinks it is
going to return 88,217 rows. (This is close to the reality of 85076 if
it actually had to find all of the rows).

Now, if I do a select on a value that *does* only have 1 value, it works
fine:

# explain analyze select id from finst_t where store_id = 9605 limit 1;
                                                               QUERY PLAN



------------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000
rows=1 loops=1)
     ->  Index Scan using finst_t_store_id_idx on finst_t
(cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
           Index Cond: (store_id = 9605)
   Total runtime: 0.000 ms

And 1 further thing, I *can* force it to do a fast index scan if I
disable sequential scanning.

# set enable_seqscan to off;
# explain analyze select id from finst_t where store_id = 539960 limit 1;
                                                                   QUERY
PLAN



------------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..1.59 rows=1 width=4) (actual time=0.000..0.000
rows=1 loops=1)
     ->  Index Scan using finst_t_store_id_idx on finst_t
(cost=0.00..140417.22 rows=88217 width=4) (actual time=0.000..0.000
rows=1 loops=1)
           Index Cond: (store_id = 539960)
   Total runtime: 0.000 ms

Could being aware of LIMIT be added to the planner? Is there a better
way to check for existence?

John
=:->

PS> I'm using postgres 8.0-beta3 on win32 (the latest installer).


Attachment

pgsql-performance by date:

Previous
From: John Meinel
Date:
Subject: Re: Queries slow using stored procedures
Next
From: Tom Lane
Date:
Subject: Re: Sequential Scan with LIMIT