Home > mailing lists

Sequential Scan with LIMIT - Mailing list pgsql-performance

From	John Meinel
Subject	Sequential Scan with LIMIT
Date	October 24, 2004 20:28:00
Msg-id	417C023F.1080502@johnmeinel.com Whole thread Raw
Responses	Re: Sequential Scan with LIMIT Re: Sequential Scan with LIMIT
List	pgsql-performance

Tree view

I was looking into another problem, and I found something that surprised
me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs
maybe 100,000 times. Without the LIMIT, this query should definitely do
a sequential scan.

But with the LIMIT, doesn't it know that it will return at max 1 value,
and thus be able to use the index?

It seems to be doing the LIMIT too late.

The real purpose of this query is to check to see if a value exists in
the column, so there might be a better way of doing it. Here is the demo
info:

# select count(*) from finst_t;
542315

# select count(*) from finst_t where store_id = 539960;
85076

# explain analyze select id from finst_t where store_id = 539960 limit 1;
                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..0.13 rows=1 width=4) (actual time=860.000..860.000
rows=1 loops=1)
     ->  Seq Scan on finst_t  (cost=0.00..11884.94 rows=88217 width=4)
(actual time=860.000..860.000 rows=1 loops=1)
           Filter: (store_id = 539960)
   Total runtime: 860.000 ms

Notice that the "actual rows=1", meaning it is aware of the limit as it
is going through the table. But for some reason the planner thinks it is
going to return 88,217 rows. (This is close to the reality of 85076 if
it actually had to find all of the rows).

Now, if I do a select on a value that *does* only have 1 value, it works
fine:

# explain analyze select id from finst_t where store_id = 9605 limit 1;
                                                               QUERY PLAN



------------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000
rows=1 loops=1)
     ->  Index Scan using finst_t_store_id_idx on finst_t
(cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
           Index Cond: (store_id = 9605)
   Total runtime: 0.000 ms

And 1 further thing, I *can* force it to do a fast index scan if I
disable sequential scanning.

# set enable_seqscan to off;
# explain analyze select id from finst_t where store_id = 539960 limit 1;
                                                                   QUERY
PLAN



------------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..1.59 rows=1 width=4) (actual time=0.000..0.000
rows=1 loops=1)
     ->  Index Scan using finst_t_store_id_idx on finst_t
(cost=0.00..140417.22 rows=88217 width=4) (actual time=0.000..0.000
rows=1 loops=1)
           Index Cond: (store_id = 539960)
   Total runtime: 0.000 ms

Could being aware of LIMIT be added to the planner? Is there a better
way to check for existence?

John
=:->

PS> I'm using postgres 8.0-beta3 on win32 (the latest installer).

Attachment

signature.asc

pgsql-performance by date:

From: John Meinel
Date: 24 October 2004, 20:11:53
Subject: Re: Queries slow using stored procedures

From: Tom Lane
Date: 24 October 2004, 21:11:58
Subject: Re: Sequential Scan with LIMIT

Sequential Scan with LIMIT - Mailing list pgsql-performance

Attachment

Previous

Next