Sequential Scan with LIMIT - Mailing list pgsql-performance
From | John Meinel |
---|---|
Subject | Sequential Scan with LIMIT |
Date | |
Msg-id | 417C023F.1080502@johnmeinel.com Whole thread Raw |
Responses |
Re: Sequential Scan with LIMIT
Re: Sequential Scan with LIMIT |
List | pgsql-performance |
I was looking into another problem, and I found something that surprised me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.". Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs maybe 100,000 times. Without the LIMIT, this query should definitely do a sequential scan. But with the LIMIT, doesn't it know that it will return at max 1 value, and thus be able to use the index? It seems to be doing the LIMIT too late. The real purpose of this query is to check to see if a value exists in the column, so there might be a better way of doing it. Here is the demo info: # select count(*) from finst_t; 542315 # select count(*) from finst_t where store_id = 539960; 85076 # explain analyze select id from finst_t where store_id = 539960 limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.13 rows=1 width=4) (actual time=860.000..860.000 rows=1 loops=1) -> Seq Scan on finst_t (cost=0.00..11884.94 rows=88217 width=4) (actual time=860.000..860.000 rows=1 loops=1) Filter: (store_id = 539960) Total runtime: 860.000 ms Notice that the "actual rows=1", meaning it is aware of the limit as it is going through the table. But for some reason the planner thinks it is going to return 88,217 rows. (This is close to the reality of 85076 if it actually had to find all of the rows). Now, if I do a select on a value that *does* only have 1 value, it works fine: # explain analyze select id from finst_t where store_id = 9605 limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) -> Index Scan using finst_t_store_id_idx on finst_t (cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (store_id = 9605) Total runtime: 0.000 ms And 1 further thing, I *can* force it to do a fast index scan if I disable sequential scanning. # set enable_seqscan to off; # explain analyze select id from finst_t where store_id = 539960 limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..1.59 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) -> Index Scan using finst_t_store_id_idx on finst_t (cost=0.00..140417.22 rows=88217 width=4) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (store_id = 539960) Total runtime: 0.000 ms Could being aware of LIMIT be added to the planner? Is there a better way to check for existence? John =:-> PS> I'm using postgres 8.0-beta3 on win32 (the latest installer).
Attachment
pgsql-performance by date: