Here's a "corner case" that might interest someone. It tripped up one of our programmers.
We have a table with > 10 million rows. The ID column is indexed, the table has been vacuum/analyzed. Compare these
twoqueries:
select * from tbl where id >= 10000000 limit 1;
select * from tbl where id >= 10000000 order by id limit 1;
The first takes 4 seconds, and uses a full table scan. The second takes 32 msec and uses the index.
Details are below.
I understand why the planner makes the choices it does -- the "id > 10000000" isn't very selective and under normal
circumstancesa full table scan is probably the right choice. But the "limit 1" apparently doesn't alter the planner's
strategyat all. We were surprised by this.
Adding the "order by" was a simple solution.
Craig
pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------
Limit (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 rows=1 loops=1)
-> Seq Scan on url_queue (cost=0.00..391254.35 rows=606176 width=108) (actual time=4036.101..4036.101 rows=1
loops=1)
Filter: (priority >= 10000000)
Total runtime: 4036.200 ms
(4 rows)
pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 order by priority limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------
Limit (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 rows=1 loops=1)
-> Index Scan using url_queue_priority on url_queue (cost=0.00..1440200.41 rows=606176 width=112) (actual
time=32.434..32.434rows=1 loops=1)
Index Cond: (priority >= 10000000)
Total runtime: 32.566 ms