Optimizer: limit not taken into account - Mailing list pgsql-performance

From Craig A. James
Subject Optimizer: limit not taken into account
Date
Msg-id 446B474C.2050803@modgraph-usa.com
Whole thread Raw
In response to Re: Pgsql (and mysql) benchmark on T2000/Solaris and some  (Arjen van der Meijden <acmmailing@tweakers.net>)
Responses Re: Optimizer: limit not taken into account
Re: Optimizer: limit not taken into account
Re: Optimizer: limit not taken into account
List pgsql-performance
Here's a "corner case" that might interest someone.  It tripped up one of our programmers.

We have a table with > 10 million rows.  The ID column is indexed, the table has been vacuum/analyzed.  Compare these
twoqueries: 

   select * from tbl where id >= 10000000 limit 1;
   select * from tbl where id >= 10000000 order by id limit 1;

The first takes 4 seconds, and uses a full table scan.  The second takes 32 msec and uses the index.
Details are below.

I understand why the planner makes the choices it does -- the "id > 10000000" isn't very selective and under normal
circumstancesa full table scan is probably the right choice.  But the "limit 1" apparently doesn't alter the planner's
strategyat all.  We were surprised by this. 

Adding the "order by" was a simple solution.

Craig



pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 limit 1;
                       QUERY PLAN
------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 rows=1 loops=1)
   ->  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108) (actual time=4036.101..4036.101 rows=1
loops=1)
         Filter: (priority >= 10000000)
 Total runtime: 4036.200 ms
(4 rows)

pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 order by priority limit 1;
                       QUERY PLAN
--------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 rows=1 loops=1)
   ->  Index Scan using url_queue_priority on url_queue  (cost=0.00..1440200.41 rows=606176 width=112) (actual
time=32.434..32.434rows=1 loops=1) 
         Index Cond: (priority >= 10000000)
 Total runtime: 32.566 ms

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: IMMUTABLE?
Next
From: john_oshea@wordbank.com
Date:
Subject: Re: SQL CPU time usage