Using LIMIT changes index used by planner - Mailing list pgsql-performance
From | Sven Willenberger |
---|---|
Subject | Using LIMIT changes index used by planner |
Date | |
Msg-id | 41BD3317.9090507@dmv.com Whole thread Raw |
Responses |
Re: Using LIMIT changes index used by planner
|
List | pgsql-performance |
I have a question regarding a serious performance hit taken when using a LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB of memory. The table in question contains some 25 million rows with a bigserial primary key, orderdate index and a referrer index. The 2 select statements are as follow: A) select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid; B) select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid limit 10; So the only difference is the use of the Limit, which, in theory, should be quicker after custacctid is ordered. Now the analyze results: A) explain select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=904420.55..904468.11 rows=19025 width=44) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..903068.29 rows=19025 width=44) Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Filter: (referrer = 1365) (5 rows) ************************ B) explain select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..33796.50 rows=10 width=44) -> Index Scan using custacct2_pkey on custacct (cost=0.00..64297840.86 rows=19025 width=44) Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) (3 rows) ******************* Notice the huge cost difference in the two plans: 904468 in the one without LIMIT versus 64297840.86 for the index scan on custacct index. Why would the planner switch from using the orderdate index to the custacct index (which is a BIGSERIAL, btw)? I can change that behavior (and speed up the resultant query) by using the following subquery: explain select foo.storelocation, foo.order_number from (select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid) as foo limit 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=904420.55..904420.67 rows=10 width=100) -> Subquery Scan foo (cost=904420.55..904658.36 rows=19025 width=100) -> Sort (cost=904420.55..904468.11 rows=19025 width=44) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..903068.29 rows=19025 width=44) Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Filter: (referrer = 1365) (7 rows) As a side note, when running query A, the query takes 1772.523 ms, when running the subselect version to get the limit, it takes 1415.615 ms. Running option B (with the other index being scanned) takes several minutes (close to 10 minutes!). What am I missing about how the planner views the LIMIT statement? Sven
pgsql-performance by date: