Using LIMIT changes index used by planner - Mailing list pgsql-performance

From Sven Willenberger
Subject Using LIMIT changes index used by planner
Date
Msg-id 41BD3317.9090507@dmv.com
Whole thread Raw
Responses Re: Using LIMIT changes index used by planner
List pgsql-performance
I have a question regarding a serious performance hit taken when using a
LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB
of memory. The table in question contains some 25 million rows with a
bigserial primary key, orderdate index and a referrer index. The 2
select statements are as follow:

A) select storelocation,order_number from custacct where referrer = 1365
  and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid;

B) select storelocation,order_number from custacct where referrer = 1365
  and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid limit 10;

So the only difference is the use of the Limit, which, in theory, should
be quicker after custacctid is ordered.

Now the analyze results:

A) explain select storelocation,order_number from custacct where
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07
12:00:00' order by custacctid;

        QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=904420.55..904468.11 rows=19025 width=44)
    Sort Key: custacctid
    ->  Index Scan using orderdate_idx on custacct
(cost=0.00..903068.29 rows=19025 width=44)
          Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp
without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp
without time zone))
          Filter: (referrer = 1365)
(5 rows)

************************

B) explain select storelocation,order_number from custacct where
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07
12:00:00' order by custacctid limit 10;

                  QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..33796.50 rows=10 width=44)
    ->  Index Scan using custacct2_pkey on custacct
(cost=0.00..64297840.86 rows=19025 width=44)
          Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
12:00:00'::timestamp without time zone))
(3 rows)

*******************

Notice the huge cost difference in the two plans: 904468 in the one
without LIMIT versus 64297840.86 for the index scan on custacct index.
Why would the planner switch from using the orderdate index to the
custacct index (which is a BIGSERIAL, btw)?

I can change that behavior (and speed up the resultant query) by using
the following subquery:

explain select foo.storelocation, foo.order_number from (select
storelocation,order_number from custacct where referrer = 1365  and
orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid) as foo  limit 10;

              QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=904420.55..904420.67 rows=10 width=100)
    ->  Subquery Scan foo  (cost=904420.55..904658.36 rows=19025 width=100)
          ->  Sort  (cost=904420.55..904468.11 rows=19025 width=44)
                Sort Key: custacctid
                ->  Index Scan using orderdate_idx on custacct
(cost=0.00..903068.29 rows=19025 width=44)
                      Index Cond: ((orderdate >= '2004-12-07
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
12:00:00'::timestamp without time zone))
                      Filter: (referrer = 1365)
(7 rows)

As a side note, when running query A, the query takes 1772.523 ms, when
running the subselect version to get the limit, it takes 1415.615 ms.
Running option B (with the other index being scanned) takes several
minutes (close to 10 minutes!). What am I missing about how the planner
views the LIMIT statement?

Sven

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Query is not using index when it should
Next
From: Andrew McMillan
Date:
Subject: Re: Using LIMIT changes index used by planner