Re: Using LIMIT changes index used by planner - Mailing list pgsql-performance

From Sven Willenberger
Subject Re: Using LIMIT changes index used by planner
Date
Msg-id 1103048932.5534.22.camel@lanshark.dmv.com
Whole thread Raw
In response to Re: Using LIMIT changes index used by planner  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Using LIMIT changes index used by planner
List pgsql-performance
On Mon, 2004-12-13 at 17:43 -0500, Tom Lane wrote:
> Sven Willenberger <sven@dmv.com> writes:
> > explain analyze select storelocation,order_number from custacct where
> > referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07
> > 12:00:00' order by custacctid limit 10;
>
> >                   QUERY PLAN
>
> >
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >   Limit  (cost=0.00..43065.76 rows=10 width=43) (actual
> > time=1306957.216..1307072.111 rows=10 loops=1)
> >     ->  Index Scan using custacct2_pkey on custacct
> > (cost=0.00..92083209.38 rows=21382 width=43) (actual
> > time=1306957.205..1307072.017 rows=10 loops=1)
> >           Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07
> > 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
> > 12:00:00'::timestamp without time zone))
> >   Total runtime: 1307072.231 ms
> > (4 rows)
>
> I think this is the well-known issue of lack of cross-column correlation
> statistics.  The planner is well aware that this indexscan will be
> horridly expensive if run to completion ---
> <snip>
> There isn't any near-term fix in the wind for this, since storing
> cross-column statistics is an expensive proposition that we haven't
> decided how to handle.  Your workaround with separating the ORDER BY
> from the LIMIT is a good one.
>

You are correct in that there is a high degree of correlation between
the custacctid (which is a serial key) and the orderdate as the orders
generally get entered in the order that they arrive. I will go with the
workaround subselect query plan then.

On a related note, is there a way (other than set enable_seqscan=off) to
give a hint to the planner that it is cheaper to use and index scan
versus seq scan? Using the "workaround" query on any time period greater
than 12 hours results in the planner using a seq scan. Disabling the seq
scan and running the query on a full day period for example shows:

explain analyze select foo.storelocaion, foo.order_number from (select
storelocation,order_number from custacct where referrer = 1365  and
ordertdate between '2004-12-09' and '2004-12-10' order by custacctid) as
foo  limit 10 offset 100;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2661326.22..2661326.35 rows=10 width=100) (actual
time=28446.605..28446.796 rows=10 loops=1)
   ->  Subquery Scan foo  (cost=2661324.97..2661866.19 rows=43297
width=100) (actual time=28444.916..28446.298 rows=110 loops=1)
         ->  Sort  (cost=2661324.97..2661433.22 rows=43297 width=41)
(actual time=28444.895..28445.334 rows=110 loops=1)
               Sort Key: custacctid
               ->  Index Scan using orderdate_idx on custacct
(cost=0.00..2657990.68 rows=43297 width=41) (actual
time=4.432..28145.212 rows=44333 loops=1)
                     Index Cond: ((orderdate >= '2004-12-09
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-10
00:00:00'::timestamp without time zone))
                     Filter: (referrer = 1365)
 Total runtime: 28456.893 ms
(8 rows)


If I interpret the above correctly, the planner guestimates a cost of
2661326 but the actual cost is much less (assuming time is equivalent to
cost). Would the set statistics command be of any benefit here in
"training" the planner?

Sven




pgsql-performance by date:

Previous
From: Markus Schaber
Date:
Subject: Re: Anything to be gained from a 'Postgres Filesystem'?
Next
From: Tom Lane
Date:
Subject: Re: Using LIMIT changes index used by planner