Home > mailing lists

Re: Why don't use index on x when ORDER BY x, y? - Mailing list pgsql-performance

From	Robert Klemme
Subject	Re: Why don't use index on x when ORDER BY x, y?
Date	November 24, 2014 13:39:25
Msg-id	CAM9pMnPN1+pLNyEGMDuE-3CeA8DRwAhKNZ4cmmte0q11w4BKyg@mail.gmail.com Whole thread
In response to	Why don't use index on x when ORDER BY x, y? (Vlad Arkhipov <arhipov@dc.baikal.ru>)
List	pgsql-performance

Tree view

On Mon, Nov 24, 2014 at 12:02 PM, Vlad Arkhipov wrote:
> Hello,
>
> I wonder why Postgres does not use index in the query below? It is a quite
> common use-case when you  want to sort records by an arbitrary set of
> columns but do not want to create a lot of compound indexes for all possible
> combinations of them. It seems that if, for instance, your query's ORDER BY
> is x, y, z then any of these indexes could be used to improve the
> performance: (x); (x, y); (x, y, z).
>
> create temp table t as
> select s as x, s % 10 as y, s % 100 as z
> from generate_series(1, 1000000) s;
>
> analyze t;
> create index idx1 on t (y);
>
> select *
> from t
> order by y desc, x
> limit 10;

For the record, here's the plan:

rklemme=> explain analyze
rklemme-> select *
rklemme-> from t
rklemme-> order by y desc, x
rklemme-> limit 10;
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=36511.64..36511.67 rows=10 width=12) (actual
time=4058.863..4058.917 rows=10 loops=1)
   ->  Sort  (cost=36511.64..39011.64 rows=1000000 width=12) (actual
time=4058.849..4058.868 rows=10 loops=1)
         Sort Key: y, x
         Sort Method: top-N heapsort  Memory: 17kB
         ->  Seq Scan on t  (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.031..2025.639 rows=1000000 loops=1)
 Total runtime: 4058.992 ms
(6 rows)

Your index does not cover y AND x. If you remove "order by x" the
index will be used:

rklemme=> explain analyze
select *
from t
order by y desc
limit 10;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.86 rows=10 width=12) (actual time=0.081..0.284
rows=10 loops=1)
   ->  Index Scan Backward using idx1 on t  (cost=0.42..43268.33
rows=1000000 width=12) (actual time=0.066..0.125 rows=10 loops=1)
 Total runtime: 0.403 ms
(3 rows)


Now, with a different index:

rklemme=> create index idx2 on t (y desc, x);
CREATE INDEX
rklemme=> explain analyze
select *
from t
order by y desc, x
limit 10;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.88 rows=10 width=12) (actual time=0.127..0.290
rows=10 loops=1)
   ->  Index Scan using idx2 on t  (cost=0.42..45514.12 rows=1000000
width=12) (actual time=0.112..0.165 rows=10 loops=1)
 Total runtime: 0.404 ms
(3 rows)


rklemme=> select version();
                                            version
-----------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on i686-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 32-bit
(1 row)

Kind regards


--
[guy, jim].each {|him| remember.him do |as, often| as.you_can - without end}
http://blog.rubybestpractices.com/

pgsql-performance by date:

From: Vlad Arkhipov
Date: 24 November 2014, 11:02:21
Subject: Why don't use index on x when ORDER BY x, y?

From: Tom Lane
Date: 24 November 2014, 15:39:36
Subject: Re: Why don't use index on x when ORDER BY x, y?

Re: Why don't use index on x when ORDER BY x, y? - Mailing list pgsql-performance

Previous

Next