Re: Query with order by and limit is very slow - wrong index used - Mailing list pgsql-performance

From Cédric Villemain
Subject Re: Query with order by and limit is very slow - wrong index used
Date
Msg-id CAF6yO=1_ofBKrQ-eLW+1CQM62cpFFxmduRaojM5fZWkCXiX_zQ@mail.gmail.com
Whole thread Raw
In response to Re: Query with order by and limit is very slow - wrong index used  (Nowak Michał <michal.nowak@me.com>)
List pgsql-performance
> a9-dev=> explain analyze select * from records where source_id
='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'order by id limit 200; 
>                                                                    QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..489.57 rows=200 width=1126) (actual time=99701.773..99703.858 rows=200 loops=1)
>  ->  Index Scan using records_pkey on records  (cost=0.00..2441698.81 rows=997489 width=1126) (actual
time=99684.878..99686.936rows=200 loops=1) 
>        Filter: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
>  Total runtime: 99705.916 ms
> (4 rows)
>
> a9-dev=> explain analyze select * from records2 where source_id
='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'order by id2 limit 200; 
>                                                                    QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..770.01 rows=200 width=1124) (actual time=0.076..0.205 rows=200 loops=1)
>  ->  Index Scan using source_id2_id2_id2x on records2  (cost=0.00..3735751.15 rows=970308 width=1124) (actual
time=0.074..0.180rows=200 loops=1) 
>        Index Cond: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
>  Total runtime: 0.235 ms
> (4 rows)
>
>
> First one uses records_pkey, and with estimated cost 2441698.81 runs in over 1,5 minute.
> Second uses index on (source_id, id) and with estimated cost 3735751.15 runs in 235 miliseconds.
>
> IMHO, problem lies not in records distribution nor normalization, but in planner's wrong cost estimation. I don't
knowto tell/force him to use proper index. 

Getting information on your current configuration should help.
Please see http://wiki.postgresql.org/wiki/Slow_Query_Questions

You should take care of the cache effect of your queries between your
tests, here it is not a problem, but this explain  was way longer for
this similar query.

a9-dev=> explain analyze select * from records where source_id
='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id
limit 200;
                                                                  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..757.51 rows=200 width=1126) (actual
time=43.648..564.798 rows=200 loops=1)
 ->  Index Scan using source_id_id_idx on records
(cost=0.00..1590267.66 rows=419868 width=1126) (actual
time=43.631..564.700 rows=200 loops=1)
       Index Cond: ((source_id)::text =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'::text)
Total runtime: 564.895 ms



--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

pgsql-performance by date:

Previous
From: Ben Chobot
Date:
Subject: Re: How can i get record by data block not by sql?
Next
From: Tom Lane
Date:
Subject: Re: Query with order by and limit is very slow - wrong index used