> a9-dev=> explain analyze select * from records where source_id
='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'order by id limit 200;
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..489.57 rows=200 width=1126) (actual time=99701.773..99703.858 rows=200 loops=1)
> -> Index Scan using records_pkey on records (cost=0.00..2441698.81 rows=997489 width=1126) (actual
time=99684.878..99686.936rows=200 loops=1)
> Filter: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
> Total runtime: 99705.916 ms
> (4 rows)
>
> a9-dev=> explain analyze select * from records2 where source_id
='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'order by id2 limit 200;
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..770.01 rows=200 width=1124) (actual time=0.076..0.205 rows=200 loops=1)
> -> Index Scan using source_id2_id2_id2x on records2 (cost=0.00..3735751.15 rows=970308 width=1124) (actual
time=0.074..0.180rows=200 loops=1)
> Index Cond: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
> Total runtime: 0.235 ms
> (4 rows)
>
>
> First one uses records_pkey, and with estimated cost 2441698.81 runs in over 1,5 minute.
> Second uses index on (source_id, id) and with estimated cost 3735751.15 runs in 235 miliseconds.
>
> IMHO, problem lies not in records distribution nor normalization, but in planner's wrong cost estimation. I don't
knowto tell/force him to use proper index.
Getting information on your current configuration should help.
Please see http://wiki.postgresql.org/wiki/Slow_Query_Questions
You should take care of the cache effect of your queries between your
tests, here it is not a problem, but this explain was way longer for
this similar query.
a9-dev=> explain analyze select * from records where source_id
='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id
limit 200;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..757.51 rows=200 width=1126) (actual
time=43.648..564.798 rows=200 loops=1)
-> Index Scan using source_id_id_idx on records
(cost=0.00..1590267.66 rows=419868 width=1126) (actual
time=43.631..564.700 rows=200 loops=1)
Index Cond: ((source_id)::text =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'::text)
Total runtime: 564.895 ms
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation