Home > mailing lists
Costly "Sort Key" on indexed timestamp column - Mailing list pgsql-performance

From
Subject	Costly "Sort Key" on indexed timestamp column
Date	September 10, 2004 02:52:57
Msg-id	20040909225137.77822.qmail@web12701.mail.yahoo.com Whole thread Raw
Responses	Re: Costly "Sort Key" on indexed timestamp column
List	pgsql-performance
Tree view
Hello,

I'm tuning my PostgreSQL DB (7.3.4) and have come across a query that
doesn't use an index I created specially for it, and consequently takes
circa 2 seconds to run. :(

The ugly query looks like this (the important part is really at the
very end - order by piece):

select userinfo1_.id as id0_, servicepla3_.id as id1_, account2_.id as
id2_, passwordhi4_.id as id3_, userdemogr5_.id as id4_,
userinfo1_.first_name as first_name0_, userinfo1_.last_name as
last_name0_, userinfo1_.email as email0_, userinfo1_.href as href0_,
userinfo1_.last_login_date as last_log6_0_, userinfo1_.login_count as
login_co7_0_, userinfo1_.password_hint_answer as password8_0_,
userinfo1_.create_date as create_d9_0_, userinfo1_.exp_date as
exp_date0_, userinfo1_.type as type0_, userinfo1_.account_id as
account_id0_, userinfo1_.plan_id as plan_id0_,
userinfo1_.password_hint_id as passwor14_0_,
userinfo1_.user_demographic_id as user_de15_0_, servicepla3_.name as
name1_, servicepla3_.max_links as max_links1_, account2_.username as
username2_, account2_.password as password2_, account2_.status as
status2_, passwordhi4_.question as question3_, userdemogr5_.city as
city4_, userdemogr5_.postal_code as postal_c3_4_,
userdemogr5_.country_id as country_id4_,
userdemogr5_.state_id as state_id4_, userdemogr5_.gender_id as
gender_id4_ from user_preference userprefer0_ inner join user_info
userinfo1_ on userprefer0_.user_id=userinfo1_.id inner join account
account2_ on userinfo1_.account_id=account2_.id inner join service_plan
servicepla3_ on userinfo1_.plan_id=servicepla3_.id left outer join
password_hint passwordhi4_ on
userinfo1_.password_hint_id=passwordhi4_.id inner join user_demographic
userdemogr5_ on userinfo1_.user_demographic_id=userdemogr5_.id,
preference preference6_, preference_value preference7_ where
(preference6_.name='allow_subscribe'  and
userprefer0_.preference_id=preference6_.id)AND(preference7_.value=1
and userprefer0_.preference_value_id=preference7_.id) order by
userinfo1_.create_date desc limit 10;


The output of EXPLAIN ANALYZE follows.  Note how 99% of the total cost
comes from "Sort Key: userinfo1_.create_date".  When I saw this, I
created an index for this:

CREATE INDEX ix_user_info_create_date ON user_info(create_date);

But that didn't seem to make much of a difference.  The total cost did
go down from about 1250 to 1099, but that's still too high.

---------------------------------------------------------
 Limit  (cost=1099.35..1099.38 rows=10 width=222) (actual
time=1914.13..1914.17 rows=10 loops=1)
   ->  Sort  (cost=1099.35..1099.43 rows=31 width=222) (actual
time=1914.12..1914.14 rows=11 loops=1)
         Sort Key: userinfo1_.create_date
         ->  Hash Join  (cost=90.71..1098.60 rows=31 width=222) (actual
time=20.34..1908.41 rows=767 loops=1)
               Hash Cond: ("outer".preference_value_id = "inner".id)
               ->  Hash Join  (cost=89.28..1092.58 rows=561 width=218)
(actual time=19.92..1886.59 rows=768 loops=1)
                     Hash Cond: ("outer".preference_id = "inner".id)
                     ->  Hash Join  (cost=88.10..1045.14 rows=7850
width=214) (actual time=19.44..1783.47 rows=9984 loops=1)
                           Hash Cond: ("outer".user_demographic_id =
"inner".id)
                           ->  Hash Join  (cost=72.59..864.51 rows=8933
width=190) (actual time=14.83..1338.15 rows=9984 loops=1)
                                 Hash Cond: ("outer".password_hint_id =
"inner".id)
                                 ->  Hash Join  (cost=71.50..726.87
rows=8933 width=161) (actual time=14.53..1039.69 rows=9984 loops=1)
                                       Hash Cond: ("outer".plan_id =
"inner".id)
                                       ->  Hash Join
(cost=70.42..569.46 rows=8933 width=144) (actual time=14.26..700.80
rows=9984 loops=1)
                                             Hash Cond:
("outer".account_id = "inner".id)
                                             ->  Hash Join
(cost=53.83..390.83 rows=10073 width=116) (actual time=9.67..373.71
rows=9984 loops=1)
                                                   Hash Cond:
("outer".user_id = "inner".id)
                                                   ->  Seq Scan on
user_preference userprefer0_  (cost=0.00..160.73 rows=10073 width=12)
(actual time=0.09..127.64 rows=9984 loops=1)
                                                   ->  Hash
(cost=51.66..51.66 rows=866 width=104) (actual time=9.40..9.40 rows=0
loops=1)
                                                         ->  Seq Scan
on user_info userinfo1_  (cost=0.00..51.66 rows=866 width=104) (actual
time=0.12..7.15 rows=768 loops=1)
                                             ->  Hash
(cost=14.68..14.68 rows=768 width=28) (actual time=4.45..4.45 rows=0
loops=1)
                                                   ->  Seq Scan on
account account2_  (cost=0.00..14.68 rows=768 width=28) (actual
time=0.10..2.56 rows=768 loops=1)
                                       ->  Hash  (cost=1.06..1.06
rows=6 width=17) (actual time=0.13..0.13 rows=0 loops=1)
                                             ->  Seq Scan on
service_plan servicepla3_  (cost=0.00..1.06 rows=6 width=17) (actual
time=0.10..0.11 rows=6 loops=1)
                                 ->  Hash  (cost=1.07..1.07 rows=7
width=29) (actual time=0.15..0.15 rows=0 loops=1)
                                       ->  Seq Scan on password_hint
passwordhi4_  (cost=0.00..1.07 rows=7 width=29) (actual
time=0.11..0.13 rows=7 loops=1)
                           ->  Hash  (cost=13.61..13.61 rows=761
width=24) (actual time=4.46..4.46 rows=0 loops=1)
                                 ->  Seq Scan on user_demographic
userdemogr5_  (cost=0.00..13.61 rows=761 width=24) (actual
time=0.10..2.73 rows=769 loops=1)
                     ->  Hash  (cost=1.18..1.18 rows=1 width=4) (actual
time=0.16..0.16 rows=0 loops=1)
                           ->  Seq Scan on preference preference6_
(cost=0.00..1.18 rows=1 width=4) (actual time=0.14..0.15
rows=1 loops=1)
                                 Filter: (name =
'allow_subscribe'::character varying)
               ->  Hash  (cost=1.43..1.43 rows=2 width=4) (actual
time=0.23..0.23 rows=0 loops=1)
                     ->  Seq Scan on preference_value preference7_
(cost=0.00..1.43 rows=2 width=4) (actual time=0.17..0.21
rows=3 loops=1)
                           Filter: ((value)::text = '1'::text)
 Total runtime: 1914.91 msec
(35 rows)



There are a few Seq Scan's, but they are benign, as their low/no cost
shows - they are very small, 'static' tables (e.g. country list, state
list, preference names list).

Does anyone have any ideas how I could speed up this query?

Thanks,
Otis
pgsql-performance by date:
From: Mark Wong
Date: 09 September 2004, 21:13:11
Subject: Re: fsync vs open_sync
From: Neil Conway
Date: 10 September 2004, 03:06:46
Subject: Re: Multiple Uniques
Costly "Sort Key" on indexed timestamp column - Mailing list pgsql-performance

Previous

Next