Re: Is a better way to have the same result of this query? - Mailing list pgsql-performance
From | Vernon Wu |
---|---|
Subject | Re: Is a better way to have the same result of this query? |
Date | |
Msg-id | PKIGWVKHYUZT63MHQM42A01BTSWUED.3def9e92@kimiko Whole thread Raw |
In response to | Re: Is a better way to have the same result of this query? (Jochem van Dieten <jochemd@oli.tudelft.nl>) |
Responses |
Re: Is a better way to have the same result of this query?
|
List | pgsql-performance |
Jochem, Thanks for your suggestion/information. The followings are the analyise outcomes after I did some modifications with the query. My finding is that the new query does improve the performance according to the plan. The actual time is reversed might due to the fact the test data is very small (the machine is a very old one by the way). The userid is the key for all tables and the gender is indexed. DoI also index the country and province to improve the preformance? The modified query with the suggested flatting query. Nested Loop (cost=0.00..91.97 rows=995 width=445) (actual time=1.00..3.00 rows=2 loops=1) -> Nested Loop (cost=0.00..62.02 rows=1 width=445) (actual time=1.00..3.00 rows=2 loops=1) -> Nested Loop (cost=0.00..34.68 rows=1 width=378) (actual time=1.00..3.00 rows=3 loops=1) -> Nested Loop (cost=0.00..29.84 rows=1 width=366) (actual time=1.00..3.00 rows=3 loops=1) -> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1) -> Index Scan using pk_preference on preference f (cost=0.00..4.82 rows=1 width=77) (actual time= 0.67..1.00 rows=1 loops=3) -> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actual time=0.00..0.00 rows= 1 loops=3) -> Index Scan using pk_profile on profile p (cost=0.00..27.33 rows=1 width=67) (actual time=0.00..0.00 rows=1 loops=3) SubPlan -> Materialize (cost=22.50..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=2) -> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1) -> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2) Total runtime: 5.00 msec After replacing "p.userid NOT IN" with "NOT EXISTS": Result (cost=0.00..61.56 rows=995 width=445) (actual time=3.00..4.00 rows=2 loops=1) InitPlan -> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1) -> Nested Loop (cost=0.00..61.56 rows=995 width=445) (actual time=3.00..4.00 rows=2 loops=1) -> Nested Loop (cost=0.00..31.61 rows=1 width=445) (actual time=3.00..4.00 rows=2 loops=1) -> Nested Loop (cost=0.00..26.77 rows=1 width=433) (actual time=2.00..3.00 rows=2 loops=1) -> Nested Loop (cost=0.00..21.93 rows=1 width=356) (actual time=2.00..2.00 rows=2 loops=1) -> Index Scan using profile_sex_idx on profile p (cost=0.00..17.09 rows=1 width=67) (actual time= 1.00..1.00 rows=2 loops=1) -> Index Scan using pk_account on account a (cost=0.00..4.83 rows=1 width=289) (actual time= 0.50..0.50 rows=1 loops=2) -> Index Scan using pk_preference on preference f (cost=0.00..4.82 rows=1 width=77) (actual time= 0.50..0.50 rows=1 loops=2) -> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actual time=0.50..0.50 rows= 1 loops=2) -> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2) Total runtime: 7.00 msec After vacuum analyze: Result (cost=3.19..5.29 rows=1 width=91) (actual time=3.00..4.00 rows=2 loops=1) InitPlan -> Seq Scan on block b (cost=0.00..1.01 rows=1 width=7) (actual time=0.00..0.00 rows=0 loops=1) -> Nested Loop (cost=3.19..5.29 rows=1 width=91) (actual time=3.00..4.00 rows=2 loops=1) -> Hash Join (cost=3.19..4.27 rows=1 width=91) (actual time=3.00..3.00 rows=2 loops=1) -> Hash Join (cost=2.13..3.20 rows=1 width=72) (actual time=2.00..2.00 rows=3 loops=1) -> Seq Scan on account a (cost=0.00..1.04 rows=3 width=31) (actual time=1.00..1.00 rows=3 loops=1) -> Hash (cost=2.13..2.13 rows=1 width=41) (actual time=0.00..0.00 rows=0 loops=1) -> Nested Loop (cost=0.00..2.13 rows=1 width=41) (actual time=0.00..0.00 rows=3 loops=1) -> Seq Scan on profile p1 (cost=0.00..1.04 rows=1 width=12) (actual time=0.00..0.00 rows=1 loops=1) -> Seq Scan on preference f (cost=0.00..1.03 rows=3 width=29) (actual time=0.00..0.00 rows=3 loops=1) -> Hash (cost=1.05..1.05 rows=2 width=19) (actual time=1.00..1.00 rows=0 loops=1) -> Seq Scan on profile p (cost=0.00..1.05 rows=2 width=19) (actual time=1.00..1.00 rows=2 loops=1) -> Seq Scan on block (cost=0.00..1.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=2) Total runtime: 7.00 msec The original query Nested Loop (cost=0.00..127.12 rows=995 width=894) (actual time=1.00..2.00 rows=2 loops=1) -> Nested Loop (cost=0.00..97.17 rows=1 width=894) (actual time=1.00..1.00 rows=2 loops=1) -> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1) -> Index Scan using pk_profile on profile p (cost=0.00..72.16 rows=1 width=605) (actual time=0.33..0.33 rows=1 loops=3) SubPlan -> Materialize (cost=22.50..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=2) -> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1) -> Materialize (cost=44.82..44.82 rows=111 width=89) (actual time=0.50..0.50 rows=1 loops=2) -> Nested Loop (cost=0.00..44.82 rows=111 width=89) (actual time=0.00..0.00 rows=3 loops=1) -> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actual time= 0.00..0.00 rows=1 loops=1) -> Seq Scan on preference f (cost=0.00..20.00 rows=1000 width=77) (actual time=0.00..0.00 rows= 3 loops=1) -> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2) Total runtime: 4.00 msec After replacing "p.userid NOT IN" with "NOT EXISTS": Result (cost=0.00..104.62 rows=995 width=894) (actual time=1.00..2.00 rows=2 loops=1) InitPlan -> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1) -> Nested Loop (cost=0.00..104.62 rows=995 width=894) (actual time=1.00..1.00 rows=2 loops=1) -> Nested Loop (cost=0.00..74.67 rows=1 width=894) (actual time=1.00..1.00 rows=2 loops=1) -> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1) -> Index Scan using pk_profile on profile p (cost=0.00..49.66 rows=1 width=605) (actual time=0.33..0.33 rows=1 loops=3) SubPlan -> Materialize (cost=44.82..44.82 rows=111 width=89) (actual time=0.50..0.50 rows=1 loops=2) -> Nested Loop (cost=0.00..44.82 rows=111 width=89) (actual time=0.00..1.00 rows=3 loops=1) -> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actualtime= 0.00..0.00 rows=1 loops=1) -> Seq Scan on preference f (cost=0.00..20.00 rows=1000 width=77) (actual time=0.00..1.00 rows=3 loops=1) -> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2) Total runtime: 4.00 msec After vacuum analyze: Result (cost=7.30..9.39 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1) InitPlan -> Seq Scan on block b (cost=0.00..1.01 rows=1 width=7) (actual time=0.00..0.00 rows=0 loops=1) -> Nested Loop (cost=7.30..9.39 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1) -> Seq Scan on block (cost=0.00..1.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1) -> Materialize (cost=8.37..8.37 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1) -> Hash Join (cost=7.30..8.37 rows=1 width=63) (actual time=2.00..3.00 rows=2 loops=1) -> Seq Scan on account a (cost=0.00..1.04 rows=3 width=31) (actual time=0.00..0.00 rows=3 loops=1) -> Hash (cost=7.30..7.30 rows=1 width=32) (actual time=2.00..2.00 rows=0 loops=1) -> Index Scan using profile_sex_idx on profile p (cost=0.00..7.30 rows=1 width=32) (actual time= 2.00..2.00 rows=2 loops=1) SubPlan -> Materialize (cost=2.13..2.13 rows=1 width=41) (actual time=0.50..0.50 rows=1 loops=2) -> Nested Loop (cost=0.00..2.13 rows=1 width=41) (actual time=1.00..1.00 rows=3loops=1) -> Seq Scan on profile p1 (cost=0.00..1.04 rows=1 width=12) (actual time=0.00..0.00 rows=1 loops=1) -> Seq Scan on preference f (cost=0.00..1.03 rows=3 width=29) (actual time=0.00..0.00 rows=3 loops=1) Total runtime: 4.00 msec 12/5/2002 2:01:15 AM, Jochem van Dieten <jochemd@oli.tudelft.nl> wrote: >Vernon Wu wrote: >> >> SELECT p.userid, p.year, a.country, a.province, a.city >> FROM profile p, account a >> WHERE p.userid=a.userid AND >> (p.year BETWEEN 1961 AND 1976) AND >> a.country='CA' AND >> a.province='BC' AND >> p.gender='f' AND >> p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND >> block.userid IS NOT NULL AND >> p.userid IN >> (SELECT f.userid FROM preference f, profile p1 WHERE p1.userid='Joe' AND 2002-p1.year BETWEEN >> f.minage AND f.maxage) > >You might want to flatten this into more joins and less subqueries, >especially since you are using IN which is not very optimized: > >SELECT p.userid, p.year, a.country, a.province, a.city >FROM profile p, account a, preference f, profile p1 >WHERE > f.userid = p.userid AND > p.userid=a.userid AND > (p.year BETWEEN 1961 AND 1976) AND > a.country='CA' AND > a.province='BC' AND > p.gender='f' AND > p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND > block.userid IS NOT NULL AND > p1.userid='Joe' AND > 2002-p1.year BETWEEN f.minage AND f.maxage > >Also, I am not sure about the NOT IN. If you can rewrite it using EXISTS >try that, it might be faster. > > >> Nested Loop (cost=0.00..127.12 rows=995 width=894) >> -> Nested Loop (cost=0.00..97.17 rows=1 width=894) >> -> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289) >> -> Index Scan using pk_profile on profile p (cost=0.00..72.16 rows=1 width=605) >> SubPlan >> -> Materialize (cost=22.50..22.50 rows=5 width=55) >> -> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55 >> ) >> -> Materialize (cost=44.82..44.82 rows=111 width=89) >> -> Nested Loop (cost=0.00..44.82 rows=111 width=89) >> -> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) >> -> Seq Scan on preference f (cost=0.00..20.00 rows=1000 width=77) > >rows=1000 usually indicates you didn't vacuum analyze. Did you? > >> -> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) > >And to add to Vernons questions: if you are using PostgreSQL 7.2 or >later, please send us the EXPLAIN ANALYZE output. > >Jochem > >
pgsql-performance by date: