Re: Is a better way to have the same result of this query? - Mailing list pgsql-performance

From Vernon Wu
Subject Re: Is a better way to have the same result of this query?
Date
Msg-id PKIGWVKHYUZT63MHQM42A01BTSWUED.3def9e92@kimiko
Whole thread Raw
In response to Re: Is a better way to have the same result of this query?  (Jochem van Dieten <jochemd@oli.tudelft.nl>)
Responses Re: Is a better way to have the same result of this query?
List pgsql-performance
Jochem,

Thanks for your suggestion/information.

The followings are the analyise outcomes after I did some modifications with the query. My finding is that the new
query 
does improve the performance according to the plan. The actual time is reversed might due to the fact the test data is
very small (the machine is a very old one by the way). The userid is the key for all tables and the gender is indexed.
DoI  
also index the country and province to improve the preformance?

The modified query with the suggested flatting query.

Nested Loop  (cost=0.00..91.97 rows=995 width=445) (actual time=1.00..3.00 rows=2 loops=1)
  ->  Nested Loop  (cost=0.00..62.02 rows=1 width=445) (actual time=1.00..3.00 rows=2 loops=1)
        ->  Nested Loop  (cost=0.00..34.68 rows=1 width=378) (actual time=1.00..3.00 rows=3 loops=1)
              ->  Nested Loop  (cost=0.00..29.84 rows=1 width=366) (actual time=1.00..3.00 rows=3 loops=1)
                    ->  Seq Scan on account a  (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3
loops=1)
                    ->  Index Scan using pk_preference on preference f  (cost=0.00..4.82 rows=1 width=77) (actual time=
0.67..1.00 rows=1 loops=3)
              ->  Index Scan using pk_profile on profile p1  (cost=0.00..4.82 rows=1 width=12) (actual time=0.00..0.00
rows=
1 loops=3)
        ->  Index Scan using pk_profile on profile p  (cost=0.00..27.33 rows=1 width=67) (actual time=0.00..0.00 rows=1

loops=3)
              SubPlan
                ->  Materialize  (cost=22.50..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=2)
                      ->  Seq Scan on block b  (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0
loops=1)
  ->  Seq Scan on block  (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 5.00 msec

After replacing "p.userid NOT IN" with "NOT EXISTS":

Result  (cost=0.00..61.56 rows=995 width=445) (actual time=3.00..4.00 rows=2 loops=1)
  InitPlan
    ->  Seq Scan on block b  (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1)
  ->  Nested Loop  (cost=0.00..61.56 rows=995 width=445) (actual time=3.00..4.00 rows=2 loops=1)
        ->  Nested Loop  (cost=0.00..31.61 rows=1 width=445) (actual time=3.00..4.00 rows=2 loops=1)
              ->  Nested Loop  (cost=0.00..26.77 rows=1 width=433) (actual time=2.00..3.00 rows=2 loops=1)
                    ->  Nested Loop  (cost=0.00..21.93 rows=1 width=356) (actual time=2.00..2.00 rows=2 loops=1)
                          ->  Index Scan using profile_sex_idx on profile p  (cost=0.00..17.09 rows=1 width=67) (actual
time=
1.00..1.00 rows=2 loops=1)
                          ->  Index Scan using pk_account on account a  (cost=0.00..4.83 rows=1 width=289) (actual
time=
0.50..0.50 rows=1 loops=2)
                    ->  Index Scan using pk_preference on preference f  (cost=0.00..4.82 rows=1 width=77) (actual time=
0.50..0.50 rows=1 loops=2)
              ->  Index Scan using pk_profile on profile p1  (cost=0.00..4.82 rows=1 width=12) (actual time=0.50..0.50
rows=
1 loops=2)
        ->  Seq Scan on block  (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 7.00 msec

After vacuum analyze:

Result  (cost=3.19..5.29 rows=1 width=91) (actual time=3.00..4.00 rows=2 loops=1)
  InitPlan
    ->  Seq Scan on block b  (cost=0.00..1.01 rows=1 width=7) (actual time=0.00..0.00 rows=0 loops=1)
  ->  Nested Loop  (cost=3.19..5.29 rows=1 width=91) (actual time=3.00..4.00 rows=2 loops=1)
        ->  Hash Join  (cost=3.19..4.27 rows=1 width=91) (actual time=3.00..3.00 rows=2 loops=1)
              ->  Hash Join  (cost=2.13..3.20 rows=1 width=72) (actual time=2.00..2.00 rows=3 loops=1)
                    ->  Seq Scan on account a  (cost=0.00..1.04 rows=3 width=31) (actual time=1.00..1.00 rows=3
loops=1)
                    ->  Hash  (cost=2.13..2.13 rows=1 width=41) (actual time=0.00..0.00 rows=0 loops=1)
                          ->  Nested Loop  (cost=0.00..2.13 rows=1 width=41) (actual time=0.00..0.00 rows=3 loops=1)
                                ->  Seq Scan on profile p1  (cost=0.00..1.04 rows=1 width=12) (actual time=0.00..0.00
rows=1 
loops=1)
                                ->  Seq Scan on preference f  (cost=0.00..1.03 rows=3 width=29) (actual time=0.00..0.00
rows=3 
loops=1)
              ->  Hash  (cost=1.05..1.05 rows=2 width=19) (actual time=1.00..1.00 rows=0 loops=1)
                    ->  Seq Scan on profile p  (cost=0.00..1.05 rows=2 width=19) (actual time=1.00..1.00 rows=2
loops=1)
        ->  Seq Scan on block  (cost=0.00..1.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 7.00 msec

The original query

Nested Loop  (cost=0.00..127.12 rows=995 width=894) (actual time=1.00..2.00 rows=2 loops=1)
  ->  Nested Loop  (cost=0.00..97.17 rows=1 width=894) (actual time=1.00..1.00 rows=2 loops=1)
        ->  Seq Scan on account a  (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1)
        ->  Index Scan using pk_profile on profile p  (cost=0.00..72.16 rows=1 width=605) (actual time=0.33..0.33
rows=1 
loops=3)
              SubPlan
                ->  Materialize  (cost=22.50..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=2)
                      ->  Seq Scan on block b  (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0
loops=1)
                ->  Materialize  (cost=44.82..44.82 rows=111 width=89) (actual time=0.50..0.50 rows=1 loops=2)
                      ->  Nested Loop  (cost=0.00..44.82 rows=111 width=89) (actual time=0.00..0.00 rows=3 loops=1)
                            ->  Index Scan using pk_profile on profile p1  (cost=0.00..4.82 rows=1 width=12) (actual
time=
0.00..0.00 rows=1 loops=1)
                            ->  Seq Scan on preference f  (cost=0.00..20.00 rows=1000 width=77) (actual time=0.00..0.00
rows=
3 loops=1)
  ->  Seq Scan on block  (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 4.00 msec

After replacing "p.userid NOT IN" with "NOT EXISTS":

Result  (cost=0.00..104.62 rows=995 width=894) (actual time=1.00..2.00 rows=2 loops=1)
  InitPlan
    ->  Seq Scan on block b  (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1)
  ->  Nested Loop  (cost=0.00..104.62 rows=995 width=894) (actual time=1.00..1.00 rows=2 loops=1)
        ->  Nested Loop  (cost=0.00..74.67 rows=1 width=894) (actual time=1.00..1.00 rows=2 loops=1)
              ->  Seq Scan on account a  (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1)
              ->  Index Scan using pk_profile on profile p  (cost=0.00..49.66 rows=1 width=605) (actual time=0.33..0.33

rows=1 loops=3)
                    SubPlan
                      ->  Materialize  (cost=44.82..44.82 rows=111 width=89) (actual time=0.50..0.50 rows=1 loops=2)
                            ->  Nested Loop  (cost=0.00..44.82 rows=111 width=89) (actual time=0.00..1.00 rows=3
loops=1)
                                  ->  Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12)
(actualtime= 
0.00..0.00 rows=1 loops=1)
                                  ->  Seq Scan on preference f  (cost=0.00..20.00 rows=1000 width=77) (actual
time=0.00..1.00 
rows=3 loops=1)
        ->  Seq Scan on block  (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 4.00 msec

After vacuum analyze:

Result  (cost=7.30..9.39 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1)
  InitPlan
    ->  Seq Scan on block b  (cost=0.00..1.01 rows=1 width=7) (actual time=0.00..0.00 rows=0 loops=1)
  ->  Nested Loop  (cost=7.30..9.39 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1)
        ->  Seq Scan on block  (cost=0.00..1.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1)
        ->  Materialize  (cost=8.37..8.37 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1)
              ->  Hash Join  (cost=7.30..8.37 rows=1 width=63) (actual time=2.00..3.00 rows=2 loops=1)
                    ->  Seq Scan on account a  (cost=0.00..1.04 rows=3 width=31) (actual time=0.00..0.00 rows=3
loops=1)
                    ->  Hash  (cost=7.30..7.30 rows=1 width=32) (actual time=2.00..2.00 rows=0 loops=1)
                          ->  Index Scan using profile_sex_idx on profile p  (cost=0.00..7.30 rows=1 width=32) (actual
time=
2.00..2.00 rows=2 loops=1)
                                SubPlan
                                  ->  Materialize  (cost=2.13..2.13 rows=1 width=41) (actual time=0.50..0.50 rows=1
loops=2)
                                        ->  Nested Loop  (cost=0.00..2.13 rows=1 width=41) (actual time=1.00..1.00
rows=3loops=1) 
                                              ->  Seq Scan on profile p1  (cost=0.00..1.04 rows=1 width=12) (actual
time=0.00..0.00 
rows=1 loops=1)
                                              ->  Seq Scan on preference f  (cost=0.00..1.03 rows=3 width=29) (actual
time=0.00..0.00 
rows=3 loops=1)
Total runtime: 4.00 msec


12/5/2002 2:01:15 AM, Jochem van Dieten <jochemd@oli.tudelft.nl> wrote:

>Vernon Wu wrote:
>>
>> SELECT p.userid, p.year, a.country, a.province, a.city
>> FROM profile p, account a
>> WHERE p.userid=a.userid AND
>>     (p.year BETWEEN 1961 AND 1976) AND
>>     a.country='CA' AND
>>     a.province='BC' AND
>>     p.gender='f' AND
>>     p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND
>>     block.userid IS NOT NULL AND
>>     p.userid IN
>>     (SELECT f.userid FROM preference f, profile p1 WHERE p1.userid='Joe' AND 2002-p1.year BETWEEN
>>     f.minage AND f.maxage)
>
>You might want to flatten this into more joins and less subqueries,
>especially since you are using IN which is not very optimized:
>
>SELECT p.userid, p.year, a.country, a.province, a.city
>FROM profile p, account a, preference f, profile p1
>WHERE
>    f.userid = p.userid AND
>    p.userid=a.userid AND
>    (p.year BETWEEN 1961 AND 1976) AND
>    a.country='CA' AND
>    a.province='BC' AND
>    p.gender='f' AND
>    p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND
>    block.userid IS NOT NULL AND
>    p1.userid='Joe' AND
>    2002-p1.year BETWEEN f.minage AND f.maxage
>
>Also, I am not sure about the NOT IN. If you can rewrite it using EXISTS
>try that, it might be faster.
>
>
>> Nested Loop  (cost=0.00..127.12 rows=995 width=894)
>>   ->  Nested Loop  (cost=0.00..97.17 rows=1 width=894)
>>         ->  Seq Scan on account a  (cost=0.00..25.00 rows=1 width=289)
>>         ->  Index Scan using pk_profile on profile p  (cost=0.00..72.16 rows=1 width=605)
>>               SubPlan
>>                 ->  Materialize  (cost=22.50..22.50 rows=5 width=55)
>>                       ->  Seq Scan on block b  (cost=0.00..22.50 rows=5 width=55
>> )
>>                 ->  Materialize  (cost=44.82..44.82 rows=111 width=89)
>>                       ->  Nested Loop  (cost=0.00..44.82 rows=111 width=89)
>>                             ->  Index Scan using pk_profile on profile p1  (cost=0.00..4.82 rows=1 width=12)
>>                             ->  Seq Scan on preference f  (cost=0.00..20.00 rows=1000 width=77)
>
>rows=1000 usually indicates you didn't vacuum analyze. Did you?
>
>>   ->  Seq Scan on block  (cost=0.00..20.00 rows=995 width=0)
>
>And to add to Vernons questions: if you are using PostgreSQL 7.2 or
>later, please send us the EXPLAIN ANALYZE output.
>
>Jochem
>
>




pgsql-performance by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Index question with LIKE keyword
Next
From: Vernon Wu
Date:
Subject: Re: Is a better way to have the same result of this