Re: Is a better way to have the same result of this query? - Mailing list pgsql-performance

From Jochem van Dieten
Subject Re: Is a better way to have the same result of this query?
Date
Msg-id 3DEF23EB.1060603@oli.tudelft.nl
Whole thread Raw
In response to Is a better way to have the same result of this query?  (Vernon Wu <vernonw@gatewaytech.com>)
Responses Re: Is a better way to have the same result of this query?  (Vernon Wu <vernonw@gatewaytech.com>)
List pgsql-performance
Vernon Wu wrote:
>
> SELECT p.userid, p.year, a.country, a.province, a.city
> FROM profile p, account a
> WHERE p.userid=a.userid AND
>     (p.year BETWEEN 1961 AND 1976) AND
>     a.country='CA' AND
>     a.province='BC' AND
>     p.gender='f' AND
>     p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND
>     block.userid IS NOT NULL AND
>     p.userid IN
>     (SELECT f.userid FROM preference f, profile p1 WHERE p1.userid='Joe' AND 2002-p1.year BETWEEN
>     f.minage AND f.maxage)

You might want to flatten this into more joins and less subqueries,
especially since you are using IN which is not very optimized:

SELECT p.userid, p.year, a.country, a.province, a.city
FROM profile p, account a, preference f, profile p1
WHERE
    f.userid = p.userid AND
    p.userid=a.userid AND
    (p.year BETWEEN 1961 AND 1976) AND
    a.country='CA' AND
    a.province='BC' AND
    p.gender='f' AND
    p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND
    block.userid IS NOT NULL AND
    p1.userid='Joe' AND
    2002-p1.year BETWEEN f.minage AND f.maxage

Also, I am not sure about the NOT IN. If you can rewrite it using EXISTS
try that, it might be faster.


> Nested Loop  (cost=0.00..127.12 rows=995 width=894)
>   ->  Nested Loop  (cost=0.00..97.17 rows=1 width=894)
>         ->  Seq Scan on account a  (cost=0.00..25.00 rows=1 width=289)
>         ->  Index Scan using pk_profile on profile p  (cost=0.00..72.16 rows=1 width=605)
>               SubPlan
>                 ->  Materialize  (cost=22.50..22.50 rows=5 width=55)
>                       ->  Seq Scan on block b  (cost=0.00..22.50 rows=5 width=55
> )
>                 ->  Materialize  (cost=44.82..44.82 rows=111 width=89)
>                       ->  Nested Loop  (cost=0.00..44.82 rows=111 width=89)
>                             ->  Index Scan using pk_profile on profile p1  (cost=0.00..4.82 rows=1 width=12)
>                             ->  Seq Scan on preference f  (cost=0.00..20.00 rows=1000 width=77)

rows=1000 usually indicates you didn't vacuum analyze. Did you?

>   ->  Seq Scan on block  (cost=0.00..20.00 rows=995 width=0)

And to add to Vernons questions: if you are using PostgreSQL 7.2 or
later, please send us the EXPLAIN ANALYZE output.

Jochem


pgsql-performance by date:

Previous
From: "john cartmell"
Date:
Subject: ORDER BY ... LIMIT.. performance
Next
From: "Eric Theis"
Date:
Subject: Index question with LIKE keyword