Tuning complicated query - Mailing list pgsql-sql

From Ludwig Lim
Subject Tuning complicated query
Date
Msg-id 20020926095729.45949.qmail@web80310.mail.yahoo.com
Whole thread Raw
List pgsql-sql
Hi:


   Attached to the e-mail is the body of the query and
the result of the EXPLAIN  (Sorry for not placing the
query and EXPLAIN in the e-mail body . The query is
rather complicated and the EXPLAIN result is rather
long ).
  The file demo.out.3 is the result of the EXPLAIN
  The file demo.sql is the sql statement.

    I would like your opinion on how to tune the query
as posted in the attachment

  Note that I have indexes on the all the column
customer_id on both sc_customer_attr and
sc_add_points.

  I am wondering why sequential scan was used the on
the clause   a.customer_id = b.customer_id since the
previous join condition has an "exist" subquery with
LIMIT with filters out unneccesary customer_id before
performing the join (a_customer_id = b.customer_id).

    Also I was wondering why the number of rows in the
last sequential scan is still 7 million plus (most of
the should already have been elimated by the
subquery).

    Note that before the executing the query, the
database has been VACUUMed and ANALYZEd. The result of
EXPLAIN ANALYZE is almost similar to one produce by
issuing the EXPLAIN.

    Any hints on tuning the query?

thank you

ludwig


__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.compsql:demo.sql5:31: NOTICE:  QUERY PLAN:

Limit  (cost=602630531.21..602630531.21 rows=10 width=69)
  ->  Sort  (cost=602630531.21..602630531.21 rows=218145 width=69)
        ->  Aggregate  (cost=602583597.46..602609774.87 rows=218145 width=69)
              ->  Group  (cost=602583597.46..602596686.16 rows=2181451 width=69)
                    ->  Sort  (cost=602583597.46..602583597.46 rows=2181451 width=69)
                          ->  Merge Join  (cost=602162862.44..602184219.45 rows=2181451 width=69)
                                ->  Sort  (cost=600998172.01..600998172.01 rows=52125 width=49)
                                      ->  Merge Join  (cost=600994410.42..600994904.87 rows=52125 width=49)
                                            ->  Sort  (cost=13.25..13.25 rows=302 width=25)
                                                  ->  Seq Scan on sc_attr c  (cost=0.00..3.30 rows=302 width=25)
                                            ->  Sort  (cost=600994397.17..600994397.17 rows=64485 width=24)
                                                  ->  Seq Scan on sc_customer_attr a  (cost=0.00..600990276.11
rows=64485width=24) 
                                                        SubPlan
                                                          ->  Limit  (cost=736.25..736.25 rows=1 width=20)
                                                                ->  Subquery Scan z  (cost=736.25..736.25 rows=1
width=20)
                                                                      ->  Limit  (cost=736.25..736.25 rows=1 width=20)
                                                                            ->  Sort  (cost=736.25..736.25 rows=1
width=20)
                                                                                  ->  Aggregate  (cost=0.00..736.24
rows=1width=20) 
                                                                                        ->  Group  (cost=0.00..736.24
rows=1width=20) 
                                                                                              ->  Index Scan using
xie2sc_add_pointson sc_add_points d  (cost=0.00..736.24 rows=1 width=20) 
                                ->  Sort  (cost=1164690.44..1164690.44 rows=7354200 width=20)
                                      ->  Seq Scan on sc_add_points b  (cost=0.00..138679.20 rows=7354200 width=20)

explain select  count(distinct(b.customer_id)) as members,
                sum(b.total_loyalty) as sales,
                count(b.customer_id) as visits,
                c.attr_cd,
                c.attr_type_cd,
                c.description as description
from            sc_customer_attr a,
                sc_add_points b,
                sc_attr c
where            exists (select z.customer_id
                        from (select d.customer_id,
                                     sum(d.total_loyalty) as points
                               from  sc_add_points d
                               where  d.transdate >= 19980100.000000  and
                                      d.transdate <= 20020931.000000  and
                                      d.company_cd = 1 and d.branch_cd = 13  and
a.customer_id= d.customer_id        
                               group by d.customer_id
                               order by points desc
                               limit 100 ) as z
                       )  and
                 a.attr_cd = c.attr_cd and
                 a.attr_type_cd = c.attr_type_cd and
                 a.attr_type_cd = 2           and
                 a.company_cd = c.company_cd and
                 b.customer_id = a.customer_id
group by         c.attr_type_cd,
                 c.attr_cd,
                 c.description
order by         c.description asc
LIMIT 10 OFFSET 0;

pgsql-sql by date:

Previous
From: Philip Hallstrom
Date:
Subject: Re: SQL formatter?
Next
From: John "Sebastian N. Mayordomo"
Date:
Subject: start and end of the week