avoiding seqscan? - Mailing list pgsql-performance

From Palle Girgensohn
Subject avoiding seqscan?
Date
Msg-id 95900000.1064782481@palle.girgensohn.se
Whole thread Raw
Responses Re: avoiding seqscan?  (Josh Berkus <josh@agliodbs.com>)
Re: avoiding seqscan?  (Gaetano Mendola <mendola@bigfoot.com>)
List pgsql-performance
Hi!

I have a SQL statement that I cannot get to use the index. postgresql
insists on using a seqscan and performance is very poor. set enable_seqscan
= true boost performance drastically, as you can see below. Since seqscan
is not always bad, I'd rather not turn it off completely, but rather get
the planner to do the right thing here. Is there another way to do this,
apart from setting enable_seqscan=false?

Thanks,
Palle


the tables are:
person  with about 30000 tuples
dyn_field_content_person, 331156 tuples
dyn_field_person, just 15 tuples
course about 700 tuples
partitcipant with ~ 7800 tuples, where ~ 60 have course_id=707...

uu=# explain analyze
uu-#  select lower(substr(p.last_name,1,1)) as letter, count(*)
uu-#  FROM course c join group_data gd on (c.active_group_id =
gd.this_group_id)
uu-#       join person p on (gd.item_text = p.userid)
uu-#       join dyn_field_person dfp on (dfp.extern_item_id = 10 and
dfp.giver=c.giver)
uu-#       join dyn_field_content_person dfcp on (dfp.id =
dfcp.dyn_field_id and dfcp.userid=p.userid)
uu-#       left outer join participant pt on (pt.userid = p.userid and
pt.course_id = 707)
uu-#  WHERE c.id = 707
uu-#  group by 1
uu-# ;

QUERY PLAN


---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------
-------------------
 Aggregate  (cost=10496.30..10498.35 rows=27 width=106) (actual
time=4166.01..4167.23 rows=19 loops=1)
   ->  Group  (cost=10496.30..10497.67 rows=273 width=106) (actual
time=4165.92..4166.80 rows=60 loops=1)
         ->  Sort  (cost=10496.30..10496.98 rows=273 width=106) (actual
time=4165.91..4166.10 rows=60 loops=1)
               Sort Key: lower(substr(p.last_name, 1, 1))
               ->  Merge Join  (cost=10443.75..10485.23 rows=273 width=106)
(actual time=4094.42..4165.20 rows=60 loops=1)
                     Merge Cond: ("outer".userid = "inner".userid)
                     Join Filter: ("inner".course_id = 707)
                     ->  Sort  (cost=9803.86..9804.54 rows=273 width=88)
(actual time=3823.78..3823.97 rows=60 loops=1)
                           Sort Key: dfcp.userid
                           ->  Hash Join  (cost=2444.22..9792.79 rows=273
width=88) (actual time=1140.50..3822.60 rows=60 loops=1)
                                 Hash Cond: ("outer".userid =
"inner".item_text)
                                 Join Filter: ("inner".id =
"outer".dyn_field_id)
                                 ->  Seq Scan on dyn_field_content_person
dfcp  (cost=0.00..5643.56 rows=331156 width=16) (actual time=0.01..2028.31
rows=331156 loops=1)
                                 ->  Hash  (cost=2443.54..2443.54 rows=272
width=72) (actual time=340.24..340.24 rows=0 loops=1)
                                       ->  Nested Loop
(cost=1401.84..2443.54 rows=272 width=72) (actual time=338.76..339.91
rows=60 loops=1)
                                             Join Filter: ("outer".giver =
"inner".giver)
                                             ->  Seq Scan on
dyn_field_person dfp  (cost=0.00..1.19 rows=1 width=16) (actual
time=0.06..0.09 rows=1 loops=1)
                                                   Filter: (extern_item_id
= 10)
                                             ->  Materialize
(cost=2437.67..2437.67 rows=374 width=56) (actual time=338.64..338.82
rows=60 loops=1)
                                                   ->  Hash Join
(cost=1401.84..2437.67 rows=374 width=56) (actual time=7.74..338.36 rows=60
loops=1)
                                                         Hash Cond:
("outer".userid = "inner".item_text)
                                                         ->  Seq Scan on
person p  (cost=0.00..806.09 rows=30009 width=23) (actual time=0.01..203.67
rows=30009 loops=1)
                                                         ->  Hash
(cost=1400.89..1400.89 rows=378 width=33) (actual time=1.60..1.60 rows=0
loops=1)
                                                               ->  Nested
Loop  (cost=0.00..1400.89 rows=378 width=33) (actual time=0.12..1.28
rows=60 loops=1)
                                                                     ->
Index Scan using course_pkey on course c  (cost=0.00..5.08 rows=1 width=16)
(actual time=0.06..0.06 rows=1 loops=1)

Index Cond: (id = 707)
                                                                     ->
Index Scan using group_data_this_idx on group_data gd  (cost=0.00..1390.80
rows=402 width=17) (actual time=0.04..0.6
6 rows=60 loops=1)

Index Cond: ("outer".active_group_id = gd.this_group_id)
                     ->  Sort  (cost=639.90..659.42 rows=7808 width=18)
(actual time=266.55..290.81 rows=7722 loops=1)
                           Sort Key: pt.userid
                           ->  Seq Scan on participant pt
(cost=0.00..135.08 rows=7808 width=18) (actual time=0.02..50.24 rows=7808
loops=1)
 Total runtime: 4170.16 msec
(32 rader)

Tid: 4184,68 ms
uu=# set enable_seqscan = false;
SET
Tid: 1,20 ms
uu=# explain analyze
uu-#  select lower(substr(p.last_name,1,1)) as letter, count(*)
uu-#  FROM course c join group_data gd on (c.active_group_id =
gd.this_group_id)
uu-#       join person p on (gd.item_text = p.userid)
uu-#       join dyn_field_person dfp on (dfp.extern_item_id = 10 and
dfp.giver=c.giver)
uu-#       join dyn_field_content_person dfcp on (dfp.id =
dfcp.dyn_field_id and dfcp.userid=p.userid)
uu-#       left outer join participant pt on (pt.userid = p.userid and
pt.course_id = 707)
uu-#  WHERE c.id = 707
uu-#  group by 1
uu-# ;

QUERY PLAN


---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------
---------
 Aggregate  (cost=17928.32..17930.37 rows=27 width=106) (actual
time=171.37..172.58 rows=19 loops=1)
   ->  Group  (cost=17928.32..17929.68 rows=273 width=106) (actual
time=171.27..172.14 rows=60 loops=1)
         ->  Sort  (cost=17928.32..17929.00 rows=273 width=106) (actual
time=171.26..171.45 rows=60 loops=1)
               Sort Key: lower(substr(p.last_name, 1, 1))
               ->  Merge Join  (cost=17545.53..17917.25 rows=273 width=106)
(actual time=36.64..170.53 rows=60 loops=1)
                     Merge Cond: ("outer".userid = "inner".userid)
                     Join Filter: ("inner".course_id = 707)
                     ->  Sort  (cost=17545.53..17546.22 rows=273 width=88)
(actual time=28.62..28.84 rows=60 loops=1)
                           Sort Key: dfcp.userid
                           ->  Nested Loop  (cost=0.00..17534.46 rows=273
width=88) (actual time=7.99..27.49 rows=60 loops=1)
                                 Join Filter: ("outer".id =
"inner".dyn_field_id)
                                 ->  Nested Loop  (cost=0.00..3685.31
rows=272 width=72) (actual time=7.67..8.95 rows=60 loops=1)
                                       Join Filter: ("outer".giver =
"inner".giver)
                                       ->  Index Scan using
df_person_giver_id_idx on dyn_field_person dfp  (cost=0.00..6.20 rows=1
width=16) (actual time=0.14..0.17 rows=1 loops=1)
                                             Filter: (extern_item_id = 10)
                                       ->  Materialize
(cost=3674.43..3674.43 rows=374 width=56) (actual time=7.49..7.69 rows=60
loops=1)
                                             ->  Nested Loop
(cost=0.00..3674.43 rows=374 width=56) (actual time=0.24..7.22 rows=60
loops=1)
                                                   ->  Nested Loop
(cost=0.00..1400.89 rows=378 width=33) (actual time=0.10..1.34 rows=60
loops=1)
                                                         ->  Index Scan
using course_pkey on course c  (cost=0.00..5.08 rows=1 width=16) (actual
time=0.04..0.05 rows=1 loops=1)
                                                               Index Cond:
(id = 707)
                                                         ->  Index Scan
using group_data_this_idx on group_data gd  (cost=0.00..1390.80 rows=402
width=17) (actual time=0.04..0.70 rows=60 lo
ops=1)
                                                               Index Cond:
("outer".active_group_id = gd.this_group_id)
                                                   ->  Index Scan using
person_pkey on person p  (cost=0.00..6.01 rows=1 width=23) (actual
time=0.07..0.08 rows=1 loops=60)
                                                         Index Cond:
("outer".item_text = p.userid)
                                 ->  Index Scan using
df_content_person_userid_id_idx on dyn_field_content_person dfcp
(cost=0.00..50.75 rows=12 width=16) (actual time=0.08..0.23 rows=11 l
oops=60)
                                       Index Cond: (dfcp.userid =
"outer".item_text)
                     ->  Index Scan using participant_uid_cid_idx on
participant pt  (cost=0.00..349.76 rows=7808 width=18) (actual
time=0.07..84.34 rows=7722 loops=1)
 Total runtime: 173.37 msec
(28 rader)

Tid: 183,37 ms


pgsql-performance by date:

Previous
From: "Matt Clark"
Date:
Subject: Re: advice on raid controller
Next
From: Josh Berkus
Date:
Subject: Re: avoiding seqscan?