Thread: avoiding seqscan?
Hi! I have a SQL statement that I cannot get to use the index. postgresql insists on using a seqscan and performance is very poor. set enable_seqscan = true boost performance drastically, as you can see below. Since seqscan is not always bad, I'd rather not turn it off completely, but rather get the planner to do the right thing here. Is there another way to do this, apart from setting enable_seqscan=false? Thanks, Palle the tables are: person with about 30000 tuples dyn_field_content_person, 331156 tuples dyn_field_person, just 15 tuples course about 700 tuples partitcipant with ~ 7800 tuples, where ~ 60 have course_id=707... uu=# explain analyze uu-# select lower(substr(p.last_name,1,1)) as letter, count(*) uu-# FROM course c join group_data gd on (c.active_group_id = gd.this_group_id) uu-# join person p on (gd.item_text = p.userid) uu-# join dyn_field_person dfp on (dfp.extern_item_id = 10 and dfp.giver=c.giver) uu-# join dyn_field_content_person dfcp on (dfp.id = dfcp.dyn_field_id and dfcp.userid=p.userid) uu-# left outer join participant pt on (pt.userid = p.userid and pt.course_id = 707) uu-# WHERE c.id = 707 uu-# group by 1 uu-# ; QUERY PLAN --------------------------------------------------------------------------- --------------------------------------------------------------------------- --------------------------------------- ------------------- Aggregate (cost=10496.30..10498.35 rows=27 width=106) (actual time=4166.01..4167.23 rows=19 loops=1) -> Group (cost=10496.30..10497.67 rows=273 width=106) (actual time=4165.92..4166.80 rows=60 loops=1) -> Sort (cost=10496.30..10496.98 rows=273 width=106) (actual time=4165.91..4166.10 rows=60 loops=1) Sort Key: lower(substr(p.last_name, 1, 1)) -> Merge Join (cost=10443.75..10485.23 rows=273 width=106) (actual time=4094.42..4165.20 rows=60 loops=1) Merge Cond: ("outer".userid = "inner".userid) Join Filter: ("inner".course_id = 707) -> Sort (cost=9803.86..9804.54 rows=273 width=88) (actual time=3823.78..3823.97 rows=60 loops=1) Sort Key: dfcp.userid -> Hash Join (cost=2444.22..9792.79 rows=273 width=88) (actual time=1140.50..3822.60 rows=60 loops=1) Hash Cond: ("outer".userid = "inner".item_text) Join Filter: ("inner".id = "outer".dyn_field_id) -> Seq Scan on dyn_field_content_person dfcp (cost=0.00..5643.56 rows=331156 width=16) (actual time=0.01..2028.31 rows=331156 loops=1) -> Hash (cost=2443.54..2443.54 rows=272 width=72) (actual time=340.24..340.24 rows=0 loops=1) -> Nested Loop (cost=1401.84..2443.54 rows=272 width=72) (actual time=338.76..339.91 rows=60 loops=1) Join Filter: ("outer".giver = "inner".giver) -> Seq Scan on dyn_field_person dfp (cost=0.00..1.19 rows=1 width=16) (actual time=0.06..0.09 rows=1 loops=1) Filter: (extern_item_id = 10) -> Materialize (cost=2437.67..2437.67 rows=374 width=56) (actual time=338.64..338.82 rows=60 loops=1) -> Hash Join (cost=1401.84..2437.67 rows=374 width=56) (actual time=7.74..338.36 rows=60 loops=1) Hash Cond: ("outer".userid = "inner".item_text) -> Seq Scan on person p (cost=0.00..806.09 rows=30009 width=23) (actual time=0.01..203.67 rows=30009 loops=1) -> Hash (cost=1400.89..1400.89 rows=378 width=33) (actual time=1.60..1.60 rows=0 loops=1) -> Nested Loop (cost=0.00..1400.89 rows=378 width=33) (actual time=0.12..1.28 rows=60 loops=1) -> Index Scan using course_pkey on course c (cost=0.00..5.08 rows=1 width=16) (actual time=0.06..0.06 rows=1 loops=1) Index Cond: (id = 707) -> Index Scan using group_data_this_idx on group_data gd (cost=0.00..1390.80 rows=402 width=17) (actual time=0.04..0.6 6 rows=60 loops=1) Index Cond: ("outer".active_group_id = gd.this_group_id) -> Sort (cost=639.90..659.42 rows=7808 width=18) (actual time=266.55..290.81 rows=7722 loops=1) Sort Key: pt.userid -> Seq Scan on participant pt (cost=0.00..135.08 rows=7808 width=18) (actual time=0.02..50.24 rows=7808 loops=1) Total runtime: 4170.16 msec (32 rader) Tid: 4184,68 ms uu=# set enable_seqscan = false; SET Tid: 1,20 ms uu=# explain analyze uu-# select lower(substr(p.last_name,1,1)) as letter, count(*) uu-# FROM course c join group_data gd on (c.active_group_id = gd.this_group_id) uu-# join person p on (gd.item_text = p.userid) uu-# join dyn_field_person dfp on (dfp.extern_item_id = 10 and dfp.giver=c.giver) uu-# join dyn_field_content_person dfcp on (dfp.id = dfcp.dyn_field_id and dfcp.userid=p.userid) uu-# left outer join participant pt on (pt.userid = p.userid and pt.course_id = 707) uu-# WHERE c.id = 707 uu-# group by 1 uu-# ; QUERY PLAN --------------------------------------------------------------------------- --------------------------------------------------------------------------- --------------------------------------- --------- Aggregate (cost=17928.32..17930.37 rows=27 width=106) (actual time=171.37..172.58 rows=19 loops=1) -> Group (cost=17928.32..17929.68 rows=273 width=106) (actual time=171.27..172.14 rows=60 loops=1) -> Sort (cost=17928.32..17929.00 rows=273 width=106) (actual time=171.26..171.45 rows=60 loops=1) Sort Key: lower(substr(p.last_name, 1, 1)) -> Merge Join (cost=17545.53..17917.25 rows=273 width=106) (actual time=36.64..170.53 rows=60 loops=1) Merge Cond: ("outer".userid = "inner".userid) Join Filter: ("inner".course_id = 707) -> Sort (cost=17545.53..17546.22 rows=273 width=88) (actual time=28.62..28.84 rows=60 loops=1) Sort Key: dfcp.userid -> Nested Loop (cost=0.00..17534.46 rows=273 width=88) (actual time=7.99..27.49 rows=60 loops=1) Join Filter: ("outer".id = "inner".dyn_field_id) -> Nested Loop (cost=0.00..3685.31 rows=272 width=72) (actual time=7.67..8.95 rows=60 loops=1) Join Filter: ("outer".giver = "inner".giver) -> Index Scan using df_person_giver_id_idx on dyn_field_person dfp (cost=0.00..6.20 rows=1 width=16) (actual time=0.14..0.17 rows=1 loops=1) Filter: (extern_item_id = 10) -> Materialize (cost=3674.43..3674.43 rows=374 width=56) (actual time=7.49..7.69 rows=60 loops=1) -> Nested Loop (cost=0.00..3674.43 rows=374 width=56) (actual time=0.24..7.22 rows=60 loops=1) -> Nested Loop (cost=0.00..1400.89 rows=378 width=33) (actual time=0.10..1.34 rows=60 loops=1) -> Index Scan using course_pkey on course c (cost=0.00..5.08 rows=1 width=16) (actual time=0.04..0.05 rows=1 loops=1) Index Cond: (id = 707) -> Index Scan using group_data_this_idx on group_data gd (cost=0.00..1390.80 rows=402 width=17) (actual time=0.04..0.70 rows=60 lo ops=1) Index Cond: ("outer".active_group_id = gd.this_group_id) -> Index Scan using person_pkey on person p (cost=0.00..6.01 rows=1 width=23) (actual time=0.07..0.08 rows=1 loops=60) Index Cond: ("outer".item_text = p.userid) -> Index Scan using df_content_person_userid_id_idx on dyn_field_content_person dfcp (cost=0.00..50.75 rows=12 width=16) (actual time=0.08..0.23 rows=11 l oops=60) Index Cond: (dfcp.userid = "outer".item_text) -> Index Scan using participant_uid_cid_idx on participant pt (cost=0.00..349.76 rows=7808 width=18) (actual time=0.07..84.34 rows=7722 loops=1) Total runtime: 173.37 msec (28 rader) Tid: 183,37 ms
Palle, > I have a SQL statement that I cannot get to use the index. postgresql > insists on using a seqscan and performance is very poor. set enable_seqscan > = true boost performance drastically, as you can see below. Since seqscan > is not always bad, I'd rather not turn it off completely, but rather get > the planner to do the right thing here. Is there another way to do this, > apart from setting enable_seqscan=false? In your postgresql.conf, try setting effective_cache_size to something like 50% of your system's RAM, and lovering random_page_cost to 2.0 or even 1.5. Then restart PostgreSQL and try your query again. What version, btw? -- Josh Berkus Aglio Database Solutions San Francisco
Hi, Indeed, setting random_page_cost does the trick. Thanks! It seems to make sense to set random_page_cost to this value. Are there any drawbacks? postgresql-7.3.4 postgresql.conf: tcpip_socket = true max_connections = 100 superuser_reserved_connections = 2 # Performance # shared_buffers = 12000 sort_mem = 8192 vacuum_mem = 32768 effective_cache_size = 64000 random_page_cost = 2 ... --On söndag, september 28, 2003 14.34.25 -0700 Josh Berkus <josh@agliodbs.com> wrote: > Palle, > >> I have a SQL statement that I cannot get to use the index. postgresql >> insists on using a seqscan and performance is very poor. set >> enable_seqscan = true boost performance drastically, as you can see >> below. Since seqscan is not always bad, I'd rather not turn it off >> completely, but rather get the planner to do the right thing here. Is >> there another way to do this, apart from setting enable_seqscan=false? > > In your postgresql.conf, try setting effective_cache_size to something > like 50% of your system's RAM, and lovering random_page_cost to 2.0 or > even 1.5. Then restart PostgreSQL and try your query again. > > What version, btw? > > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html
Palle, > Indeed, setting random_page_cost does the trick. Thanks! > > It seems to make sense to set random_page_cost to this value. Are there any > drawbacks? Only if your server was heavily multi-tasking, and as a result had little RAM+CPU available. Then you'd want to raise the value again. -- Josh Berkus Aglio Database Solutions San Francisco
Will that make a difference? From what I've seen, it does not make much difference, but I have seen queries speed up when rewritten explicit joins. I guess it depends on other things, but is it really so that the explicit joins are bad somehow? Do you have any pointers to documentation about it, if so? Thanks, Palle --On måndag, september 29, 2003 00.54.43 +0200 Gaetano Mendola <mendola@bigfoot.com> wrote: > Palle Girgensohn wrote: >> uu=# explain analyze >> uu-# select lower(substr(p.last_name,1,1)) as letter, count(*) >> uu-# FROM course c join group_data gd on (c.active_group_id = >> gd.this_group_id) >> uu-# join person p on (gd.item_text = p.userid) >> uu-# join dyn_field_person dfp on (dfp.extern_item_id = 10 and >> dfp.giver=c.giver) >> uu-# join dyn_field_content_person dfcp on (dfp.id = >> dfcp.dyn_field_id and dfcp.userid=p.userid) >> uu-# left outer join participant pt on (pt.userid = p.userid and >> pt.course_id = 707) >> uu-# WHERE c.id = 707 >> uu-# group by 1 >> uu-# ; > > Why are you using this form of join ? When and if is not necessary use > the implicit form. > > > Regards > Gaetano Mendola > > >
--On måndag, september 29, 2003 15.32.31 +0200 Gaetano Mendola <mendola@bigfoot.com> wrote: > Are not absolutelly bad but sometimes that path that you choose is not > the optimal, in postgres 7.4 use the explicit join will be less > limitative for the planner. > > Regards > Gaetano Mendola Ah, OK. True! In this case though, the sql questions are crafted with great care, since we have a lot of data in a few of the tables, other are almost empty, so we try to limit the amount of data as early as possible. Our experience says that we often do a better job than the planner, since we know which tables are "fat". Hence, we have actually moved to exlicit joins in questions and sometimes gained speed. But, in the general case, implicit might be better, I guess. Regards, Palle
Palle Girgensohn wrote: > Will that make a difference? From what I've seen, it does not make much > difference, but I have seen queries speed up when rewritten explicit > joins. I guess it depends on other things, but is it really so that the > explicit joins are bad somehow? Do you have any pointers to > documentation about it, if so? > > Thanks, > Palle Are not absolutelly bad but sometimes that path that you choose is not the optimal, in postgres 7.4 use the explicit join will be less limitative for the planner. Regards Gaetano Mendola
girgen@pingpong.net (Palle Girgensohn) writes: > Will that make a difference? From what I've seen, it does not make > much difference, but I have seen queries speed up when rewritten > explicit joins. I guess it depends on other things, but is it really > so that the explicit joins are bad somehow? Do you have any pointers > to documentation about it, if so? The problem is that if you expressly specify the joins, the query optimizer can't choose its own paths. And while that may not be better at the moment, it is quite possible that when you upgrade to a newer version, those queries, if "not join-specified," could immediately get faster. I would expect that the query that uses implicit joins will be clearer to read, which adds a little further merit to that direction. That goes along with the usual way that it is preferable to optimize things, namely that you should start by solving the problem as simply as you can, and only proceed to further optimization if that actually proves necessary. Optimization efforts commonly add complexity and make code more difficult to maintain; that's not the place to start if you don't even know the effort is necessary. -- (format nil "~S@~S" "cbbrowne" "libertyrms.info") <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
Palle Girgensohn wrote: > uu=# explain analyze > uu-# select lower(substr(p.last_name,1,1)) as letter, count(*) > uu-# FROM course c join group_data gd on (c.active_group_id = > gd.this_group_id) > uu-# join person p on (gd.item_text = p.userid) > uu-# join dyn_field_person dfp on (dfp.extern_item_id = 10 and > dfp.giver=c.giver) > uu-# join dyn_field_content_person dfcp on (dfp.id = > dfcp.dyn_field_id and dfcp.userid=p.userid) > uu-# left outer join participant pt on (pt.userid = p.userid and > pt.course_id = 707) > uu-# WHERE c.id = 707 > uu-# group by 1 > uu-# ; Why are you using this form of join ? When and if is not necessary use the implicit form. Regards Gaetano Mendola
Palle Girgensohn wrote: > Will that make a difference? From what I've seen, it does not make much > difference, but I have seen queries speed up when rewritten explicit > joins. I guess it depends on other things, but is it really so that the > explicit joins are bad somehow? Do you have any pointers to > documentation about it, if so? > > Thanks, > Palle Are not absolutelly bad but sometimes that path that you choose is not the optimal, in postgres 7.4 the think will be better. Regards Gaetano Mendola
--On måndag, september 29, 2003 11.12.55 -0400 Christopher Browne <cbbrowne@libertyrms.info> wrote: > girgen@pingpong.net (Palle Girgensohn) writes: >> Will that make a difference? From what I've seen, it does not make >> much difference, but I have seen queries speed up when rewritten >> explicit joins. I guess it depends on other things, but is it really >> so that the explicit joins are bad somehow? Do you have any pointers >> to documentation about it, if so? > > The problem is that if you expressly specify the joins, the query > optimizer can't choose its own paths. And while that may not be > better at the moment, it is quite possible that when you upgrade to a > newer version, those queries, if "not join-specified," could > immediately get faster. You've got a point here. Still, with some queries, since the data is pretty static and we know much about its distribution over the tables, we had to explicitally tell postgresql how to optimze the queries to get them fast enough. We cannot afford any queries to be more than fractions of seconds, really. > I would expect that the query that uses implicit joins will be clearer > to read, which adds a little further merit to that direction. Depends, I actually don't agree on this, but I guess it depends on which syntax you're used to. > That goes along with the usual way that it is preferable to optimize > things, namely that you should start by solving the problem as simply > as you can, and only proceed to further optimization if that actually > proves necessary. Optimization efforts commonly add complexity and > make code more difficult to maintain; that's not the place to start if > you don't even know the effort is necessary. Oh, but of course. For the queries I refer to, optimization actually proved necessary, believe me :-) Cheers, Palle