Thread: [Fwd: query efficiency - Can I speed it up?]
-- Ann "In a world without walls and fences - who needs Windows and Gates ?" (unknown) Dear All Can anyone tell me if/how I can speed up this query? You could just point me at a good 'guide to building efficient queries' if I'm missing something obvious. I've tried a few online sources and most of my queries are much faster now but this one ??? Thanks nemdb3=# explain select pept_id,description,confidence from gotcha inner join go on (gotcha.go_term=go.go_term) where go.description~'oxygen' and ( spid='ALP' or spid='ASP' or spid='DIP' or spid='GPP' ) order by gotcha.confidence; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=187523.03..187523.54 rows=203 width=54) Sort Key: gotcha.confidence -> Nested Loop (cost=0.00..187515.25 rows=203 width=54) -> Seq Scan on go (cost=0.00..266.50 rows=4 width=50) Filter: (description ~ 'oxygen'::text) -> Index Scan using gotcha_go_term_sp_id on gotcha (cost=0.00..46809.29 rows=232 width=32) Index Cond: ((gotcha.go_term)::text = ("outer".go_term)::text) Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text = 'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text = 'GPP'::text)) (8 rows) nemdb3=# \d go Table "public.go" Column | Type | Modifiers -----------------+-----------------------+----------- go_term | character varying(14) | go_asp | character varying(1) | description | text | hierarchy_level | integer | Indexes: "go_go_term" btree (go_term) nemdb3=# \d gotcha Table "public.gotcha" Column | Type | Modifiers ------------+-----------------------+----------- run_no | character varying(20) | pept_id | character varying(15) | contig | integer | confidence | integer | go_term | character varying(14) | hits | integer | spid | character varying(5) | Indexes: "gotcha_go_term_sp_id" btree (go_term, spid) "gotcha_pept_id" btree (pept_id) "gotcha_spid" btree (spid) nemdb3=# select relname, relkind, reltuples, relpages from pg_class where relname like 'go%'; relname | relkind | reltuples | relpages ------------------------+---------+-------------+---------- gotcha | r | 2.70698e+07 | 695188 go | r | 10600 | 134 gotcha_pept_id | i | 2.70698e+07 | 104266 go_go_term | i | 10600 | 43 gotcha_spid | i | 2.70698e+07 | 74225 gotcha_go_term_sp_id | i | 2.70698e+07 | 134201 (11 rows) nemdb3=# -- Ann "In a world without walls and fences - who needs Windows and Gates ?" (unknown)
ann hedley <ann.hedley@ed.ac.uk> writes: > Can anyone tell me if/how I can speed up this query? Try a newer version of Postgres --- I think 8.1 is the first one that can make decent use of that two-column index in this type of query. Notice that in the plan, only the "go_term" column is being checked in the index condition: > -> Index Scan using gotcha_go_term_sp_id on gotcha > (cost=0.00..46809.29 rows=232 width=32) > Index Cond: ((gotcha.go_term)::text = > ("outer".go_term)::text) > Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text = > 'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text = > 'GPP'::text)) I take it from the enormous cost that there are going to be lots of rows with the same go_term, and it's the spid filter that is cutting it down to a reasonable number of rows ... but this plan is going to visit the heap for every row matching go_term, because the planner isn't smart enough to fold the OR'd restriction clause together with the join clause to make an index condition. It applies it as a "filter" instead which is way way slower in this situation. I can't tell exactly which PG release you're using, but it's definitely older than 8.1. regards, tom lane
psql (PostgreSQL) 8.0.3 Thanks, I'll try an upgrade. Tom Lane wrote: > ann hedley <ann.hedley@ed.ac.uk> writes: > >> Can anyone tell me if/how I can speed up this query? >> > > Try a newer version of Postgres --- I think 8.1 is the first one that > can make decent use of that two-column index in this type of query. > Notice that in the plan, only the "go_term" column is being checked > in the index condition: > > >> -> Index Scan using gotcha_go_term_sp_id on gotcha >> (cost=0.00..46809.29 rows=232 width=32) >> Index Cond: ((gotcha.go_term)::text = >> ("outer".go_term)::text) >> Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text = >> 'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text = >> 'GPP'::text)) >> > > I take it from the enormous cost that there are going to be lots of rows > with the same go_term, and it's the spid filter that is cutting it down > to a reasonable number of rows ... but this plan is going to visit the > heap for every row matching go_term, because the planner isn't smart > enough to fold the OR'd restriction clause together with the join > clause to make an index condition. It applies it as a "filter" instead > which is way way slower in this situation. I can't tell exactly which > PG release you're using, but it's definitely older than 8.1. > > regards, tom lane > > -- Ann "In a world without walls and fences - who needs Windows and Gates ?" (unknown)