Thread: [repost] partial index / funxtional idx or bad sql?
my apologies - a strange key combination sent the message early. ---- greetings. i have a query that is taking a rather long time to execute and have been looking into setting up a partial index to help, although i'm not sure if this is what i want. here is the (simplified) table "posts": id serial type_id int areacode smallint content text and the other table (areacodes) referenced: site_id smallint areacode smallint the query is: SELECT p.id, p.areacode, p.content FROM posts p WHERE p.type_id = ? AND p.areacode in ( select areacode from areacodes where site_id = ? ) the "posts" table has 100,000 rows of varying data, across areacodes and types. given the type_id and site_id, the query is currently taking ~4 seconds to return 8500 rows (on a dual proc/ gig ram linux box). indexes on table "posts" are: primary key (id) and another on both (type_id, areacode) index on the table "areacodes" is (site_id, areacode). would a parital index help in speeding up this query? are my current indexes counter productive? or is it just my sql that need help? thanks much for any help or pointers to information. - seth __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
Seth, > SELECT p.id, p.areacode, p.content > FROM posts p > WHERE p.type_id = ? > AND p.areacode in ( > select areacode from areacodes > where site_id = ? > ) Unless you're using 7.4 from CVS, you want to get rid of that IN: SELECT p.id, p.areacode, p.content FROM posts p WHERE p.type_id = ? AND EXISTS ( select areacode from areacodes where site_id = ? and p.areacode = areacodes.areacode ); See how that works, and if it's still slow, post the EXPLAIN ANALYZE. -- -Josh Berkus Aglio Database Solutions San Francisco
hi josh. i'm using 7.3.2. i tried using EXISTS instead of the IN, but the same query now returns in seven sceonds as opposed to four with the IN. cmdb=# EXPLAIN ANALYZE cmdb-# select c.class_id, c.areacode, c.title from classifieds c cmdb-# where c.class_cat_id = '1' cmdb-# and c.areacode IN ( cmdb(# select areacode from cm_areacode where site_id = '10') cmdb-# ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using classifieds_dual_idx on classifieds c (cost=0.00..26622.14 rows=1837 width=39) (actual time=345.48..2305.04 rows=8460 loops=1) Index Cond: (class_cat_id = 1) Filter: (subplan) SubPlan -> Materialize (cost=3.46..3.46 rows=4 width=2) (actual time=0.00..0.01 rows=5 loops=61966) -> Index Scan using site_cm_areacode_idx on cm_areacode (cost=0.00..3.46 rows=4 width=2) (actual time=0.14..0.22 rows=5 loops=1) Index Cond: (site_id = 10) Total runtime: 2314.14 msec (8 rows) ---------------------------------- classifieds_dual_idx is the btree index on (class_type_id, areacode) and site_cm_areacode_idx is the btree index on (site_id) only. there is an index in the areacode table that has both (site_id, areacode) but it's apparently not being used. would it help the query to use that index instead? thanks for your help. --- Josh Berkus <josh@agliodbs.com> wrote: > Seth, > > > SELECT p.id, p.areacode, p.content > > FROM posts p > > WHERE p.type_id = ? > > AND p.areacode in ( > > select areacode from areacodes > > where site_id = ? > > ) > > Unless you're using 7.4 from CVS, you want to get rid of that IN: > > SELECT p.id, p.areacode, p.content > FROM posts p > WHERE p.type_id = ? > AND EXISTS ( > select areacode from areacodes > where site_id = ? > and p.areacode = areacodes.areacode > ); > > See how that works, and if it's still slow, post the EXPLAIN ANALYZE. __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
Csajl, > i'm using 7.3.2. i tried using EXISTS instead of the IN, but the same > query now returns in seven sceonds as opposed to four with the IN. <snip> > classifieds_dual_idx is the btree index on (class_type_id, areacode) > and site_cm_areacode_idx is the btree index on (site_id) only. > there is an index in the areacode table that has both (site_id, areacode) > but it's apparently not being used. would it help the query to use that > index instead? No. From the look of things, it's not the index scan that's taking time ... it's the subplan, which is doing 61,000 loops. Which is normal for IN, but not for EXISTS. You run VACUUM ANALYZE? -- Josh Berkus Aglio Database Solutions San Francisco
On Mon, 12 May 2003, csajl wrote: > i'm using 7.3.2. i tried using EXISTS instead of the IN, but the same query > now returns in seven sceonds as opposed to four with the IN. > > > cmdb=# EXPLAIN ANALYZE > cmdb-# select c.class_id, c.areacode, c.title from classifieds c > cmdb-# where c.class_cat_id = '1' > cmdb-# and c.areacode IN ( > cmdb(# select areacode from cm_areacode where site_id = '10') > cmdb-# ; How about something like: select c.class_id, c.areacode, c.title from classifieds c, (select distinct areacode from cm_areacode where site_id='10') a where c.class_cat_id='1' and c.areacode=a.areacode;
hi josh. thanks for your help and time with this. ran vacuum analyze, still timed in around 3seconds. i dropped the site_id only index on the areacodes table in favor of the dual site_id and areacode index and seemingly gained 1/2 second. by using the IN, i gain another .3 of a second. (i thought EXISTS was supposed to be more efficient?) the loop on the subplan (~62k) is killing me. any alternatives to what i thought would be a seemingly innocuous lookup? the cm_Areacode table is nothing more than two columns, associating each areacode into a site_id. (292 rows if i remember correctly) cmdb=# EXPLAIN ANALYZE cmdb-# select c.class_id, c.areacode, c.title from classifieds c cmdb-# where c.class_cat_id = '1' cmdb-# and EXISTS ( cmdb(# select areacode from cm_areacode cm where site_id = '10' and c.areacode = cm.areacode) cmdb-# ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using classifieds_dual_idx on classifieds c (cost=0.00..493277.77 rows=28413 width=39) (actual time=360.23..2523.08 rows=8460 loops=1) Index Cond: (class_cat_id = 1) Filter: (subplan) SubPlan -> Index Scan using areacode_site_dual_cmareacode on cm_areacode cm (cost=0.00..4.96 rows=1 width=2) (actual time=0.01..0.01 rows=0 loops=61966) Index Cond: ((site_id = 10) AND ($0 = areacode)) Total runtime: 2533.93 msec (7 rows) cmdb=# ------------------------------------ cmdb=# EXPLAIN ANALYZE cmdb-# select c.class_id, c.areacode, c.title from classifieds c cmdb-# where c.class_cat_id = '1' cmdb-# and c.areacode IN ( cmdb(# select areacode from cm_areacode where site_id = '10') cmdb-# ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using classifieds_dual_idx on classifieds c (cost=0.00..632183.80 rows=28413 width=39) (actual time=344.70..2287.93 rows=8460 loops=1) Index Cond: (class_cat_id = 1) Filter: (subplan) SubPlan -> Materialize (cost=7.40..7.40 rows=4 width=2) (actual time=0.00..0.00 rows=5 loops=61966) -> Seq Scan on cm_areacode (cost=0.00..7.40 rows=4 width=2) (actual time=0.20..0.73 rows=5 loops=1) Filter: (site_id = 10) Total runtime: 2296.83 msec (8 rows) --- Josh Berkus <josh@agliodbs.com> wrote: > Csajl, > > > i'm using 7.3.2. i tried using EXISTS instead of the IN, but the same > > query now returns in seven sceonds as opposed to four with the IN. > <snip> > > classifieds_dual_idx is the btree index on (class_type_id, areacode) > > and site_cm_areacode_idx is the btree index on (site_id) only. > > there is an index in the areacode table that has both (site_id, areacode) > > but it's apparently not being used. would it help the query to use that > > index instead? > > No. > From the look of things, it's not the index scan that's taking time ... it's > the subplan, which is doing 61,000 loops. Which is normal for IN, but not > for EXISTS. You run VACUUM ANALYZE? __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
wow. that did it. so much for my knowing SQL... unbelievable - thanks much. cmdb=# EXPLAIN ANALYZE cmdb-# select c.class_id, c.areacode, c.title from classifieds c cmdb-# , (select distinct areacode from cm_areacode where site_id='10') a cmdb-# where c.class_cat_id='1' and c.areacode=a.areacode; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=7.44..1107.53 rows=279 width=41) (actual time=1.13..258.11 rows=8460 loops=1) -> Subquery Scan a (cost=7.44..7.46 rows=1 width=2) (actual time=0.86..0.92 rows=5 loops=1) -> Unique (cost=7.44..7.46 rows=1 width=2) (actual time=0.85..0.88 rows=5 loops=1) -> Sort (cost=7.44..7.45 rows=4 width=2) (actual time=0.85..0.86 rows=5 loops=1) Sort Key: areacode -> Seq Scan on cm_areacode (cost=0.00..7.40 rows=4 width=2) (actual time=0.20..0.73 rows=5 loops=1) Filter: (site_id = 10) -> Index Scan using classifieds_dual_idx on classifieds c (cost=0.00..1096.59 rows=279 width=39) (actual time=0.22..44.28 rows=1692 loops=5) Index Cond: ((c.class_cat_id = 1) AND (c.areacode = "outer".areacode)) Total runtime: 267.71 msec (10 rows) --- Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > > On Mon, 12 May 2003, csajl wrote: > > > i'm using 7.3.2. i tried using EXISTS instead of the IN, but the same > query > > now returns in seven sceonds as opposed to four with the IN. > > > > > > cmdb=# EXPLAIN ANALYZE > > cmdb-# select c.class_id, c.areacode, c.title from classifieds c > > cmdb-# where c.class_cat_id = '1' > > cmdb-# and c.areacode IN ( > > cmdb(# select areacode from cm_areacode where site_id = '10') > > cmdb-# ; > > How about something like: > > select c.class_id, c.areacode, c.title from > classifieds c, > (select distinct areacode from cm_areacode where site_id='10') a > where c.class_cat_id='1' and c.areacode=a.areacode; > __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
On Mon, May 12, 2003 at 09:03:38PM -0700, csajl wrote: > > wow. > > that did it. so much for my knowing SQL... > > > > How about something like: > > > > select c.class_id, c.areacode, c.title from > > classifieds c, > > (select distinct areacode from cm_areacode where site_id='10') a > > where c.class_cat_id='1' and c.areacode=a.areacode; > > Wow, I'll have to keep that in mind. Shouldn't the optimizer be able to handle that? Could this get added to the TODO? -- Jim C. Nasby (aka Decibel!) jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <jim@nasby.net> writes: > Wow, I'll have to keep that in mind. Shouldn't the optimizer be able to > handle that? Could this get added to the TODO? No, 'cause it's done (in CVS tip). I'm actually a bit hesitant now to recommend that people do such things, because the 7.4 optimizer is likely to produce a better plan from the unmodified IN query than it will from any explicitly "improved" version. The 7.4 code knows several ways to do IN efficiently, but when you hand-transform the query you are forcing the choice; perhaps wrongly. An example from CVS tip and the regression database in which hand transformation forces a less efficient plan choice: regression=# explain analyze select * from tenk1 a where unique1 in (select ten from tenk1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=483.17..484.91 rows=10 width=248) (actual time=407.14..409.16 rows=10 loops=1) Merge Cond: ("outer".unique1 = "inner".ten) -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..1571.97 rows=10000 width=244) (actual time=0.41..1.60 rows=11loops=1) -> Sort (cost=483.17..483.19 rows=10 width=4) (actual time=406.57..406.65 rows=10 loops=1) Sort Key: tenk1.ten -> HashAggregate (cost=483.00..483.00 rows=10 width=4) (actual time=406.08..406.26 rows=10 loops=1) -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=4) (actual time=0.19..261.84 rows=10000 loops=1) Total runtime: 410.74 msec (8 rows) regression=# explain analyze select * from tenk1 a, (select distinct ten from tenk1) b regression-# where a.unique1 = b.ten; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1122.39..1232.59 rows=10 width=248) (actual time=476.67..666.02 rows=10 loops=1) -> Subquery Scan b (cost=1122.39..1172.39 rows=10 width=4) (actual time=475.94..662.00 rows=10 loops=1) -> Unique (cost=1122.39..1172.39 rows=10 width=4) (actual time=475.89..661.65 rows=10 loops=1) -> Sort (cost=1122.39..1147.39 rows=10000 width=4) (actual time=475.85..559.27 rows=10000 loops=1) Sort Key: ten -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=4) (actual time=0.37..274.87 rows=10000 loops=1) -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..6.01 rows=1 width=244) (actual time=0.27..0.31 rows=1 loops=10) Index Cond: (a.unique1 = "outer".ten) Total runtime: 687.53 msec (9 rows) So, for now, make the transformation ... but keep a note about the IN version to try whenever you update to 7.4. regards, tom lane