severe performance issue with planner - Mailing list pgsql-performance
From | Eric Brown |
---|---|
Subject | severe performance issue with planner |
Date | |
Msg-id | BAY13-F54DwKAYzTkuA00017c9b@hotmail.com Whole thread Raw |
Responses |
Re: severe performance issue with planner
Re: severe performance issue with planner Re: severe performance issue with planner |
List | pgsql-performance |
First let me explain the situation: I came into the #postgresql irc channel talking about this problem, and someone advised me to use this mailing list (so i'm not just wasting your time i hope). I'm not sure how to describe the problem fully, so I'll start by explaining what my database does, a little about the table structure, then an example of a problematic query, and some other information that might be relevant. My database is a Chinese-English/English-Chinese dictionary. It lets users search Chinese words by any character in the word, or any sequence of characters (starting anywhere the user wants). Characters are often searched by their pinyin values (a romanization of the sounds). The dictionary has an average word length of 2 (but it sucks), but there are also many words of length 4 and 6 characters. So it wouldn't be uncommon to search for something like "a1 la1 bo yu" (arabic). There are also some very long words with 12 or more characters (where my problem becomes more pronounced). That being said, the most important table here is the words table: Table "public.words" Column | Type | Modifiers --------------+----------------------+----------- wid | integer | not null sequence | smallint | not null variant | smallint | not null char_count | smallint | not null unicode | character varying(5) | not null pinyin | character varying(8) | not null simpvar | character varying(5) | zvar | character varying(5) | compatvar | character varying(5) | def_exists | boolean | not null num_variants | smallint | page_order | integer | pinyins | character varying | unicodes | character varying | Indexes: "words2_pkey" primary key, btree (wid, variant, "sequence") "page_index" btree (page_order) "pinyin_index" btree (pinyin) "unicode_index" btree (unicode) The best example of the problem I have when using this table is this query: SELECT w8.wid, w8.variant, w8.num_variants, sum_text(w8.unicode) as unicodes, sum_text(w8.pinyin) as pinyins FROM words as w0, words as w1, words as w2, words as w3, words as w4, words as w5, words as w6, words as w7, words as w8 WHERE w0.wid > 0 AND w0.pinyin = 'zheng4' AND w0.def_exists = 't' AND w0.sequence = 0 AND w1.wid = w0.wid AND w1.pinyin LIKE 'fu_' AND w1.variant = w0.variant AND w1.sequence = (w0.sequence + 1) AND w2.wid = w1.wid AND w2.pinyin LIKE 'ji_' AND w2.variant = w1.variant AND w2.sequence = (w1.sequence + 1) AND w3.wid = w2.wid AND w3.pinyin LIKE 'guan_' AND w3.variant = w2.variant AND w3.sequence = (w2.sequence + 1) AND w4.wid = w3.wid AND w4.pinyin LIKE 'kai_' AND w4.variant = w3.variant AND w4.sequence = (w3.sequence + 1) AND w5.wid = w4.wid AND w5.pinyin LIKE 'fang_' AND w5.variant = w4.variant AND w5.sequence = (w4.sequence + 1) AND w6.wid = w5.wid AND w6.pinyin LIKE 'xi_' AND w6.variant = w5.variant AND w6.sequence = (w5.sequence + 1) AND w7.wid = w6.wid AND w7.pinyin LIKE 'tong_' AND w7.variant = w6.variant AND w7.sequence = (w6.sequence + 1) AND w8.wid = w7.wid AND w8.variant = w7.variant GROUP BY w8.wid, w8.variant, w8.num_variants, w8.page_order , w0.sequence , w1.sequence , w2.sequence , w3.sequence , w4.sequence , w5.sequence , w6.sequence , w7.sequence ORDER BY w8.page_order; (phew!) with the default geqo_threshold of 11, this query takes 3155ms on my machine (a 1ghz athlon with 384 megs of pc133 ram). This is very very long. if i first do prepare blah as SELECT ....., then run execute blah, the time goes down to about 275ms (i had been running this query a lot, and did a vacuum update before all this). the ouput from EXPLAIN ANALYZE : QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=54.13..54.14 rows=1 width=43) (actual time=315.357..315.357 rows=1 loops=1) Sort Key: w8.page_order -> HashAggregate (cost=54.12..54.12 rows=1 width=43) (actual time=315.328..315.330 rows=1 loops=1) -> Nested Loop (cost=0.00..54.08 rows=1 width=43) (actual time=6.229..314.566 rows=12 loops=1) Join Filter: (("outer".wid = "inner".wid) AND ("outer".variant = "inner".variant) AND ("outer"."sequence" = ("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" + 1))) -> Nested Loop (cost=0.00..48.07 rows=1 width=83) (actual time=6.088..279.745 rows=12 loops=1) Join Filter: (("inner"."sequence" = ("outer"."sequence" + 1)) AND ("outer"."sequence" = ("inner"."sequence" + 1))) -> Nested Loop (cost=0.00..42.05 rows=1 width=75) (actual time=5.980..278.602 rows=12 loops=1) -> Nested Loop (cost=0.00..36.04 rows=1 width=48) (actual time=5.910..278.280 rows=1 loops=1) Join Filter: (("inner".variant = "outer".variant) AND ("inner".wid = "outer".wid)) -> Nested Loop (cost=0.00..30.04 rows=1 width=40) (actual time=3.465..275.137 rows=1 loops=1) Join Filter: ("inner"."sequence" = ("outer"."sequence" + 1)) -> Nested Loop (cost=0.00..24.03 rows=1 width=32) (actual time=3.408..275.045 rows=1 loops=1) Join Filter: ("outer"."sequence" = ("inner"."sequence" + 1)) -> Nested Loop (cost=0.00..18.00 rows=1 width=24) (actual time=3.350..274.948 rows=1 loops=1) -> Nested Loop (cost=0.00..11.99 rows=1 width=16) (actual time=3.295..274.678 rows=6 loops=1) Join Filter: (("inner".wid = "outer".wid) AND ("inner".variant = "outer".variant) AND ("inner"."sequence" = ("outer"."sequence" + 1))) -> Index Scan using pinyin_index on words w4 (cost=0.00..5.98 rows=1 width=8) (actual time=0.090..1.222 rows=165 loops=1) Index Cond: (((pinyin)::text >= 'kai'::character varying) AND ((pinyin)::text < 'kaj'::character varying)) Filter: ((pinyin)::text ~~ 'kai_'::text) -> Index Scan using pinyin_index on words w5 (cost=0.00..5.98 rows=1 width=8) (actual time=0.017..1.380 rows=259 loops=165) Index Cond: (((pinyin)::text >= 'fang'::character varying) AND ((pinyin)::text < 'fanh'::character varying)) Filter: ((pinyin)::text ~~ 'fang_'::text) -> Index Scan using words2_pkey on words w1 (cost=0.00..6.00 rows=1 width=8) (actual time=0.032..0.037 rows=0 loops=6) Index Cond: (("outer".wid = w1.wid) AND ("outer".variant = w1.variant)) Filter: ((pinyin)::text ~~ 'fu_'::text) -> Index Scan using words2_pkey on words w0 (cost=0.00..6.01 rows=1 width=8) (actual time=0.033..0.068 rows=1 loops=1) Index Cond: (("outer".wid = w0.wid) AND (w0.wid > 0) AND ("outer".variant = w0.variant)) Filter: (((pinyin)::text = 'zheng4'::text) AND (def_exists = true) AND ("sequence" = 0)) -> Index Scan using words2_pkey on words w2 (cost=0.00..6.00 rows=1 width=8) (actual time=0.029..0.060 rows=1 loops=1) Index Cond: ((w2.wid = "outer".wid) AND (w2.variant = "outer".variant)) Filter: ((pinyin)::text ~~ 'ji_'::text) -> Index Scan using pinyin_index on words w7 (cost=0.00..5.98 rows=1 width=8) (actual time=0.030..2.573 rows=338 loops=1) Index Cond: (((pinyin)::text >= 'tong'::character varying) AND ((pinyin)::text < 'tonh'::character varying)) Filter: ((pinyin)::text ~~ 'tong_'::text) -> Index Scan using words2_pkey on words w8 (cost=0.00..5.99 rows=1 width=27) (actual time=0.029..0.130 rows=12 loops=1) Index Cond: ((w8.wid = "outer".wid) AND (w8.variant = "outer".variant)) -> Index Scan using words2_pkey on words w6 (cost=0.00..6.00 rows=1 width=8) (actual time=0.040..0.060 rows=1 loops=12) Index Cond: ((w6.wid = "outer".wid) AND (w6.variant = "outer".variant)) Filter: ((pinyin)::text ~~ 'xi_'::text) -> Index Scan using pinyin_index on words w3 (cost=0.00..5.98 rows=1 width=8) (actual time=0.023..2.312 rows=304 loops=12) Index Cond: (((pinyin)::text >= 'guan'::character varying) AND ((pinyin)::text < 'guao'::character varying)) Filter: ((pinyin)::text ~~ 'guan_'::text) Total runtime: 316.493 ms (44 rows) Time: 3167.853 ms As you can see, the two run times there are quite different... The person I spoke to in the irc channel said this all indicated a poor planning time, and I think I agree. Yesterday I tried setting geqo_threshold to 7 instead of the default of 11, and it seemed to help a little, but the running times were still extremely high. I guess I do have a question in addition to just wanting to notify the right people of this problem: Since a lot of my queries are similar to this one (but not similar enough to allow me to use one or two of them over and over with different parameters), is there any way for me to reorganize or rewrite the queries so that the planner doesn't take so long? (I would hate to have to take all of this out of the db's hands and iterate in code myself...) If you guys are optimistic about someone being able to fix this problem in pgsql, I will just wait for the bug fix. Thanks for listening :) let me know if you need any more information (oh yea, this is on Linux, version 7.4.1) _________________________________________________________________ Create a Job Alert on MSN Careers and enter for a chance to win $1000! http://msn.careerbuilder.com/promo/kaday.htm?siteid=CBMSN_1K&sc_extcmp=JS_JASweep_MSNHotm2
pgsql-performance by date: