severe performance issue with planner - Mailing list pgsql-performance

From Eric Brown
Subject severe performance issue with planner
Date
Msg-id BAY13-F54DwKAYzTkuA00017c9b@hotmail.com
Whole thread Raw
Responses Re: severe performance issue with planner
Re: severe performance issue with planner
Re: severe performance issue with planner
List pgsql-performance
First let me explain the situation:

I came into the #postgresql irc channel talking about this problem, and
someone advised me to use this mailing list (so i'm not just wasting your
time i hope).  I'm not sure how to describe the problem fully, so I'll start
by explaining what my database does, a little about the table structure,
then an example of a problematic query, and some other information that
might be relevant.

My database is a Chinese-English/English-Chinese dictionary.  It lets users
search Chinese words by any character in the word, or any sequence of
characters (starting anywhere the user wants).  Characters are often
searched by their pinyin values (a romanization of the sounds).  The
dictionary has an average word length of 2 (but it sucks), but there are
also many words of length 4 and 6 characters.  So it wouldn't be uncommon to
search for something like "a1 la1 bo yu" (arabic).  There are also some very
long words with 12 or more characters (where my problem becomes more
pronounced).

That being said, the most important table here is the words table:
              Table "public.words"
    Column    |         Type         | Modifiers
--------------+----------------------+-----------
wid          | integer              | not null
sequence     | smallint             | not null
variant      | smallint             | not null
char_count   | smallint             | not null
unicode      | character varying(5) | not null
pinyin       | character varying(8) | not null
simpvar      | character varying(5) |
zvar         | character varying(5) |
compatvar    | character varying(5) |
def_exists   | boolean              | not null
num_variants | smallint             |
page_order   | integer              |
pinyins      | character varying    |
unicodes     | character varying    |
Indexes:
    "words2_pkey" primary key, btree (wid, variant, "sequence")
    "page_index" btree (page_order)
    "pinyin_index" btree (pinyin)
    "unicode_index" btree (unicode)


The best example of the problem I have when using this table is this query:

SELECT
    w8.wid,
    w8.variant,
    w8.num_variants,
    sum_text(w8.unicode) as unicodes,
    sum_text(w8.pinyin) as pinyins
FROM
    words as w0,     words as w1,
    words as w2,     words as w3,
    words as w4,     words as w5,
    words as w6,     words as w7,
    words as w8
WHERE
    w0.wid > 0 AND
    w0.pinyin = 'zheng4' AND
    w0.def_exists = 't' AND
    w0.sequence = 0 AND
    w1.wid = w0.wid AND
    w1.pinyin LIKE 'fu_' AND
    w1.variant = w0.variant AND
    w1.sequence = (w0.sequence + 1) AND
    w2.wid = w1.wid AND
    w2.pinyin LIKE 'ji_' AND
    w2.variant = w1.variant AND
    w2.sequence = (w1.sequence + 1) AND
    w3.wid = w2.wid AND
    w3.pinyin LIKE 'guan_' AND
    w3.variant = w2.variant AND
    w3.sequence = (w2.sequence + 1) AND
    w4.wid = w3.wid AND
    w4.pinyin LIKE 'kai_' AND
    w4.variant = w3.variant AND
    w4.sequence = (w3.sequence + 1) AND
    w5.wid = w4.wid AND
    w5.pinyin LIKE 'fang_' AND
    w5.variant = w4.variant AND
    w5.sequence = (w4.sequence + 1) AND
    w6.wid = w5.wid AND
    w6.pinyin LIKE 'xi_' AND
    w6.variant = w5.variant AND
    w6.sequence = (w5.sequence + 1) AND
    w7.wid = w6.wid AND
    w7.pinyin LIKE 'tong_' AND
    w7.variant = w6.variant AND
    w7.sequence = (w6.sequence + 1) AND
    w8.wid = w7.wid AND
    w8.variant = w7.variant
GROUP BY
    w8.wid,
    w8.variant,
    w8.num_variants,
    w8.page_order ,
    w0.sequence ,
    w1.sequence ,
    w2.sequence ,
    w3.sequence ,
    w4.sequence ,
    w5.sequence ,
    w6.sequence ,
    w7.sequence
ORDER BY
    w8.page_order;

(phew!)

with the default geqo_threshold of 11, this query takes 3155ms on my machine
(a 1ghz athlon with 384 megs of pc133 ram).  This is very very long.

if i first do prepare blah as SELECT ....., then run execute blah, the time
goes down to about 275ms  (i had been running this query a lot, and did a
vacuum update before all this).

the ouput from EXPLAIN ANALYZE :


                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=54.13..54.14 rows=1 width=43) (actual time=315.357..315.357
rows=1 loops=1)
   Sort Key: w8.page_order
   ->  HashAggregate  (cost=54.12..54.12 rows=1 width=43) (actual
time=315.328..315.330 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..54.08 rows=1 width=43) (actual
time=6.229..314.566 rows=12 loops=1)
               Join Filter: (("outer".wid = "inner".wid) AND
("outer".variant = "inner".variant) AND ("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
               ->  Nested Loop  (cost=0.00..48.07 rows=1 width=83) (actual
time=6.088..279.745 rows=12 loops=1)
                     Join Filter: (("inner"."sequence" = ("outer"."sequence"
+ 1)) AND ("outer"."sequence" = ("inner"."sequence" + 1)))
                     ->  Nested Loop  (cost=0.00..42.05 rows=1 width=75)
(actual time=5.980..278.602 rows=12 loops=1)
                           ->  Nested Loop  (cost=0.00..36.04 rows=1
width=48) (actual time=5.910..278.280 rows=1 loops=1)
                                 Join Filter: (("inner".variant =
"outer".variant) AND ("inner".wid = "outer".wid))
                                 ->  Nested Loop  (cost=0.00..30.04 rows=1
width=40) (actual time=3.465..275.137 rows=1 loops=1)
                                       Join Filter: ("inner"."sequence" =
("outer"."sequence" + 1))
                                       ->  Nested Loop  (cost=0.00..24.03
rows=1 width=32) (actual time=3.408..275.045 rows=1 loops=1)
                                             Join Filter:
("outer"."sequence" = ("inner"."sequence" + 1))
                                             ->  Nested Loop
(cost=0.00..18.00 rows=1 width=24) (actual time=3.350..274.948 rows=1
loops=1)
                                                   ->  Nested Loop
(cost=0.00..11.99 rows=1 width=16) (actual time=3.295..274.678 rows=6
loops=1)
                                                         Join Filter:
(("inner".wid = "outer".wid) AND ("inner".variant = "outer".variant) AND
("inner"."sequence" = ("outer"."sequence" + 1)))
                                                         ->  Index Scan
using pinyin_index on words w4  (cost=0.00..5.98 rows=1 width=8) (actual
time=0.090..1.222 rows=165 loops=1)
                                                               Index Cond:
(((pinyin)::text >= 'kai'::character varying) AND ((pinyin)::text <
'kaj'::character varying))
                                                               Filter:
((pinyin)::text ~~ 'kai_'::text)
                                                         ->  Index Scan
using pinyin_index on words w5  (cost=0.00..5.98 rows=1 width=8) (actual
time=0.017..1.380 rows=259 loops=165)
                                                               Index Cond:
(((pinyin)::text >= 'fang'::character varying) AND ((pinyin)::text <
'fanh'::character varying))
                                                               Filter:
((pinyin)::text ~~ 'fang_'::text)
                                                   ->  Index Scan using
words2_pkey on words w1  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.032..0.037 rows=0 loops=6)
                                                         Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
                                                         Filter:
((pinyin)::text ~~ 'fu_'::text)
                                             ->  Index Scan using
words2_pkey on words w0  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.033..0.068 rows=1 loops=1)
                                                   Index Cond: (("outer".wid
= w0.wid) AND (w0.wid > 0) AND ("outer".variant = w0.variant))
                                                   Filter: (((pinyin)::text
= 'zheng4'::text) AND (def_exists = true) AND ("sequence" = 0))
                                       ->  Index Scan using words2_pkey on
words w2  (cost=0.00..6.00 rows=1 width=8) (actual time=0.029..0.060 rows=1
loops=1)
                                             Index Cond: ((w2.wid =
"outer".wid) AND (w2.variant = "outer".variant))
                                             Filter: ((pinyin)::text ~~
'ji_'::text)
                                 ->  Index Scan using pinyin_index on words
w7  (cost=0.00..5.98 rows=1 width=8) (actual time=0.030..2.573 rows=338
loops=1)
                                       Index Cond: (((pinyin)::text >=
'tong'::character varying) AND ((pinyin)::text < 'tonh'::character varying))
                                       Filter: ((pinyin)::text ~~
'tong_'::text)
                           ->  Index Scan using words2_pkey on words w8
(cost=0.00..5.99 rows=1 width=27) (actual time=0.029..0.130 rows=12 loops=1)
                                 Index Cond: ((w8.wid = "outer".wid) AND
(w8.variant = "outer".variant))
                     ->  Index Scan using words2_pkey on words w6
(cost=0.00..6.00 rows=1 width=8) (actual time=0.040..0.060 rows=1 loops=12)
                           Index Cond: ((w6.wid = "outer".wid) AND
(w6.variant = "outer".variant))
                           Filter: ((pinyin)::text ~~ 'xi_'::text)
               ->  Index Scan using pinyin_index on words w3
(cost=0.00..5.98 rows=1 width=8) (actual time=0.023..2.312 rows=304
loops=12)
                     Index Cond: (((pinyin)::text >= 'guan'::character
varying) AND ((pinyin)::text < 'guao'::character varying))
                     Filter: ((pinyin)::text ~~ 'guan_'::text)
Total runtime: 316.493 ms
(44 rows)

Time: 3167.853 ms



As you can see, the two run times there are quite different... The person I
spoke to in the irc channel said this all indicated a poor planning time,
and I think I agree.  Yesterday I tried setting geqo_threshold to 7 instead
of the default of 11, and it seemed to help a little, but the running times
were still extremely high.

I guess I do have a question in addition to just wanting to notify the right
people of this problem:  Since a lot of my queries are similar to this one
(but not similar enough to allow me to use one or two of them over and over
with different parameters), is there any way for me to reorganize or rewrite
the queries so that the planner doesn't take so long?  (I would hate to have
to take all of this out of the db's hands and iterate in code myself...)

If you guys are optimistic about someone being able to fix this problem in
pgsql, I will just wait for the bug fix.


Thanks for listening :)  let me know if you need any more information (oh
yea, this is on Linux, version 7.4.1)

_________________________________________________________________
Create a Job Alert on MSN Careers and enter for a chance to win $1000!
http://msn.careerbuilder.com/promo/kaday.htm?siteid=CBMSN_1K&sc_extcmp=JS_JASweep_MSNHotm2


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Impact of varchar/text in use of indexes
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: severe performance issue with planner