Re: severe performance issue with planner - Mailing list pgsql-performance

From Eric Brown
Subject Re: severe performance issue with planner
Date
Msg-id BAY13-F143A1R1q0iLo000112cf@hotmail.com
Whole thread Raw
In response to severe performance issue with planner  ("Eric Brown" <bigwhitecow@hotmail.com>)
Responses Re: severe performance issue with planner
List pgsql-performance
ok, at the advice of jurka, I will post new results:

Here's the query as I have changed it now:
SELECT
w8.wid,
w8.variant,
w8.num_variants,
sum_text(w8.unicode) as unicodes,
sum_text(w8.pinyin) as pinyins
FROM
(words as w8 JOIN
(words as w7 JOIN
(words as w6 JOIN
(words as w5 JOIN
(words as w4 JOIN
(words as w3 JOIN
(words as w2 JOIN
(words as w0 JOIN words as w1

ON(w1.wid = w0.wid AND w1.variant = w0.variant AND w1.sequence = w0.sequence
+ 1 AND w1.pinyin LIKE 'fu_'))
ON(w2.wid = w1.wid AND w2.variant = w1.variant AND w2.sequence = w1.sequence
+ 1 AND w2.pinyin LIKE 'ji_'))
ON(w3.wid = w2.wid AND w3.variant = w2.variant AND w3.sequence = w2.sequence
+ 1 AND w3.pinyin LIKE 'guan_'))
ON(w4.wid = w3.wid AND w4.variant = w3.variant AND w4.sequence = w3.sequence
+ 1 AND w4.pinyin LIKE 'kai_'))
ON(w5.wid = w4.wid AND w5.variant = w4.variant AND w5.sequence = w4.sequence
+ 1 AND w5.pinyin LIKE 'fang_'))
ON(w6.wid = w5.wid AND w6.variant = w5.variant AND w6.sequence = w5.sequence
+ 1 AND w6.pinyin LIKE 'xi_'))
ON(w7.wid = w6.wid AND w7.variant = w6.variant AND w7.sequence = w6.sequence
+ 1 AND w7.pinyin LIKE 'tong_'))
ON(w8.wid = w7.wid AND w8.variant = w7.variant))


WHERE
w0.wid > 0 AND
w0.pinyin = 'zheng4' AND
w0.def_exists = 't' AND
w0.sequence = 0
GROUP BY
w8.wid,
w8.variant,
w8.num_variants,
w8.page_order,
w0.sequence ,
w1.sequence ,
w2.sequence ,
w3.sequence ,
w4.sequence ,
w5.sequence ,
w6.sequence ,
w7.sequence
ORDER BY
w8.page_order;


And here's the output of explain analyze:



                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=54.26..54.26 rows=1 width=43) (actual time=14.916..14.917 rows=1
loops=1)
   Sort Key: w8.page_order
   ->  HashAggregate  (cost=54.24..54.25 rows=1 width=43) (actual
time=14.891..14.892 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..54.10 rows=4 width=43) (actual
time=3.676..14.446 rows=12 loops=1)
               ->  Nested Loop  (cost=0.00..48.09 rows=1 width=64) (actual
time=3.638..14.269 rows=1 loops=1)
                     Join Filter: ("outer"."sequence" = ("inner"."sequence"
+ 1))
                     ->  Nested Loop  (cost=0.00..42.06 rows=1 width=56)
(actual time=3.581..14.181 rows=1 loops=1)
                           Join Filter: (("inner".wid = "outer".wid) AND
("inner".variant = "outer".variant) AND ("inner"."sequence" =
("outer"."sequence" + 1)) AND ("outer"."sequence" = ("inner"."sequence" +
1)))
                           ->  Nested Loop  (cost=0.00..36.05 rows=1
width=48) (actual time=2.152..12.443 rows=1 loops=1)
                                 Join Filter: (("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
                                 ->  Nested Loop  (cost=0.00..30.03 rows=1
width=40) (actual time=2.104..12.368 rows=1 loops=1)
                                       Join Filter: (("outer".variant =
"inner".variant) AND ("outer".wid = "inner".wid) AND ("inner"."sequence" =
("outer"."sequence" + 1)))
                                       ->  Nested Loop  (cost=0.00..24.02
rows=1 width=32) (actual time=2.040..11.226 rows=1 loops=1)
                                             Join Filter:
("outer"."sequence" = ("inner"."sequence" + 1))
                                             ->  Nested Loop
(cost=0.00..18.00 rows=1 width=24) (actual time=1.979..11.147 rows=1
loops=1)
                                                   Join Filter:
(("inner".variant = "outer".variant) AND ("inner".wid = "outer".wid))
                                                   ->  Nested Loop
(cost=0.00..12.00 rows=1 width=16) (actual time=0.258..8.765 rows=1 loops=1)
                                                         ->  Index Scan
using pinyin_index on words w3  (cost=0.00..5.99 rows=1 width=8) (actual
time=0.084..2.399 rows=304 loops=1)
                                                               Index Cond:
(((pinyin)::text >= 'guan'::character varying) AND ((pinyin)::text <
'guao'::character varying))
                                                               Filter:
((pinyin)::text ~~ 'guan_'::text)
                                                         ->  Index Scan
using words2_pkey on words w1  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.018..0.018 rows=0 loops=304)
                                                               Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
                                                               Filter:
((pinyin)::text ~~ 'fu_'::text)
                                                   ->  Index Scan using
pinyin_index on words w7  (cost=0.00..5.99 rows=1 width=8) (actual
time=0.025..1.863 rows=338 loops=1)
                                                         Index Cond:
(((pinyin)::text >= 'tong'::character varying) AND ((pinyin)::text <
'tonh'::character varying))
                                                         Filter:
((pinyin)::text ~~ 'tong_'::text)
                                             ->  Index Scan using
words2_pkey on words w6  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.037..0.052 rows=1 loops=1)
                                                   Index Cond: ((w6.wid =
"outer".wid) AND (w6.variant = "outer".variant))
                                                   Filter: ((pinyin)::text
~~ 'xi_'::text)
                                       ->  Index Scan using pinyin_index on
words w4  (cost=0.00..5.99 rows=1 width=8) (actual time=0.028..0.874
rows=165 loops=1)
                                             Index Cond: (((pinyin)::text >=
'kai'::character varying) AND ((pinyin)::text < 'kaj'::character varying))
                                             Filter: ((pinyin)::text ~~
'kai_'::text)
                                 ->  Index Scan using words2_pkey on words
w2  (cost=0.00..6.00 rows=1 width=8) (actual time=0.023..0.047 rows=1
loops=1)
                                       Index Cond: (("outer".wid = w2.wid)
AND ("outer".variant = w2.variant))
                                       Filter: ((pinyin)::text ~~
'ji_'::text)
                           ->  Index Scan using pinyin_index on words w5
(cost=0.00..5.99 rows=1 width=8) (actual time=0.025..1.436 rows=259 loops=1)
                                 Index Cond: (((pinyin)::text >=
'fang'::character varying) AND ((pinyin)::text < 'fanh'::character varying))
                                 Filter: ((pinyin)::text ~~ 'fang_'::text)
                     ->  Index Scan using words2_pkey on words w0
(cost=0.00..6.01 rows=1 width=8) (actual time=0.030..0.058 rows=1 loops=1)
                           Index Cond: (("outer".wid = w0.wid) AND (w0.wid >
0) AND ("outer".variant = w0.variant))
                           Filter: (((pinyin)::text = 'zheng4'::text) AND
(def_exists = true) AND ("sequence" = 0))
               ->  Index Scan using words2_pkey on words w8
(cost=0.00..6.00 rows=1 width=27) (actual time=0.019..0.103 rows=12 loops=1)
                     Index Cond: ((w8.wid = "outer".wid) AND (w8.variant =
"outer".variant))
Total runtime: 15.987 ms
(44 rows)

Time: 838.446 ms


As you can see, there still appears to be some issue with planning taking
~820ms, while the actual query should only take 16ms (and as you can see i
used the "folding" technique which takes some load off of the planner.


Here is the old query/analysis as run right now:
SELECT
   w8.wid,
   w8.variant,
   w8.num_variants,
   sum_text(w8.unicode) as unicodes,
   sum_text(w8.pinyin) as pinyins
FROM
   words as w0,     words as w1,
   words as w2,     words as w3,
   words as w4,     words as w5,
   words as w6,     words as w7,
   words as w8
WHERE
   w0.wid > 0 AND
   w0.pinyin = 'zheng4' AND
   w0.def_exists = 't' AND
   w0.sequence = 0 AND
   w1.wid = w0.wid AND
   w1.pinyin LIKE 'fu_' AND
   w1.variant = w0.variant AND
   w1.sequence = (w0.sequence + 1) AND
   w2.wid = w1.wid AND
   w2.pinyin LIKE 'ji_' AND
   w2.variant = w1.variant AND
   w2.sequence = (w1.sequence + 1) AND
   w3.wid = w2.wid AND
   w3.pinyin LIKE 'guan_' AND
   w3.variant = w2.variant AND
   w3.sequence = (w2.sequence + 1) AND
   w4.wid = w3.wid AND
   w4.pinyin LIKE 'kai_' AND
   w4.variant = w3.variant AND
   w4.sequence = (w3.sequence + 1) AND
   w5.wid = w4.wid AND
   w5.pinyin LIKE 'fang_' AND
   w5.variant = w4.variant AND
   w5.sequence = (w4.sequence + 1) AND
   w6.wid = w5.wid AND
   w6.pinyin LIKE 'xi_' AND
   w6.variant = w5.variant AND
   w6.sequence = (w5.sequence + 1) AND
   w7.wid = w6.wid AND
   w7.pinyin LIKE 'tong_' AND
   w7.variant = w6.variant AND
   w7.sequence = (w6.sequence + 1) AND
   w8.wid = w7.wid AND
   w8.variant = w7.variant
GROUP BY
   w8.wid,
   w8.variant,
   w8.num_variants,
   w8.page_order ,
   w0.sequence ,
   w1.sequence ,
   w2.sequence ,
   w3.sequence ,
   w4.sequence ,
   w5.sequence ,
   w6.sequence ,
   w7.sequence
ORDER BY
   w8.page_order;

wepy=> EXPLAIN ANALYZE wepy=> explain ANALYZE
                                                                     QUERY
PLAN
wepy-> wepy->

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
wepy-> wepy->  Sort  (cost=54.15..54.16 rows=1 width=43) (actual
time=1043.148..1043.149 rows=1 loops=1)
wepy-> wepy->    Sort Key: w8.page_order
wepy-> wepy->    ->  HashAggregate  (cost=54.14..54.14 rows=1 width=43)
(actual time=1043.121..1043.122 rows=1 loops=1)
wepy-> wepy->          ->  Nested Loop  (cost=0.00..54.10 rows=1 width=43)
(actual time=9.627..1042.565 rows=12 loops=1)
wepy-> wepy->                Join Filter: (("inner"."sequence" =
("outer"."sequence" + 1)) AND ("outer"."sequence" = ("inner"."sequence" +
1)))
wepy-> wepy->                ->  Nested Loop  (cost=0.00..48.08 rows=1
width=83) (actual time=9.557..1041.784 rows=12 loops=1)
wepy-> wepy->                      Join Filter: (("outer".wid = "inner".wid)
AND ("outer".variant = "inner".variant) AND ("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
wepy-> wepy->                      ->  Nested Loop  (cost=0.00..11.99 rows=1
width=16) (actual time=3.152..290.176 rows=6 loops=1)
wepy-> wepy->                            Join Filter: (("inner".wid =
"outer".wid) AND ("inner".variant = "outer".variant) AND ("inner"."sequence"
= ("outer"."sequence" + 1)))
wepy-> wepy->                            ->  Index Scan using pinyin_index
on words w4  (cost=0.00..5.99 rows=1 width=8) (actual time=0.084..1.233
rows=165 loops=1)
wepy-> wepy->                                  Index Cond: (((pinyin)::text
 >= 'kai'::character varying) AND ((pinyin)::text < 'kaj'::character
varying))
wepy-> wepy->                                  Filter: ((pinyin)::text ~~
'kai_'::text)
wepy-> wepy->                            ->  Index Scan using pinyin_index
on words w5  (cost=0.00..5.99 rows=1 width=8) (actual time=0.018..1.411
rows=259 loops=165)
wepy-> wepy->                                  Index Cond: (((pinyin)::text
 >= 'fang'::character varying) AND ((pinyin)::text < 'fanh'::character
varying))
wepy-> wepy->                                  Filter: ((pinyin)::text ~~
'fang_'::text)
wepy-> wepy->                      ->  Nested Loop  (cost=0.00..36.06 rows=1
width=67) (actual time=4.500..125.184 rows=12 loops=6)
wepy-> wepy->                            ->  Nested Loop  (cost=0.00..30.05
rows=1 width=40) (actual time=4.446..125.003 rows=1 loops=6)
wepy-> wepy->                                  Join Filter:
("inner"."sequence" = ("outer"."sequence" + 1))
wepy-> wepy->                                  ->  Nested Loop
(cost=0.00..24.03 rows=1 width=32) (actual time=4.391..124.920 rows=1
loops=6)
wepy-> wepy->                                        ->  Nested Loop
(cost=0.00..18.01 rows=1 width=24) (actual time=4.339..124.781 rows=3
loops=6)
wepy-> wepy->                                              Join Filter:
(("inner".variant = "outer".variant) AND ("inner".wid = "outer".wid) AND
("inner"."sequence" = ("outer"."sequence" + 1)))
wepy-> wepy->                                              ->  Nested Loop
(cost=0.00..12.00 rows=1 width=16) (actual time=0.154..10.898 rows=18
loops=6)
wepy-> wepy->                                                    ->  Index
Scan using pinyin_index on words w3  (cost=0.00..5.99 rows=1 width=8)
(actual time=0.027..2.358 rows=304 loops=6)
wepy-> wepy->                                                          Index
Cond: (((pinyin)::text >= 'guan'::character varying) AND ((pinyin)::text <
'guao'::character varying))
wepy-> wepy->
Filter: ((pinyin)::text ~~ 'guan_'::text)
wepy-> wepy->                                                    ->  Index
Scan using words2_pkey on words w6  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.025..0.025 rows=0 loops=1824)
wepy-> wepy->                                                          Index
Cond: ((w6.wid = "outer".wid) AND (w6.variant = "outer".variant))
wepy-> wepy->
Filter: ((pinyin)::text ~~ 'xi_'::text)
wepy-> wepy->                                              ->  Index Scan
using pinyin_index on words w7  (cost=0.00..5.99 rows=1 width=8) (actual
time=0.017..5.788 rows=338 loops=108)
wepy-> wepy->                                                    Index Cond:
(((pinyin)::text >= 'tong'::character varying) AND ((pinyin)::text <
'tonh'::character varying))
wepy-> wepy->                                                    Filter:
((pinyin)::text ~~ 'tong_'::text)
wepy-> wepy->                                        ->  Index Scan using
words2_pkey on words w0  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.026..0.035 rows=0 loops=18)
wepy-> wepy->                                              Index Cond:
(("outer".wid = w0.wid) AND (w0.wid > 0) AND ("outer".variant = w0.variant))
wepy-> wepy->                                              Filter:
(((pinyin)::text = 'zheng4'::text) AND (def_exists = true) AND ("sequence" =
0))
wepy-> wepy->                                  ->  Index Scan using
words2_pkey on words w1  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.021..0.047 rows=1 loops=6)
wepy-> wepy->                                        Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
wepy-> wepy->                                        Filter: ((pinyin)::text
~~ 'fu_'::text)
wepy-> wepy->                            ->  Index Scan using words2_pkey on
words w8  (cost=0.00..6.00 rows=1 width=27) (actual time=0.021..0.085
rows=12 loops=6)
wepy-> wepy->                                  Index Cond: ((w8.wid =
"outer".wid) AND (w8.variant = "outer".variant))
wepy-> wepy->                ->  Index Scan using words2_pkey on words w2
(cost=0.00..6.00 rows=1 width=8) (actual time=0.021..0.046 rows=1 loops=12)
wepy-> wepy->                      Index Cond: ((w2.wid = "outer".wid) AND
(w2.variant = "outer".variant))
wepy-> wepy->                      Filter: ((pinyin)::text ~~ 'ji_'::text)
wepy-> wepy->  Total runtime: 1044.239 ms
wepy-> wepy-> (43 rows)
wepy-> wepy->
wepy-> wepy-> Time: 3939.938 ms



Even the total runtime has improved dramatically with the use of explicit
joins and folding.  However, I still need to cut the query time down much
more... My last resort is to dig through the source code and see if I can
understand how the planner works, but I suspect it's thousands of lines of
code :\

BTW, I have since upgraded to 7.4.2 (from 7.4.1), and you can verify these
planner times even with an empty table.  The table spec is below:

             Table "public.words"
   Column    |         Type         | Modifiers
--------------+----------------------+-----------
wid          | integer              | not null
sequence     | smallint             | not null
variant      | smallint             | not null
char_count   | smallint             | not null
unicode      | character varying(5) | not null
pinyin       | character varying(8) | not null
simpvar      | character varying(5) |
zvar         | character varying(5) |
compatvar    | character varying(5) |
def_exists   | boolean              | not null
num_variants | smallint             |
page_order   | integer              |
pinyins      | character varying    |
unicodes     | character varying    |
Indexes:
   "words2_pkey" primary key, btree (wid, variant, "sequence")
   "page_index" btree (page_order)
   "pinyin_index" btree (pinyin)
   "unicode_index" btree (unicode)

_________________________________________________________________
Get rid of annoying pop-up ads with the new MSN Toolbar � FREE!
http://clk.atdmt.com/AVE/go/onm00200414ave/direct/01/


pgsql-performance by date:

Previous
From: "Subbiah, Stalin"
Date:
Subject: Benchmarking postgres on Solaris/Linux
Next
From: Josh Berkus
Date:
Subject: Re: Databases Vs. Schemas