Re: Major performance problem after upgrade from 8.3 to 8.4 - Mailing list pgsql-performance
From | Gerhard Wiesinger |
---|---|
Subject | Re: Major performance problem after upgrade from 8.3 to 8.4 |
Date | |
Msg-id | alpine.LFD.2.01.1009040850110.26391@bbs.intern Whole thread Raw |
In response to | Re: Major performance problem after upgrade from 8.3 to 8.4 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Major performance problem after upgrade from 8.3 to
8.4
(Gerhard Wiesinger <lists@wiesinger.com>)
|
List | pgsql-performance |
On Fri, 3 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt >> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt > > Hmm. The 8.3 plan is indeed assuming that the number of rows will stay > constant as we bubble up through the join levels, but AFAICS this is > simply wrong: > > -> Nested Loop Left Join (cost=0.00..38028.89 rows=67 width=8) > -> Nested Loop Left Join (cost=0.00..25399.46 rows=67 width=8) > -> Nested Loop Left Join (cost=0.00..12770.04 rows=67 width=8) > -> Index Scan using i_log_unique on log l (cost=0.00..140.61 rows=67 width=8) > Index Cond: (datetime >= (now() - '00:01:00'::interval)) > -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..187.39 rows=89 width=8) > Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) > -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..187.39 rows=89 width=8) > Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) > -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..187.39 rows=89 width=8) > Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) > > If the log_details indexscans are expected to produce 89 rows per > execution, then surely the join size should go up 89x at each level, > because the join steps themselves don't eliminate anything. > > In 8.4 the arithmetic is at least self-consistent: > > -> Nested Loop Left Join (cost=0.00..505256.95 rows=57630 width=8) > -> Nested Loop Left Join (cost=0.00..294671.96 rows=6059 width=8) > -> Nested Loop Left Join (cost=0.00..272532.55 rows=637 width=8) > -> Index Scan using log_pkey on log l (cost=0.00..270203.92 rows=67 width=8) > Filter: (datetime >= (now() - '00:01:00'::interval)) > -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..34.63 rows=10 width=8) > Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) > -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..34.63 rows=10 width=8) > Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) > -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..34.63 rows=10 width=8) > Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) > > The rowcount estimates are apparently a shade less than 10, but they get > rounded off in the display. > > I believe the reason for this change is that 8.4's join estimation code > was rewritten so that it wasn't completely bogus for outer joins. 8.3 > might have been getting the right answer, but it was for the wrong > reasons. > > So the real question to be answered here is why doesn't it think that > each of the unique_key_and_id indexscans produce just a single row, as > you indicated was the case. The 8.4 estimate is already a factor of > almost 10 closer to reality than 8.3's, but you need another factor of > 10. You might find that increasing the statistics target for the > log_details table helps. Ok, Tom, tried different things (more details are below): 1.) Setting statistic target to 1000 and 10000 (without success), still merge join 2.) Tried to added a Index on description to help the planner for uniqueness (without success) 3.) Forced the planner to use nested loop joins (SUCCESS): SET enable_hashjoin=false;SET enable_mergejoin=false; (BTW: How do use such settings in Java and PHP and Perl, is there a command available?) Open questions: Why does the planner not choose nested loop joins, that should be the optimal one for that situation? Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs? Any ideas for improvement of the planner? Details: -- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id, fk_keyid) -- 1000 and 10000 didn't help for better query plan for Nested Loop Left Join, still Merge Left Join -- Sample with: -- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000; -- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000; -- ANALYZE VERBOSE log_details; -- Still Merge Join: -- -> Merge Left Join (cost=9102353.88..83786934.25 rows=2726186787 width=16) -- Merge Cond: (l.id = d2000902.fk_id) -- -> Merge Left Join (cost=8926835.18..40288402.09 rows=972687282 width=24) -- Merge Cond: (l.id = d2000904.fk_id) -- Default values again ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100; ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100; ANALYZE VERBOSE log_details; -- Tried to add WITHOUT SUCCESS (that planner could know that description is NOT NULL and UNIQE) DROP INDEX IF EXISTS i_key_description_desc; CREATE UNIQUE INDEX i_key_description_desc ON key_description (description); -- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one result: (SELECT keyid FROM key_description WHEREdescription = 'Raumsolltemperatur') -- Therefore from constraint planner should know that fk_id is NOT NULL and UNIQUE: CONSTRAINT unique_key_and_id UNIQUE(fk_id,fk_keyid): -- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND -- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = 'Raumsolltemperatur') -- Does the planner value alls those UNIQUEnesses and NOT NULLs? -- Again back to 8.3 query plan which is fast (319ms): SET enable_hashjoin=false; SET enable_mergejoin=false; -- -> Nested Loop Left Join (cost=0.00..22820970510.45 rows=2727492136 width=16) -- -> Nested Loop Left Join (cost=0.00..12810087616.29 rows=973121653 width=24) -- -> Nested Loop Left Join (cost=0.00..9238379092.22 rows=347192844 width=24) Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
pgsql-performance by date: