Home > mailing lists

Re: Major performance problem after upgrade from 8.3 to 8.4 - Mailing list pgsql-performance

From	Gerhard Wiesinger
Subject	Re: Major performance problem after upgrade from 8.3 to 8.4
Date	September 4, 2010 03:59:24
Msg-id	alpine.LFD.2.01.1009040850110.26391@bbs.intern Whole thread Raw
In response to	Re: Major performance problem after upgrade from 8.3 to 8.4 (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Major performance problem after upgrade from 8.3 to 8.4
List	pgsql-performance

Tree view

On Fri, 3 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt
>> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt
>
> Hmm.  The 8.3 plan is indeed assuming that the number of rows will stay
> constant as we bubble up through the join levels, but AFAICS this is
> simply wrong:
>
>          ->  Nested Loop Left Join  (cost=0.00..38028.89 rows=67 width=8)
>            ->  Nested Loop Left Join  (cost=0.00..25399.46 rows=67 width=8)
>              ->  Nested Loop Left Join  (cost=0.00..12770.04 rows=67 width=8)
>                ->  Index Scan using i_log_unique on log l  (cost=0.00..140.61 rows=67 width=8)
>                    Index Cond: (datetime >= (now() - '00:01:00'::interval))
>                ->  Index Scan using unique_key_and_id on log_details d7  (cost=0.00..187.39 rows=89 width=8)
>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>              ->  Index Scan using unique_key_and_id on log_details d6  (cost=0.00..187.39 rows=89 width=8)
>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>            ->  Index Scan using unique_key_and_id on log_details d5  (cost=0.00..187.39 rows=89 width=8)
>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>
> If the log_details indexscans are expected to produce 89 rows per
> execution, then surely the join size should go up 89x at each level,
> because the join steps themselves don't eliminate anything.
>
> In 8.4 the arithmetic is at least self-consistent:
>
>          ->  Nested Loop Left Join  (cost=0.00..505256.95 rows=57630 width=8)
>            ->  Nested Loop Left Join  (cost=0.00..294671.96 rows=6059 width=8)
>              ->  Nested Loop Left Join  (cost=0.00..272532.55 rows=637 width=8)
>                ->  Index Scan using log_pkey on log l  (cost=0.00..270203.92 rows=67 width=8)
>                    Filter: (datetime >= (now() - '00:01:00'::interval))
>                ->  Index Scan using unique_key_and_id on log_details d7  (cost=0.00..34.63 rows=10 width=8)
>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>              ->  Index Scan using unique_key_and_id on log_details d6  (cost=0.00..34.63 rows=10 width=8)
>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>            ->  Index Scan using unique_key_and_id on log_details d5  (cost=0.00..34.63 rows=10 width=8)
>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>
> The rowcount estimates are apparently a shade less than 10, but they get
> rounded off in the display.
>
> I believe the reason for this change is that 8.4's join estimation code
> was rewritten so that it wasn't completely bogus for outer joins.  8.3
> might have been getting the right answer, but it was for the wrong
> reasons.
>
> So the real question to be answered here is why doesn't it think that
> each of the unique_key_and_id indexscans produce just a single row, as
> you indicated was the case.  The 8.4 estimate is already a factor of
> almost 10 closer to reality than 8.3's, but you need another factor of
> 10.  You might find that increasing the statistics target for the
> log_details table helps.


Ok, Tom, tried different things (more details are below):
1.) Setting statistic target to 1000 and 10000 (without success), still
merge join
2.) Tried to added a Index on description to help the planner for
uniqueness (without success)
3.) Forced the planner to use nested loop joins (SUCCESS):
SET enable_hashjoin=false;SET enable_mergejoin=false;
(BTW: How do use such settings in Java and PHP and Perl, is there a
command available?)

Open questions:
Why does the planner not choose nested loop joins, that should be the
optimal one for that situation?
Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs?
Any ideas for improvement of the planner?

Details:
-- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id, fk_keyid)
-- 1000 and 10000 didn't help for better query plan for Nested Loop Left Join, still Merge Left Join
-- Sample with:
-- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000;
-- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000;
-- ANALYZE VERBOSE log_details;
-- Still Merge Join:
--   ->  Merge Left Join  (cost=9102353.88..83786934.25 rows=2726186787 width=16)
--         Merge Cond: (l.id = d2000902.fk_id)
--         ->  Merge Left Join  (cost=8926835.18..40288402.09 rows=972687282 width=24)
--               Merge Cond: (l.id = d2000904.fk_id)
-- Default values again
ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100;
ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100;
ANALYZE VERBOSE log_details;

-- Tried to add WITHOUT SUCCESS (that planner could know that description is NOT NULL and UNIQE)
DROP INDEX IF EXISTS i_key_description_desc;
CREATE UNIQUE INDEX i_key_description_desc ON key_description (description);
-- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one result: (SELECT keyid FROM key_description
WHEREdescription = 'Raumsolltemperatur') 
-- Therefore from constraint planner should know that fk_id is NOT NULL and UNIQUE: CONSTRAINT unique_key_and_id
UNIQUE(fk_id,fk_keyid): 
-- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND
-- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = 'Raumsolltemperatur')
-- Does the planner value alls those UNIQUEnesses and NOT NULLs?

-- Again back to 8.3 query plan which is fast (319ms):
SET enable_hashjoin=false;
SET enable_mergejoin=false;
--   ->  Nested Loop Left Join  (cost=0.00..22820970510.45 rows=2727492136 width=16)
--         ->  Nested Loop Left Join  (cost=0.00..12810087616.29 rows=973121653 width=24)
--               ->  Nested Loop Left Join  (cost=0.00..9238379092.22 rows=347192844 width=24)

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

pgsql-performance by date:

From: Tom Lane
Date: 03 September 2010, 18:11:00
Subject: Re: Major performance problem after upgrade from 8.3 to 8.4

From: Kaloyan Iliev Iliev
Date: 07 September 2010, 04:19:29
Subject: Re: Question about LEFT JOIN and query plan

Re: Major performance problem after upgrade from 8.3 to 8.4 - Mailing list pgsql-performance

Previous

Next