Re: Bad plan after vacuum analyze - Mailing list pgsql-performance

From Josh Berkus
Subject Re: Bad plan after vacuum analyze
Date
Msg-id 200505111110.15983.josh@agliodbs.com
Whole thread Raw
In response to Bad plan after vacuum analyze  (Guillaume Smet <guillaume_ml@smet.org>)
Responses Re: Bad plan after vacuum analyze  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Guillaume,

> We reproduced the problem on a 7.4.5 and on a 7.4.7 server.
> * we load the dump in a new database
> * query: it's fast (< 1ms)
> * VACUUM FULL ANALYZE;
> * query: it's really slow (130ms) and it's another plan
> * set enable_seqscan=off;
> * query: it's fast (< 1ms) : it uses the best plan

Looking at this, the planner seems convinced that the merge join is the
easiest way to do the OUTER JOINS, but it appears to be wrong; a nested loop
is faster.

This isn't the only place I've encountered our optimizer doing this --
underestimating the cost of a merge join.  This seems to be becuase the
merge_join vs. nested_loop decision seems to be being made in the planner
without taking the double-sort and index access costs into account.   This
query is an excellent example:

"good" plan:
 Nested Loop Left Join  (cost=2.44..17.36 rows=1 width=5532) (actual
time=0.441..0.466 rows=1 loops=1)
   Join Filter: ("outer".parent_application_id = "inner".application_id)
   ->  Nested Loop Left Join  (cost=2.44..15.73 rows=1 width=5214) (actual
time=0.378..0.402 rows=1 loops=1)

See, here the planner thinks that the 2 nested loops will cost "35".

"bad" plan:
 Merge Right Join  (cost=9.27..9.48 rows=1 width=545) (actual
time=129.364..129.365 rows=1 loops=1)
   Merge Cond: ("outer".application_id = "inner".parent_application_id)
   ->  Index Scan using applicati_applicati_id_p_ogstm on applications t116
(cost=0.00..5.51 rows=28 width=20) (actual time=0.030..0.073 rows=28 loops=1)
   ->  Sort  (cost=9.27..9.27 rows=1 width=529) (actual time=129.202..129.203
rows=1 loops=1)
         Sort Key: t22.parent_application_id
         ->  Merge Right Join  (cost=8.92..9.26 rows=1 width=529) (actual
time=129.100..129.103 rows=1 loops=1)
               Merge Cond: ("outer".object_id = "inner".parent_application_id)
               ->  Index Scan using acs_objects_object_id_p_hhkb1 on
acs_objects t98  (cost=0.00..2554.07 rows=33510 width=81) (actual
time=0.043..56.392 rows=33510 loops=1)
               ->  Sort  (cost=8.92..8.93 rows=1 width=452) (actual
time=0.309..0.310 rows=1 loops=1)
                     Sort Key: t22.parent_application_id

Here the planner chooses a merge right join.  This decision seems to have been
made entirely on the basis of the cost of the join itself (total of 17)
without taking the cost of the sort and index access (total of 2600+) into
account.

Tom, is this a possible error in planner logic?


--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Guillaume Smet
Date:
Subject: Bad plan after vacuum analyze
Next
From: Tom Lane
Date:
Subject: Re: Bad plan after vacuum analyze