Re: performance penalty between Postgresql 8.3.8 and 8.4.1 - Mailing list pgsql-performance

From Tom Lane
Subject Re: performance penalty between Postgresql 8.3.8 and 8.4.1
Date
Msg-id 12437.1260288206@sss.pgh.pa.us
Whole thread Raw
In response to Re: performance penalty between Postgresql 8.3.8 and 8.4.1  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Robert Haas <robertmhaas@gmail.com> writes:
> I can think of a couple of possible explanations for the behavior you're
> seeing:

The reason it's switching from a nestloop to something else is pretty
obvious: the estimate of the number of rows coming out of the lower
join has gone from 81 to 60772.  Neither of which is real accurate :-(,
but the larger value pretty strongly discourages using a nestloop.

The estimates for the individual scans mostly seem to be better than
before, in the case of xdf_road_name far better: 97 vs 1, against a true
value of 100.  So that's good; I suspect though that it just comes from
the increase in default stats target and doesn't reflect any logic
change.  The bottom line though is that it's gone from a considerable
underestimate of the join size to a considerable overestimate, and that
pushes it to use a different plan that turns out to be inferior.

I don't see any fixable bug here.  This is just a corner case where
the inherent inaccuracies in join size estimation went wrong for us;
but for every one of those there's another one where we'd get the
right answer for the wrong reason.

One thing that might be worth considering is to try to improve the
accuracy of this rowcount estimate:

                  ->  Bitmap Heap Scan on xdf_road_link rl  (cost=6.82..743.34 rows=222 width=34) (actual
time=0.025..0.115rows=7 loops=100) 
                          Recheck Cond: (rl.road_name_id = rn.road_name_id)
                          Filter: ((rl.is_exit_name = 'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar))
                          ->  Bitmap Index Scan on nx_xdfroadlink_roadnameid  (cost=0.00..6.76 rows=222 width=0)
(actualtime=0.008..0.008 rows=7 loops=100) 
                                Index Cond: (rl.road_name_id = rn.road_name_id)

I think a large part of the inaccuracy here has to do with not having
good stats for the joint effect of the is_exit_name and is_junction_name
conditions.  But to be frank that looks like bad schema design.
Consider merging those and any related flags into one "entry type"
column.

            regards, tom lane

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: error occured in dbt2 against with postgresql
Next
From: "Schmitz, David"
Date:
Subject: Re: performance penalty between Postgresql 8.3.8 and 8.4.1