Re: performance regression in 9.2/9.3 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: performance regression in 9.2/9.3
Date
Msg-id 10598.1401989997@sss.pgh.pa.us
Whole thread Raw
In response to Re: performance regression in 9.2/9.3  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: performance regression in 9.2/9.3  (Linos <info@linos.es>)
List pgsql-hackers
Merlin Moncure <mmoncure@gmail.com> writes:
> On Thu, Jun 5, 2014 at 9:54 AM, Linos <info@linos.es> wrote:
>> What I don't understand is why the statistics have this bad information, all my tests are done on a database just
restoredand analyzed. Can I do something to improve the quality of my database statistics and let the planner do better
choices?Maybe increase the statistics target of the columns involved?
 

> By that I meant row count estimates coming out of the joins are way
> off.  This is pushing the planner into making bad choices.  The most
> pervasive problem I see is that the row count estimate boils down to
> '1' at some juncture causing the server to favor nestloop/index scan
> when something like a hash join would likely be more appropriate.

There's some fairly wacko stuff going on in this example, like why
is the inner HashAggregate costed so much higher by 9.3 than 8.4,
when the inputs are basically the same?  And why does 9.3 fail to
suppress the SubqueryScan on "ven", when 8.4 does get rid of it?
And why is the final output rows estimate so much higher in 9.3?
That one is actually higher than the product of the two nestloop
inputs, which looks like possibly a bug.

I think what's happening is that 9.3 is picking what it knows to be a less
than optimal join method so that it can sort the output by means of the
ordered scan "Index Scan using referencia_key on modelo mo", and thereby
avoid an explicit sort of what it thinks would be 42512461 rows.  With a
closer-to-reality estimate there, it would have gone for a plan more
similar to 8.4's, ie, hash joins and then an explicit sort.

There is a lot going on in this plan that we haven't been told about; for
instance at least one of the query's tables seems to actually be a view,
and some other ones appear to be inheritance trees with partitioning
constraints, and I'm suspicious that some of the aggregates might be
user-defined functions with higher than normal costs.

I'd like to see a self-contained test case, by which I mean full details
about the table/view schemas; it's not clear whether the actual data
is very important here.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alexey Klyukin
Date:
Subject: Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4
Next
From: Tom Lane
Date:
Subject: Re: slotname vs slot_name