Revisiting default_statistics_target - Mailing list pgsql-hackers

From Greg Smith
Subject Revisiting default_statistics_target
Date
Msg-id alpine.GSO.2.01.0905221149310.6633@westnet.com
Whole thread Raw
Responses Re: Revisiting default_statistics_target
Re: Revisiting default_statistics_target
List pgsql-hackers
Yesterday Jignesh Shah presented his extensive benchmark results comparing 
8.4-beta1 with 8.3.7 at PGCon: 
http://blogs.sun.com/jkshah/entry/pgcon_2009_performance_comparison_of

While most cases were dead even or a modest improvement, his dbt-2 results 
suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget 
to 100 was responsible for about 80% of that regression.  The remainder 
was from the constraint_exclusion change.  That 80/20 proportion was 
mentioned in the talk but not in the slides.  Putting both those back to 
the 8.3 defaults swapped things where 8.4b1 was ahead by 5% instead. 
(Note that all of the later benchmarks in his slides continued to use the 
default parameters, that change was only tested with that specific 
workload)

The situation where the stats target being so low hurts things the most 
are the data warehouse use cases.  Josh Berkus tells me that his latest DW 
testing suggests that the 10->100 increase turns out to be insufficient 
anyway; 400+ is the range you really need that to be in.  I did a quick 
survey of some other community members who work in this space and that 
experience is not unique.  Josh has some early tools that tackle this 
problem by adjusting the stats target only when it's critical--on indexed 
columns for example.  I'm going to work with him to help get those 
polished, and to see if we can replicate some of those cases via a public 
benchmark.

The bump from 10 to 100 was supported by microbenchmarks that suggested it 
would be tolerable.  That doesn't seem to be reality here though, and it's 
questionable whether this change really helps the people who need to fool 
with the value the most.  This sort of feedback is exactly why it made 
sense to try this out during the beta cycle.  But unless someone has some 
compelling evidence to the contrary, it looks like the stats target needs 
to go back to a lower value.  I think the best we can do here is to 
improve the documentation about this parameter and continue to work on 
tuning guides and tools to help people set it correctly.

As for the change to constraint_exclusion, the regression impact there is 
much less severe and the downside of getting it wrong is pretty bad. 
Rather than reverting it, the ideal response to that might be to see if 
it's possible to improve the "partition" code path.  But as I'm not going 
to volunteer to actually do that, I really don't get a vote here anyway.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Next
From: "Greg Sabino Mullane"
Date:
Subject: Re: Revisiting default_statistics_target