Home > mailing lists

Re: Potential autovacuum optimization: new tables - Mailing list pgsql-hackers

From	Joshua Berkus
Subject	Re: Potential autovacuum optimization: new tables
Date	October 13, 2012 22:50:02
Msg-id	1894453570.55658.1350157791392.JavaMail.root@agliodbs.com Whole thread Raw
In response to	Re: Potential autovacuum optimization: new tables (Stephen Frost <sfrost@snowman.net>)
Responses	Re: Potential autovacuum optimization: new tables (Jeff Janes <jeff.janes@gmail.com>) Re: Potential autovacuum optimization: new tables (Tom Lane <tgl@sss.pgh.pa.us>) Re: Potential autovacuum optimization: new tables (Robert Haas <robertmhaas@gmail.com>) Re: Potential autovacuum optimization: new tables (Christopher Browne <cbbrowne@gmail.com>)
List	pgsql-hackers

Tree view

> For my part, while that's certainly an interesting idea, it's far
> more
> complicated than even providing GUCs and the idea is to make PG just
> "do
> it right", not to offer the user more ways to get it wrong...

Yes, please let's not replace the existing too-simplistic knobs with giant complicated gadgets nobody, including us,
understands.

For my part, over the last 3 years of consulting and dealing with postgresql.conf settings for more than 140 clients:

* only 10% of them ever touched the autoanalyze settings at all
* of the ~~ 14 who did:  * 1 improved the tuning of their database  * 3 of them messed up autoanalyze, causing stats
andvacuum issues  * ~~ 10 had no measurable effect
 

... so you'll understand when I say that I don't think ease of knob-twiddling is a priority for autoanalyze design.  In
fact,I'd say that removing the knobs entirely is a design goal.
 

I've been going over the notes and email archives from the period where Matt O'Connor and I arrived at the current
settings. All of our testing was devoted to autovacuum, not autoanalyze.  The threshold+scale_factor design works
prettywell for autovacuum; it prevents us from constantly vacuuming small tables, or large tables with less than 20%
deadrows.  And I did extensive testing using DBT2 on OSDL to set the current defaults.
 

Our mistake was assuming that the same formula which worked well for vacuum would work well for analyze.  And since the
DBT2database has entirely medium-sized tables full of random data, no shortcomings in this thinking showed up in the
tests. Since the only counterproposal at the time was to have a flat percentage without a threshold, we got the current
defaults.

So, problem #1 is coming up with a mathematical formula.  My initial target values are in terms of # of rows in the
tablevs. # of writes before analyze is triggered:
 

1 : 3
10 : 5
100 : 10
1000 : 100
100000 : 2000
1000000 : 5000
10000000 : 25000
100000000 : 100000

.... etc.  So problem #1 is a mathematical formula which gives this kind of curve.  I've tried some solution-seeking
software,but I don't know how to use it well enough to get something useful.
 

Second problem is actually testing the result.  At this point, we don't have any performance tests which create
anythingother than fairly randomly distributed data, which doesn't tend to show up any issues in analyze.  We really
needa performance test where new data is skewed and unbalanced, including tables of radically different sizes, and
wherewe're set up to measure the level of inaccuracy in query statistics.  
 

Hmmm.  Actually, for measuring the innacuracy, I have some tools thanks to David Wheeler.  But not to generate the test
inthe first place.
 

--Josh Berkus

pgsql-hackers by date:

From: Tom Lane
Date: 13 October 2012, 22:45:56
Subject: Re: Optimizer regression

From: Jim Nasby
Date: 13 October 2012, 22:54:54
Subject: Re: Optimizer regression

Re: Potential autovacuum optimization: new tables - Mailing list pgsql-hackers

Previous

Next