Re: Potential autovacuum optimization: new tables - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Potential autovacuum optimization: new tables |
Date | |
Msg-id | CA+TgmoZw-21aZBd4gLqvL9gkdgJVfOJS+Q_rcL3GvK0EQXkPYA@mail.gmail.com Whole thread Raw |
In response to | Re: Potential autovacuum optimization: new tables (Joshua Berkus <josh@agliodbs.com>) |
Responses |
Re: Potential autovacuum optimization: new tables
|
List | pgsql-hackers |
On Sat, Oct 13, 2012 at 3:49 PM, Joshua Berkus <josh@agliodbs.com> wrote: > For my part, over the last 3 years of consulting and dealing with postgresql.conf settings for more than 140 clients: > > * only 10% of them ever touched the autoanalyze settings at all > * of the ~~ 14 who did: > * 1 improved the tuning of their database > * 3 of them messed up autoanalyze, causing stats and vacuum issues > * ~~ 10 had no measurable effect > > ... so you'll understand when I say that I don't think ease of knob-twiddling is a priority for autoanalyze design. Infact, I'd say that removing the knobs entirely is a design goal. Yeah. My experience is shorter in time frame, but similar in composition. > I've been going over the notes and email archives from the period where Matt O'Connor and I arrived at the current settings. All of our testing was devoted to autovacuum, not autoanalyze. The threshold+scale_factor design works prettywell for autovacuum; it prevents us from constantly vacuuming small tables, or large tables with less than 20% deadrows. And I did extensive testing using DBT2 on OSDL to set the current defaults. However, I disagree with this. I think that things have changed a lot in 8.4+, because of the visibility map. Extra vacuuming is not nearly so expensive as it used to be, and essentially 100% of the vacuum problems I see are caused by not vacuuming frequently enough, either because (1) when vacuum does eventually run it imposes a gigantic server load for a really long time or (2) too much bloat builds up between vacuum runs. If these settings were adjusted in an exactly middle-of-the-road fashion, I ought to see 50% of the problems from vacuuming too often and the other 50% from not vacuuming often enough.The reality is nothing like that; it's all on one side. As I've said before (and I believe Simon has said similar things), I think we should be vacuuming the heap much more often but only doing index vac when we accumulate enough dead tuples to justify the cost of the index scan. Pruning the heap is cheap and very effective. > Our mistake was assuming that the same formula which worked well for vacuum would work well for analyze. And since theDBT2 database has entirely medium-sized tables full of random data, no shortcomings in this thinking showed up in thetests. Since the only counterproposal at the time was to have a flat percentage without a threshold, we got the currentdefaults. > > So, problem #1 is coming up with a mathematical formula. My initial target values are in terms of # of rows in the tablevs. # of writes before analyze is triggered: > > 1 : 3 > 10 : 5 > 100 : 10 > 1000 : 100 > 100000 : 2000 > 1000000 : 5000 > 10000000 : 25000 > 100000000 : 100000 > > .... etc. So problem #1 is a mathematical formula which gives this kind of curve. I've tried some solution-seeking software,but I don't know how to use it well enough to get something useful. That's a pretty funny-looking curve, because it doubles between 10 and 100 but then increases 10x between 100 and 1000. It's similarly erratic further on. But I do agree that some kind of log scale might be appropriate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: