On Mon, Jun 3, 2013 at 6:34 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Where I hit a nightmare scenario with an anti-wraparound autovacuum, personally, was after an upgrade using pg_dump piped to psql. At a high OLTP transaction load time (obviously the most likely time for it to kick in, because it is triggered by xid consumption), it started to READ AND REWRITE every heap page of every table. This overwhelmed the battery-backed write cache, causing a series of "freezes" for a few minutes at a time, raising a very large number of end-user complaints.
But this is only after autovacuum_vacuum_cost_delay was already changed to zero, right? It is hard to imagine the write cache being overwhelmed by the default setting, or even substantially more aggressive than the default but still not zero. Anti-wraparound vacuums should generate almost purely sequential writes (at least if only btree indexes exist), so they should clear very quickly.
> "I'll whack in some manual VACUUM cron jobs during low load maintenance > hours and hope that keeps the worst of the problem away, that's what > random forum posts on the Internet say to do". > -> "oh my, why did my DB just do an emergency shutdown?"
Yeah, I've seen exactly that sequence, and some variations on it quite often. In fact, when I was first using PostgreSQL I got as far as "Maybe I didn't solve the autovacuum thing" but instead of "I'll just turn it off" my next step was "I wonder what would happen if I tried making it *more* aggressive so that it didn't have so much work to do each time it fired?" Of course, that vastly improved things. I have found it surprisingly difficult to convince other people to try that, though.
What is it you changed? Either a anti-wraparound happens, or it does not, so I'm not sure what you mean about making it more aggressive so there is less to do. It always has to do the whole thing. Was it the autovacuum_vacuum_scale_factor that you changed?