Thread: Unexplained Major Vacuum Archive Activity During Vacuum
Hey guys, I don't notice any errors, which just makes this even more strange. But after weeks of operating normally, our 10pm manual vacuum job generated transaction logs basically equivalent to 3/4 of our database, and I can't find any explanation. This amount is about 6x higher than usual. Before I go crazy and tear the box apart, does anyone know of some internal change that may cause intermittent marking of pages to increase without related database activity? Or did I just vacuum a database with a corrupt CPU or piece of RAM? (Let me say again, I see no errors anywhere in the database logs.) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas wrote: > I don't notice any errors, which just makes this even more strange. > But after weeks of operating normally, our 10pm manual vacuum job > generated transaction logs basically equivalent to 3/4 of our > database, and I can't find any explanation. This amount is about 6x > higher than usual. > > Before I go crazy and tear the box apart, does anyone know of some > internal change that may cause intermittent marking of pages to > increase without related database activity? Yes. Did you bulk load this data (possibly through restoring pg_dump output)? If so, and you have not explicitly run VACUUM FREEZE afterward, the vacuum noticed that it was time to freeze all of these tuples. When I use pg_dump output to create a database, I always VACUUM FREEZE ANALYZE immediately afterward -- before I start archiving. Just be glad you got it with a manual vacuum during off-peak hours, rather than having this kick in via autovacuum during peak OLTP load. > Or did I just vacuum a database with a corrupt CPU or piece of RAM? > (Let me say again, I see no errors anywhere in the database logs.) You haven't mentioned anything that should be taken as evidence of corruption or any unusual behavior on the part of PostgreSQL. -Kevin
On 11/01/2012 09:18 AM, Kevin Grittner wrote: > Did you bulk load this data (possibly through restoring pg_dump > output)? If so, and you have not explicitly run VACUUM FREEZE > afterward, the vacuum noticed that it was time to freeze all of these > tuples. Ok, that might explain it, then. We did in fact just upgrade from 8.2 to 9.1 about 2 weeks ago. And no, I didn't do a VACUUM FREEZE, just a VACUUM ANALYZE to make sure stats were ready. I'm still a little uncertain what the tangible difference is between a FREEZE and a regular VACUUM. I get that it sets freeze_min_age to 0, but why does that even matter? Is 50M out of 2B not good enough? Every VACUUM knocks the counter back to the minimum, so I guess I don't get the justification for magically forcing the minimum to be lower. Of course, all that page marking would definitely produce a butt-ton of transaction logs. So at least that makes sense. :) Thanks, Keven! > You haven't mentioned anything that should be taken as evidence of > corruption or any unusual behavior on the part of PostgreSQL. No, but I was a little freaked out by the unexplained activity. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas wrote: > Ok, that might explain it, then. We did in fact just upgrade from 8.2 to > 9.1 about 2 weeks ago. And no, I didn't do a VACUUM FREEZE, just a > VACUUM ANALYZE to make sure stats were ready. I'm still a little > uncertain what the tangible difference is between a FREEZE and a regular > VACUUM. I get that it sets freeze_min_age to 0, but why does that even > matter? Is 50M out of 2B not good enough? Every VACUUM knocks the > counter back to the minimum, so I guess I don't get the justification > for magically forcing the minimum to be lower. You might find this section of the docs illuminating: http://www.postgresql.org/docs/9.2/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND -Kevin
On 11/01/2012 10:28 AM, Kevin Grittner wrote: > http://www.postgresql.org/docs/9.2/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND I read that several times, and I still don't get how it applies to this case. Based on my past experience with 8.2, and my understanding of 9.1, I moved autovacuum_freeze_max_age up to 650M so we'd never get a mid-day freeze. And the default for vacuum_freeze_table_age is 150M, which I hadn't changed. So here's what I don't get: * A manual vacuum vacuums a table. * If the age of that table is > 150M (by default), also freeze. * Counters are reset to autovacuum_freeze_min_age or 0... eh. If that's the case, the freeze bit shouldn't have affected us, because I already basically crippled autovacuum from freezing anything. The nightly vacuum would freeze because we do more than 150M transactions per day. So with the new settings, we've been effectively doing a VACUUM FREEZE every night. And this has been going on for weeks without issue. But last night? Total pandemonium. I suppose it could be related to the market being closed for 2 extra days, but we kept running our accounting jobs. The volume is just very suspicious. Either I'm totally misunderstanding a fundamental issue, or something still seems fishy here. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 1 November 2012 17:19, Shaun Thomas <sthomas@optionshouse.com> wrote: > On 11/01/2012 10:28 AM, Kevin Grittner wrote: > Based on my past experience with 8.2, and my understanding of 9.1, I > moved autovacuum_freeze_max_age up to 650M so we'd never get a mid-day > freeze. And the default for vacuum_freeze_table_age is 150M, which I hadn't > changed. Instead of attempting to postpone freeze until beyond the life expectancy of our universe, what you probably should have done is vacuum more often so that vacuum has less work to do. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
On 11/01/2012 11:40 AM, Alban Hertroys wrote: > Instead of attempting to postpone freeze until beyond the life > expectancy of our universe, what you probably should have done is > vacuum more often so that vacuum has less work to do. More often than every night, with autovacuum running in the background to get regular stuff that happens during the day? 650M transactions is 3 or 4 days for us. That's hardly the lifetime of the universe. And since I didn't modify vacuum_freeze_table_age, any table vacuumed after 150M transactions is given a vacuum freeze anyway. No harm done. It's my understanding you *don't* want to freeze excessively. I think once every day is bad enough, honestly. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 1 Nov 2012, at 17:44, Shaun Thomas wrote: > On 11/01/2012 11:40 AM, Alban Hertroys wrote: > >> Instead of attempting to postpone freeze until beyond the life >> expectancy of our universe, what you probably should have done is >> vacuum more often so that vacuum has less work to do. > > More often than every night, with autovacuum running in the background to get regular stuff that happens during the day?650M transactions is 3 or 4 days for us. That's hardly the lifetime of the universe. And since I didn't modify vacuum_freeze_table_age,any table vacuumed after 150M transactions is given a vacuum freeze anyway. No harm done. 150M database transactions a day sounds excessive, is there no way to reduce that number? That aside, 650M transactions in 3 at 4 days is not equal to 150M transactions a day. It can be quite a few more. Since youmentioned that the market halted for 2 days there were probably a lot more transactions waiting than usual; not just piledup work, but lots of attempts at corrections as well. It wouldn't surprise me if you went over 650M transactions thatday. > It's my understanding you *don't* want to freeze excessively. I think once every day is bad enough, honestly. That's not what I was suggesting. I wasn't talking about vacuum freeze but normal autovacuum with more aggressive parameters. That should handle transaction wrap-around automatically when it looks like you're getting close to the transaction wrap-aroundid. As per the docs in 8.2, vacuum freeze was deprecated back then already. Knowing the devs a bit, there wasa good reason to do so. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On 11/02/2012 03:08 AM, Alban Hertroys wrote: > 150M database transactions a day sounds excessive, is there no way to > reduce that number? I wish. 150M is actually a conservative estimate. In fact, we average 141M, but have been as high as 270M. It's all market dependent. Here's a quick look at the last two weeks of activity: reading_date | transactions --------------+-------------- 2012-10-19 | 188548680 2012-10-20 | 16722600 2012-10-21 | 18326700 2012-10-22 | 183141540 2012-10-23 | 183269580 2012-10-24 | 177945540 2012-10-25 | 180901740 2012-10-26 | 170482920 2012-10-27 | 16103820 2012-10-28 | 15026340 2012-10-29 | 27431160 2012-10-30 | 20299500 2012-10-31 | 165263760 2012-11-01 | 175540020 So you can see that even in the "off" days, we handle over 15M transactions per day. Monday and Tuesday were admittedly slow, but that means there was even less reason for VACUUM to go bonkers. We run it *every night*. Twice, in fact. Once after the main part of the day is done, and once after a lot of our nightly reconciliation scripts run because of all the rows they touch. We can probably disable that second vacuum now that we are using autovacuum, but the first one has to stay. > It wouldn't surprise me if you went over 650M transactions that day. Yeah, and I thought that might be the case too. Until I looked at our graph of transaction totals for the week. The last vacuum that would have caused an automatic freeze happened on the 26th. So by the time the "bad" vacuum happened on the 31st, we'd accumulated about 244M transactions. Sure, that's quite a few, but not 6x more than usual, as the amount of transaction logs generated during the vacuum might suggest. > That's not what I was suggesting. I wasn't talking about vacuum > freeze but normal autovacuum with more aggressive parameters. It's already pretty aggressive. I cut autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor in half from the defaults, and tweaked a couple tables that were getting vacuumed or analyzed every minute. What I don't want is for it to be so aggressive that it decides to freeze a 50M row table in the middle of an active trading day. That's why the 10PM vacuum stays. If I just depended on autovacuum_freeze_max_age, that would be a distinct possibility. > Vacuum freeze was deprecated back then already. > Knowing the devs a bit, there was a good reason to do so. VACUUM FREEZE is no longer deprecated according to the 9.1 and 9.2 docs. I've never used it myself, though. Besides that, autovacuum does a FREEZE automatically if you go over autovacuum_freeze_max_age. Due to the free space map, they also added vacuum_freeze_table_age, which causes a regular VACUUM to be promoted to a VACUUM FREEZE if the age is higher than that value. By default, that's 150M; just about perfect for our system, honestly. In pre-8.4, before the free space map, you don't need FREEZE at all. A regular vacuum could reset all XID counters down to some minimum value. That minimum was vacuum_freeze_min_age. But because of the visibility map, and regular vacuums using it, older XIDs could be missed during a regular VACUUM because it wasn't launched by the autovacuum thread. They introduced vacuum_freeze_table_age so you could get the old functionality back (always resetting down to vacuum_freeze_min_age during any VACUUM) if so desired. Basically VACUUM FREEZE doesn't mean what it once did. The free space map made it a necessity because VACUUM doesn't always recliam XIDs anymore. At least, that's the impression I got from the docs. I could be way off. It bears out, though. I've got plenty of tables in our stage setup where I'll vacuum them after setting vacuum_freeze_min_age to 100 or something, and they just keep climbing. VACUUM FREEZE *always* resets the value, though. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Thursday, November 01, 2012 05:40:23 PM Alban Hertroys wrote: > On 1 November 2012 17:19, Shaun Thomas <sthomas@optionshouse.com> wrote: > > On 11/01/2012 10:28 AM, Kevin Grittner wrote: > > Based on my past experience with 8.2, and my understanding of 9.1, I > > moved autovacuum_freeze_max_age up to 650M so we'd never get a mid-day > > freeze. And the default for vacuum_freeze_table_age is 150M, which I > > hadn't changed. > > Instead of attempting to postpone freeze until beyond the life > expectancy of our universe, what you probably should have done is > vacuum more often so that vacuum has less work to do. Thats not really possible with freeze vacuums. When the table is older than the applicable freeze age its scanned completely instead of only the parts that are sensible according to the vacuum map. The more expensive scans really only happen when they make sense... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services