Thread: Vacuum going -D; crash or just impatience?
Folks, I've a 7.2.4 report-generation database that has been growing for some time, resuting in the nightly VACUUM FULL ANALYZE taking longer and longer. (most of the data is copied nightly from other systems, so use of FSM is not very effective). The problem is that the nightly admin scripts are programmed to check for a locked up nightly maintainence, and to "pg_ctl -m fast stop" it. As the VACUUM FULL now takes over an hour, it falsely detected a lockup and shutdown the database in the middle of VACUUM. On restarting the database, I manually VACUUM FULLed it, and the VACUUM would speed through until hitting the spot where the database was shutdown, at which point the VACUUM process went "D", and apparently locked up for 10 minutes. No error messages were written to the logs. Unfortunately, I could not give it longer to see if it recovered because this is a production system and I had to get it up and running from backup by 9am. Does this sound like a crash during VACUUM, or just like it needed more time? If anyone wants to analyze, I have a complete backup of the post-problem PGDATA directory. The host system is RH Linux 8.0. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Does this sound like a crash during VACUUM, or just like it needed more time? I think it just needed more time. VACUUM goes to great lengths to be crash-safe. I doubt that a "fast stop" could have left the database in a corrupted state. It is entirely likely that an interrupted VACUUM FULL would leave the next VACUUM FULL with more, not less, work to do --- all of the tuples the first one tried to move before being killed would now be dead and need to be cleaned up. > I've a 7.2.4 report-generation database that has been growing for some time, > resuting in the nightly VACUUM FULL ANALYZE taking longer and longer. (most > of the data is copied nightly from other systems, so use of FSM is not very > effective). Are you saying that you delete most or all of the rows, then vacuum? You might consider TRUNCATE if you delete all the rows, or CLUSTER if you delete most, as a substitute for VACUUM FULL. (You'd still want to run ANALYZE, after you load fresh data.) VACUUM FULL is really designed for the case where there are not a huge number of dead rows --- it gets awfully slow if it has to move lots of data. Also, I think you have probably not given the FSM enough chance. If the FSM settings are adequate then it should work fine to do -- delete unwanted rows VACUUM (not FULL, not ANALYZE) -- load new rows ANALYZE regards, tom lane
Tom, > I think it just needed more time. VACUUM goes to great lengths to be > crash-safe. I doubt that a "fast stop" could have left the database > in a corrupted state. OK, that's reasuring. I would have liked to give the process more time, b= ut=20 with users waiting .... One thing I am puzzled by is the "D" status on the VACUUM process. That wo= uld=20 seem to indicate that VACUUM was waiting for some other process ... but I= =20 can't imagine what it could be. Suggestions? > Are you saying that you delete most or all of the rows, then vacuum? > You might consider TRUNCATE if you delete all the rows, or CLUSTER > if you delete most, as a substitute for VACUUM FULL. (You'd still want > to run ANALYZE, after you load fresh data.) VACUUM FULL is really > designed for the case where there are not a huge number of dead rows > --- it gets awfully slow if it has to move lots of data. There are several "holding" tables which are truncated and then re-built. = But=20 the tables that are holding up VACUUM are the permanent ones, which are=20 experiencing up to 900,000 updates every night.=20=20 > Also, I think you have probably not given the FSM enough chance. > If the FSM settings are adequate then it should work fine to do Well, the holdup is the indexes, which are recycling about 500,000 pages an= d=20 in 7.2.4 FSM doesn't help me. Unfortunately, dropping the indexes during t= he=20 data transformation isn't really an option, because the indexes support som= e=20 of the data transform steps. I'm wondering if I need to REINDEX more often; I think I'll try that next. --=20 -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > One thing I am puzzled by is the "D" status on the VACUUM process. Disk I/O wait state, no? Pretty much what I'd expect for VACUUM ... regards, tom lane