Thread: Vacuum problems
Hi everyone…
I have a database that is currently about 25 gigs on my primary DB server running Postgres 8.2.9, and two others that are less than 1 gig apiece. The DB server is a quad proc, quad core, 64 gigs of memory, 5 drive RAID5 array, so it has plenty of horsepower. Until about three weeks ago I was running a nightly vacuum analyze and a vacuum full analyze once per week.
This is what I was running for the vacuum full command:
vacuumdb -a -e -f -z -v -U postgres
The nightly vacuums have been working flawlessly, but about three weeks ago the vacuum full started failing. It was taking about 5-10 minutes normally, but all of a sudden it started hitting the command timeout that I have set, which is at 60 minutes. I thought that it may be a corrupt table or a large amount of content had been deleted from a database, so I built a script to loop through each database and run a vacuum full analyze on each table individually thinking I would find my problem table. The script finished in 5 minutes!
This is what I’m running on each table now in my script:
vacuumdb -d $DB -t $TABLE -e -f -z -v -U postgres
As I understand it, the “vacuumdb –a” command basically does the same thing as my script. So why is it timing out while my script finishes in 5 minutes or less? Is the “vacuumdb –a” command doing something that I’m not?
Now that the holiday season is past I will be upgrading to the latest 8.2 release as soon as possible, but I need to get this figured out first unless it’s a bug that the upgrade fixes.
I’d appreciate any advice or explanations you guys can send my way.
Thanks,
Scot Kreienkamp
La-Z-Boy Inc.
skreien@la-z-boy.com
734-242-1444 ext 6379
well, upgrade to 8.2.11 - even tho, change log doesn't say anything about vacuum there. Secondly, I think turn up logging verbosity - and see if postgresql actually complains about anything there, otherwise it is pretty much blind guess.
btw, is that on windows ? (looking at la-z-boy.com, it uses .net).
Nope, on RHEL5. This is the backend of an application that's not available or visible to the public. Thanks, Scot Kreienkamp La-Z-Boy Inc. skreien@la-z-boy.com 734-242-1444 ext 6379 -----Original Message----- From: Grzegorz Jaśkiewicz [mailto:gryzman@gmail.com] Sent: Monday, January 05, 2009 11:19 AM To: Scot Kreienkamp Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Vacuum problems btw, is that on windows ? (looking at la-z-boy.com, it uses .net).
On Mon, Jan 5, 2009 at 7:46 AM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote: > Hi everyone… > > > > I have a database that is currently about 25 gigs on my primary DB server > running Postgres 8.2.9, and two others that are less than 1 gig apiece. The > DB server is a quad proc, quad core, 64 gigs of memory, 5 drive RAID5 array, > so it has plenty of horsepower. Until about three weeks ago I was running a > nightly vacuum analyze and a vacuum full analyze once per week. Did you have a compelling reason for running vacuum full? It's generally discouraged unless you've got a usage pattern that demands it. If you are running vacuum full you likely have bloated indexes, so you might need to reindex the db as well. > This is what I was running for the vacuum full command: > > vacuumdb -a -e -f -z -v -U postgres > > > > The nightly vacuums have been working flawlessly, but about three weeks ago > the vacuum full started failing. It was taking about 5-10 minutes normally, > but all of a sudden it started hitting the command timeout that I have set, > which is at 60 minutes. Since I assume vacuum is running under the superuser account you can try this: alter user postgres set statement_timeout=0; To give it all the time it needs to finish. > I thought that it may be a corrupt table or a large > amount of content had been deleted from a database, so I built a script to > loop through each database and run a vacuum full analyze on each table > individually thinking I would find my problem table. The script finished in > 5 minutes! It might be that the previous vacuum full cleaned up enough stuff that the next one ran faster. But again, vacuum full is usually a bad idea as regular maintenance.
Scott, Would the "alter user postgres set statement_timeout=0;" be a permanent change? I ask because our application is using that for its login to the database. (No lectures please, I inherited the system that way. I already read the riot act to our developers about that.) If so I'll have to set it back after the vacuum is done. FYI, when I inherited the system it was doing nightly vacuum fulls. It was that way for several months. If that causes bloated indexes, then that's fairly likely a problem I have. Sounds like I should quit running vacuum fulls altogether except maybe once or twice per year. I'll try adding a reindex for tonight's vacuum run. Thanks, Scot Kreienkamp La-Z-Boy Inc. skreien@la-z-boy.com -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Monday, January 05, 2009 11:35 AM To: Scot Kreienkamp Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Vacuum problems On Mon, Jan 5, 2009 at 7:46 AM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote: > Hi everyone... > > > > I have a database that is currently about 25 gigs on my primary DB server > running Postgres 8.2.9, and two others that are less than 1 gig apiece. The > DB server is a quad proc, quad core, 64 gigs of memory, 5 drive RAID5 array, > so it has plenty of horsepower. Until about three weeks ago I was running a > nightly vacuum analyze and a vacuum full analyze once per week. Did you have a compelling reason for running vacuum full? It's generally discouraged unless you've got a usage pattern that demands it. If you are running vacuum full you likely have bloated indexes, so you might need to reindex the db as well. > This is what I was running for the vacuum full command: > > vacuumdb -a -e -f -z -v -U postgres > > > > The nightly vacuums have been working flawlessly, but about three weeks ago > the vacuum full started failing. It was taking about 5-10 minutes normally, > but all of a sudden it started hitting the command timeout that I have set, > which is at 60 minutes. Since I assume vacuum is running under the superuser account you can try this: alter user postgres set statement_timeout=0; To give it all the time it needs to finish. > I thought that it may be a corrupt table or a large > amount of content had been deleted from a database, so I built a script to > loop through each database and run a vacuum full analyze on each table > individually thinking I would find my problem table. The script finished in > 5 minutes! It might be that the previous vacuum full cleaned up enough stuff that the next one ran faster. But again, vacuum full is usually a bad idea as regular maintenance.
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes: > The nightly vacuums have been working flawlessly, but about three weeks > ago the vacuum full started failing. It was taking about 5-10 minutes > normally, but all of a sudden it started hitting the command timeout > that I have set, which is at 60 minutes. I thought that it may be a > corrupt table or a large amount of content had been deleted from a > database, so I built a script to loop through each database and run a > vacuum full analyze on each table individually thinking I would find my > problem table. The script finished in 5 minutes! Maybe the vacuum was blocking waiting for someone's lock on a table? What else is running in parallel with your nightly vacuum script? regards, tom lane
Nothing. I ran a query to see what other queries were running and what other locks existed at the time. No queries running, no locks. Nothing running at that time of night except dump, which is finished before the vacuum job starts. Thanks, Scot Kreienkamp La-Z-Boy Inc. skreien@la-z-boy.com -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, January 05, 2009 1:05 PM To: Scot Kreienkamp Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Vacuum problems "Scot Kreienkamp" <SKreien@la-z-boy.com> writes: > The nightly vacuums have been working flawlessly, but about three weeks > ago the vacuum full started failing. It was taking about 5-10 minutes > normally, but all of a sudden it started hitting the command timeout > that I have set, which is at 60 minutes. I thought that it may be a > corrupt table or a large amount of content had been deleted from a > database, so I built a script to loop through each database and run a > vacuum full analyze on each table individually thinking I would find my > problem table. The script finished in 5 minutes! Maybe the vacuum was blocking waiting for someone's lock on a table? What else is running in parallel with your nightly vacuum script? regards, tom lane
On Mon, Jan 5, 2009 at 10:15 AM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote: > Scott, > > Would the "alter user postgres set statement_timeout=0;" be a permanent > change? I ask because our application is using that for its login to > the database. (No lectures please, I inherited the system that way. I > already read the riot act to our developers about that.) If so I'll > have to set it back after the vacuum is done. Then you can just issue a "set statement_timeout=0" before you run vacuum / vacuum full. The update versus insert ratio isn't as important as how many rows are updated out of the total between each run of vacuum analyze. Vacuum full is definitely NOT a regular, recommended practice. I don't think the docs really say it is. But a few other people have seemed to get the same idea from the docs, so there must be some gray area I'm not seeing when I read them. Given the usage pattern you described earlier, I'd say vacuum full is definitely NOT called for, but regular vacuum should be plenty. The best thing to do is to examine how many dead tuples you've got to keep track of, and if that number keeps rising then figure out if fsm pages needs to be bumped up, and / or autovacuum needs more aggresive settings. Note that autovacuum is kind of hand cuffed on pg versions before 8.3 because it was single threaded, and one really big table could throw it behind on other more frequently updated tables getting bloated while the vacuum thread runs against that one large table. Use vacuum verbose to get an idea of how many dead tuples there are in the database, and see if they rise to a plateu, or just keep rising. For most usage patterns with autovacuum enabled, you'll see a steady rise to about 10-20% dead tuples then it should level off. > FYI, when I inherited the system it was doing nightly vacuum fulls. It > was that way for several months. If that causes bloated indexes, then > that's fairly likely a problem I have. Sounds like I should quit > running vacuum fulls altogether except maybe once or twice per year. A lot of times a pgsql doing nightly fulls is a sign of someone who started out with an old version that only supported full vacuum and applying the faulty knowledge they gained from there to the newer version which likely doesn't need it. If you do find one table that really needs full vacuums because of its usage pattern, it's best to cron up a single vacuum (regular) to run more often on it, or make autovacuum more aggresive, or, failing those two, to make a regular nightly vacuum full / cluster / reindex for that one relation. Usually cluster is a better choice, as it doesn't bloat indexes and puts the table into index order (on the index you clustered on).
Another strategy to look at is to make autovacuum more aggresive by putting entries for that one table into the pg_autovacuum table. I know that table exists in 8.3 but I don't remember if it exists in older versions or how it works there. But if you have that available you can make that one relation more aggresively autovacuumed without affecting the other relations vacuum scheduling. On 8.3 you can set individual cost delays, and all the other settings you'd need. If 8.2 doesn't support this, look at migrating just for that functionality alone.
Thanks for the advice Scott. I've taken out the vacuum fulls entirely. I've now got a nightly vacuum analyze as well as reindex. I'll probably drop both to every other night. BTW, the database shrunk by 2 gigs just from reindexing last night. I expect I'll see a performance gain from actually doing reindexing since this database has never been reindexed since it was put in production 6 months ago. I've got about 12 tables that get caught by the autoanalyze and about 6 that get caught by autovacuum on a daily basis. I'm not sure how often the autovacuum and autoanalyze runs on those tables. I probably need to up the logging to find out. I'm not worried about making it more aggressive yet. One other problem though... my database has a "-" in the name... when I try to run: psql -U postgres -d rms-prod -c "REINDEX SYSTEM rms-prod" I get this: ERROR: syntax error at or near "-" LINE 1: REINDEX SYSTEM rms-prod The user tables run fine. Should I reindex the system tables also? If so, how do I get around the dash in the db name? Thanks, Scot Kreienkamp La-Z-Boy Inc. skreien@la-z-boy.com 734-242-1444 ext 6379 -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Monday, January 05, 2009 1:37 PM To: Scot Kreienkamp Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Vacuum problems On Mon, Jan 5, 2009 at 10:15 AM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote: > Scott, > > Would the "alter user postgres set statement_timeout=0;" be a permanent > change? I ask because our application is using that for its login to > the database. (No lectures please, I inherited the system that way. I > already read the riot act to our developers about that.) If so I'll > have to set it back after the vacuum is done. Then you can just issue a "set statement_timeout=0" before you run vacuum / vacuum full. The update versus insert ratio isn't as important as how many rows are updated out of the total between each run of vacuum analyze. Vacuum full is definitely NOT a regular, recommended practice. I don't think the docs really say it is. But a few other people have seemed to get the same idea from the docs, so there must be some gray area I'm not seeing when I read them. Given the usage pattern you described earlier, I'd say vacuum full is definitely NOT called for, but regular vacuum should be plenty. The best thing to do is to examine how many dead tuples you've got to keep track of, and if that number keeps rising then figure out if fsm pages needs to be bumped up, and / or autovacuum needs more aggresive settings. Note that autovacuum is kind of hand cuffed on pg versions before 8.3 because it was single threaded, and one really big table could throw it behind on other more frequently updated tables getting bloated while the vacuum thread runs against that one large table. Use vacuum verbose to get an idea of how many dead tuples there are in the database, and see if they rise to a plateu, or just keep rising. For most usage patterns with autovacuum enabled, you'll see a steady rise to about 10-20% dead tuples then it should level off. > FYI, when I inherited the system it was doing nightly vacuum fulls. It > was that way for several months. If that causes bloated indexes, then > that's fairly likely a problem I have. Sounds like I should quit > running vacuum fulls altogether except maybe once or twice per year. A lot of times a pgsql doing nightly fulls is a sign of someone who started out with an old version that only supported full vacuum and applying the faulty knowledge they gained from there to the newer version which likely doesn't need it. If you do find one table that really needs full vacuums because of its usage pattern, it's best to cron up a single vacuum (regular) to run more often on it, or make autovacuum more aggresive, or, failing those two, to make a regular nightly vacuum full / cluster / reindex for that one relation. Usually cluster is a better choice, as it doesn't bloat indexes and puts the table into index order (on the index you clustered on).
you don't have to reindex too often - it locks exclusively whole table, just like vacuum full. Just do it every few months, depending on db growth.
----- "Scot Kreienkamp" <SKreien@la-z-boy.com> wrote: > Thanks for the advice Scott. I've taken out the vacuum fulls > entirely. > I've now got a nightly vacuum analyze as well as reindex. I'll > probably > drop both to every other night. > > BTW, the database shrunk by 2 gigs just from reindexing last night. > I > expect I'll see a performance gain from actually doing reindexing > since > this database has never been reindexed since it was put in production > 6 > months ago. > > I've got about 12 tables that get caught by the autoanalyze and about > 6 > that get caught by autovacuum on a daily basis. I'm not sure how > often > the autovacuum and autoanalyze runs on those tables. I probably need > to > up the logging to find out. I'm not worried about making it more > aggressive yet. > > One other problem though... my database has a "-" in the name... when > I > try to run: > > psql -U postgres -d rms-prod -c "REINDEX SYSTEM rms-prod" You need to quote the db name: psql -U postgres -d rms-prod -c 'REINDEX SYSTEM "rms-prod"' > > I get this: > > ERROR: syntax error at or near "-" > LINE 1: REINDEX SYSTEM rms-prod > > The user tables run fine. Should I reindex the system tables also? > If > so, how do I get around the dash in the db name? > > Thanks, > > Scot Kreienkamp > La-Z-Boy Inc. > skreien@la-z-boy.com > 734-242-1444 ext 6379 > Adrian Klaver aklaver@comcast.net
Yep... dummy me. That works. I tried that before with the reindexdb command, that doesn't work. I didn't try it with the psql command. Thanks, Scot Kreienkamp La-Z-Boy Inc. skreien@la-z-boy.com -----Original Message----- From: Adrian Klaver [mailto:aklaver@comcast.net] Sent: Tuesday, January 06, 2009 12:02 PM To: Scot Kreienkamp Cc: pgsql-general@postgresql.org; Scott Marlowe Subject: Re: [GENERAL] Vacuum problems ----- "Scot Kreienkamp" <SKreien@la-z-boy.com> wrote: > Thanks for the advice Scott. I've taken out the vacuum fulls > entirely. > I've now got a nightly vacuum analyze as well as reindex. I'll > probably > drop both to every other night. > > BTW, the database shrunk by 2 gigs just from reindexing last night. > I > expect I'll see a performance gain from actually doing reindexing > since > this database has never been reindexed since it was put in production > 6 > months ago. > > I've got about 12 tables that get caught by the autoanalyze and about > 6 > that get caught by autovacuum on a daily basis. I'm not sure how > often > the autovacuum and autoanalyze runs on those tables. I probably need > to > up the logging to find out. I'm not worried about making it more > aggressive yet. > > One other problem though... my database has a "-" in the name... when > I > try to run: > > psql -U postgres -d rms-prod -c "REINDEX SYSTEM rms-prod" You need to quote the db name: psql -U postgres -d rms-prod -c 'REINDEX SYSTEM "rms-prod"' > > I get this: > > ERROR: syntax error at or near "-" > LINE 1: REINDEX SYSTEM rms-prod > > The user tables run fine. Should I reindex the system tables also? > If > so, how do I get around the dash in the db name? > > Thanks, > > Scot Kreienkamp > La-Z-Boy Inc. > skreien@la-z-boy.com > 734-242-1444 ext 6379 > Adrian Klaver aklaver@comcast.net
On Tue, Jan 6, 2009 at 9:05 AM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > you don't have to reindex too often - it locks exclusively whole > table, just like vacuum full. Just do it every few months, depending > on db growth. While I don't wholly disagree with periodic reindexing, I do recommend that one keeps track of bloat. It's easy enough to have an alarm that goes off if any index gets over 50% dead space, then go look at the database. Or go in every week and see what vacuum verbose looks like. Even if you just email yourself a copy of the last 10 or 15 lines every morning or something to see how bloated the db is in general, you'll catch most problems before they become problems. One or two rogue updates without where clauses on medium to large sized tables can blow you right out of the water. Best to have some way to keep track of them.
Scott Marlowe wrote: > While I don't wholly disagree with periodic reindexing, I do recommend > that one keeps track of bloat. It's easy enough to have an alarm that > goes off if any index gets over 50% dead space, then go look at the > database. Reading this list, I've noticed that: - Many admins don't understand vacuum vs vacuum full at all, and are unaware of the probable need for a reindex after vacuum full. They're often landing up with very bloated indexes from excessive use of vacuum full, or very bloated tables due to insufficient fsm space / infrequent vacuuming. - It's hard to spot table and (especially) index bloat. Pg doesn't warn about bloated tables or indexes in any way that people seem to notice, nor are there any built-in views or functions that might help the admin identify problem tables and indexes. - Most people have a lot of trouble understanding where and how their storage is being used. I'm wondering if it might be a good idea to adopt one of the handy views people have written for getting table/index bloat information as a standard part of Pg (probably as an SQL function rather than a view) so people can just "SELECT pg_bloat()" to get a useful summary of table/index status. The other thing I wonder about is having EXPLAIN and EXPLAIN ANALYZE report information on the amount of free space that a table seq scan or an index scan is having to traverse. That might bring problems to the attention of people who're investigating query performance issues without being aware that the underlying issue is actually bloated indexes, not bad plans. -- Craig Ringer