Thread: VACUUM FULL takes long time to complete
Hi! We've got a Postgres 8.1.5 installation with a 60GBish database: =# select version(); version ------------------------------------------------------------------------ --------------- PostgreSQL 8.1.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 (Gentoo 4.1.1) (1 row) =# select pg_size_pretty(pg_database_size('xxx')); pg_size_pretty ---------------- 59 GB (please don't freak over the word Gentoo! This is Postgres compiled, by us, using the sources from postgresql.org, not some patched-up Gentoo version) Anyways, we run a VACUUM FULL ANALYZE VERBOSE every night, and it's now taking 3+ hours to complete. We also have autovacuum turned on with default settings. This 3+ hour vacuum time is cutting into our production hours. :( Note that while we're vacuuming Postgres is running in a single user mode and not listening. There's no other connections to the database. The hardware itself is a dual-core Intel 2gHz w/ 4G ram and adequate space in a hardware Raid 5 configuration. I realize Raid 5 isn't ideal, but in the general use cases of our database it doesn't noticeably impact performance. How can we begin to cut down the vacuum time? My first thought is simply change the schedule to run weekly (or biweekly) since we're also running autovacuum. Are there any other Postgres configuration changes that might help to improve vacuum performance? The settings we've explicitly set are: max_connections = 256 shared_buffers = 40000 temp_buffers = 5000 work_mem = 32768 maintenance_work_mem = 65535 max_fsm_pages = 120000 fsync = on wal_buffers = 16 effective_cache_size = 5000 log_connections = on log_duration = off log_line_prefix = '%m [xid=%x] [%p]: ' stats_start_collector = on stats_command_string = on stats_block_level = on stats_row_level = on autovacuum = on Thanks in advance for any suggestions! eric
On Sat, Mar 31, 2007 at 03:18:35PM -0400, Eric B. Ridge wrote: > Anyways, we run a VACUUM FULL ANALYZE VERBOSE every night, and it's > now taking 3+ hours to complete. We also have autovacuum turned on > with default settings. This 3+ hour vacuum time is cutting into our > production hours. :( Firstly, with autovacuum you really don't need to be running VACUUM FULL anyway. VACUUM FULL is actually quite inefficient, you're better off having higher FSM settings and leaving it to autovacuum. For more info, post the actual output of VACUUM VERBOSE. That displays enough info to get to the cause. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
"Eric B. Ridge" <ebr@tcdi.com> wrote: > > Hi! We've got a Postgres 8.1.5 installation with a 60GBish database: > > =# select version(); > version > ------------------------------------------------------------------------ > --------------- > PostgreSQL 8.1.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) > 4.1.1 (Gentoo 4.1.1) > (1 row) > > =# select pg_size_pretty(pg_database_size('xxx')); > pg_size_pretty > ---------------- > 59 GB > > (please don't freak over the word Gentoo! This is Postgres compiled, > by us, using the sources from postgresql.org, not some patched-up > Gentoo version) > > Anyways, we run a VACUUM FULL ANALYZE VERBOSE every night, and it's > now taking 3+ hours to complete. We also have autovacuum turned on > with default settings. This 3+ hour vacuum time is cutting into our > production hours. :( Have you established any reason to run vacuum full at all? Generally speaking, vacuum full isn't necessary, and overall isn't a good idea. If you do have a good reason for running it (which I'd be interested to hear) you could reduce the impact by breaking the job up. Perhaps vacuum one table each night, so that if you have 30 tables, each table will get vacuum fulled once a month, and each night's job should only take about 5 minutes. Note that if you _do_ run vacuum full, it's a good idea to reindex afterward. Vacuum full seems to result in a lot of index bloat (which, BTW, is a good reason not to use it). > Note that while we're vacuuming Postgres is running in a single user > mode and not listening. There's no other connections to the database. > > The hardware itself is a dual-core Intel 2gHz w/ 4G ram and adequate > space in a hardware Raid 5 configuration. I realize Raid 5 isn't > ideal, but in the general use cases of our database it doesn't > noticeably impact performance. Grab some systems data while the process is running. A few snapshots of top output, iostat, + anything you can think of to narrow down where the bottleneck is. However, with a 65G database and 4G of RAM to work with, I would expect that IO is going to be the bottleneck, and that comes back to using something faster than RAID 5. > How can we begin to cut down the vacuum time? My first thought is > simply change the schedule to run weekly (or biweekly) since we're > also running autovacuum. Are there any other Postgres configuration > changes that might help to improve vacuum performance? > > The settings we've explicitly set are: > > max_connections = 256 > shared_buffers = 40000 I use ~250000 shared_buffers on our large memory systems. I don't have conclusive evidence that going that high is really worth it, but I haven't had any performance problems with it. Note that with a DB as large as yours, this may cause problems with the memory available for work_mem, but you'll have to test your use case to know for sure. > temp_buffers = 5000 > work_mem = 32768 > maintenance_work_mem = 65535 > max_fsm_pages = 120000 > fsync = on > wal_buffers = 16 > effective_cache_size = 5000 > log_connections = on > log_duration = off > log_line_prefix = '%m [xid=%x] [%p]: ' > stats_start_collector = on > stats_command_string = on > stats_block_level = on > stats_row_level = on > autovacuum = on -- Bill Moran http://www.potentialtech.com
On Mar 31, 2007, at 5:15 PM, Martijn van Oosterhout wrote: > Firstly, with autovacuum you really don't need to be running VACUUM > FULL anyway. VACUUM FULL is actually quite inefficient, you're better > off having higher FSM settings and leaving it to autovacuum. Thank you (and Bill Moran) for the response. The reason we run VACUUM FULL daily, I think, is just a hangover from the times when Postgres didn't have autovacuum in core. And I think using FULL in particular was from a desire to keep the database as trim as possible. We'll flip our automated process off and see how things go. Maybe implement a monthly CLUSTER and/or REINDEX on the largest tables to help keep their sizes in check. Thanks again! eric