Thread: diskspace usage recovered on db rebuild
Greetings worthymen. I have a question regarding filesystem disk space usage. We have a production database containing 5 years of sales data. Linux 2.6.5; Postgresq 7.4.7. VACUUM ANALZYE the entire database everynight (about 40min). It's size, @SUM(pg_class.relpages) * 8192K, is ... About 66 Gigabytes on disk. When I rebuild the database (using pg_dump and pgsql ), the new resultant database is .. About 48 Gigabytes on disk. A 27% space savings. Can someone tell me why that is? Thank you, David Link White Plains, NY Long live Postgres.
On Apr 1, 2005 9:21 AM, David Link <dlink@soundscan.com> wrote: > Greetings worthymen. > > I have a question regarding filesystem disk space usage. > > We have a production database containing 5 years of sales data. > Linux 2.6.5; Postgresq 7.4.7. VACUUM ANALZYE the entire database > everynight (about 40min). > It's size, @SUM(pg_class.relpages) * 8192K, is ... > > About 66 Gigabytes on disk. > > When I rebuild the database (using pg_dump and pgsql ), the new > resultant database is .. > > About 48 Gigabytes on disk. > > A 27% space savings. > > Can someone tell me why that is? Are you doing a vacuum full each night? What is the specific command(s) that you are using for vacuum, pg_dump and the import? -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
On Fri, 2005-04-01 at 11:21, David Link wrote: > Greetings worthymen. > > I have a question regarding filesystem disk space usage. > > We have a production database containing 5 years of sales data. > Linux 2.6.5; Postgresq 7.4.7. VACUUM ANALZYE the entire database > everynight (about 40min). > It's size, @SUM(pg_class.relpages) * 8192K, is ... > > About 66 Gigabytes on disk. > > When I rebuild the database (using pg_dump and pgsql ), the new > resultant database is .. > > About 48 Gigabytes on disk. > > A 27% space savings. > > Can someone tell me why that is? There's a certain amount of left over space that's ok. PostgreSQL, under normal operation, with regular, non-full vacuums, grows until it reaches a "steady state" size that has some percentage of each file having freed tuple space that can be reused by later inserts / updates. This is a "good thing" as long as it doesn't go to far. Since inserts and updates can be placed in already allocated space, they should be added faster than if each one had to allocate more space in a file then tack the tuple on the end each time. However, if there's too much free space in the table files, then the database will be less efficient, because each sequential read of the tables has to read a lot of "blank" space. It's all about balance. You might want to look at running the pg_autovacuum daemon and letting it decide when a vacuum is necessary, or scheduling regular vacuums to run more often than every night. You might also want to look at adding vacuum or possible vacuum full <tablename> when updating large tables to individually clean up afterwards. Next time, try a vacuum full first to see how much space it can reclaim. And lastly, use vacuum verbose to get an idea if your fsm settings are high enough.
Thank you Scott and Lonni for your replies ... On Fri, 2005-04-01 at 11:21, David Link wrote: > I have a question regarding filesystem disk space usage. > > We have a production database containing 5 years of sales data. > Linux 2.6.5; Postgresq 7.4.7. VACUUM ANALZYE the entire database > everynight (about 40min). > It's size, @SUM(pg_class.relpages) * 8192K, is ... > > About 66 Gigabytes on disk. > > When I rebuild the database (using pg_dump and pgsql ), the new > resultant database is .. > > About 48 Gigabytes on disk. > > A 27% space savings. > > Can someone tell me why that is? *Scott Marlowe wrote: *There's a certain amount of left over space that's ok. PostgreSQL, under normal operation, with regular, non-full vacuums, grows until it reaches a "steady state" size that has some percentage of each file having freed tuple space that can be reused by later inserts / updates. This is a "good thing" as long as it doesn't go to far. Since inserts and updates can be placed in already allocated space, they should be added faster than if each one had to allocate more space in a file then tack the tuple on the end each time. However, if there's too much free space in the table files, then the database will be less efficient, because each sequential read of the tables has to read a lot of "blank" space. It's all about balance. You might want to look at running the pg_autovacuum daemon and letting it decide when a vacuum is necessary, or scheduling regular vacuums to run more often than every night. You might also want to look at adding vacuum or possible vacuum full <tablename> when updating large tables to individually clean up afterwards. Next time, try a vacuum full first to see how much space it can reclaim. And lastly, use vacuum verbose to get an idea if your fsm settings are high enough. *Lonni J Friedman wrote: * Are you doing a vacuum full each night? What is the specific command(s) that you are using for vacuum, pg_dump and the import? David Link writes: The answer to Lonni question: $ vacuumdb --analyze $database 2>&1 | tee -a $log $ time (pg_dump $database | gzip > $backup_dir/$dump_file) 2>&1 | tee -a $log $ gunzip -c $dumpfile | psql -h $host -d $database >/dev/null