Thread: diskspace usage recovered on db rebuild

diskspace usage recovered on db rebuild

From
David Link
Date:
Greetings worthymen.

I have a question regarding filesystem disk space usage.

We have a production database containing 5 years of sales data.
Linux 2.6.5; Postgresq 7.4.7.  VACUUM ANALZYE the entire database
everynight (about 40min).
It's size, @SUM(pg_class.relpages) * 8192K, is ...

  About 66 Gigabytes on disk.

When I rebuild the database (using pg_dump and pgsql ), the new
resultant database is ..

  About 48 Gigabytes on disk.

A 27% space savings.

Can someone tell me why that is?

Thank you,
David Link
White Plains, NY
Long live Postgres.




Re: diskspace usage recovered on db rebuild

From
Lonni J Friedman
Date:
On Apr 1, 2005 9:21 AM, David Link <dlink@soundscan.com> wrote:
> Greetings worthymen.
>
> I have a question regarding filesystem disk space usage.
>
> We have a production database containing 5 years of sales data.
> Linux 2.6.5; Postgresq 7.4.7.  VACUUM ANALZYE the entire database
> everynight (about 40min).
> It's size, @SUM(pg_class.relpages) * 8192K, is ...
>
>   About 66 Gigabytes on disk.
>
> When I rebuild the database (using pg_dump and pgsql ), the new
> resultant database is ..
>
>   About 48 Gigabytes on disk.
>
> A 27% space savings.
>
> Can someone tell me why that is?

Are you doing a vacuum full each night?  What is the specific
command(s) that you are using for vacuum, pg_dump and the import?


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: diskspace usage recovered on db rebuild

From
Scott Marlowe
Date:
On Fri, 2005-04-01 at 11:21, David Link wrote:
> Greetings worthymen.
>
> I have a question regarding filesystem disk space usage.
>
> We have a production database containing 5 years of sales data.
> Linux 2.6.5; Postgresq 7.4.7.  VACUUM ANALZYE the entire database
> everynight (about 40min).
> It's size, @SUM(pg_class.relpages) * 8192K, is ...
>
>   About 66 Gigabytes on disk.
>
> When I rebuild the database (using pg_dump and pgsql ), the new
> resultant database is ..
>
>   About 48 Gigabytes on disk.
>
> A 27% space savings.
>
> Can someone tell me why that is?

There's a certain amount of left over space that's ok.  PostgreSQL,
under normal operation, with regular, non-full vacuums, grows until it
reaches a "steady state" size that has some percentage of each file
having freed tuple space that can be reused by later inserts / updates.
This is a "good thing" as long as it doesn't go to far.

Since inserts and updates can be placed in already allocated space, they
should be added faster than if each one had to allocate more space in a
file then tack the tuple on the end each time.

However, if there's too much free space in the table files, then the
database will be less efficient, because each sequential read of the
tables has to read a lot of "blank" space.

It's all about balance.

You might want to look at running the pg_autovacuum daemon and letting
it decide when a vacuum is necessary, or scheduling regular vacuums to
run more often than every night.  You might also want to look at adding
vacuum or possible vacuum full <tablename> when updating large tables to
individually clean up afterwards.

Next time, try a vacuum full first to see how much space it can
reclaim.  And lastly, use vacuum verbose to get an idea if your fsm
settings are high enough.

Re: diskspace usage recovered on db rebuild

From
David Link
Date:
Thank you Scott and Lonni for your replies ...

On Fri, 2005-04-01 at 11:21, David Link wrote:
> I have a question regarding filesystem disk space usage.
>
> We have a production database containing 5 years of sales data.
> Linux 2.6.5; Postgresq 7.4.7.  VACUUM ANALZYE the entire database
> everynight (about 40min).
> It's size, @SUM(pg_class.relpages) * 8192K, is ...
>
>   About 66 Gigabytes on disk.
>
> When I rebuild the database (using pg_dump and pgsql ), the new
> resultant database is ..
>
>   About 48 Gigabytes on disk.
>
> A 27% space savings.
>
> Can someone tell me why that is?

*Scott Marlowe wrote:

*There's a certain amount of left over space that's ok.  PostgreSQL,
under normal operation, with regular, non-full vacuums, grows until it
reaches a "steady state" size that has some percentage of each file
having freed tuple space that can be reused by later inserts / updates.
This is a "good thing" as long as it doesn't go to far.

Since inserts and updates can be placed in already allocated space, they
should be added faster than if each one had to allocate more space in a
file then tack the tuple on the end each time.

However, if there's too much free space in the table files, then the
database will be less efficient, because each sequential read of the
tables has to read a lot of "blank" space.

It's all about balance.

You might want to look at running the pg_autovacuum daemon and letting
it decide when a vacuum is necessary, or scheduling regular vacuums to
run more often than every night.  You might also want to look at adding
vacuum or possible vacuum full <tablename> when updating large tables to
individually clean up afterwards.

Next time, try a vacuum full first to see how much space it can
reclaim.  And lastly, use vacuum verbose to get an idea if your fsm
settings are high enough.

*Lonni J Friedman wrote:
*
Are you doing a vacuum full each night?  What is the specific
command(s) that you are using for vacuum, pg_dump and the import?



David Link writes:

The answer to Lonni question:

$ vacuumdb --analyze $database 2>&1 | tee -a $log

$ time (pg_dump $database | gzip > $backup_dir/$dump_file) 2>&1 | tee -a $log

$ gunzip -c $dumpfile | psql -h $host -d $database >/dev/null