Thread: Database backup mechanism

Database backup mechanism

From
RaviKumar.Mandala@versata.com
Date:
<br /><font face="sans-serif" size="2">Hi Folks,</font><br /><br /><font face="sans-serif" size="2">We have a
requirementto deal with large databases of the size Terabytes when we go into production. What is the best database
back-upmechanism and possible issues?</font><br /><br /><font face="sans-serif" size="2">pg_dump can back-up database
butthe dump file is limited by OS file-size limit. What about the option of compressing the dump file? How much time
doesit generally take for large databases? I heard, that it would be way too long (even one or days). I haven't tried
itout, though.</font><br /><br /><font face="sans-serif" size="2">What about taking zipped back-up of the database
directory?We tried this out but the checkpoint data in pg_xlogs directory is also being backed-up. Since these logs
keepson increasing from day1 of database creation, the back_up size if increasing drastically.</font><br /><font
face="sans-serif"size="2">Can we back-up certain subdirectories without loss of information or consistency..?</font><br
/><br/><font face="sans-serif" size="2">Any quick comments/suggestions in this regard would be very helpful.</font><br
/><br/><font face="sans-serif" size="2">Thanks in advance,</font><br /><font face="sans-serif" size="2">Ravi Kumar
Mandala</font>

Re: Database backup mechanism

From
Heikki Linnakangas
Date:
RaviKumar.Mandala@versata.com wrote:
> We have a requirement to deal with large databases of the size Terabytes 
> when we go into production. What is the best database back-up mechanism 
> and possible issues?

It depends.

Make sure you read Chapter 23. Backup and Restore of the user manual:

http://www.postgresql.org/docs/8.2/interactive/backup.html

It discusses pg_dump and restore, as well as file system level backup. 
You'll probably want to set up continuous archiving, which allows you to 
take a file-system level backup without shutting down the database.

> What about taking zipped back-up of the database directory? We tried this 
> out but the checkpoint data in pg_xlogs directory is also being backed-up. 
> Since these logs keeps on increasing from day1 of database creation, the 
> back_up size if increasing drastically.

The amount of WAL files in pg_xlog directory is controlled by the 
checkpoint_segments configuration parameter.

> Can we back-up certain subdirectories without loss of information or 
> consistency..?

No.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Database backup mechanism

From
Doug Knight
Date:
I would also be interested in any "creative" ways to reduce the size and time to backup databases/clusters. We were
justhaving a conversation about this yesterday. We were mulling over things like using rsync to only backup files in
thedatabase directory tree that actually changed. Or maybe doing a selective backup of files based on modified times,
etc,but were unsure if this would be a safe, reliable way to backup a reduced set of data.<br /><br /> Doug Knight<br
/>WSI Inc.<br /> Andover, MA<br /><br /> On Fri, 2007-02-09 at 12:45 +0530, RaviKumar.Mandala@versata.com wrote:<br
/><blockquotetype="CITE"><br /><font size="2"><font color="#000000">Hi Folks,</font></font><font color="#000000">
</font><br/><br /><font size="2"><font color="#000000">We have a requirement to deal with large databases of the size
Terabyteswhen we go into production. What is the best database back-up mechanism and possible
issues?</font></font><fontcolor="#000000"> </font><br /><br /><font size="2"><font color="#000000">pg_dump can back-up
databasebut the dump file is limited by OS file-size limit. What about the option of compressing the dump file? How
muchtime does it generally take for large databases? I heard, that it would be way too long (even one or days). I
haven'ttried it out, though.</font></font><font color="#000000"> </font><br /><br /><font size="2"><font
color="#000000">Whatabout taking zipped back-up of the database directory? We tried this out but the checkpoint data in
pg_xlogsdirectory is also being backed-up. Since these logs keeps on increasing from day1 of database creation, the
back_upsize if increasing drastically.</font></font><font color="#000000"> </font><br /><font size="2"><font
color="#000000">Canwe back-up certain subdirectories without loss of information or consistency..?</font></font><font
color="#000000"></font><br /><br /><font size="2"><font color="#000000">Any quick comments/suggestions in this regard
wouldbe very helpful.</font></font><font color="#000000"> </font><br /><br /><font size="2"><font
color="#000000">Thanksin advance,</font></font><font color="#000000"> </font><br /><font size="2"><font
color="#000000">RaviKumar Mandala</font></font></blockquote> 

Re: Database backup mechanism

From
Andrew Dunstan
Date:
RaviKumar.Mandala@versata.com wrote:
>
> Hi Folks,
>
> We have a requirement to deal with large databases of the size 
> Terabytes when we go into production. What is the best database 
> back-up mechanism and possible issues?
>
> pg_dump can back-up database but the dump file is limited by OS 
> file-size limit. What about the option of compressing the dump file? 
> How much time does it generally take for large databases? I heard, 
> that it would be way too long (even one or days). I haven't tried it 
> out, though.
>
> What about taking zipped back-up of the database directory? We tried 
> this out but the checkpoint data in pg_xlogs directory is also being 
> backed-up. Since these logs keeps on increasing from day1 of database 
> creation, the back_up size if increasing drastically.
> Can we back-up certain subdirectories without loss of information or 
> consistency..?
>
> Any quick comments/suggestions in this regard would be very helpful.
>

Please ask in the correct forum, either pgsql-general or pgsql-admin. 
This list is strictly for discussion of development of postgres, not 
usage questions.

(If all you need is a pg_dump backup, maybe you could just pipe its 
output to something like 'split -a 5 -b 1000m - mybackup')

cheers

andrew


Re: Database backup mechanism

From
Richard Huxton
Date:
Doug Knight wrote:
> I would also be interested in any "creative" ways to reduce the size and
> time to backup databases/clusters. We were just having a conversation
> about this yesterday. We were mulling over things like using rsync to
> only backup files in the database directory tree that actually changed.
> Or maybe doing a selective backup of files based on modified times, etc,
> but were unsure if this would be a safe, reliable way to backup a
> reduced set of data.

Most of your virtual-layer filesystems should offer a snapshot facility 
that lets the database think its writing to the files while you see a 
static version. You could rsync that against an older file-level copy as 
the base copy in a PITR backup.

Note - even with a snapshot facility you need to use PITR or stop the 
database to get a guaranteed working copy.

--   Richard Huxton  Archonet Ltd


Re: Database backup mechanism

From
Robert Treat
Date:
On Friday 09 February 2007 08:16, Doug Knight wrote:
> I would also be interested in any "creative" ways to reduce the size and
> time to backup databases/clusters. We were just having a conversation
> about this yesterday. We were mulling over things like using rsync to
> only backup files in the database directory tree that actually changed.
> Or maybe doing a selective backup of files based on modified times, etc,
> but were unsure if this would be a safe, reliable way to backup a
> reduced set of data.
>

The biggest factor with backing up TB databases is the amount of activity you 
have going on.  If you have a fairly static data set, you can try using rsync 
or custom pg_dump scripts piping specific tables to something like split as 
needed. Both of these methods will require some finagling, so make sure to 
test them before using them.  You can probably also use something like Slony, 
though the initial data copy will prove painful, but again on a fairly static 
set of data it could work. 

If you're doing large amounts of transactional activity, these methods are 
going to become unworkable.  We push about 2 GB of WAL an hour on one of our 
systems, and the only method that seems workable is using PITR with weekly 
filesystem snapshots as the base and then copying the xlogs offline for 
re-play.  It's still tricky to get right, but it seems to work.  

-- 
Robert Treat
Database Architect
OmniTI Computer Consulting, Inc.