Re: How to do fast, reliable backups? - Mailing list pgsql-admin

From Silvana Di Martino
Subject Re: How to do fast, reliable backups?
Date
Msg-id 200403061329.17799.silvanadimartino@tin.it
Whole thread Raw
In response to How to do fast, reliable backups?  (Chris Ruprecht <chris@ruprecht.org>)
Responses Re: How to do fast, reliable backups?
Re: How to do fast, reliable backups?
List pgsql-admin
Alle 21:37, venerdì 5 marzo 2004, Chris Ruprecht ha scritto:
> I am wondering how you guys back up your databases. Say, I have a 20 GB
> database, data and indexes. If I run pg_dump on this, it backs up the
> schema and the data. When I have to restore this, I whould have to run this
> through psql which would then re-build the indexes after it has inserted
> the records. That's not really a feasable option as it takes too long.

When you have large databases and you need consistent backups, an interesting
solution is master-slave replication. Just use a second server of your LAN as
a slave replica of your main database server. Should you have any problem
with the main server, just switch the IP addresses of the master and slave
servers and you are up and running again. The switching takes a few
milliseconds and can even be performed automatically by a few specific
programs (look for "fault-tolerant linux systems" with google). While your
backup server serves your users, you have all the time to install a new hard
disk into the main server, restore your data (from the running slave server
or from a tape) and come back to service.

When you need to put your data into a tape or a CD-ROM, just stop the
master-slave mechanism, make a "static" backup (a pg_dump or a file system
copy) of your _slave_ server and restart the replication program. The
master-slave replication mechanism will take care to re-align the two servers
in a few minutes. This way, you never need to stop the main server.

> If I back up the pgdata directory, I will get inconsistent data, as
> somebody might update data in 2 tables while my backup is running, one
> change in a table I have already backed up, one in a table I have not. The
> solution would be to shut the database server down, do the backup and then
> start everything back up. But that's not really an option in a 24/7
> environment.

Making a copy of your data directory while your server is running is
definitely a bad idea. Forget it. Backing up the file system used by a RDBMS
usually requires to put the server off-line and flush all the unwritten data
to the disk. This cannot be done in a 24/7 environment.

> What I'd like to see is a transaction aware backup which backs up indexes
> as well as data.

Have a look at one of the master-slave replication systems available on the
net. For this task you do not need anything as sophisticated as a
multi-master replication system (Like the Bettina Kemme "PG-Replicator"
project). You could even be able to write down your own client program with
Perl o Python. The program just have to connect to your running master
server, read the relevant data and insert them into the database on the
slave server. Using a SERIAL primary key it is quite easy to spot which data
of a table have not been copied to the slave server yet. Using a timer, you
can define the time granularity of the replication mechanism that best fit
your need. Using transactions you can be sure to get just consistent data.
Reading and saving system tables, you can backup even the most hidden and
subtle things of your DB.

Beware that restoring your data is up to you, if you use a custom-made system.
Test the restore system before using your program in a production
environment.

Good luck.
-----------------------------------------
Alessandro Bottoni and Silvana Di Martino
alessandrobottoni@interfree.it
silvanadimartino@tin.it

pgsql-admin by date:

Previous
From: Grega Bremec
Date:
Subject: Re: Postgresql functions
Next
From: Sam Barnett-Cormack
Date:
Subject: Re: Help! Regarding Pg for posgreSQL