Thread: Backup - filesystem snapshots

Backup - filesystem snapshots

From
Chris Jewell
Date:
Hi,

I'm trying to implement a backup strategy for a research database in
order to prevent again users accidentally dropping their data.

My preferred method would be to create regular snapshots of the data
directory, and then send this to the backup server using rsync, with
hard-linking backup rotation.  The backup data directories could then be
examined using a postmaster running on the backup server to extract any
accidentally deleted tables.

My problem is how to do these snapshots: is it enough to create a hard
link to the directory, or is there still a risk that a currently running
transaction might introduce inconsistencies?  I guess I could use the
pg_ctl -m 'Smart' command to stop the database after all clients have
disconnected, but I sometimes have users leaving their clients connected
all night.  Is there any other way to suspend the postmaster such that
it finishes its current transaction and queues any other transactions
while the snapshot is taking place?  Any other ideas of how I can create
such snapshots?

Thanks,

Chris
--
Chris Jewell
Dept of Maths and Statistics
Fylde College
Lancaster University
Lancaster
Lancs
LA1 4YF

Re: Backup - filesystem snapshots

From
Dmitri Kushak
Date:
Chris,

I am having the same problem here and I am considering using SVN for DB backup, i.e. every snapshot (whichever way you do it, pg_dump seems OK)
gets checked in on a regular basis, then if a table is accidentally deleted, it can still be checked out and restored. In case you do not know,
SVN (Subversion, http://subversion.tigris.org) is like CVS (Concurrent version system).
Of course the SVN repository itself needs to be backed up somewhere.

Best regards,
Dmitri



On Wed, 2006-01-18 at 20:11 +0000, Chris Jewell wrote:
Hi,

I'm trying to implement a backup strategy for a research database in
order to prevent again users accidentally dropping their data.

My preferred method would be to create regular snapshots of the data
directory, and then send this to the backup server using rsync, with
hard-linking backup rotation.  The backup data directories could then be
examined using a postmaster running on the backup server to extract any
accidentally deleted tables.

My problem is how to do these snapshots: is it enough to create a hard
link to the directory, or is there still a risk that a currently running
transaction might introduce inconsistencies?  I guess I could use the
pg_ctl -m 'Smart' command to stop the database after all clients have
disconnected, but I sometimes have users leaving their clients connected
all night.  Is there any other way to suspend the postmaster such that
it finishes its current transaction and queues any other transactions
while the snapshot is taking place?  Any other ideas of how I can create
such snapshots?

Thanks,

Chris

Re: Backup - filesystem snapshots

From
Chris Browne
Date:
c.jewell@lancaster.ac.uk (Chris Jewell) writes:
> I'm trying to implement a backup strategy for a research database in
> order to prevent again users accidentally dropping their data.
>
> My preferred method would be to create regular snapshots of the data
> directory, and then send this to the backup server using rsync, with
> hard-linking backup rotation.  The backup data directories could then be
> examined using a postmaster running on the backup server to extract any
> accidentally deleted tables.
>
> My problem is how to do these snapshots: is it enough to create a hard
> link to the directory, or is there still a risk that a currently running
> transaction might introduce inconsistencies?  I guess I could use the
> pg_ctl -m 'Smart' command to stop the database after all clients have
> disconnected, but I sometimes have users leaving their clients connected
> all night.  Is there any other way to suspend the postmaster such that
> it finishes its current transaction and queues any other transactions
> while the snapshot is taking place?  Any other ideas of how I can create
> such snapshots?

Short answer to "is there a risk?" is "You betcha!"

There's a fairly new feature called Point in Time Recovery that is
specifically designed to address those risks.

<http://www.postgresql.org/docs/8.1/static/backup-online.html>
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/advocacy.html
The English exam was a piece  of cake---which was a bit of a surprise,
actually, because I was expecting some questions on a sheet of paper.

Re: Backup - filesystem snapshots

From
Chris Jewell
Date:
Hi,

Thanks for all the suggestions.  One thing I had done, before we got the
backup server, was to write a Perl script that parsed the log file
looking for SQL commands that altered data (eg CREATE, ALTER, UPDATE
etc).  I had, of course, set the client_min_messages = notice.  Since I
had confined my users to their own schemas, the script simply queries
the database for the user list, checks to see if they have altered data,
and does a pg_dump on their schema if they have.  This actually was
working reasonably well, but required a good deal of monitoring.  I will
go back to using this system if the WAL logging seems clunky.  However,
the disadvantage is that you need to confine users to their own schemas.
 Also, the script was written pre-roles.  I'd probably have to modify it
nowadays!

If anyone is interested, I can post the Perl script.

Chris

PS The backups are really only done to protect users from themselves
since we're running a RAID mirroring system - hence the PITR for a given
user (as opposed to for the whole database).

Chris Jewell wrote:
> Hi,
>
> I'm trying to implement a backup strategy for a research database in
> order to prevent again users accidentally dropping their data.
>
> My preferred method would be to create regular snapshots of the data
> directory, and then send this to the backup server using rsync, with
> hard-linking backup rotation.  The backup data directories could then be
> examined using a postmaster running on the backup server to extract any
> accidentally deleted tables.
>
> My problem is how to do these snapshots: is it enough to create a hard
> link to the directory, or is there still a risk that a currently running
> transaction might introduce inconsistencies?  I guess I could use the
> pg_ctl -m 'Smart' command to stop the database after all clients have
> disconnected, but I sometimes have users leaving their clients connected
> all night.  Is there any other way to suspend the postmaster such that
> it finishes its current transaction and queues any other transactions
> while the snapshot is taking place?  Any other ideas of how I can create
> such snapshots?
>
> Thanks,
>
> Chris

--
Chris Jewell, BSc(Hons), BVSc, MRCVS
Dept of Maths and Statistics
Fylde College
Lancaster University
Lancaster
Lancs
LA1 4YF