backup/restore - another area. - Mailing list pgsql-performance

From Jeff
Subject backup/restore - another area.
Date
Msg-id Pine.BSF.4.44.0310091517320.71723-100000@torgo.978.org
Whole thread Raw
Responses Re: backup/restore - another area.
List pgsql-performance
Boy, I must be getting annoying by now huh?

Anyway, after the joys of Solaris being fast I'm moving onto another area
- backup & restore.  I've been checking the archives and haven't seen any
"good" tips for backing up big databases (and more importantly,
restoring).

I've noticed while doing a backup (with both -Fc and regular recipe) that
my IO is no where near being stressed.  According to vmstat, it sits
around reading about 512kB/sec (with occasional spikes) and every 5-6
seconds it writes out a 3MB hunk.

So as a test I decided to cp a 1GB file and got a constant read speed of
20MB/sec and the writes. well. were more sporatic (buffering most likely)
and it would write out 60MB every 3 seconds.

And. then.. on the restore I notice similar things - IO hardly being
stressed at all... reading in at ~512kB/sec and every now and then writing
out a few MB.


So, I've been thinking of various backup/restore strategies... some I'm
sure some people do, some need code written and may be controvertial..

Idea #1:
Use an LVM and take a snapshop - archive that.
From the way I see it. the downside is the LVM will use a lot of space
until the snapshot is removed.  Also PG may be in a slightly inconsistant
state - but this should "appear" to PG the same as if the power went out.

For restore, simply unarchive this snapshot and point postgres at it. Let
it recover and you are good to go.

Little overhead from what I see...
I'm leaning towards this method the more I think of it.

Idea #2:

a new program/internal "system". Lets call it pg_backup. It would generate
a very fast backup (that restores very fast) at the expense of disk space.
Pretty much what we would do is write out new copies of all the pages in
the db - both indexes and tables.

the pro's to this is it does not depend on an LVM and therefore is
accessable to all platforms.  it also has the other benfets mentioned
above, except speed.

For a restore PG would need something like a 'restore mode' where we can
just have it pump pages into it somehow.. It would not have to build
index, check constraints, and all that because by definition the backup
would contain valid data.

The downside for both of these are that the backup is only good for that
version of PG on that architecture.  Speaking in Informix world this is
how it is - it has a fast backup & fast restore that does essentially #2
and then it has export/import options (works like our current pg_dump and
restore).

and oh yeah -I've tried disabling fsync on load and while it did go faster
it was only 2 minutes faster (9m vs 11m).

Any thoughts on this? What do you ther folk with big db's do?

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



pgsql-performance by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: [pgsql-advocacy] OFFTOPIC: PostgreSQL vs MySQL
Next
From: "scott.marlowe"
Date:
Subject: Re: PostgreSQL vs MySQL