Thread: Differential Backups

Differential Backups

From
"Ian Harding"
Date:
I have been thinking about backups.  I currently do one a day.  However, I thought it might be nice to get differential
backupsthrough the day.  I should be able to generate dumps throughout the day, generate a diff from my baseline dump,
andjust keep the diff, right?  Then to do a restore I would just patch for the point in time I wanted to restore to?
Seemslike it would work, but whether it would save any hard drive space would depend on how much activity the database
saw. Anyone doing this now? 

Ian A. Harding
Programmer/Analyst II
Tacoma-Pierce County Health Department
(253) 798-3549
mailto: ianh@tpchd.org


Re: Differential Backups

From
"Timothy H. Keitt"
Date:
Tried it. GNU diff chokes on very large files. It would be so nice if
incremental dumps were native to pgsql.

Tim

Ian Harding wrote:

>I have been thinking about backups.  I currently do one a day.  However, I thought it might be nice to get
differentialbackups through the day.  I should be able to generate dumps throughout the day, generate a diff from my
baselinedump, and just keep the diff, right?  Then to do a restore I would just patch for the point in time I wanted to
restoreto?  Seems like it would work, but whether it would save any hard drive space would depend on how much activity
thedatabase saw.  Anyone doing this now? 
>
>Ian A. Harding
>Programmer/Analyst II
>Tacoma-Pierce County Health Department
>(253) 798-3549
>mailto: ianh@tpchd.org
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>

--
Timothy H. Keitt
Department of Ecology and Evolution
State University of New York at Stony Brook
Stony Brook, New York 11794 USA
Phone: 631-632-1101, FAX: 631-632-7626
http://life.bio.sunysb.edu/ee/keitt/




Re: Differential Backups

From
Doug McNaught
Date:
"Ian Harding" <ianh@tpchd.org> writes:

> I have been thinking about backups.  I currently do one a day.
> However, I thought it might be nice to get differential backups
> through the day.  I should be able to generate dumps throughout the
> day, generate a diff from my baseline dump, and just keep the diff,
> right?  Then to do a restore I would just patch for the point in
> time I wanted to restore to?  Seems like it would work, but whether
> it would save any hard drive space would depend on how much activity
> the database saw.  Anyone doing this now?

Interesting idea.  The one thing I might worry about is that 'diff'
might (I'm not familiar with its algorithm) eat a great deal of memory
if the dumps you're comparing are very large and significantly
different.

I'd say give it a try and see how you like it.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: Differential Backups

From
Alvaro Herrera
Date:
On 29 Oct 2001, Doug McNaught wrote:

> "Ian Harding" <ianh@tpchd.org> writes:
>
> > I have been thinking about backups.  I currently do one a day.
> > However, I thought it might be nice to get differential backups
> > through the day.

> Interesting idea.  The one thing I might worry about is that 'diff'
> might (I'm not familiar with its algorithm) eat a great deal of memory
> if the dumps you're comparing are very large and significantly
> different.

GNU diff reads in memory both files. You sure need lots to compare
medium sized databases, and I don't think this method will work on big
ones.

I think this has to be implemented inside the database; maybe there's a
way of extracting the data from WAL logs (committed transactions?). Then
you need to go to the tables and see what each transaction did...

Another way to do it could be to store a timestamp on each tuple, and
check that for the diff backup. Sounds like you're going to enlarge your
data a lot by just having the timestamps...

--
Alvaro Herrera (<alvherre[@]atentus.com>)
"Coge la flor que hoy nace alegre, ufana. Quién sabe si nacera otra man~ana?"


Re: Differential Backups

From
Tod McQuillin
Date:
On Mon, 29 Oct 2001, Ian Harding wrote:

> I have been thinking about backups.  I currently do one a day.
> However, I thought it might be nice to get differential backups
> through the day.  I should be able to generate dumps throughout the
> day, generate a diff from my baseline dump, and just keep the diff,
> right?  Then to do a restore I would just patch for the point in time
> I wanted to restore to?

This is exactly what rcs (http://www.cs.purdue.edu/homes/trinkle/RCS/) and
cvs (http://www.cvshome.org/) do.

If you check each new pgdump into an rcs file, rcs saves only the diffs
from the prior revision.

I'm not sure if this would meet your needs or not, but it's worth a look.
--
Tod McQuillin



Re: Differential Backups

From
Paul Tomblin
Date:
Quoting Alvaro Herrera (alvherre@atentus.com):
> > Interesting idea.  The one thing I might worry about is that 'diff'
> > might (I'm not familiar with its algorithm) eat a great deal of memory
> > if the dumps you're comparing are very large and significantly
> > different.
>
> GNU diff reads in memory both files. You sure need lots to compare
> medium sized databases, and I don't think this method will work on big
> ones.

Doesn't GNU diff have the "-h" option?

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Never underestimate the bandwidth of a station wagon full of
tapes hurtling down the highway.
                              -- Andrew Tanenbaum

Re: Differential Backups

From
Alvaro Herrera
Date:
On Mon, 29 Oct 2001, Paul Tomblin wrote:

> Quoting Alvaro Herrera (alvherre@atentus.com):
> > > Interesting idea.  The one thing I might worry about is that 'diff'
> > > might (I'm not familiar with its algorithm) eat a great deal of memory
> > > if the dumps you're comparing are very large and significantly
> > > different.
> >
> > GNU diff reads in memory both files. You sure need lots to compare
> > medium sized databases, and I don't think this method will work on big
> > ones.
>
> Doesn't GNU diff have the "-h" option?

No, at least in my version of it (2.7, which appears to be the latest in
my local mirror of GNU). What's that supposed to do? In fact, the help
text says

       -h     This  option currently has no effect; it is present
              for Unix compatibility.

--
Alvaro Herrera (<alvherre[@]atentus.com>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)





Re: Differential Backups

From
Nicholas Piper
Date:
On Tue, 30 Oct 2001, Alvaro Herrera wrote:

> On Mon, 29 Oct 2001, Paul Tomblin wrote:

> > Quoting Alvaro Herrera (alvherre@atentus.com):

> > > GNU diff reads in memory both files. You sure need lots to compare
> > > medium sized databases, and I don't think this method will work on big
> > > ones.

> > Doesn't GNU diff have the "-h" option?

> No, at least in my version of it (2.7, which appears to be the latest in
> my local mirror of GNU). What's that supposed to do? In fact, the help

Maybe the -H option was meant:

       -H     Use heuristics to speed  handling  of  large  files
              that have numerous scattered small changes.

In 2.7 also.

--
Part 3 MEng Cybernetics; Reading, UK       http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website   1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !

Re: Differential Backups

From
Paul Tomblin
Date:
Quoting Alvaro Herrera (alvherre@atentus.com):
> On Mon, 29 Oct 2001, Paul Tomblin wrote:
> > Quoting Alvaro Herrera (alvherre@atentus.com):
> > > > Interesting idea.  The one thing I might worry about is that 'diff'
> > > > might (I'm not familiar with its algorithm) eat a great deal of memory
> > > > if the dumps you're comparing are very large and significantly
> > > > different.
> > >
> > > GNU diff reads in memory both files. You sure need lots to compare
> > > medium sized databases, and I don't think this method will work on big
> > > ones.
> >
> > Doesn't GNU diff have the "-h" option?
>
> No, at least in my version of it (2.7, which appears to be the latest in
> my local mirror of GNU). What's that supposed to do? In fact, the help
> text says
>
>        -h     This  option currently has no effect; it is present
>               for Unix compatibility.

The option I'm thinking of might be "-H".  The old man pages used to say
it stood for "half hearted", optimized for large files with few
differences.


--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
God does not play dice with the Universe.    -- Albert Einstein.

Re: Differential Backups

From
hubert depesz lubaczewski
Date:
On Mon, 29 Oct 2001 12:22:44 -0800
"Ian Harding" <ianh@tpchd.org> wrote:

> I have been thinking about backups.  I currently do one a day.  However, I
thought it might be nice to get differential backups through the day.  I
should be able to generate dumps throughout the day, generate a diff from my
baseline dump, and just keep the diff, right?  Then to do a restore I would
just patch for the point in time I wanted to restore to?  Seems like it would
work, but whether it would save any hard drive space would depend on how much
activity the database saw.  Anyone doing this now?

idea is god, but dont use suggested diff program. go for xdelta. it's
algorithm is much better - faster and definetly less memory-eating.

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
... vows are spoken to be broken ...                 [enjoy the silence]
... words are meaningless and forgettable ...             [depeche mode]

Re: Differential Backups

From
"Jeff Lu"
Date:
Can you show me an example on doing a backup using xdelta?

Thanks
-Jeff

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of hubert depesz
lubaczewski
Sent: Tuesday, October 30, 2001 7:29 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Differential Backups


On Mon, 29 Oct 2001 12:22:44 -0800
"Ian Harding" <ianh@tpchd.org> wrote:

> I have been thinking about backups.  I currently do one a day.  However, I
thought it might be nice to get differential backups through the day.  I
should be able to generate dumps throughout the day, generate a diff from my
baseline dump, and just keep the diff, right?  Then to do a restore I would
just patch for the point in time I wanted to restore to?  Seems like it
would
work, but whether it would save any hard drive space would depend on how
much
activity the database saw.  Anyone doing this now?

idea is god, but dont use suggested diff program. go for xdelta. it's
algorithm is much better - faster and definetly less memory-eating.

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
... vows are spoken to be broken ...                 [enjoy the silence]
... words are meaningless and forgettable ...             [depeche mode]

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Re: Differential Backups

From
"Chris Dircks"
Date:
quote from gnu diff man page:

-h     This option currently has no effect; it is present for Unix
compatibility.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Paul Tomblin
Sent: Monday, October 29, 2001 6:31 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Differential Backups


Quoting Alvaro Herrera (alvherre@atentus.com):
> > Interesting idea.  The one thing I might worry about is that 'diff'
> > might (I'm not familiar with its algorithm) eat a great deal of memory
> > if the dumps you're comparing are very large and significantly
> > different.
>
> GNU diff reads in memory both files. You sure need lots to compare
> medium sized databases, and I don't think this method will work on big
> ones.

Doesn't GNU diff have the "-h" option?

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Never underestimate the bandwidth of a station wagon full of
tapes hurtling down the highway.
                              -- Andrew Tanenbaum

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Re: Differential Backups

From
hubert depesz lubaczewski
Date:
On Tue, 30 Oct 2001 11:05:48 -0800
"Jeff Lu" <jklcom@mindspring.com> wrote:

> Can you show me an example on doing a backup using xdelta?

sure.
what i will show assumes that usually you want newest backup to be available
fastest. older backups can take some time to generate.

first make your standard pg_dump to some file. let's call it dump.sql
$ pg_dump -d dump.sql .........
o.k.
now next day (and every following day too) you do:
$ pg_dump -d new.dump ........
$ xdelta delta new.dump dump.sql patch_file_name
$ mv -f new.dump dump.sql

now in dump.sql you always have the newest dump file, while patch file
contains information how to get older patch from newer.
how to patch?

$ xdelta patch patch_file_name dump.sql old.dump.sql

all you have to do is to store these patchfiles forever, or just ocassionally
(once in a month) make full backup instead of differential.

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
... vows are spoken to be broken ...                 [enjoy the silence]
... words are meaningless and forgettable ...             [depeche mode]

Re: Differential Backups

From
"Mourad EL HADJ MIMOUNE"
Date:
Hi,

I have passed from Postgresql-7.1RC2 to Postgresql-7.1.3 .

I have made a backup of the old data and I have performed a initdb .

I wanted accede to the old data without success. I have used the option -D
/my-old-data of the postmaster to change the data directory but it doesn't
work.

could you give me a solution to make this.

thanks.

Mourad.