Thread: Differential Backups
I have been thinking about backups. I currently do one a day. However, I thought it might be nice to get differential backupsthrough the day. I should be able to generate dumps throughout the day, generate a diff from my baseline dump, andjust keep the diff, right? Then to do a restore I would just patch for the point in time I wanted to restore to? Seemslike it would work, but whether it would save any hard drive space would depend on how much activity the database saw. Anyone doing this now? Ian A. Harding Programmer/Analyst II Tacoma-Pierce County Health Department (253) 798-3549 mailto: ianh@tpchd.org
Tried it. GNU diff chokes on very large files. It would be so nice if incremental dumps were native to pgsql. Tim Ian Harding wrote: >I have been thinking about backups. I currently do one a day. However, I thought it might be nice to get differentialbackups through the day. I should be able to generate dumps throughout the day, generate a diff from my baselinedump, and just keep the diff, right? Then to do a restore I would just patch for the point in time I wanted to restoreto? Seems like it would work, but whether it would save any hard drive space would depend on how much activity thedatabase saw. Anyone doing this now? > >Ian A. Harding >Programmer/Analyst II >Tacoma-Pierce County Health Department >(253) 798-3549 >mailto: ianh@tpchd.org > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > -- Timothy H. Keitt Department of Ecology and Evolution State University of New York at Stony Brook Stony Brook, New York 11794 USA Phone: 631-632-1101, FAX: 631-632-7626 http://life.bio.sunysb.edu/ee/keitt/
"Ian Harding" <ianh@tpchd.org> writes: > I have been thinking about backups. I currently do one a day. > However, I thought it might be nice to get differential backups > through the day. I should be able to generate dumps throughout the > day, generate a diff from my baseline dump, and just keep the diff, > right? Then to do a restore I would just patch for the point in > time I wanted to restore to? Seems like it would work, but whether > it would save any hard drive space would depend on how much activity > the database saw. Anyone doing this now? Interesting idea. The one thing I might worry about is that 'diff' might (I'm not familiar with its algorithm) eat a great deal of memory if the dumps you're comparing are very large and significantly different. I'd say give it a try and see how you like it. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
On 29 Oct 2001, Doug McNaught wrote: > "Ian Harding" <ianh@tpchd.org> writes: > > > I have been thinking about backups. I currently do one a day. > > However, I thought it might be nice to get differential backups > > through the day. > Interesting idea. The one thing I might worry about is that 'diff' > might (I'm not familiar with its algorithm) eat a great deal of memory > if the dumps you're comparing are very large and significantly > different. GNU diff reads in memory both files. You sure need lots to compare medium sized databases, and I don't think this method will work on big ones. I think this has to be implemented inside the database; maybe there's a way of extracting the data from WAL logs (committed transactions?). Then you need to go to the tables and see what each transaction did... Another way to do it could be to store a timestamp on each tuple, and check that for the diff backup. Sounds like you're going to enlarge your data a lot by just having the timestamps... -- Alvaro Herrera (<alvherre[@]atentus.com>) "Coge la flor que hoy nace alegre, ufana. Quién sabe si nacera otra man~ana?"
On Mon, 29 Oct 2001, Ian Harding wrote: > I have been thinking about backups. I currently do one a day. > However, I thought it might be nice to get differential backups > through the day. I should be able to generate dumps throughout the > day, generate a diff from my baseline dump, and just keep the diff, > right? Then to do a restore I would just patch for the point in time > I wanted to restore to? This is exactly what rcs (http://www.cs.purdue.edu/homes/trinkle/RCS/) and cvs (http://www.cvshome.org/) do. If you check each new pgdump into an rcs file, rcs saves only the diffs from the prior revision. I'm not sure if this would meet your needs or not, but it's worth a look. -- Tod McQuillin
Quoting Alvaro Herrera (alvherre@atentus.com): > > Interesting idea. The one thing I might worry about is that 'diff' > > might (I'm not familiar with its algorithm) eat a great deal of memory > > if the dumps you're comparing are very large and significantly > > different. > > GNU diff reads in memory both files. You sure need lots to compare > medium sized databases, and I don't think this method will work on big > ones. Doesn't GNU diff have the "-h" option? -- Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. -- Andrew Tanenbaum
On Mon, 29 Oct 2001, Paul Tomblin wrote: > Quoting Alvaro Herrera (alvherre@atentus.com): > > > Interesting idea. The one thing I might worry about is that 'diff' > > > might (I'm not familiar with its algorithm) eat a great deal of memory > > > if the dumps you're comparing are very large and significantly > > > different. > > > > GNU diff reads in memory both files. You sure need lots to compare > > medium sized databases, and I don't think this method will work on big > > ones. > > Doesn't GNU diff have the "-h" option? No, at least in my version of it (2.7, which appears to be the latest in my local mirror of GNU). What's that supposed to do? In fact, the help text says -h This option currently has no effect; it is present for Unix compatibility. -- Alvaro Herrera (<alvherre[@]atentus.com>) "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
On Tue, 30 Oct 2001, Alvaro Herrera wrote: > On Mon, 29 Oct 2001, Paul Tomblin wrote: > > Quoting Alvaro Herrera (alvherre@atentus.com): > > > GNU diff reads in memory both files. You sure need lots to compare > > > medium sized databases, and I don't think this method will work on big > > > ones. > > Doesn't GNU diff have the "-h" option? > No, at least in my version of it (2.7, which appears to be the latest in > my local mirror of GNU). What's that supposed to do? In fact, the help Maybe the -H option was meant: -H Use heuristics to speed handling of large files that have numerous scattered small changes. In 2.7 also. -- Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/ Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !
Quoting Alvaro Herrera (alvherre@atentus.com): > On Mon, 29 Oct 2001, Paul Tomblin wrote: > > Quoting Alvaro Herrera (alvherre@atentus.com): > > > > Interesting idea. The one thing I might worry about is that 'diff' > > > > might (I'm not familiar with its algorithm) eat a great deal of memory > > > > if the dumps you're comparing are very large and significantly > > > > different. > > > > > > GNU diff reads in memory both files. You sure need lots to compare > > > medium sized databases, and I don't think this method will work on big > > > ones. > > > > Doesn't GNU diff have the "-h" option? > > No, at least in my version of it (2.7, which appears to be the latest in > my local mirror of GNU). What's that supposed to do? In fact, the help > text says > > -h This option currently has no effect; it is present > for Unix compatibility. The option I'm thinking of might be "-H". The old man pages used to say it stood for "half hearted", optimized for large files with few differences. -- Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody God does not play dice with the Universe. -- Albert Einstein.
On Mon, 29 Oct 2001 12:22:44 -0800 "Ian Harding" <ianh@tpchd.org> wrote: > I have been thinking about backups. I currently do one a day. However, I thought it might be nice to get differential backups through the day. I should be able to generate dumps throughout the day, generate a diff from my baseline dump, and just keep the diff, right? Then to do a restore I would just patch for the point in time I wanted to restore to? Seems like it would work, but whether it would save any hard drive space would depend on how much activity the database saw. Anyone doing this now? idea is god, but dont use suggested diff program. go for xdelta. it's algorithm is much better - faster and definetly less memory-eating. depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ ... vows are spoken to be broken ... [enjoy the silence] ... words are meaningless and forgettable ... [depeche mode]
Can you show me an example on doing a backup using xdelta? Thanks -Jeff -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of hubert depesz lubaczewski Sent: Tuesday, October 30, 2001 7:29 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Differential Backups On Mon, 29 Oct 2001 12:22:44 -0800 "Ian Harding" <ianh@tpchd.org> wrote: > I have been thinking about backups. I currently do one a day. However, I thought it might be nice to get differential backups through the day. I should be able to generate dumps throughout the day, generate a diff from my baseline dump, and just keep the diff, right? Then to do a restore I would just patch for the point in time I wanted to restore to? Seems like it would work, but whether it would save any hard drive space would depend on how much activity the database saw. Anyone doing this now? idea is god, but dont use suggested diff program. go for xdelta. it's algorithm is much better - faster and definetly less memory-eating. depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ ... vows are spoken to be broken ... [enjoy the silence] ... words are meaningless and forgettable ... [depeche mode] ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
quote from gnu diff man page: -h This option currently has no effect; it is present for Unix compatibility. -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Paul Tomblin Sent: Monday, October 29, 2001 6:31 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Differential Backups Quoting Alvaro Herrera (alvherre@atentus.com): > > Interesting idea. The one thing I might worry about is that 'diff' > > might (I'm not familiar with its algorithm) eat a great deal of memory > > if the dumps you're comparing are very large and significantly > > different. > > GNU diff reads in memory both files. You sure need lots to compare > medium sized databases, and I don't think this method will work on big > ones. Doesn't GNU diff have the "-h" option? -- Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. -- Andrew Tanenbaum ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Tue, 30 Oct 2001 11:05:48 -0800 "Jeff Lu" <jklcom@mindspring.com> wrote: > Can you show me an example on doing a backup using xdelta? sure. what i will show assumes that usually you want newest backup to be available fastest. older backups can take some time to generate. first make your standard pg_dump to some file. let's call it dump.sql $ pg_dump -d dump.sql ......... o.k. now next day (and every following day too) you do: $ pg_dump -d new.dump ........ $ xdelta delta new.dump dump.sql patch_file_name $ mv -f new.dump dump.sql now in dump.sql you always have the newest dump file, while patch file contains information how to get older patch from newer. how to patch? $ xdelta patch patch_file_name dump.sql old.dump.sql all you have to do is to store these patchfiles forever, or just ocassionally (once in a month) make full backup instead of differential. depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ ... vows are spoken to be broken ... [enjoy the silence] ... words are meaningless and forgettable ... [depeche mode]
Hi, I have passed from Postgresql-7.1RC2 to Postgresql-7.1.3 . I have made a backup of the old data and I have performed a initdb . I wanted accede to the old data without success. I have used the option -D /my-old-data of the postmaster to change the data directory but it doesn't work. could you give me a solution to make this. thanks. Mourad.