Thread: Incrementally Updated Backup
Way past feature freeze, but this small change allows a powerful new feature utilising the Restartable Recovery capability. Very useful for very large database backups... Includes full documentation. Perhaps a bit rushed, but inclusion in 8.2 would be great. (Ouch, don't shout back, read the patch first....) ----------------------------- Docs copied here as better explanation: <title>Incrementally Updated Backups</title> <para> Restartable Recovery can also be utilised to avoid the need to take regular complete base backups, thus improving backup performance in situations where the server is heavily loaded or the database is very large. This concept is known as incrementally updated backups. </para> <para> If we take a backup of the server files after a recovery is partially completed, we will be able to restart the recovery from the last restartpoint. This backup is now further forward along the timeline than the original base backup, so we can refer to it as an incrementally updated backup. If we need to recover, it will be faster to recover from the incrementally updated backup than from the base backup. </para> <para> The <xref linkend="startup-after-recovery"> option in the recovery.conf file is provided to allow the recovery to complete up to the current last WAL segment, yet without starting the database. This option allows us to stop the server and take a backup of the partially recovered server files: this is the incrementally updated backup. </para> <para> We can use the incrementally updated backup concept to come up with a streamlined backup schedule. For example: <orderedlist> <listitem> <para> Set up continuous archiving </para> </listitem> <listitem> <para> Take weekly base backup </para> </listitem> <listitem> <para> After 24 hours, restore base backup to another server, then run a partial recovery and take a backup of the latest database state to produce an incrmentally updated backup. </para> </listitem> <listitem> <para> After next 24 hours, restore the incrementally updated backup to the second server, then run a partial recovery, at the end, take a backup of the partially recovered files. </para> </listitem> <listitem> <para> Repeat previous step each day, until the end of the week. </para> </listitem> </orderedlist> </para> <para> A weekly backup need only be taken once per week, yet the same level of protection is offered as if base backups were taken nightly. </para> </sect2> -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Attachment
Simon Riggs wrote: > + > + if (startupAfterRecovery) > + ereport(ERROR, > + (errmsg("recovery ends normally with startup_after_recovery=false"))); > + I find this part of the patch a bit ugly. Isn't there a better way to exit than throwing an error that's not really an error? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Simon Riggs wrote: >> + >> + if (startupAfterRecovery) >> + ereport(ERROR, >> + (errmsg("recovery ends normally with startup_after_recovery=false"))); >> + > I find this part of the patch a bit ugly. Isn't there a better way to > exit than throwing an error that's not really an error? This patch has obviously been thrown together with no thought and even less testing. It breaks the normal case (I think the above if-test is backwards), and I don't believe that it works for the advertised purpose either (because nothing gets done to force a checkpoint before aborting, thus the files on disk are not up to date with the end of WAL). Also, I'm not sold that the concept is even useful. Apparently the idea is to offload the expense of taking periodic base backups from a master server, by instead backing up a PITR slave's fileset --- which is fine. But why in the world would you want to stop the slave to do it? ISTM we would want to arrange things so that you can copy the slave's files while it continues replicating, just as with a standard base backup. regards, tom lane
No, too late. --------------------------------------------------------------------------- Simon Riggs wrote: > > Way past feature freeze, but this small change allows a powerful new > feature utilising the Restartable Recovery capability. Very useful for > very large database backups... > > Includes full documentation. > > Perhaps a bit rushed, but inclusion in 8.2 would be great. (Ouch, don't > shout back, read the patch first....) > > ----------------------------- > Docs copied here as better explanation: > > <title>Incrementally Updated Backups</title> > > <para> > Restartable Recovery can also be utilised to avoid the need to take > regular complete base backups, thus improving backup performance in > situations where the server is heavily loaded or the database is > very large. This concept is known as incrementally updated backups. > </para> > > <para> > If we take a backup of the server files after a recovery is > partially > completed, we will be able to restart the recovery from the last > restartpoint. This backup is now further forward along the timeline > than the original base backup, so we can refer to it as an > incrementally > updated backup. If we need to recover, it will be faster to recover > from > the incrementally updated backup than from the base backup. > </para> > > <para> > The <xref linkend="startup-after-recovery"> option in the > recovery.conf > file is provided to allow the recovery to complete up to the current > last > WAL segment, yet without starting the database. This option allows > us > to stop the server and take a backup of the partially recovered > server > files: this is the incrementally updated backup. > </para> > > <para> > We can use the incrementally updated backup concept to come up with > a > streamlined backup schedule. For example: > <orderedlist> > <listitem> > <para> > Set up continuous archiving > </para> > </listitem> > <listitem> > <para> > Take weekly base backup > </para> > </listitem> > <listitem> > <para> > After 24 hours, restore base backup to another server, then run a > partial recovery and take a backup of the latest database state to > produce an incrmentally updated backup. > </para> > </listitem> > <listitem> > <para> > After next 24 hours, restore the incrementally updated backup to > the > second server, then run a partial recovery, at the end, take a > backup > of the partially recovered files. > </para> > </listitem> > <listitem> > <para> > Repeat previous step each day, until the end of the week. > </para> > </listitem> > </orderedlist> > </para> > > <para> > A weekly backup need only be taken once per week, yet the same level > of > protection is offered as if base backups were taken nightly. > </para> > > </sect2> > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> On Tue, 2006-09-19 at 12:13 -0400, Tom Lane wrote: > Also, I'm not sold that the concept is even useful. Apparently the idea > is to offload the expense of taking periodic base backups from a master > server, by instead backing up a PITR slave's fileset --- which is fine. Good. That's the key part of the idea and its a useful one, so I was looking to document it for 8.2 I thought of this idea separately, then, as usual, realised that this idea has a long heritage: Change Accumulation has been in production use with IMS for at least 20 years. > But why in the world would you want to stop the slave to do it? ISTM > we would want to arrange things so that you can copy the slave's files > while it continues replicating, just as with a standard base backup. You can do that, of course, but my thinking was that people would regard the technique as "unsupported", so I added a quick flag as a prototype. On Tue, 2006-09-19 at 12:13 -0400, Tom Lane wrote: > This patch has obviously been thrown together with no thought and even > less testing. It breaks the normal case (I think the above if-test is > backwards), and I don't believe that it works for the advertised purpose > either (because nothing gets done to force a checkpoint before aborting, > thus the files on disk are not up to date with the end of WAL). Yes, it was done very quickly and submitted to ensure it could be considered yesterday for inclusion. It was described by me as rushed, which it certainly was because of personal time pressure yesterday: I thought that made it clear that discussion was needed. Heikki mentions to me it wasn't clear, so those criticisms are accepted. On Tue, 2006-09-19 at 16:05 +0100, Heikki Linnakangas wrote: > Simon Riggs wrote: > > + > > + if (startupAfterRecovery) > > + ereport(ERROR, > > + (errmsg("recovery ends normally with startup_after_recovery=false"))); > > + > > I find this part of the patch a bit ugly. Me too. Overall, my own thoughts and Tom's and Heikki's comments indicate I should withdraw the patch rather than fix it. Patch withdrawn. Enclose a new doc patch to describe the capability, without s/w change. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Attachment
On Wed, Sep 20, 2006 at 02:09:43PM +0100, Simon Riggs wrote: > > But why in the world would you want to stop the slave to do it? ISTM > > we would want to arrange things so that you can copy the slave's files > > while it continues replicating, just as with a standard base backup. > > You can do that, of course, but my thinking was that people would regard > the technique as "unsupported", so I added a quick flag as a prototype. An advantage to being able to stop the server is that you could have one server processing backups for multiple PostgreSQL clusters by going through them 1 (or more likely, 2, 4, etc) at a time, essentially providing N+1 capability. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes: > An advantage to being able to stop the server is that you could have one > server processing backups for multiple PostgreSQL clusters by going > through them 1 (or more likely, 2, 4, etc) at a time, essentially > providing N+1 capability. Why wouldn't you implement that by putting N postmasters onto the backup server? It'd be far more efficient than the proposed patch, which by aborting at random points is essentially guaranteeing a whole lot of useless re-replay of WAL whenever you restart it. regards, tom lane
On Wed, Sep 20, 2006 at 04:26:30PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jim@nasby.net> writes: > > An advantage to being able to stop the server is that you could have one > > server processing backups for multiple PostgreSQL clusters by going > > through them 1 (or more likely, 2, 4, etc) at a time, essentially > > providing N+1 capability. > > Why wouldn't you implement that by putting N postmasters onto the backup > server? It'd be far more efficient than the proposed patch, which by > aborting at random points is essentially guaranteeing a whole lot of > useless re-replay of WAL whenever you restart it. My thought is that in many envoronments it would take much beefier hardware to support N postmasters running simultaneously than to cycle through them periodically bringing the backups up-to-date. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes: > My thought is that in many envoronments it would take much beefier > hardware to support N postmasters running simultaneously than to cycle > through them periodically bringing the backups up-to-date. How you figure that? The cycling approach will require more total I/O due to extra page re-reads ... particularly if it's built on a patch like this one that abandons work-in-progress at arbitrary points. A postmaster running WAL replay does not require all that much in the way of CPU resources. It is going to need I/O comparable to the gross I/O load of its master, but cycling isn't going to reduce that at all. regards, tom lane
On Wed, Sep 20, 2006 at 05:50:48PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jim@nasby.net> writes: > > My thought is that in many envoronments it would take much beefier > > hardware to support N postmasters running simultaneously than to cycle > > through them periodically bringing the backups up-to-date. > > How you figure that? The cycling approach will require more total I/O > due to extra page re-reads ... particularly if it's built on a patch > like this one that abandons work-in-progress at arbitrary points. > > A postmaster running WAL replay does not require all that much in the > way of CPU resources. It is going to need I/O comparable to the gross > I/O load of its master, but cycling isn't going to reduce that at all. True, but running several dozen instances on a single machine will require a lot more memory (or, conversely, each individual database gets a lot less memory to use). Of course, this is all hand-waving right now... it'd be interesting to see which approach was actually better. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> True, but running several dozen instances on a single machine will > require a lot more memory (or, conversely, each individual database gets > a lot less memory to use). > > Of course, this is all hand-waving right now... it'd be interesting to > see which approach was actually better. I'm running 4 WAL logging standby clusters on a single machine. While the load on the master servers occasionally goes up to >60, the load on the standby machine have never climbed above 5. Of course when the master servers are all loaded, the standby gets behind with the recovery... but eventually it gets up to date again. I would be very surprised if it would get less behind if I would use it in the 1 by 1 scenario. Cheers, Csaba.
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Simon Riggs wrote: > > > On Tue, 2006-09-19 at 12:13 -0400, Tom Lane wrote: > > > Also, I'm not sold that the concept is even useful. Apparently the idea > > is to offload the expense of taking periodic base backups from a master > > server, by instead backing up a PITR slave's fileset --- which is fine. > > Good. That's the key part of the idea and its a useful one, so I was > looking to document it for 8.2 > > I thought of this idea separately, then, as usual, realised that this > idea has a long heritage: Change Accumulation has been in production use > with IMS for at least 20 years. > > > But why in the world would you want to stop the slave to do it? ISTM > > we would want to arrange things so that you can copy the slave's files > > while it continues replicating, just as with a standard base backup. > > You can do that, of course, but my thinking was that people would regard > the technique as "unsupported", so I added a quick flag as a prototype. > > On Tue, 2006-09-19 at 12:13 -0400, Tom Lane wrote: > > > This patch has obviously been thrown together with no thought and even > > less testing. It breaks the normal case (I think the above if-test is > > backwards), and I don't believe that it works for the advertised purpose > > either (because nothing gets done to force a checkpoint before aborting, > > thus the files on disk are not up to date with the end of WAL). > > Yes, it was done very quickly and submitted to ensure it could be > considered yesterday for inclusion. It was described by me as rushed, > which it certainly was because of personal time pressure yesterday: I > thought that made it clear that discussion was needed. Heikki mentions > to me it wasn't clear, so those criticisms are accepted. > > On Tue, 2006-09-19 at 16:05 +0100, Heikki Linnakangas wrote: > > Simon Riggs wrote: > > > + > > > + if (startupAfterRecovery) > > > + ereport(ERROR, > > > + (errmsg("recovery ends normally with startup_after_recovery=false"))); > > > + > > > > I find this part of the patch a bit ugly. > > Me too. > > > > Overall, my own thoughts and Tom's and Heikki's comments indicate I > should withdraw the patch rather than fix it. Patch withdrawn. > > Enclose a new doc patch to describe the capability, without s/w change. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Documentation patch applied. Thanks. Your documentation changes can be viewed in five minutes using links on the developer's page, http://www.postgresql.org/developer/testing. --------------------------------------------------------------------------- Simon Riggs wrote: > > > On Tue, 2006-09-19 at 12:13 -0400, Tom Lane wrote: > > > Also, I'm not sold that the concept is even useful. Apparently the idea > > is to offload the expense of taking periodic base backups from a master > > server, by instead backing up a PITR slave's fileset --- which is fine. > > Good. That's the key part of the idea and its a useful one, so I was > looking to document it for 8.2 > > I thought of this idea separately, then, as usual, realised that this > idea has a long heritage: Change Accumulation has been in production use > with IMS for at least 20 years. > > > But why in the world would you want to stop the slave to do it? ISTM > > we would want to arrange things so that you can copy the slave's files > > while it continues replicating, just as with a standard base backup. > > You can do that, of course, but my thinking was that people would regard > the technique as "unsupported", so I added a quick flag as a prototype. > > On Tue, 2006-09-19 at 12:13 -0400, Tom Lane wrote: > > > This patch has obviously been thrown together with no thought and even > > less testing. It breaks the normal case (I think the above if-test is > > backwards), and I don't believe that it works for the advertised purpose > > either (because nothing gets done to force a checkpoint before aborting, > > thus the files on disk are not up to date with the end of WAL). > > Yes, it was done very quickly and submitted to ensure it could be > considered yesterday for inclusion. It was described by me as rushed, > which it certainly was because of personal time pressure yesterday: I > thought that made it clear that discussion was needed. Heikki mentions > to me it wasn't clear, so those criticisms are accepted. > > On Tue, 2006-09-19 at 16:05 +0100, Heikki Linnakangas wrote: > > Simon Riggs wrote: > > > + > > > + if (startupAfterRecovery) > > > + ereport(ERROR, > > > + (errmsg("recovery ends normally with startup_after_recovery=false"))); > > > + > > > > I find this part of the patch a bit ugly. > > Me too. > > > > Overall, my own thoughts and Tom's and Heikki's comments indicate I > should withdraw the patch rather than fix it. Patch withdrawn. > > Enclose a new doc patch to describe the capability, without s/w change. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +