Thread: PITR as Online Backup Solution

PITR as Online Backup Solution

From
"Thomas F. O'Connell"
Date:
I'm administering a database that is not immediately a good candidate
for replication via Slony. As an interim solution, I'd like to give
PITR a shot. The primary goal is to have a failover scenario that
allows for recovery within a window that's much smaller than it would
be if the only option were the output of the previous night's
pg_dump, and the need has arisen jointly with the fact that pg_dump
has become an unwieldy process on this database.

Here's the main question: Is it possible to replay logs continuously
against a database that has already been recovered? The documentation
only covers the scenario of full recovery from scratch, and I'm
wondering if this is because this is the only scenario possible with
the current level of PITR functionality.

Ideally, I'd be able to take a base backup of a production system,
copy it to a remote system, which is also the repository for segment
files generated by archive_command, and complete the recovery process
outlined in the docs. From that point, it would make sense to me that
I should be able to continuously replay WAL files against the new
database (possibly as soon as archive_command generates a new one)
without having to purge my data directory. Is that a reasonable
assumption?

If so, then I need to know more about which steps to use from the
recovery scenario presented in the docs and how to identify the state
in which a given segment file exists. For instance, it's not
immediately clear from the docs what happens to the segment files
after recovery_command runs during the recovery scenario. It says
those segments are copied from the archive directory, but then what?
Are they recycled as in a basic postgres installation?

Also, if this system eventually works as I expect it might be able
to, would there ever be a reason to redo it from scratch (i.e., is
one base backup sufficient ad infinitum)?

--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)


Re: PITR as Online Backup Solution

From
"Andy Shellam"
Date:
Hi Thomas,

I've been trying to get a similar system up and running, and I have to say
one point in the documentation isn't particularly clear.

It says that you can string together an almost endless supply of WAL logs to
replay against a database, however you must always start from a base backup
- ie. You cannot do a recovery on your standby, add a log from your live,
then re-do the recovery again.  You would have to restore your base backup,
then replay all the logs to include the new one.

A guy called Simon Riggs came up with an idea of having a base backup, then
creating a script to be called in your recovery_command that reads the WAL
logs in, but if it cannot find one it needs, it sits and waits and retries
every so often - every time it finds a new log, that is passed back to the
PG recovery.  Then you would have to find a method of telling the script
that you wish to bring the database up and it will exit and allow PGSQL to
come up at the current state with the latest data.

This is a script I am looking to develop myself in the coming months, as I
would also like a very similar situation to yourself.

Replication sounds like a better option, and I am also looking to Slony-I
but haven't got it up and running yet.  Running this would keep the backup
system up to date as much as possible, and you could simply tell your
applications to switch over to the standby as the need arises - however, I
believe Slony is only master-to-slave, so any writes on your slave will not
be propagated back to your master.

Regards

Andy


-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Thomas F. O'Connell
Sent: Thursday, 02 March, 2006 10:39 PM
To: PGSQL Admin
Subject: [ADMIN] PITR as Online Backup Solution

I'm administering a database that is not immediately a good candidate
for replication via Slony. As an interim solution, I'd like to give
PITR a shot. The primary goal is to have a failover scenario that
allows for recovery within a window that's much smaller than it would
be if the only option were the output of the previous night's
pg_dump, and the need has arisen jointly with the fact that pg_dump
has become an unwieldy process on this database.

Here's the main question: Is it possible to replay logs continuously
against a database that has already been recovered? The documentation
only covers the scenario of full recovery from scratch, and I'm
wondering if this is because this is the only scenario possible with
the current level of PITR functionality.

Ideally, I'd be able to take a base backup of a production system,
copy it to a remote system, which is also the repository for segment
files generated by archive_command, and complete the recovery process
outlined in the docs. From that point, it would make sense to me that
I should be able to continuously replay WAL files against the new
database (possibly as soon as archive_command generates a new one)
without having to purge my data directory. Is that a reasonable
assumption?

If so, then I need to know more about which steps to use from the
recovery scenario presented in the docs and how to identify the state
in which a given segment file exists. For instance, it's not
immediately clear from the docs what happens to the segment files
after recovery_command runs during the recovery scenario. It says
those segments are copied from the archive directory, but then what?
Are they recycled as in a basic postgres installation?

Also, if this system eventually works as I expect it might be able
to, would there ever be a reason to redo it from scratch (i.e., is
one base backup sufficient ad infinitum)?

--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

!DSPAM:14,4407742449411499595736!




Re: PITR as Online Backup Solution

From
Simon Riggs
Date:
On Thu, 2006-03-02 at 16:38 -0600, Thomas F. O'Connell wrote:
> I'm administering a database that is not immediately a good candidate
> for replication via Slony. As an interim solution, I'd like to give
> PITR a shot. The primary goal is to have a failover scenario that
> allows for recovery within a window that's much smaller than it would
> be if the only option were the output of the previous night's
> pg_dump, and the need has arisen jointly with the fact that pg_dump
> has become an unwieldy process on this database.
>
> Here's the main question: Is it possible to replay logs continuously
> against a database that has already been recovered? The documentation
> only covers the scenario of full recovery from scratch, and I'm
> wondering if this is because this is the only scenario possible with
> the current level of PITR functionality.

> Ideally, I'd be able to take a base backup of a production system,
> copy it to a remote system, which is also the repository for segment
> files generated by archive_command, and complete the recovery process
> outlined in the docs. From that point, it would make sense to me that
> I should be able to continuously replay WAL files against the new
> database (possibly as soon as archive_command generates a new one)
> without having to purge my data directory. Is that a reasonable
> assumption?

Yes, it was designed to be able to do this.

> If so, then I need to know more about which steps to use from the
> recovery scenario presented in the docs and how to identify the state
> in which a given segment file exists.

> For instance, it's not
> immediately clear from the docs what happens to the segment files
> after recovery_command runs during the recovery scenario. It says
> those segments are copied from the archive directory, but then what?
> Are they recycled as in a basic postgres installation?

They overwrite each other, thus avoiding a build up of logs. It is
designed to support "infinite" recovery.

> Also, if this system eventually works as I expect it might be able
> to, would there ever be a reason to redo it from scratch (i.e., is
> one base backup sufficient ad infinitum)?

As long as you've read the documented caveats there are no design
limitations...but I would restart from a base backup weekly, to be sure.
It's all about certainty after all.

Best Regards, Simon Riggs



Re: PITR as Online Backup Solution

From
Simon Riggs
Date:
On Fri, 2006-03-03 at 10:23 +0000, Andy Shellam wrote:

> Then you would have to find a method of telling the script
> that you wish to bring the database up and it will exit and allow PGSQL to
> come up at the current state with the latest data.

> Replication sounds like a better option

If you want auto failover, then you will need a package that detects
that accurately and can initiate any required actions, whatever method
you choose of keeping multiple systems in sync.

IMHO auto failover needs *extensive* testing and failure analysis to
make it more robust and practically useful than manual failover, so is
not quite the magic bullet it may appear.

Best Regards, Simon Riggs


Re: PITR as Online Backup Solution

From
"Thomas F. O'Connell"
Date:
On Mar 3, 2006, at 11:54 AM, Simon Riggs wrote:

> On Thu, 2006-03-02 at 16:38 -0600, Thomas F. O'Connell wrote:
>> Ideally, I'd be able to take a base backup of a production system,
>> copy it to a remote system, which is also the repository for segment
>> files generated by archive_command, and complete the recovery process
>> outlined in the docs. From that point, it would make sense to me that
>> I should be able to continuously replay WAL files against the new
>> database (possibly as soon as archive_command generates a new one)
>> without having to purge my data directory. Is that a reasonable
>> assumption?
>
> Yes, it was designed to be able to do this.

 From the docs, I'm having a hard time determining which steps to
edit or omit in order to execute this scenario. Is it possible for
you (or anyone else on the list) to present an extension of section
23.3.3 <http://www.postgresql.org/docs/8.1/static/backup-
online.html#BACKUP-PITR-RECOVERY> that covered the continuous replay
scenario? I'd be happy to help contribute a patch to the docs once I
understand the procedure a bit better.

>> For instance, it's not
>> immediately clear from the docs what happens to the segment files
>> after recovery_command runs during the recovery scenario. It says
>> those segments are copied from the archive directory, but then what?
>> Are they recycled as in a basic postgres installation?
>
> They overwrite each other, thus avoiding a build up of logs. It is
> designed to support "infinite" recovery.

Is there any management of this process that I'd need to account for
in related scripts, recover_command or otherwise?

> As long as you've read the documented caveats there are no design
> limitations...but I would restart from a base backup weekly, to be
> sure.
> It's all about certainty after all.

Sounds good. Thanks for the tips!

--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)

Re: PITR as Online Backup Solution

From
"Thomas F. O'Connell"
Date:
On Mar 3, 2006, at 12:00 PM, Simon Riggs wrote:

> On Fri, 2006-03-03 at 10:23 +0000, Andy Shellam wrote:
>
>> Then you would have to find a method of telling the script
>> that you wish to bring the database up and it will exit and allow
>> PGSQL to
>> come up at the current state with the latest data.
>
>> Replication sounds like a better option
>
> If you want auto failover, then you will need a package that detects
> that accurately and can initiate any required actions, whatever method
> you choose of keeping multiple systems in sync.
>
> IMHO auto failover needs *extensive* testing and failure analysis to
> make it more robust and practically useful than manual failover, so is
> not quite the magic bullet it may appear.
>
> Best Regards, Simon Riggs

Auto failover is not a requirement for me. My primary goals are to
reduce the need for pg_dump to occur nightly against a large
production database and to reduce the window of potential data loss
in a failover scenario, where the assumption is that the failover
would be managed manually.

There's too much application-inspired DDL and too many tables (as a
result of an inheritance-heavy) data model for Slony to be a viable
solution in the short term.

--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)

Re: PITR as Online Backup Solution

From
"Jim C. Nasby"
Date:
On Fri, Mar 03, 2006 at 10:23:57AM -0000, Andy Shellam wrote:
> Hi Thomas,
>
> I've been trying to get a similar system up and running, and I have to say
> one point in the documentation isn't particularly clear.
>
> It says that you can string together an almost endless supply of WAL logs to
> replay against a database, however you must always start from a base backup
> - ie. You cannot do a recovery on your standby, add a log from your live,
> then re-do the recovery again.  You would have to restore your base backup,
> then replay all the logs to include the new one.
>
> A guy called Simon Riggs came up with an idea of having a base backup, then
> creating a script to be called in your recovery_command that reads the WAL
> logs in, but if it cannot find one it needs, it sits and waits and retries
> every so often - every time it finds a new log, that is passed back to the
> PG recovery.  Then you would have to find a method of telling the script
> that you wish to bring the database up and it will exit and allow PGSQL to
> come up at the current state with the latest data.
>
> This is a script I am looking to develop myself in the coming months, as I
> would also like a very similar situation to yourself.

Take a look at http://pgfoundry.org/projects/pgpitrha/. It's empty right
now because it was just approved, but Jeff Frost should be adding some
code he currently has sometime this weekend. I'll also be working on
this, hopefully improving on some stuff next week.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: PITR as Online Backup Solution

From
Simon Riggs
Date:
On Fri, 2006-03-03 at 12:03 -0600, Thomas F. O'Connell wrote:
> On Mar 3, 2006, at 11:54 AM, Simon Riggs wrote:
>
> > On Thu, 2006-03-02 at 16:38 -0600, Thomas F. O'Connell wrote:
> >> Ideally, I'd be able to take a base backup of a production system,
> >> copy it to a remote system, which is also the repository for segment
> >> files generated by archive_command, and complete the recovery process
> >> outlined in the docs. From that point, it would make sense to me that
> >> I should be able to continuously replay WAL files against the new
> >> database (possibly as soon as archive_command generates a new one)
> >> without having to purge my data directory. Is that a reasonable
> >> assumption?
> >
> > Yes, it was designed to be able to do this.
>
>  From the docs, I'm having a hard time determining which steps to
> edit or omit in order to execute this scenario. Is it possible for
> you (or anyone else on the list) to present an extension of section
> 23.3.3 <http://www.postgresql.org/docs/8.1/static/backup-
> online.html#BACKUP-PITR-RECOVERY> that covered the continuous replay
> scenario? I'd be happy to help contribute a patch to the docs once I
> understand the procedure a bit better.

The place the primary backs up to is the place where the secondary
restores from, otherwise all procedures are as documented. Neither
system knows about the other, so its simpler than maybe you think.

The only thing you need is the wait-for-next-file script.

Best Regards, Simon Riggs


Re: PITR as Online Backup Solution

From
"Thomas F. O'Connell"
Date:
On Mar 4, 2006, at 3:56 AM, Simon Riggs wrote:

> On Fri, 2006-03-03 at 12:03 -0600, Thomas F. O'Connell wrote:
>> On Mar 3, 2006, at 11:54 AM, Simon Riggs wrote:
>>
>>> On Thu, 2006-03-02 at 16:38 -0600, Thomas F. O'Connell wrote:
>>>> Ideally, I'd be able to take a base backup of a production system,
>>>> copy it to a remote system, which is also the repository for
>>>> segment
>>>> files generated by archive_command, and complete the recovery
>>>> process
>>>> outlined in the docs. From that point, it would make sense to me
>>>> that
>>>> I should be able to continuously replay WAL files against the new
>>>> database (possibly as soon as archive_command generates a new one)
>>>> without having to purge my data directory. Is that a reasonable
>>>> assumption?
>>>
>>> Yes, it was designed to be able to do this.
>>
>> From the docs, I'm having a hard time determining which steps to
>> edit or omit in order to execute this scenario. Is it possible for
>> you (or anyone else on the list) to present an extension of section
>> 23.3.3 <http://www.postgresql.org/docs/8.1/static/backup-
>> online.html#BACKUP-PITR-RECOVERY> that covered the continuous replay
>> scenario? I'd be happy to help contribute a patch to the docs once I
>> understand the procedure a bit better.
>
> The place the primary backs up to is the place where the secondary
> restores from, otherwise all procedures are as documented. Neither
> system knows about the other, so its simpler than maybe you think.
>
> The only thing you need is the wait-for-next-file script.

A few more questions in this thread:

1. With the wait-for-next-file script scenario, what happens with
recovery.conf? Does it ever become recovery.done? What happens in the
event of needing to recover? Does the script need a trigger to say,
"Okay, stop waiting. We need to use what we have right now."?

2. Are there any salient details when performing an online backup
from one machine to another? I'm assuming filesystem is an important
consideration since the recovery process is not like the pg_dump
recovery process. Any other gotchas?

--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)

Re: PITR as Online Backup Solution

From
"Andy Shellam"
Date:
Regarding the following questions:

> A few more questions in this thread:

> 1. With the wait-for-next-file script scenario, what happens with
> recovery.conf? Does it ever become recovery.done? What happens in the
> event of needing to recover? Does the script need a trigger to say,
> "Okay, stop waiting. We need to use what we have right now."?

Recovery.conf is still handled by postgresql in the normal way.  What you
would have to build in, is something similar to a "flag" file - ie. Tell the
script that if it can't find a file, check for the existence of a flag file,
if the flag file is not there, then continue waiting (and repeatedly check
for the flag at regular intervals.)  When it finds the flag file, we need to
"use what we have right now" and return the exit status to postgresql, which
will then rename the recovery.conf to recovery.done and startup (providing
there have been no errors in the logs).

Then when the script is running and waiting, to bring up the recovery box,
touch the location of the flag file the script is waiting to see, and at the
next check interval, your script should see the file, and return the exit
status to Postgres which will bring it up.

> 2. Are there any salient details when performing an online backup
> from one machine to another? I'm assuming filesystem is an important
> consideration since the recovery process is not like the pg_dump
> recovery process. Any other gotchas?

Mainly just watch out for symlinks (e.g. if you have tablespaces symlinked
on the live box, check the links are still valid on your recovery box - same
for the configs (postgresql.conf, pg_hba.conf, pg_ident.conf).  If you
haven't symlinked the config files on your live box, your backup dump will
have a copy of your live configuration - this may be what you want, but if
you have things like listen_addresses explicitly set to an IP address, then
it will be replicated to your recovery box and may cause Postgres to fail if
things are different between the two systems.  If you do symlink your
configs (like I do), check the symlinks on your recovery box are still
valid.

Regards

Andy



Re: PITR as Online Backup Solution

From
Simon Riggs
Date:
On Tue, 2006-03-14 at 16:41 -0600, Thomas F. O'Connell wrote:
> On Mar 4, 2006, at 3:56 AM, Simon Riggs wrote:
>
> > On Fri, 2006-03-03 at 12:03 -0600, Thomas F. O'Connell wrote:
> >> On Mar 3, 2006, at 11:54 AM, Simon Riggs wrote:
> >>
> >>> On Thu, 2006-03-02 at 16:38 -0600, Thomas F. O'Connell wrote:
> >>>> Ideally, I'd be able to take a base backup of a production system,
> >>>> copy it to a remote system, which is also the repository for
> >>>> segment
> >>>> files generated by archive_command, and complete the recovery
> >>>> process
> >>>> outlined in the docs. From that point, it would make sense to me
> >>>> that
> >>>> I should be able to continuously replay WAL files against the new
> >>>> database (possibly as soon as archive_command generates a new one)
> >>>> without having to purge my data directory. Is that a reasonable
> >>>> assumption?
> >>>
> >>> Yes, it was designed to be able to do this.
> >>
> >> From the docs, I'm having a hard time determining which steps to
> >> edit or omit in order to execute this scenario. Is it possible for
> >> you (or anyone else on the list) to present an extension of section
> >> 23.3.3 <http://www.postgresql.org/docs/8.1/static/backup-
> >> online.html#BACKUP-PITR-RECOVERY> that covered the continuous replay
> >> scenario? I'd be happy to help contribute a patch to the docs once I
> >> understand the procedure a bit better.
> >
> > The place the primary backs up to is the place where the secondary
> > restores from, otherwise all procedures are as documented. Neither
> > system knows about the other, so its simpler than maybe you think.
> >
> > The only thing you need is the wait-for-next-file script.
>
> A few more questions in this thread:
>
> 1. With the wait-for-next-file script scenario, what happens with
> recovery.conf? Does it ever become recovery.done? What happens in the
> event of needing to recover? Does the script need a trigger to say,
> "Okay, stop waiting. We need to use what we have right now."?

Yes.

Everything you need is discussed in the admin archives.

> 2. Are there any salient details when performing an online backup
> from one machine to another? I'm assuming filesystem is an important
> consideration since the recovery process is not like the pg_dump
> recovery process. Any other gotchas?

No

Best Regards, Simon Riggs