Thread: PITR Archive Recovery

PITR Archive Recovery

From
Simon Riggs
Date:
PITR Archive Recovery, 28 June 2004

What's in this patch?

- All of what was in previous PITR Archival patch, including reworking
of all the archiver startup/shutdown code to match that of pgstat
- New code to perform Archive Recovery mode, which streams xlogs
straight from archive to allow "infinite" recovery

[This is a full, working patch for discussion and testing, with a few
days left before 7.5dev freeze for changes and corrections]

Archive Recovery Overview

The rule is: when you startup with archive_mode on, take a backup.

This patch provides an additional mode for transactional log recovery.
The current mode provides Crash Recovery, allowing the database to
restart and to read the last few online transaction logs to recover. The
additional mode is known as Archive Recovery, enabled in postgresql.conf
using archive_mode=true.

The administrator also specifies and archive location and a program to
perform the copying (discussed later). As database writes occur and
transaction logs fill they are copied to the archive location by a new
sub-process of the postmaster, called the archiver process.

If the database crashes when in archive_mode, it will recover just as
before. If a major system crash, or other reason to restore occurs, then
the new Archive Recovery mode can be invoked by:
- restoring the full physical backup
- creating DataDir/recovery.conf
- restart PostgreSQL

PostgreSQL will recover using almost the same code path as before,
except: all transaction logs will be restored for use directly from
archive in a stream that prevents disk space from overflowing. Extended
recovery is possible, over many transaction logs - far more than was
previously possible.

The essential aspect to recovery is maintaining a solid backup/log
chain. This must consist of:
i) a full backup of every file in PostgreSQL - miss none!
ii) archived transaction logs from just before backup taken to whenever
you want to recover to...

If you "break the chain" i.e. the two sets of files don't match, or
you've deleted a transaction log or lost a tape - then you will only be
able to recover up to the point the chain broke. The longer the chain,
the easier it will be to break, so backup regularly and work out a
robust archiving policy.

Extensibility
Archive Recovery allows the administrator to specify a program or script
to use when both archiving and restoring transaction log files. As a
result, the administrator can choose to store transaction log files on
any drive, any other system or integrating with any Backup Archive
Recovery (BAR) software.

The Archive Program is provided with 3 parameters, which can be placed
anywhere within the supplied command string. Each parameter is
represented by a "%s" character string.
- The first parameter is replaced with the full path of the transaction
log to be archived.
- The second parameter is replaced with the value of the archive_dest
parameter
e.g. with these settings
archive_mode = true
archive_dest = '/mount/disk2/pgarchive'
archive_program = 'cp %s %s'
with transaction log 00000000000003A4 to archive
with DataDir of web123

would execute the following command
'cp /usr/local/pgsql/web123/00000000000003A4 /disk2/pgarchive'

To enter Archive Recovery the administrator must create a recovery.conf
file. [Currently an empty file will do, though intended as the one place
where all recovery options and parameters would be set]

The Restore Program is provided with 2 parameters
- The first parameter is replaced with the value of the archive_dest
parameter
- The second parameter is replaced with the value of the transaction log
which recovery is requesting should now be restored
- The third parameter is replaced with the full path of the filename
which should be the target of the single file recovery operation.
[In this patch, XlogArchRestoreProgram is hardcoded to "cp %s/%s %s",
pending short discussion on how to specify this...]

Possibilities
Using the current features, it is possible to implement an Automated
Standby Database. This is an active-passive High Availability option. In
this mode, the main server sends archived log files to a second, standby
server. The standby server is set up to be in "permanent recovery", by
using a RestoreProgram that waits for each file to be shipped to it. The
standby system receives each file, then recovers up to that point - so
the standby system is always a few seconds from completing its startup
should it be required.

Patch Status & Current Caveats
- A number of recovery scenarios have been tested and the patch is
believed to be stable and ready for others to begin commentary and
testing...please understand that there are many scenarios that work and
many that do not...these last are not bugs

- Recovery to a specific point in time is not yet implemented. The
administrator has the following recovery options:
--can recover the system fully, all the way to the end of logs
--stop recovery by withholding log files at an appropriate place,
thereby forcing the termination of recovery

How it Works

archive_debug is a hidden postgresql.conf setting which can be used to
show more debug output for the archive recovery facilities.
...archive_debug = true

[This currently generates an additional log file called recovery.log,
showing the record headers and types of all restored xlog records.]

Each PostgreSQL user has a corresponding backend process performing work
for them. These backends write any transaction log data to the log as
each operation occurs, then marks transactions as either committed or
aborted. The log is split into log segment files. When each log file
fills a notification file is created in archive_status, and at the same
time a signal is sent to the archiver (via the postmaster) to begin
archiving.

The archiver executes the administrator's program or script using a
system(3) call, archiving each file one at a time.

The notification file provides a number of features. First, its
simplicity ensures that we don't need a shared memory link between each
backend and the archiver, which would then be prone to failure - which
is exactly not what we want. Second, the archive_status file is written
whether or not the postmaster is still up, so will still function
correctly even when the worst has happened. Third, as a persistent
record, it will allow archiving to restart at the same point later on
(even after a backup/restore), allowing it to recover gracefully from
administrator's immediate shutdown requests as well as postmaster or
archive system failures.

When recovery completes, archive_status files are written to ensure
cleanup of all transaction logs leaves the database in a fit state for
production. The recovery.conf file is removed to ensure that a
subsequent crash doesn't accidentally begin recovery from archives...

The archiver process starts only when recovery has fully completed.

How to Fail

All of the following ways to fail have already occurred in testing...

1. if you do all 3 of these, you will not have a backup/log chain:
a) use the 'cp' command, or any other command that can write to either a
file or a directory
b) use archive_dest to specify a directory
c) forget to create the directory, which the command in a) then
interprets as a file and then all archived logs overwrite each other and
break the chain. You will be unable to rollforward AT ALL.

2. if you do all 5 of these, you will not rollforward very far, if any.
a) restore the database backup
b) forget to specify a recovery.conf file
c) startup and let database recover using only the xlogs that were
present when the backup was taken
d) then immediately after recovery run a job which causes xlog to be
written, and then a file to fill
e) use an archive command that allows overwriting of previously archived
files
This sequence causes the database to recover successfully, but not using
your full archive chain. The current redo pointer points half way
through your backup/log chain, so when the file is archived, it
overwrites a file in the middle of the chain, thus breaking the chain.
There isn't an option on cp that will prevent this...

3. turn archive_mode on and off...
Turning archive_mode off means that when an xlog fills, it won't EVER
get archived. This will break the backup/log chain. If you turn
archive_mode back on, and an xlog filled, then it will not be picked up
as having filled and you will be missing a link in the chain. Sometimes
you'll get away with it and then you'll think it will work all the time.
The rule is: when you startup with archive_mode on, take a backup.

4. Forget that xlogs are the same size, whether they are full or
not...you can't tell (yet) by looking at one whether it has a record
within it that will break the chain

How to succeed
- Test your recovery procedures before you need them
- When you're in a recovery situation backup everything you can lay your
hands on, to make sure you have a known position to return to while you
ATTEMPT recovery - you may make a mistake and need to retry
- be cool - we all make mistakes, just don't let errors multiply

- disable listen_addresses in postgresql.conf when you recover, to give
yourself some breathing space to check things, when you finally do get
the database ready
- recover with archive_debug = true to give me/others a chance to debug
any problems or answer any questions you may have

The rule is: when you startup with archive_mode on, take a backup.

When enabling archive_mode for the first time, if you have run
previously without enabling archive_mode, then when you will inevitably
get messages saying "cannot find archive_status file". After you have
taken a backup, you can manually create files in the archive_status
directory of <xlog>.done, which will then allow the bgwriter to clean
them up when it next takes a checkpoint.

Additional Work

- recovery.conf needs some minor work to specify recover options. It is
envisaged that this would be a short bison grammar, very very similar to
postgresql.conf

- It seems possible to easily work around the requirement to take the
backup while the database is open. It would be a good option to have
both hot and cold backup options...

- full documentation will also follow shortly

Credits

This work is the final push in a long series of patches and discussions
about how to achieve archive recovery with PostgreSQL. The work of
J.R.Nield and Patrick MacDonald has provided the detailed underpinnings
for this recent work, which in turn rests upon Vadim Mikheev's original
work on WAL and MVCC. Tom Lane has provided considerable technical
assistance and quality review, whilst Bruce Momjian has provided many of
the ideas and smoothed the way for much of the work. Thanks, all.

Simon Riggs, simon@2ndquadrant.com


Attachment

Re: PITR Archive Recovery

From
Simon Riggs
Date:
On Mon, 2004-06-28 at 21:58, Simon Riggs wrote:
> PITR Archive Recovery, 28 June 2004
>
> What's in this patch?
>
> - All of what was in previous PITR Archival patch, including reworking
> of all the archiver startup/shutdown code to match that of pgstat
> - New code to perform Archive Recovery mode, which streams xlogs
> straight from archive to allow "infinite" recovery
>
> [This is a full, working patch for discussion and testing, with a few
> days left before 7.5dev freeze for changes and corrections]
>

err...and these additional files are REQUIRED also:

src/backend/postmaster/pgarch.c

src/include/pgarch.h

...implemented to match the locations of pgstat code

Best regards, Simon Riggs

Attachment

Re: PITR Archive Recovery

From
Simon Riggs
Date:
On Tue, 2004-06-29 at 20:59, Simon Riggs wrote:
> On Mon, 2004-06-28 at 21:58, Simon Riggs wrote:
> > PITR Archive Recovery, 28 June 2004
> >
> > What's in this patch?

This my LAST, PLANNED patch before Freeze. Any questions?

This is a patch-on-patch, rather than a full patch. To use this, apply
earlier patches for pitr_v4_4*, then apply this. (Full patch available
upon request...just saving the good people of this list some annoyance
time from a 50k download).

This now provides:

- parsing of restore program from recovery.conf
- minor cosmetic changes to some error messages

...there's more to do, but I'm working on the...
if it ain't broke, don't fix it...

Best Regards, Simon Riggs

Attachment

Re: PITR Archive Recovery

From
ohp@pyrenet.fr
Date:
Dear Simon,

I've been following the PITR thread with great interest since a long time.
I apologie to jump it so late, but there's one question I can't find any
answer for.

Given that log files will be archieved, how can we purge them (ie know for
sure we won't need them anymore)

if I do a backup of the DATA dir, then obviously I won't need the logs
that were taken before. I can't just delete them all because maybe a few
will be archived during the backup.

Same thing goes for pg_dump.

Maybe I've missed something.

Regards, and many thanks for that great feature
 On Wed, 30 Jun 2004, Simon Riggs wrote:

> Date: Wed, 30 Jun 2004 02:46:57 +0100
> From: Simon Riggs <simon@2ndquadrant.com>
> To: pgsql-patches@postgresql.org
> Subject: Re: [PATCHES] PITR Archive Recovery
>
> On Tue, 2004-06-29 at 20:59, Simon Riggs wrote:
> > On Mon, 2004-06-28 at 21:58, Simon Riggs wrote:
> > > PITR Archive Recovery, 28 June 2004
> > >
> > > What's in this patch?
>
> This my LAST, PLANNED patch before Freeze. Any questions?
>
> This is a patch-on-patch, rather than a full patch. To use this, apply
> earlier patches for pitr_v4_4*, then apply this. (Full patch available
> upon request...just saving the good people of this list some annoyance
> time from a 50k download).
>
> This now provides:
>
> - parsing of restore program from recovery.conf
> - minor cosmetic changes to some error messages
>
> ...there's more to do, but I'm working on the...
> if it ain't broke, don't fix it...
>
> Best Regards, Simon Riggs
>

--
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: PITR Archive Recovery

From
Simon Riggs
Date:
On Wed, 2004-06-30 at 12:27, ohp@pyrenet.fr wrote:
> Given that log files will be archieved, how can we purge them (ie know for
> sure we won't need them anymore)
>

Good question - you're right I've not mentioned that.

The answer is straightforward. Whenever you do a backup, one of the
transaction logs will be the current one. That means any logs before the
earliest one you can see can now be purged from the archive.

So if you can see: 137,138,139 then that means anything at 136 or before
is able to be discarded.

However, I'd recommend keeping more than just one backup, usually 2 or
3, so the actual purge point is dependant upon your data retention
strategy, possibly linked to tape rotation etc..

> if I do a backup of the DATA dir, then obviously I won't need the logs
> that were taken before. I can't just delete them all because maybe a few
> will be archived during the backup.
>

Taking a full physical backup will normally need to exclude the pg_xlog
directory, or at least the current xlog. Since it is being written to
very regularly it is almost impossible to take a clean copy using
standard utilities - though filesystem level utilities work fine.

Best regards, Simon Riggs


Re: PITR Archive Recovery

From
Simon Riggs
Date:
....another full version of patch, now at v4_5

This now includes
- much clearer error messages
- removal of all use of elog
- recoded some scrappy left over code from v1
- improved error checking in some areas
- no new errcodes created...

otherwise, notes as before.

This patch is in need of a full test plan, since the standard regression
tests don't really apply to the situations catered for here. I'll
assemble a document as reports come in from initial use, then we can
perform a full formal test cycle half way through beta.

Best regards, Simon Riggs

Attachment

Re: PITR Archive Recovery

From
ohp@pyrenet.fr
Date:
Many thanks for your reply Simon
On Wed, 30 Jun 2004, Simon Riggs wrote:

> Date: Wed, 30 Jun 2004 19:29:14 +0100
> From: Simon Riggs <simon@2ndquadrant.com>
> To: ohp@pyrenet.fr
> Cc: pgsql-patches@postgresql.org
> Subject: Re: [PATCHES] PITR Archive Recovery
>
> On Wed, 2004-06-30 at 12:27, ohp@pyrenet.fr wrote:
> > Given that log files will be archieved, how can we purge them (ie know for
> > sure we won't need them anymore)
> >
>
> Good question - you're right I've not mentioned that.
>
> The answer is straightforward. Whenever you do a backup, one of the
> transaction logs will be the current one. That means any logs before the
> earliest one you can see can now be purged from the archive.
>
> So if you can see: 137,138,139 then that means anything at 136 or before
> is able to be discarded.
Ok, that's clear...
BUT not very easy to put in a backup stagtegy...
It may be ok if you user tar or cpio; but surely more complicated if you
use backup software like Netvault or Tapeware
>
> However, I'd recommend keeping more than just one backup, usually 2 or
> 3, so the actual purge point is dependant upon your data retention
> strategy, possibly linked to tape rotation etc..
>
sure
> > if I do a backup of the DATA dir, then obviously I won't need the logs
> > that were taken before. I can't just delete them all because maybe a few
> > will be archived during the backup.
> >
>
I agree
> Taking a full physical backup will normally need to exclude the pg_xlog
> directory, or at least the current xlog. Since it is being written to
> very regularly it is almost impossible to take a clean copy using
> standard utilities - though filesystem level utilities work fine.
>
Would it make sense to have SQL phrases (as I recall from my informix days
10 years ago)
like
START BACKUP LEVEL 0 where cluster would be archieved on whatever you
want, disallowing all writes and
SART BACKUP LEVEL 1 where cluster and logs would be archieved letting
read/write o databases possible...


> Best regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
Best regards
--
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: PITR Archive Recovery

From
Simon Riggs
Date:
On Thu, 2004-07-01 at 16:11, ohp@pyrenet.fr wrote:
> Many thanks for your reply Simon
> On Wed, 30 Jun 2004, Simon Riggs wrote:
>
> > Date: Wed, 30 Jun 2004 19:29:14 +0100
> > From: Simon Riggs <simon@2ndquadrant.com>
> > To: ohp@pyrenet.fr
> > Cc: pgsql-patches@postgresql.org
> > Subject: Re: [PATCHES] PITR Archive Recovery
> >
> > On Wed, 2004-06-30 at 12:27, ohp@pyrenet.fr wrote:
> > > Given that log files will be archieved, how can we purge them (ie know for
> > > sure we won't need them anymore)
> > >
> >
> > Good question - you're right I've not mentioned that.
> >
> > The answer is straightforward. Whenever you do a backup, one of the
> > transaction logs will be the current one. That means any logs before the
> > earliest one you can see can now be purged from the archive.
> >
> > So if you can see: 137,138,139 then that means anything at 136 or before
> > is able to be discarded.
> Ok, that's clear...
> BUT not very easy to put in a backup stagtegy...
> It may be ok if you user tar or cpio; but surely more complicated if you
> use backup software like Netvault or Tapeware

Of course, I CAN help with that, but you're right, it isn't in any
manual.

> >
> > However, I'd recommend keeping more than just one backup, usually 2 or
> > 3, so the actual purge point is dependant upon your data retention
> > strategy, possibly linked to tape rotation etc..
> >
> sure
> > > if I do a backup of the DATA dir, then obviously I won't need the logs
> > > that were taken before. I can't just delete them all because maybe a few
> > > will be archived during the backup.
> > >
> >
> I agree
> > Taking a full physical backup will normally need to exclude the pg_xlog
> > directory, or at least the current xlog. Since it is being written to
> > very regularly it is almost impossible to take a clean copy using
> > standard utilities - though filesystem level utilities work fine.
> >
> Would it make sense to have SQL phrases (as I recall from my informix days
> 10 years ago)
> like
> START BACKUP LEVEL 0 where cluster would be archieved on whatever you
> want, disallowing all writes and
> SART BACKUP LEVEL 1 where cluster and logs would be archieved letting
> read/write o databases possible...
>

These are possible in a variety of ways at operating system or device
level, so no these haven't been implemented (yet?) for PostgreSQL.

Best regards, Simon Riggs