Thread: PITR and WAL archiving

PITR and WAL archiving

From

"Adam Groves"

Date:

15 March 2007, 14:15:12

I have been ramming my head against the wall for the past two
evenings, trying to get PITR working and it's gotten to the point
where I need to ask someone for help.

I'm basically trying to follow the instructions in the manual in
chapter 23.3.

Here's how I'm starting out:
===================

Contents of 'data':

PG_VERSION
base
global
pg_clog
pg_hba.conf
pg_ident.conf
pg_multixact
pg_subtrans
pg_tblspc
pg_twophase
pg_xlog
postgresql.conf
postmaster.opts


data/pg_xlog/archive_status is empty.

I have the following line in postgresql.conf:

archive_command = 'cp -i %p /mnt/server/archive/wal/%f < /dev/null'

/mnt/server/archive/wal is empty.

Here's my procedure:
===============

1. pg_ctl start -l logfile
2. SELECT pg_start_backup('label');
3. tar -zcf /mnt/server/archive/file/backup.tar.gz . (the contents of
'data')
4. SELECT pg_stop_backup();
5. pg_ctl stop
6. I now delete the contents of 'data' (making a copy before hand)
7. In 'data': tar xzvf /mnt/server/archive/file/backup.tar.gz
8. echo \"restore_command = 'cp /mnt/server/archive/wal/%f %p'\" >
recovery.conf
9. pg_ctl start -l logfile

And here are the logs:
================

LOG:  database system was shut down at 2007-03-13 23:07:33 CET
LOG:  checkpoint record is at 0/1E000020
LOG:  redo record is at 0/1E000020; undo record is at 0/0; shutdown
TRUE
LOG:  next transaction ID: 0/161203; next OID: 17156
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
LOG:  received smart shutdown request
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2007-03-13 23:09:21 CET
LOG:  checkpoint record is at 0/1E000068
LOG:  redo record is at 0/1E000068; undo record is at 0/0; shutdown
TRUE
LOG:  next transaction ID: 0/161203; next OID: 17156
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
LOG:  database system was interrupted at 2007-03-13 23:11:21 CET
LOG:  starting archive recovery
LOG:  restore_command = "cp /mnt/server/archive/wal/%f %p"
cp: /mnt/server/archive/wal/00000001.history: No such file or
directory
LOG:  restored log file "00000001000000000000001E.000000B0.backup"
from archive
LOG:  restored log file "00000001000000000000001E" from archive
LOG:  checkpoint record is at 0/1E0000B0
LOG:  redo record is at 0/1E0000B0; undo record is at 0/0; shutdown
FALSE
LOG:  next transaction ID: 0/161205; next OID: 17156
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  automatic recovery in progress
LOG:  redo starts at 0/1E0000F8
cp: /mnt/server/archive/wal/00000001000000000000001F: No such file or
directory
LOG:  could not open file "pg_xlog/00000001000000000000001F" (log file
0, segment 31): No such file or directory
LOG:  redo done at 0/1E0000F8
LOG:  restored log file "00000001000000000000001E" from archive
LOG:  archive recovery complete
LOG:  database system is ready
LOG:  received smart shutdown request
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2007-03-13 23:17:32 CET
LOG:  checkpoint record is at 0/1F000068
LOG:  redo record is at 0/1F000068; undo record is at 0/0; shutdown
TRUE
LOG:  next transaction ID: 0/161207; next OID: 17156
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
overwrite /mnt/server/archive/wal/00000001000000000000001E? (y/n [n])
not overwritten
LOG:  archive command "cp -i pg_xlog/00000001000000000000001E /mnt/
server/archive/wal/00000001000000000000001E < /dev/null" failed:
return code 256

Where oh where might I be going wrong? I'd be grateful for any
suggestions.

Regards

Adam

Re: PITR and WAL archiving

From

"Florian G. Pflug"

Date:

15 March 2007, 19:52:14

Adam Groves wrote:
> I have been ramming my head against the wall for the past two
> evenings, trying to get PITR working and it's gotten to the point
> where I need to ask someone for help.
>
> I'm basically trying to follow the instructions in the manual in
> chapter 23.3.
Which postgres version are you using?

> 1. pg_ctl start -l logfile
> 2. SELECT pg_start_backup('label');
> 3. tar -zcf /mnt/server/archive/file/backup.tar.gz . (the contents of
> 'data')
> 4. SELECT pg_stop_backup();
> 5. pg_ctl stop
> 6. I now delete the contents of 'data' (making a copy before hand)
> 7. In 'data': tar xzvf /mnt/server/archive/file/backup.tar.gz
> 8. echo \"restore_command = 'cp /mnt/server/archive/wal/%f %p'\" >
> recovery.conf
> 9. pg_ctl start -l logfile
Are you doing this all in one row, without any other queries
(especially updates) hitting the database?

The reason that I ask is that postgres 8.1 will *only* archive a
wal segment (a wal file) if it's completly filled. Since every
WAL segment in 8.1 is *exactly* 16MB large, archive_command will
only get called after 16MB worth of wal traffic has happened.

In 8.1, pg_stop_backup() won't trigger archive_command either -
it can't, because postgres 8.1 can't close a wal segment early -
it *has* to write 16MB first. Therefore, the part of the wal
that is bring your filesystem backup into a consistent state
lives in the currently open wal after pg_stop_backup is called.

Now, you remove your datadir, and restore your filesystem backup.
Postgres realizes that it needs the wal segement that was current
when pg_start_backup() was called, and requests this from
restore_command. But since this segement was the current segment
up to the point where you removed your datadir, it got never archived :-(.

Postgres 8.2 improves the situation. It can now switch wal segments
early, and will do so if you call pg_stop_backup().

greetings, Florian Pflug