Thread: Hot Backup using WAL files

Hot Backup using WAL files

From

Damian Lubosch

Date:

04 April 2007, 11:15:32

Hello!

I already searched in the archives for similar problems but didn't find any.
I want to have a hot backup server of my master server's database.

I am using NetBSD 3.0.1 with Postgresql 8.2.3.

It works so far, that I receive the WAL Files on the slave every few
minutes from the master. I have a small Script that rebuilds the slave
server with master's data and feeds it with the WAL files that appeared
since then:

(I am doing a binary backup of the master with a perl script from
http://pgfoundry.org/frs/?group_id=1000253 )

Script on the slave:
# Source www.postgresqlforums.com/wiki/Backup_&_Recovery
# Adjusted by Damian dl@xiqit.de

PGDIR=/var/databases/pgsql
/etc/rc.d/pgsql stop
cp -Rp $PGDIR/data/pg_xlog /tmp
rm -fR $PGDIR/data
umask 077
su pgsql -c "mkdir -p $PGDIR/data/pg_xlog/archive_status"
# get newest binary backup
FILE=`ls -t $PGDIR/bak/pg* | head -1`
cd $PGDIR/data
gzip -dc $FILE | tar xpf -
cp -Rp /tmp/pg_xlog/* $PGDIR/data/pg_xlog
su pgsql -c "cp $PGDIR/recovery.done $PGDIR/data/recovery.conf"
su pgsql -c "cp $PGDIR/postgresql.conf $PGDIR/data/postgresql.conf"
/etc/rc.d/pgsql start
rm -fR /tmp/pg_xlog



My problem now is, how to continually feed the slave server with new
received WAL files?
The slave just restores up to the state when I ran my script.

I also tried to stop the server rename recovery.done to recovery.conf
and start the server again (without deleting anything). It looks in the
received WAL files' directory and complains about some errors, but
starts after that. I am worried about the errors.

From the logfile:
LOG:  received fast shutdown request
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2007-04-04 15:10:11 CEST
LOG:  starting archive recovery
LOG:  restore_command = "cp ../bak/%f %p"
cp: ../bak/00000001.history: No such file or directory
FATAL:  the database system is starting up
LOG:  restored log file "00000001000000010000004F" from archive
LOG:  invalid xl_info in primary checkpoint record
LOG:  using previous checkpoint record at 1/4F000020
LOG:  redo record is at 1/4F000020; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 0/25133; next OID: 57344
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  automatic recovery in progress
LOG:  redo starts at 1/4F000068
FATAL:  the database system is starting up
LOG:  restored log file "000000010000000100000050" from archive
cp: ../bak/000000010000000100000051: No such file or directory
LOG:  unexpected pageaddr 0/32000000 in log file 1, segment 81, offset 0
LOG:  redo done at 1/50000068
LOG:  restored log file "000000010000000100000050" from archive
FATAL:  the database system is starting up
LOG:  archive recovery complete
LOG:  database system is ready

Is this the way it works?


Bye, Damian

Re: Hot Backup using WAL files

From

Tom Lane

Date:

04 April 2007, 12:08:09

Damian Lubosch <dl@xiqit.de> writes:
> My problem now is, how to continually feed the slave server with new
> received WAL files?

You need to make the restore_command script wait until the next WAL file
is available, instead of failing immediately.

This is actually not that simple to get right.  You might want to try
pg_standby which will ship with 8.3:
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/pg_standby/

            regards, tom lane

Re: Hot Backup using WAL files

From

"Filippi Ghislain"

Date:

04 April 2007, 12:26:28

Hi,

Found on the dev mailing list (hidden well)...

On the master postgresql.conf:
archive_command = 'rsync -arv "%p"
slave_hostname:/.../wal_archive/"%f";'
(and maybe '&cp "%p" /.../wal_archive/"%f";')
archive_timeout= 900

So, you need to perform a ssh-keygen on both sides.

Now, the slave recovery.conf:
restore_command = '/.../RECOVER/restore.sh /.../wal_archive/%f "%p"'

and the restore.sh script:
************
#!/bin/bash

RESTORE_FROM=$1
RESTORE_TO=$2
DELAY=100000
TRIGGERED=0
TRIGGER_FILE="{$POSTGRES_PATH}/RECOVER /trigger"

copyfunc() {
if [ "$TRIGGERED" -eq "0" ]; then
        cp -v -i $RESTORE_FROM $RESTORE_TO
fi
}

k=`expr $1 : '.*\(history\)'`

if [ "$k" == "history" ]; then
copyfunc;
exit $?;
fi

while [ ! -f "$RESTORE_FROM" -a "$TRIGGERED" -eq "0" ]; do
                usleep $DELAY;
                if [ -e $TRIGGER_FILE ]; then
                        TRIGGERED=1;
                fi
done

copyfunc;
***********


finally, you just need to launch /etc/rc.d/pgsql start (nohup)

The logfile:
2007-03-26 10:19:02 CEST    LOG:  database system was interrupted at
200703-25 20:05:01 CEST
2007-03-26 10:19:02 CEST    LOG:  starting archive recovery
2007-03-26 10:19:02 CEST    LOG:  restore_command = "/.../restore.sh
/.../wal_archive/%f "%p""
cp: cannot stat `/.../wal_archive/00000001.history': No such file or
directory
`/.../wal_archive/0000000100000023000000AD.00000068.backup' ->
`pg_xlog/RECOVERYHISTORY'
2007-03-26 10:19:02 CEST    LOG:  restored log file
"0000000100000023000000AD.00000068.backup" from archive
`/..../wal_archive/0000000100000023000000AD' -> `pg_xlog/RECOVERYXLOG'
2007-03-26 10:19:02 CEST    LOG:  restored log file
"0000000100000023000000AD" from archive
2007-03-26 10:19:02 CEST    LOG:  checkpoint record is at 23/AD000068
2007-03-26 10:19:02 CEST    LOG:  redo record is at 23/AD000068; undo
record is at 0/0; shutdown FALSE
2007-03-26 10:19:02 CEST    LOG:  next transaction ID: 0/1068654; next
OID: 85648
2007-03-26 10:19:02 CEST    LOG:  next MultiXactId: 31; next
MultiXactOffset: 62
2007-03-26 10:19:02 CEST    LOG:  automatic recovery in progress
2007-03-26 10:19:02 CEST    LOG:  redo starts at 23/AD0000B0
`/.../wal_archive/0000000100000023000000AE' -> `pg_xlog/RECOVERYXLOG'
2007-03-26 10:19:02 CEST    LOG:  restored log file
"0000000100000023000000AE" from archive
`/.../wal_archive/0000000100000023000000AF' -> `pg_xlog/RECOVERYXLOG'
2007-03-26 10:19:02 CEST    LOG:  restored log file
"0000000100000023000000AF" from archive

today...
2007-04-04 15:50:38 CEST    LOG:  restored log file
"000000010000002700000071" from archive
`/.../wal_archive/000000010000002700000072' -> `pg_xlog/RECOVERYXLOG'
2007-04-04 16:05:38 CEST    LOG:  restored log file
"000000010000002700000072" from archive
`/.../wal_archive/000000010000002700000073' -> `pg_xlog/RECOVERYXLOG'
2007-04-04 16:20:38 CEST    LOG:  restored log file
"000000010000002700000073" from archive
`/.../wal_archive/000000010000002700000074' -> `pg_xlog/RECOVERYXLOG'
2007-04-04 16:35:38 CEST    LOG:  restored log file
"000000010000002700000074" from archive

And if you try to connect with psql:
FATAL:  the database system is starting up

Regards,
Ghislain

Re: Hot Backup using WAL files

From

"Simon Riggs"

Date:

07 April 2007, 19:37:37

On Wed, 2007-04-04 at 11:07 -0400, Tom Lane wrote:
> Damian Lubosch <dl@xiqit.de> writes:
> > My problem now is, how to continually feed the slave server with new
> > received WAL files?
>
> You need to make the restore_command script wait until the next WAL file
> is available, instead of failing immediately.
>
> This is actually not that simple to get right.  You might want to try
> pg_standby which will ship with 8.3:
> http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/pg_standby/

Thanks Tom.

pg_standby will be back compatible with 8.2 also, though isn't available
as part of the 8.2 distribution. The README will contain information for
both 8.2 and 8.3 releases, since there will be one minor difference.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com