Thread: Hot Backup using WAL files
Hello! I already searched in the archives for similar problems but didn't find any. I want to have a hot backup server of my master server's database. I am using NetBSD 3.0.1 with Postgresql 8.2.3. It works so far, that I receive the WAL Files on the slave every few minutes from the master. I have a small Script that rebuilds the slave server with master's data and feeds it with the WAL files that appeared since then: (I am doing a binary backup of the master with a perl script from http://pgfoundry.org/frs/?group_id=1000253 ) Script on the slave: # Source www.postgresqlforums.com/wiki/Backup_&_Recovery # Adjusted by Damian dl@xiqit.de PGDIR=/var/databases/pgsql /etc/rc.d/pgsql stop cp -Rp $PGDIR/data/pg_xlog /tmp rm -fR $PGDIR/data umask 077 su pgsql -c "mkdir -p $PGDIR/data/pg_xlog/archive_status" # get newest binary backup FILE=`ls -t $PGDIR/bak/pg* | head -1` cd $PGDIR/data gzip -dc $FILE | tar xpf - cp -Rp /tmp/pg_xlog/* $PGDIR/data/pg_xlog su pgsql -c "cp $PGDIR/recovery.done $PGDIR/data/recovery.conf" su pgsql -c "cp $PGDIR/postgresql.conf $PGDIR/data/postgresql.conf" /etc/rc.d/pgsql start rm -fR /tmp/pg_xlog My problem now is, how to continually feed the slave server with new received WAL files? The slave just restores up to the state when I ran my script. I also tried to stop the server rename recovery.done to recovery.conf and start the server again (without deleting anything). It looks in the received WAL files' directory and complains about some errors, but starts after that. I am worried about the errors. From the logfile: LOG: received fast shutdown request LOG: shutting down LOG: database system is shut down LOG: database system was shut down at 2007-04-04 15:10:11 CEST LOG: starting archive recovery LOG: restore_command = "cp ../bak/%f %p" cp: ../bak/00000001.history: No such file or directory FATAL: the database system is starting up LOG: restored log file "00000001000000010000004F" from archive LOG: invalid xl_info in primary checkpoint record LOG: using previous checkpoint record at 1/4F000020 LOG: redo record is at 1/4F000020; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 0/25133; next OID: 57344 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: automatic recovery in progress LOG: redo starts at 1/4F000068 FATAL: the database system is starting up LOG: restored log file "000000010000000100000050" from archive cp: ../bak/000000010000000100000051: No such file or directory LOG: unexpected pageaddr 0/32000000 in log file 1, segment 81, offset 0 LOG: redo done at 1/50000068 LOG: restored log file "000000010000000100000050" from archive FATAL: the database system is starting up LOG: archive recovery complete LOG: database system is ready Is this the way it works? Bye, Damian
Damian Lubosch <dl@xiqit.de> writes: > My problem now is, how to continually feed the slave server with new > received WAL files? You need to make the restore_command script wait until the next WAL file is available, instead of failing immediately. This is actually not that simple to get right. You might want to try pg_standby which will ship with 8.3: http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/pg_standby/ regards, tom lane
Hi, Found on the dev mailing list (hidden well)... On the master postgresql.conf: archive_command = 'rsync -arv "%p" slave_hostname:/.../wal_archive/"%f";' (and maybe '&cp "%p" /.../wal_archive/"%f";') archive_timeout= 900 So, you need to perform a ssh-keygen on both sides. Now, the slave recovery.conf: restore_command = '/.../RECOVER/restore.sh /.../wal_archive/%f "%p"' and the restore.sh script: ************ #!/bin/bash RESTORE_FROM=$1 RESTORE_TO=$2 DELAY=100000 TRIGGERED=0 TRIGGER_FILE="{$POSTGRES_PATH}/RECOVER /trigger" copyfunc() { if [ "$TRIGGERED" -eq "0" ]; then cp -v -i $RESTORE_FROM $RESTORE_TO fi } k=`expr $1 : '.*\(history\)'` if [ "$k" == "history" ]; then copyfunc; exit $?; fi while [ ! -f "$RESTORE_FROM" -a "$TRIGGERED" -eq "0" ]; do usleep $DELAY; if [ -e $TRIGGER_FILE ]; then TRIGGERED=1; fi done copyfunc; *********** finally, you just need to launch /etc/rc.d/pgsql start (nohup) The logfile: 2007-03-26 10:19:02 CEST LOG: database system was interrupted at 200703-25 20:05:01 CEST 2007-03-26 10:19:02 CEST LOG: starting archive recovery 2007-03-26 10:19:02 CEST LOG: restore_command = "/.../restore.sh /.../wal_archive/%f "%p"" cp: cannot stat `/.../wal_archive/00000001.history': No such file or directory `/.../wal_archive/0000000100000023000000AD.00000068.backup' -> `pg_xlog/RECOVERYHISTORY' 2007-03-26 10:19:02 CEST LOG: restored log file "0000000100000023000000AD.00000068.backup" from archive `/..../wal_archive/0000000100000023000000AD' -> `pg_xlog/RECOVERYXLOG' 2007-03-26 10:19:02 CEST LOG: restored log file "0000000100000023000000AD" from archive 2007-03-26 10:19:02 CEST LOG: checkpoint record is at 23/AD000068 2007-03-26 10:19:02 CEST LOG: redo record is at 23/AD000068; undo record is at 0/0; shutdown FALSE 2007-03-26 10:19:02 CEST LOG: next transaction ID: 0/1068654; next OID: 85648 2007-03-26 10:19:02 CEST LOG: next MultiXactId: 31; next MultiXactOffset: 62 2007-03-26 10:19:02 CEST LOG: automatic recovery in progress 2007-03-26 10:19:02 CEST LOG: redo starts at 23/AD0000B0 `/.../wal_archive/0000000100000023000000AE' -> `pg_xlog/RECOVERYXLOG' 2007-03-26 10:19:02 CEST LOG: restored log file "0000000100000023000000AE" from archive `/.../wal_archive/0000000100000023000000AF' -> `pg_xlog/RECOVERYXLOG' 2007-03-26 10:19:02 CEST LOG: restored log file "0000000100000023000000AF" from archive today... 2007-04-04 15:50:38 CEST LOG: restored log file "000000010000002700000071" from archive `/.../wal_archive/000000010000002700000072' -> `pg_xlog/RECOVERYXLOG' 2007-04-04 16:05:38 CEST LOG: restored log file "000000010000002700000072" from archive `/.../wal_archive/000000010000002700000073' -> `pg_xlog/RECOVERYXLOG' 2007-04-04 16:20:38 CEST LOG: restored log file "000000010000002700000073" from archive `/.../wal_archive/000000010000002700000074' -> `pg_xlog/RECOVERYXLOG' 2007-04-04 16:35:38 CEST LOG: restored log file "000000010000002700000074" from archive And if you try to connect with psql: FATAL: the database system is starting up Regards, Ghislain
On Wed, 2007-04-04 at 11:07 -0400, Tom Lane wrote: > Damian Lubosch <dl@xiqit.de> writes: > > My problem now is, how to continually feed the slave server with new > > received WAL files? > > You need to make the restore_command script wait until the next WAL file > is available, instead of failing immediately. > > This is actually not that simple to get right. You might want to try > pg_standby which will ship with 8.3: > http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/pg_standby/ Thanks Tom. pg_standby will be back compatible with 8.2 also, though isn't available as part of the 8.2 distribution. The README will contain information for both 8.2 and 8.3 releases, since there will be one minor difference. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com