Re: replication using WAL archives - Mailing list pgsql-admin
From | Gaetano Mendola |
---|---|
Subject | Re: replication using WAL archives |
Date | |
Msg-id | 41782EEF.5040708@bigfoot.com Whole thread Raw |
In response to | Re: replication using WAL archives (Robert Treat <xzilla@users.sourceforge.net>) |
Responses |
Re: replication using WAL archives
|
List | pgsql-admin |
Robert Treat wrote: > On Thu, 2004-10-21 at 02:44, Iain wrote: > >>Hi, >> >>I thought I read something about this in relation to v8, but I can't >>find any reference to it now... is it (or will it be) possible to do >>master-slave style database replication by transmitting log files to the >>standby server and having it process them? >> > > > I'm not certain if this is 8.0, but some folks have created a working > version against the 8.0 code that will do something like this. Search > the pgsql-hacker mail list archives for more information on it. I sent a post on hackers, I put it here: ======================================================================= Hi all, I seen that Eric Kerin did the work suggested by Tom about how to use the PITR in order to have an hot spare postgres, writing a C program. I did the same writing 2 shell scripts, one of them perform the restore the other one deliver the partial filled wal and check if the postmaster is alive ( check if the pid process still exist ). With these two scripts I'm able to have an hot spare installation, and the spare one go alive when the first postmaster dies. How test it: 1) Master node: modify postgresql.conf using: ~ archive_command = 'cp %p /mnt/server/archivedir/%f' ~ launch postgres and perform a backup as doc ~ http://developer.postgresql.org/docs/postgres/backup-online.html suggest to do launch the script: partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path> ~ this script will delivery each 10 seconds the "current" wal file, ~ and touch the "alive" file in order to notify the spare node that ~ the master node is up and running 2) Spare node: create a recovery.conf with the line: ~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p /mnt/server/partialdir' ~ replace the content of data directory with the backup performed at point 1, ~ remove any file present in the pg_xlog directory ( leaving there the archive_status ~ directory ) and remove the postmaster.pid file ( this is necessary if you are running ~ the spare postgres on the same hw ). ~ launch the postmaster, the restore will continue till the "alive" file present in the ~ /mnt/server/partialdir directory is not updated for 60 seconds ( you can modify this ~ values inside the restore.sh script ). Be sure that restore.sh and all directories involved are accessible Let me know. This is a first step, of course, as Eric Kerin did, is better port these script in C and make it more robust. Postgres can help this process, as suggested by Tom creating a pg_current_wal() or even better having two new GUC parameters: archive_current_wal_command and archive_current_wal_delay. I problem I discover during the tests is that if you shut down the spare node and the restore_command is still waiting for a file then the postmaster will never exit :-( ========================================================================== I hope that is clear. Regards Gaetano Mendola #!/bin/bash SOURCE=$1 TARGET=$2 PARTIAL=$3 SIZE_EXPECTED=16777216 #bytes 16 MB DIED_TIME=60 #seconds function test_existence { if [ -f ${SOURCE} ] then COUNTER=0 #I have to check if the file is begin copied #I assume that it will reach the right #size in a few seconds while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ] do sleep 1 let COUNTER+=1 if [ 20 -lt $COUNTER ] then exit 1 # BAILING OUT fi done cp $SOURCE $TARGET exit 0 fi echo ${SOURCE}"> not found" #if is looking for a history file and not exist #I have suddenly exit echo $SOURCE | grep history > /dev/null 2>&1 && exit 1 } while [ 1 ] do test_existence #CHECK IF THE MASTER IS ALIVE DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) )) if [ $DIED_TIME -lt $DELTA_TIME ] then echo "Master is dead..." # Master is dead CURRENT_WAL=$( basename $SOURCE ) echo "Partial: " ${PARTIAL} echo "Current wal: " ${CURRENT_WAL} echo "Target: " ${TARGET} cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET} > /dev/null 2>&1 && exit 0 exit 1 fi sleep 1 done #!/bin/bash PID=$1 PARTIAL=$2 PGXLOG=$3 function copy_last_wal { FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 ) echo "Last Wal> " $FILE cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {} } while [ 1 ] do ps --pid $PID > /dev/null 2>&1 ALIVE=$? if [ "${ALIVE}" == "1" ] then #The process is dead echo "Process dead" copy_last_wal exit 1 fi #The process still exist touch ${PARTIAL}/alive copy_last_wal sleep 10 done
pgsql-admin by date: