Re: replication using WAL archives - Mailing list pgsql-admin
From | Bruce Momjian |
---|---|
Subject | Re: replication using WAL archives |
Date | |
Msg-id | 200506050029.j550T2W29877@candle.pha.pa.us Whole thread Raw |
In response to | Re: replication using WAL archives (Gaetano Mendola <mendola@bigfoot.com>) |
List | pgsql-admin |
This thread has been added as a link on the TODO list under TODO.detail. --------------------------------------------------------------------------- Gaetano Mendola wrote: > Robert Treat wrote: > > On Thu, 2004-10-21 at 02:44, Iain wrote: > > > >>Hi, > >> > >>I thought I read something about this in relation to v8, but I can't > >>find any reference to it now... is it (or will it be) possible to do > >>master-slave style database replication by transmitting log files to the > >>standby server and having it process them? > >> > > > > > > I'm not certain if this is 8.0, but some folks have created a working > > version against the 8.0 code that will do something like this. Search > > the pgsql-hacker mail list archives for more information on it. > > I sent a post on hackers, I put it here: > > ======================================================================= > Hi all, > I seen that Eric Kerin did the work suggested by Tom about > how to use the PITR in order to have an hot spare postgres, > writing a C program. > > I did the same writing 2 shell scripts, one of them perform > the restore the other one deliver the partial filled wal and > check if the postmaster is alive ( check if the pid process > still exist ). > > With these two scripts I'm able to have an hot spare installation, > and the spare one go alive when the first postmaster dies. > > How test it: > > 1) Master node: > modify postgresql.conf using: > > ~ archive_command = 'cp %p /mnt/server/archivedir/%f' > > ~ launch postgres and perform a backup as doc > > ~ http://developer.postgresql.org/docs/postgres/backup-online.html > > suggest to do > > launch the script: > > partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path> > > ~ this script will delivery each 10 seconds the "current" wal file, > ~ and touch the "alive" file in order to notify the spare node that > ~ the master node is up and running > > > 2) Spare node: > create a recovery.conf with the line: > > ~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p /mnt/server/partialdir' > > ~ replace the content of data directory with the backup performed at point 1, > ~ remove any file present in the pg_xlog directory ( leaving there the archive_status > ~ directory ) and remove the postmaster.pid file ( this is necessary if you are running > ~ the spare postgres on the same hw ). > > ~ launch the postmaster, the restore will continue till the "alive" file present in the > ~ /mnt/server/partialdir directory is not updated for 60 seconds ( you can modify this > ~ values inside the restore.sh script ). > > Be sure that restore.sh and all directories involved are accessible > > Let me know. > > > This is a first step, of course, as Eric Kerin did, is better port these script > in C and make it more robust. > > Postgres can help this process, as suggested by Tom creating a pg_current_wal() > or even better having two new GUC parameters: archive_current_wal_command and > archive_current_wal_delay. > > I problem I discover during the tests is that if you shut down the spare node > and the restore_command is still waiting for a file then the postmaster will never > exit :-( > ========================================================================== > > I hope that is clear. > > > > Regards > Gaetano Mendola > > > #!/bin/bash > > > SOURCE=$1 > TARGET=$2 > PARTIAL=$3 > > SIZE_EXPECTED=16777216 #bytes 16 MB > DIED_TIME=60 #seconds > > function test_existence > { > if [ -f ${SOURCE} ] > then > COUNTER=0 > > #I have to check if the file is begin copied > #I assume that it will reach the right > #size in a few seconds > > while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ] > do > sleep 1 > let COUNTER+=1 > if [ 20 -lt $COUNTER ] > then > exit 1 # BAILING OUT > fi > done > > cp $SOURCE $TARGET > exit 0 > fi > echo ${SOURCE}"> not found" > > #if is looking for a history file and not exist > #I have suddenly exit > echo $SOURCE | grep history > /dev/null 2>&1 && exit 1 > } > > > while [ 1 ] > do > > test_existence > > #CHECK IF THE MASTER IS ALIVE > DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) )) > if [ $DIED_TIME -lt $DELTA_TIME ] > then > echo "Master is dead..." > # Master is dead > CURRENT_WAL=$( basename $SOURCE ) > echo "Partial: " ${PARTIAL} > echo "Current wal: " ${CURRENT_WAL} > echo "Target: " ${TARGET} > cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET} > /dev/null 2>&1 && exit 0 > exit 1 > fi > > sleep 1 > > done > #!/bin/bash > > PID=$1 > PARTIAL=$2 > PGXLOG=$3 > > function copy_last_wal > { > FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 ) > > echo "Last Wal> " $FILE > > cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp > mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial > find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {} > } > > > while [ 1 ] > do > ps --pid $PID > /dev/null 2>&1 > ALIVE=$? > > if [ "${ALIVE}" == "1" ] > then > #The process is dead > echo "Process dead" > copy_last_wal > exit 1 > fi > > #The process still exist > touch ${PARTIAL}/alive > copy_last_wal > > sleep 10 > done > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-admin by date: