Re: replication using WAL archives - Mailing list pgsql-admin
From | Iain |
---|---|
Subject | Re: replication using WAL archives |
Date | |
Msg-id | 011a01c4b7d9$c121a390$7201a8c0@mst1x5r347kymb Whole thread Raw |
In response to | replication using WAL archives ("Iain" <iain@mst.co.jp>) |
List | pgsql-admin |
Thanks Gaetano, and Robert. I'm not in the habit of reading the hackers list, so I still have no idea where I first read about it :) Anyway, I did some reading and it seems like there are good possibilities here, but still quite a bit of work to do. I'll add this to my list of HA options and keep an eye on it's progress. regards Iain ----- Original Message ----- From: "Gaetano Mendola" <mendola@bigfoot.com> To: "Robert Treat" <xzilla@users.sourceforge.net>; <pgsql-admin@postgresql.org> Cc: <iain@mst.co.jp> Sent: Friday, October 22, 2004 6:49 AM Subject: Re: [ADMIN] replication using WAL archives > Robert Treat wrote: >> On Thu, 2004-10-21 at 02:44, Iain wrote: >> >>>Hi, >>> >>>I thought I read something about this in relation to v8, but I can't >>>find any reference to it now... is it (or will it be) possible to do >>>master-slave style database replication by transmitting log files to the >>>standby server and having it process them? >>> >> >> >> I'm not certain if this is 8.0, but some folks have created a working >> version against the 8.0 code that will do something like this. Search >> the pgsql-hacker mail list archives for more information on it. > > I sent a post on hackers, I put it here: > > ======================================================================= > Hi all, > I seen that Eric Kerin did the work suggested by Tom about > how to use the PITR in order to have an hot spare postgres, > writing a C program. > > I did the same writing 2 shell scripts, one of them perform > the restore the other one deliver the partial filled wal and > check if the postmaster is alive ( check if the pid process > still exist ). > > With these two scripts I'm able to have an hot spare installation, > and the spare one go alive when the first postmaster dies. > > How test it: > > 1) Master node: > modify postgresql.conf using: > > ~ archive_command = 'cp %p /mnt/server/archivedir/%f' > > ~ launch postgres and perform a backup as doc > > ~ http://developer.postgresql.org/docs/postgres/backup-online.html > > suggest to do > > launch the script: > > partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path> > > ~ this script will delivery each 10 seconds the "current" wal file, > ~ and touch the "alive" file in order to notify the spare node that > ~ the master node is up and running > > > 2) Spare node: > create a recovery.conf with the line: > > ~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p > /mnt/server/partialdir' > > ~ replace the content of data directory with the backup performed > at point 1, > ~ remove any file present in the pg_xlog directory ( leaving there > the archive_status > ~ directory ) and remove the postmaster.pid file ( this is > necessary if you are running > ~ the spare postgres on the same hw ). > > ~ launch the postmaster, the restore will continue till the "alive" > file present in the > ~ /mnt/server/partialdir directory is not updated for 60 seconds > ( you can modify this > ~ values inside the restore.sh script ). > > Be sure that restore.sh and all directories involved are accessible > > Let me know. > > > This is a first step, of course, as Eric Kerin did, is better port these > script > in C and make it more robust. > > Postgres can help this process, as suggested by Tom creating a > pg_current_wal() > or even better having two new GUC parameters: archive_current_wal_command > and > archive_current_wal_delay. > > I problem I discover during the tests is that if you shut down the spare > node > and the restore_command is still waiting for a file then the postmaster > will never > exit :-( > ========================================================================== > > I hope that is clear. > > > > Regards > Gaetano Mendola > > > -------------------------------------------------------------------------------- > #!/bin/bash > > > SOURCE=$1 > TARGET=$2 > PARTIAL=$3 > > SIZE_EXPECTED=16777216 #bytes 16 MB > DIED_TIME=60 #seconds > > function test_existence > { > if [ -f ${SOURCE} ] > then > COUNTER=0 > > #I have to check if the file is begin copied > #I assume that it will reach the right > #size in a few seconds > > while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ] > do > sleep 1 > let COUNTER+=1 > if [ 20 -lt $COUNTER ] > then > exit 1 # BAILING OUT > fi > done > > cp $SOURCE $TARGET > exit 0 > fi > echo ${SOURCE}"> not found" > > #if is looking for a history file and not exist > #I have suddenly exit > echo $SOURCE | grep history > /dev/null 2>&1 && exit 1 > } > > > while [ 1 ] > do > > test_existence > > #CHECK IF THE MASTER IS ALIVE > DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) )) > if [ $DIED_TIME -lt $DELTA_TIME ] > then > echo "Master is dead..." > # Master is dead > CURRENT_WAL=$( basename $SOURCE ) > echo "Partial: " ${PARTIAL} > echo "Current wal: " ${CURRENT_WAL} > echo "Target: " ${TARGET} > cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET} > /dev/null 2>&1 && > exit 0 > exit 1 > fi > > sleep 1 > > done > -------------------------------------------------------------------------------- > #!/bin/bash > > PID=$1 > PARTIAL=$2 > PGXLOG=$3 > > function copy_last_wal > { > FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 ) > > echo "Last Wal> " $FILE > > cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp > mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial > find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {} > } > > > while [ 1 ] > do > ps --pid $PID > /dev/null 2>&1 > ALIVE=$? > > if [ "${ALIVE}" == "1" ] > then > #The process is dead > echo "Process dead" > copy_last_wal > exit 1 > fi > > #The process still exist > touch ${PARTIAL}/alive > copy_last_wal > > sleep 10 > done > -------------------------------------------------------------------------------- > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
pgsql-admin by date: