Re: replication using WAL archives - Mailing list pgsql-admin

From Iain
Subject Re: replication using WAL archives
Date
Msg-id 011a01c4b7d9$c121a390$7201a8c0@mst1x5r347kymb
Whole thread Raw
In response to replication using WAL archives  ("Iain" <iain@mst.co.jp>)
List pgsql-admin
Thanks Gaetano, and Robert.

I'm not in the habit of reading the hackers list, so I still have no idea
where I first read about it :)

Anyway, I did some reading and it seems like there are good possibilities
here, but still quite a bit of work to do. I'll add this to my list of HA
options and keep an eye on it's progress.

regards
Iain
----- Original Message -----
From: "Gaetano Mendola" <mendola@bigfoot.com>
To: "Robert Treat" <xzilla@users.sourceforge.net>;
<pgsql-admin@postgresql.org>
Cc: <iain@mst.co.jp>
Sent: Friday, October 22, 2004 6:49 AM
Subject: Re: [ADMIN] replication using WAL archives


> Robert Treat wrote:
>> On Thu, 2004-10-21 at 02:44, Iain wrote:
>>
>>>Hi,
>>>
>>>I thought I read something about this in relation to v8, but I can't
>>>find any reference to it now... is it (or will it be) possible to do
>>>master-slave style database replication by transmitting log files to the
>>>standby server and having it process them?
>>>
>>
>>
>> I'm not certain if this is 8.0, but some folks have created a working
>> version against the 8.0 code that will do something like this. Search
>> the pgsql-hacker mail list archives for more information on it.
>
> I sent a post on hackers, I put it here:
>
> =======================================================================
> Hi all,
> I seen that Eric Kerin did the work suggested by Tom about
> how to use the PITR in order to have an hot spare postgres,
> writing a C program.
>
> I did the same writing 2 shell scripts, one of them perform
> the restore the other one deliver the partial filled wal and
> check if the postmaster is alive ( check if the pid process
> still exist ).
>
> With these two scripts I'm able to have an hot spare installation,
> and the spare one go alive when the first postmaster dies.
>
> How test it:
>
> 1) Master node:
> modify postgresql.conf using:
>
> ~        archive_command = 'cp %p /mnt/server/archivedir/%f'
>
> ~        launch postgres and perform a backup as doc
>
> ~        http://developer.postgresql.org/docs/postgres/backup-online.html
>
> suggest to do
>
> launch the script:
>
> partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path>
>
> ~        this script will delivery each 10 seconds the "current" wal file,
> ~        and touch the "alive" file in order to notify the spare node that
> ~        the master node is up and running
>
>
> 2) Spare node:
> create a recovery.conf with the line:
>
> ~        restore_command = 'restore.sh /mnt/server/archivedir/%f %p
> /mnt/server/partialdir'
>
> ~        replace the content of data directory with the backup performed
> at point 1,
> ~        remove any file present in the pg_xlog directory ( leaving there
> the archive_status
> ~        directory ) and remove the postmaster.pid file ( this is
> necessary if you are running
> ~        the spare postgres on the same hw ).
>
> ~        launch the postmaster, the restore will continue till the "alive"
> file present in the
> ~        /mnt/server/partialdir directory is not updated for 60 seconds
> ( you can modify this
> ~        values inside the restore.sh script ).
>
> Be sure that restore.sh and all directories involved are accessible
>
> Let me know.
>
>
> This is a first step, of course, as Eric Kerin did, is better port these
> script
> in C and make it more robust.
>
> Postgres can help this process, as suggested by Tom creating a
> pg_current_wal()
> or even better having two new GUC parameters: archive_current_wal_command
> and
> archive_current_wal_delay.
>
> I problem I discover during the tests is that if you shut down the spare
> node
> and the restore_command is still waiting for a file then the postmaster
> will never
> exit  :-(
> ==========================================================================
>
> I hope that is clear.
>
>
>
> Regards
> Gaetano Mendola
>
>
>


--------------------------------------------------------------------------------


> #!/bin/bash
>
>
> SOURCE=$1
> TARGET=$2
> PARTIAL=$3
>
> SIZE_EXPECTED=16777216 #bytes     16 MB
> DIED_TIME=60                    #seconds
>
> function test_existence
> {
>    if [ -f ${SOURCE}   ]
>    then
>       COUNTER=0
>
>       #I have to check if the file is begin copied
>       #I assume that it will reach the right
>       #size in a few seconds
>
>       while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ]
>       do
>          sleep 1
>          let COUNTER+=1
>          if [ 20 -lt $COUNTER ]
>          then
>             exit 1    # BAILING OUT
>          fi
>       done
>
>       cp $SOURCE $TARGET
>       exit 0
>    fi
>    echo ${SOURCE}"> not found"
>
>    #if is looking for a history file and not exist
>    #I have suddenly exit
>    echo $SOURCE | grep history > /dev/null 2>&1 && exit 1
> }
>
>
> while [ 1 ]
> do
>
>   test_existence
>
>   #CHECK IF THE MASTER IS ALIVE
>   DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) ))
>   if [ $DIED_TIME -lt $DELTA_TIME ]
>   then
>       echo "Master is dead..."
>       # Master is dead
>       CURRENT_WAL=$( basename $SOURCE )
>       echo "Partial: " ${PARTIAL}
>       echo "Current wal: " ${CURRENT_WAL}
>       echo "Target: " ${TARGET}
>       cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET}  > /dev/null 2>&1 &&
> exit 0
>       exit 1
>   fi
>
>   sleep 1
>
> done
>


--------------------------------------------------------------------------------


> #!/bin/bash
>
> PID=$1
> PARTIAL=$2
> PGXLOG=$3
>
> function copy_last_wal
> {
>   FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )
>
>   echo "Last Wal> " $FILE
>
>   cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
>   mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
>   find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
> }
>
>
> while [ 1 ]
> do
>   ps --pid $PID > /dev/null 2>&1
>   ALIVE=$?
>
>   if [ "${ALIVE}" == "1" ]
>   then
>       #The process is dead
>       echo "Process dead"
>       copy_last_wal
>       exit 1
>   fi
>
>   #The process still exist
>   touch ${PARTIAL}/alive
>   copy_last_wal
>
>   sleep 10
> done
>


--------------------------------------------------------------------------------


>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html
>


pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: About System Catalogs
Next
From: "Chris White (cjwhite)"
Date:
Subject: Upgrading to 7.4.5