Home > mailing lists

Re: adding another node to our pitr config - Mailing list pgsql-general

From	Yaroslav Tykhiy
Subject	Re: adding another node to our pitr config
Date	October 16, 2009 00:01:59
Msg-id	3CE45549-C144-4DBB-BEF5-C89B40178521@barnet.com.au Whole thread Raw
In response to	adding another node to our pitr config (Geoffrey <lists@serioustechnology.com>)
List	pgsql-general

Tree view

On 06/10/2009, at 11:51 PM, Geoffrey wrote:

> We are currently using WAL shipping to have a hot spare of our
> databases.  We want to add another node to this configuration.  The
> question is, what is the best way to go about this?
>
> Currently, our script checks to see if the WAL file already exists
> on the target server, if not, then we scp the file over.  This is a
> local machine, so the scp overhead is not considered to be an issue.
>
> So, the current approach is:
>
> ssh $1 "test ! -f $2/$4" && scp $3 $1:$2/$4
>
> So, should I simply duplicate that line for the second server and
> place it below this one, or should they be dependent upon each
> other?  That is:
>
>
> archive_command = 'archive.sh node1 /esc/master/pitr %p %f node2'
>
>
> ssh $1 "test ! -f $2/$4" && scp $3 $1:$2/$4 &&
> ssh $5 "test ! -f $2/$4" && scp $3 $5:$2/$4
>
> The second node will not be at the same location, thus the network
> reliability is less.
>
> Thanks for any insights.

I've been interested in a similar setup, too, although I haven't
implemented it yet.  I think there are at least 3 obvious approaches
to consider.  They all are basically solutions/workaround to the
following issue with multiple spare nodes: If the WAL copy operation
to spare node 1 succeeds but that to spare node 2 fails, you have to
handle the partial success somehow.  Your suggested archive_command
will keep returning failure because the WAL segment already exists on
node 1.

1. A shared WAL spool on a fault-tolerant SAN mounted via NFS or
similar by all nodes.  Then you can use a trivial `test && cp && mv'
archive_command on the master node and have all the spare nodes fetch
WAL files from the spool.  (mv is there to make the copy atomic.)  An
additional advantage is that you can use the same WAL spool for
periodic DB backups: You run a file-level backup once in a while and
rely on the spool to accumulate the matching WALs for you.  A problem
with this approach is that it can be non-trivial to implement a truly
fault-tolerant shared spool and you'll end up with a single point of
failure on it.

2. Destructive copy.  Just let your archive_command overwrite existing
WAL segments.  Then a failure with node 2 will result in a retry from
scratch.

3. Delegation of failure handling to archive_command.  Instead of
relying on the pg archiver process to retry archive_command if it
returned failure, run the copy op to each spare node in a loop until
success.  Then you'll be sure all the nodes received the WAL by the
end of the script.

 From a probabilistic PoV, (3) will be notably better than (2) only if
the probability of failure for each node is high.

pgsql-general by date:

From: Scott Bailey
Date: 15 October 2009, 22:14:13
Subject: Re: Craeteing sparse arrays

From: Yadira Lizama Mue
Date: 16 October 2009, 02:04:09
Subject: PGCluster vs CyberCluster

Re: adding another node to our pitr config - Mailing list pgsql-general

Previous

Next