Thread: Warm standby failover mechanism

Warm standby failover mechanism

From
Thom Brown
Date:
Hi all,

We're looking at setting up a warm-standby server using log shipping and aren't too sure about how we should trigger failover.  Is there a commonly-used approach which is reliable enough to recommend?  Looking at the documentation, there doesn't seem to be any recommendation.  I preferrably don't want to use a witness server.

Also, what would you say is the best way to tell the failed primary server that it is no longer the primary server?

Thanks

Thom

Re: Warm standby failover mechanism

From
Simon Riggs
Date:
On Tue, 2009-02-24 at 16:55 +0000, Thom Brown wrote:

> We're looking at setting up a warm-standby server using log shipping
> and aren't too sure about how we should trigger failover.  Is there a
> commonly-used approach which is reliable enough to recommend?  Looking
> at the documentation, there doesn't seem to be any recommendation.  I
> preferrably don't want to use a witness server.
>
> Also, what would you say is the best way to tell the failed primary
> server that it is no longer the primary server?

http://www.postgresql.org/docs/8.3/static/pgstandby.html

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Warm standby failover mechanism

From
Thom Brown
Date:
Thanks for the link Simon, but this doesn't recommend any method for triggering failover, or telling the primary that another server is now primary.

We've set up a primary server in archive mode to continuously archive to an NFS mount, and the standby server to continuously recovery from that directory (although I'm not sure that's actually working... I've probably overlooked something).  The problem we face is working out how to tell the standby server that it is the primary.  Yes this can be done with a trigger file in /tmp, but how would that automatically appear there?  And when the failed server actually restarted, or the Postgres service restarts, how do we tell it that it is no longer the primary?

Thanks

Thom

2009/2/24 Simon Riggs <simon@2ndquadrant.com>

On Tue, 2009-02-24 at 16:55 +0000, Thom Brown wrote:

> We're looking at setting up a warm-standby server using log shipping
> and aren't too sure about how we should trigger failover.  Is there a
> commonly-used approach which is reliable enough to recommend?  Looking
> at the documentation, there doesn't seem to be any recommendation.  I
> preferrably don't want to use a witness server.
>
> Also, what would you say is the best way to tell the failed primary
> server that it is no longer the primary server?

http://www.postgresql.org/docs/8.3/static/pgstandby.html

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Warm standby failover mechanism

From
Karsten Hilbert
Date:
On Wed, Feb 25, 2009 at 09:26:26AM +0000, Thom Brown wrote:

> We've set up a primary server in archive mode to continuously archive to an
> NFS mount, and the standby server to continuously recovery from that
> directory (although I'm not sure that's actually working... I've probably
> overlooked something).  The problem we face is working out how to tell the
> standby server that it is the primary.  Yes this can be done with a trigger
> file in /tmp, but how would that automatically appear there?

Think backwards: How would it automatically *dis*appear from
there. Have the secondary create it and check for existance
at regular intervals. Have the primary delete it at slightly
shorter intervals.

When the secondary finds it again after the check interval
the primary didn't delete it. Wait some more for good
measure and/or try to ping the primary some other way...

Just my 2 cents,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Warm standby failover mechanism

From
Thom Brown
Date:
2009/2/25 Karsten Hilbert <Karsten.Hilbert@gmx.net>

Think backwards: How would it automatically *dis*appear from
there. Have the secondary create it and check for existance
at regular intervals. Have the primary delete it at slightly
shorter intervals.

When the secondary finds it again after the check interval
the primary didn't delete it. Wait some more for good
measure and/or try to ping the primary some other way...


Thanks for the suggestion.  I'm wondering what the chances of such a create/delete system would have of accidentally causing failover?  Is it possible to use the archive command parameter to firstly run the pg_standby command, and follow it by a touch /tmp/pg.triggerfile by using &&?

Re: Warm standby failover mechanism

From
Thom Brown
Date:
I'm still trying to work out how this file creation/deletion thing will work.  If I can tag a "&& touch /tmp/pg.trigger" command to the end of the recovery command, how often will that be called?  If I can't, I still need to ensure that it is created and deleted before the recovery command is called, otherwise it will see it before it is deleted and put itself online.

Has anyone got a practical example of what they've set up, or know what others have set up?

Re: Warm standby failover mechanism

From
"Joshua D. Drake"
Date:
On Wed, 2009-02-25 at 13:54 +0000, Thom Brown wrote:
> I'm still trying to work out how this file creation/deletion thing
> will work.  If I can tag a "&& touch /tmp/pg.trigger" command to the
> end of the recovery command, how often will that be called?  If I
> can't, I still need to ensure that it is created and deleted before
> the recovery command is called, otherwise it will see it before it is
> deleted and put itself online.
>
> Has anyone got a practical example of what they've set up, or know
> what others have set up?

You are doing this the hard way. Grab PITRTTools.

https://projects.commandprompt.com/public/pitrtools

Just pull down the stable branch:

svn co
https://projects.commandprompt.com/public/pitrtools/repo/branches/1.2

Sincerely,

Joshua D. Drake


--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Warm standby failover mechanism

From
Thom Brown
Date:
You are doing this the hard way. Grab PITRTTools.

https://projects.commandprompt.com/public/pitrtools


I can't really dispute a recommendation from JD.  I'll have to look into that.  It's a shame because we've spent ages trying to work out where we've been going wrong in this whole process (we still haven't got it picking up WALs from the archive).

Thanks!

Thom

Re: Warm standby failover mechanism

From
"Joshua D. Drake"
Date:
On Wed, 2009-02-25 at 16:33 +0000, Thom Brown wrote:
>         You are doing this the hard way. Grab PITRTTools.
>
>         https://projects.commandprompt.com/public/pitrtools
>
>
> I can't really dispute a recommendation from JD.  I'll have to look
> into that.  It's a shame because we've spent ages trying to work out
> where we've been going wrong in this whole process (we still haven't
> got it picking up WALs from the archive).

As a note, all PITRTools does is wrap around all the tools that you are
trying to make work. So you will still need pg_standby, rsync, ssh
etc...

I have updated the wiki to make it a bit more friendly.

https://projects.commandprompt.com/public/pitrtools/wiki

Sincerely,

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Warm standby failover mechanism

From
Thom Brown
Date:
As a note, all PITRTools does is wrap around all the tools that you are
trying to make work. So you will still need pg_standby, rsync, ssh
etc...

I have updated the wiki to make it a bit more friendly.

https://projects.commandprompt.com/public/pitrtools/wiki

I've given PITRTools a try, but I have been unsuccessful in utilising it.  I've configured both ini files, everything owned by user postgres, and upon running "./cmd_archiver -C cmd_archiver.ini" I get:

NOTICE: check_config_func()
NOTICE: Performing standard archive
NOTICE: archive_func()
NOTICE: send_queue_func()
NOTICE: list_queue_func()
NOTICE: generate_slave_list_func()
NOTICE: Your slaves are: ['192.168.1.17']
Traceback (most recent call last):
  File "../pitr_tools/cmd_archiver", line 343, in <module>
    archive_func()
  File "../pitr_tools/cmd_archiver", line 254, in archive_func
    queue = send_queue_func()
  File "../pitr_tools/cmd_archiver", line 219, in send_queue_func
    for host in list_queue_func():
  File "../pitr_tools/cmd_archiver", line 202, in list_queue_func
    list_archives = os.listdir(queuedir)
OSError: [Errno 2] No such file or directory: '/var/lib/postgresql/archive/192.168.1.17'

The only time that IP address appears in the INI file is on the slaves line, so I'm not sure why it's trying to find a directory called that.

Re: Warm standby failover mechanism

From
"Joshua D. Drake"
Date:
On Wed, 2009-02-25 at 21:39 +0000, Thom Brown wrote:
>
>         As a note, all PITRTools does is wrap around all the tools
>         that you are
>         trying to make work. So you will still need pg_standby, rsync,
>         ssh
>         etc...
>
>         I have updated the wiki to make it a bit more friendly.
>
>         https://projects.commandprompt.com/public/pitrtools/wiki
>
>
>
> I've given PITRTools a try, but I have been unsuccessful in utilising
> it.  I've configured both ini files, everything owned by user
> postgres, and upon running "./cmd_archiver -C cmd_archiver.ini" I get:
>
> NOTICE: check_config_func()
> NOTICE: Performing standard archive
> NOTICE: archive_func()
> NOTICE: send_queue_func()
> NOTICE: list_queue_func()
> NOTICE: generate_slave_list_func()
> NOTICE: Your slaves are: ['192.168.1.17']
> Traceback (most recent call last):
>   File "../pitr_tools/cmd_archiver", line 343, in <module>
>     archive_func()
>   File "../pitr_tools/cmd_archiver", line 254, in archive_func
>     queue = send_queue_func()
>   File "../pitr_tools/cmd_archiver", line 219, in send_queue_func
>     for host in list_queue_func():
>   File "../pitr_tools/cmd_archiver", line 202, in list_queue_func
>     list_archives = os.listdir(queuedir)
> OSError: [Errno 2] No such file or directory:
> '/var/lib/postgresql/archive/192.168.1.17'
>
> The only time that IP address appears in the INI file is on the slaves
> line, so I'm not sure why it's trying to find a directory called that.

Looks like you didn't run cmd_archiver -C <config_file> -I

Joshua D. Drake


--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Warm standby failover mechanism

From
Thom Brown
Date:
Looks like you didn't run cmd_archiver -C <config_file> -I

Ahh, okay, that did something, which I think means it created a directory named after the slave IP in the archive directory. I didn't see any mention of that switch in the README file, unless it's mentioned elsewhere and I missed it.

Now when I run "./cmd_archiver -C cmd_archiver.ini" I get:

rsync: link_stat "/var/lib/postgresql/pitr_tools/None" failed: No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(1058) [sender=3.0.3]
rsync: link_stat "/var/lib/postgresql/pitr_tools/None" failed: No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(1058) [sender=3.0.3]
FATAL: Unabled to rsync_transfer or queue_transfer
CRITICAL 5888

Re: Warm standby failover mechanism

From
"Joshua D. Drake"
Date:
On Wed, 2009-02-25 at 22:06 +0000, Thom Brown wrote:
>
>         Looks like you didn't run cmd_archiver -C <config_file> -I
>
>
>
> Ahh, okay, that did something, which I think means it created a
> directory named after the slave IP in the archive directory.

Right that is the queue directory.

> Now when I run "./cmd_archiver -C cmd_archiver.ini" I get:
>
That command really shouldn't do anything but error. The whole point of
the archiver is to be placed in the archive_command option in the
postgresql.conf. E.g;

cmd_archive -C cmd_archive.ini -F %p

This really should be happening on the pitrtools list. Let's bounce over
there and resolve this.

Joshua D. Drake

>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997