9.0 replication -- multiple hot_standby servers - Mailing list pgsql-general

From Dean Gibson (DB Administrator)
Subject 9.0 replication -- multiple hot_standby servers
Date
Msg-id 4CCB056A.8010006@ultimeth.com
Whole thread Raw
Responses Re: 9.0 replication -- multiple hot_standby servers
List pgsql-general
Oops; previously sent from the wrong eMail address, so I don't know if this actually got sent:

Two days ago I upgraded five DB boxes (for load balancing) from 8.3.0 to 9.0.1 in order to use replication. The replication configuration went reasonably well, and now all the four "hot_standby" servers are (streaming) replicating just fine from the primary DB server.  If the primary fails and I "touch" the trigger file on one of the standby boxes, that goes into primary mode just as it should.  Of course, I have to externally redirect updates to the new server.

My question is, how do I configure the other three (still) hot_standby boxes to now use the new primary?  Clearly I can change the "recovery.conf" file on each standby box, but that seems like an unnecessary nuisance.

What I would like to do is put multiple "primary_conninfo" lines into the "recovery.conf" file, and have each standby server dynamically pick the one that is the primary.  Further, when the primary goes down, the standby should re-choose the new primary using the multiple "primary_conninfo" lines.

Such a feature (if it does not already exist) would really be useful !!!

Question:  While the documentation described how to switch a standby server from hot_standby to primary, I didn't see how to switch it back to hot_standby.  Is the following the best (or only) method ???
  1. Optionally, bring the previous primary back up into hot_standby mode.
  2. STOP the current primary server.
  3. Using the trigger file on another hot-standby box, bring it into primary mode.
  4. Externally, start redirecting updates to the new primary server.
  5. On the stopped server, delete the trigger file, and rename the "recovery.done" file back to "recovery.conf".
  6. Restart the stopped server;  it should come back up into hot_standby mode.

Ideally, this could be done seamlessly without the (small) possibility of lost updates, but I don't see how to do that.  Fortunately, it's not a big issue in my case.  Here's what I think would be ideal from an update integrity standpoint:

  1. If a hot_standby box gets a non-read-only SQL transaction, if checks to see if it is still talking to a primary server:
    • If it is, it does whatever is done now (I presume it reports an error).
    • If not, it "touches" the trigger file and switches into primary mode.  If there are other hot_standby servers, ideally the new primary signals them that it is the new primary (in order to keep them from accessing the old primary, which may still be visible to them).
  2. This way, an external application feeding SQL update-type transactions, upon finding that the old primary is down, just switches to any hot_standby server as its new target and retries the update;  this will automatically cause the desired hot_standby server to switch to primary.
  3. Since we don't know what the issue was with the the previous primary, it needs to be recovered manually (and isolated from other hot_standby servers in the meantime).

Sincerely, Dean

-- 
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: 9.0 replication -- multiple hot_standby servers
Next
From: Dave Page
Date:
Subject: [ANNOUNCE] PGDay Europe 2010 schedule announced