Thread: Postgresql HA questions
I have a couple of questions regarding setting up and administering a high availability Postgresql "cluster". At the moment the setup I have consists of two servers set up with streaming replication using the named replication slots feature of Postgresql 9.4 (currently running 9.4.1). Let me know if more details are needed about the current setup So, on to my questions:
- Is there any way to do zero-downtime updates to Postgres? At first I was thinking I could just update the secondary, fail over to it, update the primary, and fail back, but then I realized that this would be much more complicated than my first thought, since I would have to either a) find a way to point all my apps that use the database to the secondary while updating, or b) re-ip the machines. Add to that the difficulty with the fail-back, and it seems like it would be a simpler solution to simply take down the primary server (and all my companies apps) for the (assumedly short) duration of the update.
With my application servers, I have a system set up using corosync and pacemaker that allows for seamless fail-over between the two machines, with the IP address and all services moving smoothly between the two at will. Ideally, I would have a similar setup with my database servers, so the applications never even know that there was a switch. Is this possible with Postgresql at all? Does it make a difference that at least one app has an "always on" connection to the DB Server?
- When using named replication slots, I know the primary server will hold on to as many WAL files as the secondary needs to be able to recover to a consistent state, which runs the risk of filling up the hard drive with WAL files if the secondary is off-line for an extended period of time. As such, I have a system set up to monitor the number of WAL files on both the primary and secondary machines. Watching that, I have on occasion seen the number hit as high as 300 some odd files on the secondary, although it always comes back down fairly quickly (on the day I saw the count up at 378, it was back down to 30 within 10 minutes). At the moment I have 13 WAL files, even though my wall_keep_segments parameter is set to 5. Interestingly (to me at least) I have not seen this same sort of variation on the primary - it generally stays right around 13 files. Is this sort of fluctuation normal, or should I be looking for some sort of an issue?
-What is the best way to monitor that the two servers are actually staying in sync? At the moment, I am looking at the value of the "active" field in the pg_replication_slots table. Is this sufficient to be sure replication is taking place properly, or should I do something like run a query on a frequently updated table and make sure the results are identical?
Thanks for any suggestions and information you can provide!
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Attachment
Israel,
I can't answer all your questions, but we've just set up a HA pair with Hot Standby using Corosync/Pacemaker. However we haven't deployed this 'live' yet.
We originally found a presentation from The PostgreSQL Conference PostgreSQL High Availability with Corosync/Pacemaker, and then bought the book PostgreSQL 9.0 High Performance (a sample chapter is available as a PDF).
All working out well in testing at the moment.
Steve Pritchard
British Trust for Ornithology
On 23 September 2015 at 17:36, Israel Brewster <israel@ravnalaska.net> wrote:
<snip>With my application servers, I have a system set up using corosync and pacemaker that allows for seamless fail-over between the two machines, with the IP address and all services moving smoothly between the two at will. Ideally, I would have a similar setup with my database servers, so the applications never even know that there was a switch. Is this possible with Postgresql at all? Does it make a difference that at least one app has an "always on" connection to the DB Server?<snip>
Thanks. I'd taken some stabs at setting up Corosync/Pacemaker, but could never get the PostgreSQL portion to work properly. One difficulty is that I am using named replication slots, which don't appear to be supported. The other issue is that the system tended to try to start my secondary as primary, which of course doesn't work to bring up the old primary as secondary (at least, that's my understanding. Let me know if that's wrong).
I'll take another stab at it given the steps outlined in the presentation you posted, and given your success - I must just be doing something wrong here.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
On Sep 25, 2015, at 1:46 AM, Steve Pritchard <steve.pritchard@bto.org> wrote:Israel,I can't answer all your questions, but we've just set up a HA pair with Hot Standby using Corosync/Pacemaker. However we haven't deployed this 'live' yet.We originally found a presentation from The PostgreSQL Conference PostgreSQL High Availability with Corosync/Pacemaker, and then bought the book PostgreSQL 9.0 High Performance (a sample chapter is available as a PDF).All working out well in testing at the moment.Steve PritchardBritish Trust for OrnithologyOn 23 September 2015 at 17:36, Israel Brewster <israel@ravnalaska.net> wrote:<snip>With my application servers, I have a system set up using corosync and pacemaker that allows for seamless fail-over between the two machines, with the IP address and all services moving smoothly between the two at will. Ideally, I would have a similar setup with my database servers, so the applications never even know that there was a switch. Is this possible with Postgresql at all? Does it make a difference that at least one app has an "always on" connection to the DB Server?<snip>