Synchronous Standalone Master Redoux - Mailing list pgsql-hackers
From | Shaun Thomas |
---|---|
Subject | Synchronous Standalone Master Redoux |
Date | |
Msg-id | 4FFB3F49.4050108@optionshouse.com Whole thread Raw |
Responses |
Re: Synchronous Standalone Master Redoux
Re: Synchronous Standalone Master Redoux |
List | pgsql-hackers |
Hey everyone, Upon doing some usability tests with PostgreSQL 9.1 recently, I ran across this discussion: http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php And after reading the entire thing, I found it odd that the overriding pushback was because nobody could think of a use case. The argument was: if you don't care if the slave dies, why not just use asynchronous replication? I'd like to introduce all of you to DRBD. DRBD is, for those who aren't familiar, distributed (network) block-level replication. Right now, this is what we're using, and will use in the future, to ensure a stable synchronous PostgreSQL copy on our backup node. I was excited to read about synchronous replication, because with it, came the possibility we could have two readable nodes with the servers we already have. You can't do that with DRBD; secondary nodes can't even mount the device. So here's your use case: 1. Slave wants to be synchronous with master. Master wants replication on at least one slave. They have this, and are happy. 2. For whatever reason, slave crashes or becomes unavailable. 3. Master notices no more slaves are available, and operates in standalone mode, accumulating WAL files until a suitable slave appears. 4. Slave finishes rebooting/rebuilding/upgrading/whatever, and re-subscribes to the feed. 5. Slave stays in degraded sync (asynchronous) mode until it is caught up, and then switches to synchronous. This makes both master and slave happy, because *intent* of synchronous replication is fulfilled. PostgreSQL's implementation means the master will block until someone/something notices and tells it to stop waiting, or the slave comes back. For pretty much any high-availability environment, this is not viable. Based on that alone, I can't imagine a scenario where synchronous replication would be considered beneficial. The current setup doubles unplanned system outage scenarios in such a way I'd never use it in a production environment. Right now, we only care if the master server dies. With sync rep, we'd have to watch both servers like a hawk and be ready to tell the master to disable sync rep, lest our 10k TPS system come to an absolute halt because the slave died. With DRBD, when a slave node goes offline, the master operates in standalone until the secondary re-appears, after which it re-synchronizes missing data, and then operates in sync mode afterwards. Just because the data is temporarily out of sync does *not* mean we want asynchronous replication. I think you'd be hard pressed to find many users taking advantage of DRBD's async mode. Just because data is temporarily catching up, doesn't mean it will remain in that state. I would *love* to have the functionality discussed in the patch. If I can make a case for it, I might even be able to convince my company to sponsor its addition, provided someone has time to integrate it. Right now, we're using DRBD so we can have a very short outage window while the offline node gets promoted, and it works, but that means a basically idle server at all times. I'd gladly accept a 10-20% performance hit for sync rep if it meant that other server could reliably act as a read slave. That's currently impossible because async replication is too slow, and sync is too fragile for reasons stated above. Am I totally off-base, here? I was shocked when I actually read the documentation on how sync rep worked, and saw that no servers would function properly until at least two were online. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
pgsql-hackers by date: