Re: Synchronous Standalone Master Redoux - Mailing list pgsql-hackers
From | Aidan Van Dyk |
---|---|
Subject | Re: Synchronous Standalone Master Redoux |
Date | |
Msg-id | CAC_2qU9rDFkUMO6ChANQsnsKQN9N0v5mhUru0r6BqowiNPaO=A@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronous Standalone Master Redoux (Shaun Thomas <sthomas@optionshouse.com>) |
Responses |
Re: Synchronous Standalone Master Redoux
|
List | pgsql-hackers |
On Thu, Jul 12, 2012 at 9:21 AM, Shaun Thomas <sthomas@optionshouse.com> wrote: > So far as transaction durability is concerned... we have a continuous > background rsync over dark fiber for archived transaction logs, DRBD for > block-level sync, filesystem snapshots for our backups, a redundant async DR > cluster, an offsite backup location, and a tape archival service stretching > back for seven years. And none of that will cause the master to stop > processing transactions unless the master itself dies and triggers a > failover. Right, so if the dark fiber between New Orleans and Seattle (pick two places for your datacenter) happens to be the first thing failing in your NO data center. Disconenct the sync-ness, and continue. Not a problem, unless it happens to be Aug 29, 2005. You have lost data. Maybe only a bit. Maybe it wasn't even important. But that's not for PostgreSQL to decide. But because your PG on DRDB "continued" when it couldn't replicate to Seattle, it told it's clients the data was durable, just minutes before the whole DC was under water. OK, so a wise admin team would have removed the NO DC from it's primary role days before that hit. Change the NO to NYC and the date Sept 11, 2001. OK, so maybe we can concede that these types of major catasrophies are more devestating to us than loosing some data. Now your primary server was in AWS US East last week. It's sync slave was in the affected AZ, but your PG primary continues on, until, since it was a EC2 instance, it disappears. Now where is your data? Or the fire marshall orders the data center (or whole building) EPO, and the connection to your backup goes down minutes before your servers or other network peers. > Using PG sync in its current incarnation would introduce an extra failure > scenario that wasn't there before. I'm pretty sure we're not the only ones > avoiding it for exactly that reason. Our queue discards messages it can't > fulfil within ten seconds and then throws an error for each one. We need to > decouple the secondary as quickly as possible if it becomes unresponsive, > and there's really no way to do that without something in the database, one > way or another. It introduces an "extra failure", because it has introduce an "extra data durability guarantee". Sure, many people don't *really* want that data durability guarantee, even though they would like the "maybe guaranteed" version of it. But that fine line is actually a difficult (impossible?) one to define if you don't know, at the moment of decision, what the next few moments will/could become. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
pgsql-hackers by date: