Re: Standalone synchronous master - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: Standalone synchronous master |
Date | |
Msg-id | CAMkU=1yqF6KY9B2xCO9uLtG8dZAxn2y=oud7UsyCvQ+ia1_bQQ@mail.gmail.com Whole thread Raw |
In response to | Re: Standalone synchronous master (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Standalone synchronous master
|
List | pgsql-hackers |
On Wed, Jan 8, 2014 at 3:00 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 01/08/2014 01:49 PM, Tom Lane wrote:Oh, right. Because the main reason for a sync replica degrading is that
> Josh Berkus <josh@agliodbs.com> writes:
>> If we really want auto-degrading sync rep, then we'd (at a minimum) need
>> a way to determine *from the replica* whether or not it was in degraded
>> mode when the master died. What good do messages to the master log do
>> you if the master no longer exists?
>
> How would it be possible for a replica to know whether the master had
> committed more transactions while communication was lost, if the master
> dies without ever restoring communication? It sounds like pie in the
> sky from here ...
it's down. In which case it isn't going to record anything. This would
still be useful for sync rep candidates, though, and I'll document why
below. But first, lemme demolish the case for auto-degrade.
So here's the case that we can't possibly solve for auto-degrade.
Anyone who wants auto-degrade needs to come up with a solution for this
case as a first requirement:
It seems like the only deterministically useful thing to do is to send a NOTICE to the *client* that the commit has succeeded, but in degraded mode, so keep your receipts and have your lawyer's number handy. Whether anyone is willing to add code to the client to process that message is doubtful, as well as whether the client will even ever receive it if we are in the middle of a major disruption.
But I think there is a good probabilistic justification for an auto-degrade mode. (And really, what else is there? There are never any real guarantees of anything. Maybe none of your replicas ever come back up. Maybe none of your customers do, either.)
1. A data center network/power event starts.
2. The sync replica goes down.
3. A short time later, the master goes down.
4. Data center power is restored.
5. The master is fried and is a permanent loss. The replica is ok, though.
Question: how does the DBA know whether data has been lost or not?
What if he had a way of knowing that some data *has* been lost? What can he do about it? What is the value in knowing it was lost after the fact, but without the ability to do anything about it?
But let's say that instead of a permanent loss, the master can be brought back up in a few days after replacing a few components, or in a few weeks after sending the drives out to clean-room data recovery specialists. Writing has already failed over to the replica, because you couldn't wait that long to bring things back up.
Once you get your old master back, you can see if transaction have been lost, and if they have been you can dump the tables out to a human readable format, use PITR and restore a copy of the replica to the point just before the failover (although I'm not really sure exactly how to identify that point) and dump that out, then use 'diff' tools to figure out what changes to the database were lost, consult with the application specialists to figure out what the application was doing that lead to those changes (if that is not obvious) and business operations people to figure out how to apply the analogous changes to the top of the database, and customer service VP or someone to figure how to retroactively fix transactions that were done after the failover which would have been differently had the lost transactions not been lost. Or instead of all that, you could look at the recovered data and learn that in fact nothing had been lost, so nothing further needs to be done.
If you were running in asyn replication mode on a busy server, there is a virtual certainty that some transactions have been lost. If you were running in sync mode with possibility of auto-degrade, it is far from certain. That depends on how long the power event lasted, compared to how long you had the timeout set to.
Or rather than a data-center-wide power spike, what if your master just "done fell over" with no drama to the rest of the neighborhood? Inspection after the fail-over to the replica shows the RAID controller card failed. There is no reason to think that a RAID controller, in the process of failing, would have caused the replication to kick into degraded mode. You know from the surviving logs that the master spent 60 seconds total in degraded mode over the last 3 months, so there is a 99.999% chance no confirmed transactions were lost. To be conservative, let's drop it to 99.99% because maybe some unknown mechanism did allow a failing RAID controller to blip the network card without leaving any evidence behind. That's a lot better than the chances of lost transactions while in async replication mode, which could be 99.9% in the other direction.
Cheers,
Jeff
pgsql-hackers by date: