Re: 2-phase commit - Mailing list pgsql-hackers
From | Andrew Sullivan |
---|---|
Subject | Re: 2-phase commit |
Date | |
Msg-id | 20030926194018.GB18244@libertyrms.info Whole thread Raw |
In response to | Re: 2-phase commit (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: 2-phase commit
|
List | pgsql-hackers |
On Fri, Sep 26, 2003 at 01:34:28PM -0400, Tom Lane wrote: > > Example: > > Master Slave > ------ ----- > commit ready--> > <--OK > commit done->XX > maybe he didn't. Both sides are forced to keep information about the > open transaction indefinitely. Timing out on either side could yield > the wrong result. If i understand the complaints, I think there are two big issues. The first problem is the restart/rejoin problem. When a 2PC member goes away, it is supposed to come back with all its former locks and everything in place, so that it can know what to do. This is also extremely tricky, but I think the answer is sort of easy. A member which re-joins without crashing (that is, it has open transactions, &c.), it just has to complete its transactions with the other member(s). If other members have processed new transactions since the member left, the member is kicked out. It's not allowed to join without being re-initialised. A member which crashes is just a special case of this. This is not elegant, not nice, &c. But I don't think anyone can really guarantee that a crahsed member will start up correctly (it crashed, after all; maybe there's a bug). So this is the safest approach, and I don't think it's a big deal. It's not cheap, of course, and there may be problems arising from the conditions I describe below. But I think they can be handled (see the section on "compromises", below) intelligently. The second, stickier problem is just as Tom describes. When the master is "Commit done" and that message doesn't make it to the other host(s), you might have to wait forever. Of course, that's not acceptable. But I can think of some options of how to decide to handle this. Note that these may not guarantee no loss of data. That's not a compromise one is usually willing to make; but just because I don't want to accept that compromise doesn't mean it is unacceptable to everyone. Some possible compromises ========================= 1, One machine always wins. One could decide to pick one machine that, in case of some sort of failure, always wins. You need some sort of heartbeat system which checks for the other member(s) of the cluster. In the event of failure, whatever is on the "winner" machine is deemed to be correct, and everyone else has to lose. If the point of your 2PC is to provide synchronous access to high loads of read-only clients, this would probably be a good solution, since only one machine would ever see data changes. 2. Quorum rule. One could decide on a quorum of machines, and the group which has quorum wins. (Naturally, this has to be an absolute majority.) The quorum can continue to process queries, and the folks who left the room have to re-sync to join. 3. Fail to read-only status and let the DBA sort it out. 4. Mark the contentious rows as "bad" and let the DBA sort it out. This option is not dissimilar to what Access/SQL server disconnected multi-master replication does. It's not elegant, but it might be a good answer for the cases where 2PC gets used. Note that none of these can guarantee that some apparently committed data will not later be lost. To real database hounds, that will sound like apostasy, but I suspect it is the sort of trade-off that real products make all the time. You have to have a way of collecting the "yeah, we told you it was committed, but we lied" data and being able to track it; and that has to be enough. The real security-of-data work is going to have to be done by ultra-reliable hardware, good maintenance practices, &c. Then when losses are down in the .001% range from this sort of mistake, no one will care. This is not, by the way, the fully-formed set of suggestions I said I'd deliver when I started the thread; but since it came up again today, I thought I'd respond with what I had so far. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
pgsql-hackers by date: