Re: 2-phase commit - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: 2-phase commit
Date
Msg-id 200309261720.h8QHKhq10420@candle.pha.pa.us
Whole thread Raw
In response to Re: 2-phase commit  ("Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>)
Responses Re: 2-phase commit  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Zeugswetter Andreas SB SD wrote:
> 
> > > From our previous discussion of 2-phase commit, there was concern that
> > > the failure modes of 2-phase commit were not solvable.  However, I think
> > > multi-master replication is going to have similar non-solvable failure
> > > modes, yet people still want multi-master replication.
> > 
> > No.  The real problem with 2PC in my mind is that its failure modes
> > occur *after* you have promised commit to one or more parties.  In
> > multi-master, if you fail you know it before you have told the client
> > his data is committed.
> 
> Hmm ? The appl cannot take the first phase commit as its commit info. It 
> needs to wait for the second phase commit. The second phase is only finished
> when all coservers have reported back. 2PC is synchronous.
> 
> The problems with 2PC are when after second phase commit was sent to all
> servers and before all report back one of them becomes unreachable/down ...
> (did it receive and do the 2nd commit or not) Such a transaction must stay
> open until the coserver is reachable again or an administrator committed/aborted it. 
> 
> It is multi master replication that usually has an asynchronous mode for
> performance, and there the trouble starts.

Let me diagram this so we can see the issues.  Normal operation is:
Master        Slave------        -----commit ready-->        <--OKcommit done--->        <--OKcompleted

One possible failure is:
Master        Slave------        -----commit ready-->        <--OKcommit done--->        dies herestuck waiting

Another possible failure is:
Master        Slave------        -----commit ready-->        <--OKdies here        stuck waiting

Are these the issues?  Can't we just add GUC timeouts to cause the
commit to fail, and the slave to stop waiting?  I suppose a problem is:
Master        Slave------        -----commit ready-->        <--OKsleep        stuck waiting, times outcommit done

Could we allow slaves to check if the backend is still alive, perhaps by
asking the postmaster, similar to what we do with the cancel signal ---
that way, the slave would never time out and always wait if the master
was alive.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: invalid tid errors in latest 7.3.4 stable.
Next
From: Tom Lane
Date:
Subject: Re: 2-phase commit