Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication. - Mailing list pgsql-hackers

From MARK CALLAGHAN
Subject Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Date
Msg-id AANLkTi=v5n4ODwfUU+Df_BKpk49r_U=FMHtOnYUNPFa5@mail.gmail.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.  (Markus Wanner <markus@bluegap.ch>)
List pgsql-hackers
On Fri, Mar 18, 2011 at 2:37 PM, Markus Wanner <markus@bluegap.ch> wrote:
> Hi,
>
> On 03/18/2011 02:40 PM, Kevin Grittner wrote:
>> Then the only thing you would consider sync replication, as far as I
>> can see, is two phase commit
>
> I think waiting for the ACK before actually making the changes from the
> transaction visible (COMMIT) would suffice for disallowing such an
> inconsistency to manifest.  But obviously, MySQL decided it's not worth
> doing that, as it's such a rare event and a short period of time that
> may show inconsistencies...

There are fewer options for implementing this in MySQL because
replication requires a binlog on the master and that requires the
internal use of XA to keep the binlog and InnoDB in sync as they are
separate resource managers. In theory, this can be changed so that
commit is only forced for the binlog and then on a crash missing
transactions could be copied from the binlog to InnoDB but I don't
think this will ever change.

By "fewer options" I mean that commit in MySQL with InnoDB and the
binlog requires:
1) prepare to InnoDB (force transaction log to disk for changes from
this transaction)
2) write binlog events from this transaction to the binlog
3) write XID event to the binlog (at this point transaction commit is
official, will survive a crash)
4) force binlog to disk
5) release row locks held by transaction in innodb
6) write commit record to innodb transaction log
7) force write of commit record to disk

Group commit is done for the fsyncs from steps 1 and 7. It is not done
for the fsync done in step 4.

Regardless, the processing above is complicated even without
semi-sync. AFAIK, semi-sync code occurs after step 7 but I have not
looked at the official version of semi-sync code in MySQL and my
memory of the work we did at Google is vague.

It is great if Postgres doesn't have this issue. It wasn't clear to me
from lurking on this list. I hope your docs highlight the behavior as
not having the issue is a big deal.

--
Mark Callaghan
mdcallag@gmail.com


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19
Next
From: Simon Riggs
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.