Re: Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Transactions involving multiple postgres foreign servers
Date
Msg-id CAB7nPqQrRpTR1RzCeee9LS3vAShcHMX4CGsGyrvF=Ldb4jpZ0w@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Transactions involving multiple postgres foreign servers
List pgsql-hackers
On Sat, Jan 10, 2015 at 9:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 1/8/15, 12:00 PM, Kevin Grittner wrote:
>> The key point is that the distributed transaction data must be
>> flagged as needing to commit rather than roll back between the
>> prepare phase and the final commit.  If you try to avoid the
>> PREPARE, flagging, COMMIT PREPARED sequence by building the
>> flagging of the distributed transaction metadata into the COMMIT
>> process, you still have the problem of what to do on crash
>> recovery.  You really need to use 2PC to keep that clean, I think.
Yes, 2PC is needed as long as more than 2 nodes perform write
operations within a transaction.

> If we had an independent transaction coordinator then I agree with you
> Kevin. I think Robert is proposing that if we are controlling one of the
> nodes that's participating as well as coordinating the overall transaction
> that we can take some shortcuts. AIUI a PREPARE means you are completely
> ready to commit. In essence you're just waiting to write and fsync the
> commit message. That is in fact the state that a coordinating PG node would
> be in by the time everyone else has done their prepare. So from that
> standpoint we're OK.
>
> Now, as soon as ANY of the nodes commit, our coordinating node MUST be able
> to commit as well! That would require it to have a real prepared transaction
> of it's own created. However, as long as there is zero chance of any other
> prepared transactions committing before our local transaction, that step
> isn't actually needed. Our local transaction will either commit or abort,
> and that will determine what needs to happen on all other nodes.

It is a property of 2PC to ensure that a prepared transaction will
commit. Now, once it is confirmed on the coordinator that all the
remote nodes have successfully PREPAREd, the coordinator issues COMMIT
PREPARED to each node. What do you do if some nodes report ABORT
PREPARED while other nodes report COMMIT PREPARED? Do you abort the
transaction on coordinator, commit it or FATAL? This lets the cluster
in an inconsistent state, meaning that some consistent cluster-wide
recovery point is needed as well (Postgres-XC and XL have introduced
the concept of barriers for such problems, stuff created first by
Pavan Deolassee).
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Fixing memory leak in pg_upgrade
Next
From: Andrew Dunstan
Date:
Subject: Re: POLA violation with \c service=