From: Fujii Masao <masao.fujii@oss.nttdata.com>
> Originally start(), commit() and rollback() are supported as FDW interfaces.
> As far as I and Sawada-san discussed this upthread, to support MySQL,
> another type of start() would be necessary to issue "XA START id" command.
> end() might be also necessary to issue "XA END id", but that command can be
> issued via prepare() together with "XA PREPARE id".
Yeah, I think we can call xa_end and xa_prepare in the FDW's prepare function.
The issue is when to call xa_start, which requires XID as an argument. We don't want to call it in transactions that
accessonly one node...?
> With his patch, prepare() is supported. What other interfaces need to be
> supported per XA/JTA?
>
> I'm not familiar with XA/JTA and XA transaction interfaces on other major
> DBMS. So I'd like to know what other interfaces are necessary additionally?
I think xa_start, xa_end, xa_prepare, xa_commit, xa_rollback, and xa_recover are sufficient. The XA specification is
here:
https://pubs.opengroup.org/onlinepubs/009680699/toc.pdf
You can see the function reference in Chapter 5, and the concept in Chapter 3. Chapter 6 was probably showing the
statetransition (function call sequence.)
> IMO Sawada-san's version of 2PC is less performant, but it's because his
> patch provides more functionality. For example, with his patch, WAL is written
> to automatically complete the unresolve foreign transactions in the case of
> failure. OTOH, Alexey patch introduces no new WAL for 2PC.
> Of course, generating more WAL would cause more overhead.
> But if we need automatic resolution feature, it's inevitable to introduce new
> WAL whichever the patch we choose.
Please do not get me wrong. I know Sawada-san is trying to ensure durability. I just wanted to know what each patch
doesin how much cost in terms of disk and network I/Os, and if one patch can take something from another for less cost.
I'm simply guessing (without having read the code yet) that each transaction basically does:
- two round trips (prepare, commit) to each remote node
- two WAL writes (prepare, commit) on the local node and each remote node
- one write for two-phase state file on each remote node
- one write to record participants on the local node
It felt hard to think about the algorithm efficiency from the source code. As you may have seen, the DBMS textbook
and/orpapers describe disk and network I/Os to evaluate algorithms. I thought such information would be useful before
goingdeeper into the source code. Maybe such things can be written in the following Sawada-san's wiki or README in the
end.
Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions
Regards
Takayuki Tsunakawa