On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:
I'm not entirely clear on what you're saying here. I admit I've not kept in close touch with the distributed processing discussions lately -- is there a write-up and/or diagram to give an overview of where we're at with this effort?
If you are saying that DTM tries to roll back a transaction after any participating server has entered the RecordTransactionCommit() critical section, then IMO it is broken. Full stop. That can't work with any reasonable semantics as far as I can see.
DTM is not trying to rollback committed transaction.
What he tries to do is to hide this commit.
As I already wrote, the idea was to implement "lightweight" 2PC because prepared transactions mechanism in PostgreSQL adds too much overhead and cause soe problems with recovery.
The transaction is normally committed in xlog, so that it can always be recovered in case of node fault.
But before setting correspondent bit(s) in CLOG and releasing locks we first contact arbiter to get global status of transaction.
If it is successfully locally committed by all nodes, then arbiter approves commit and commit of transaction normally completed.
Otherwise arbiter rejects commit. In this case DTM marks transaction as aborted in CLOG and returns error to the client.
XLOG is not changed and in case of failure PostgreSQL will try to replay this transaction.
But during recovery it also tries to restore transaction status in CLOG.
And at this placeDTM contacts arbiter to know status of transaction.
I think the general idea is that if Commit is WAL logged, then the
operation is considered to committed on local node and commit should
happen on any node, only once prepare from all nodes is successful.
And after that transaction is not supposed to abort. But I think you are
trying to optimize the DTM in some way to not follow that kind of protocol.
By the way, how will arbiter does the recovery in a scenario where it
crashes, won't it need to contact all nodes for the status of in-progress or
prepared transactions?
I think it would be better if more detailed design of DTM with respect to
transaction management and recovery could be updated on wiki for having
discussion on this topic. I have seen that you have already updated many
details of the system, but still the complete picture of DTM is not clear.