Re: eXtensible Transaction Manager API - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: eXtensible Transaction Manager API
Date
Msg-id 563E2C8C.5000204@postgrespro.ru
Whole thread Raw
In response to Re: eXtensible Transaction Manager API  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: eXtensible Transaction Manager API  (Michael Paquier <michael.paquier@gmail.com>)
Re: eXtensible Transaction Manager API  (Simon Riggs <simon@2ndQuadrant.com>)
Re: eXtensible Transaction Manager API  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,
Thank you for your feedback.
My comments are inside.

On 11/07/2015 05:11 PM, Amit Kapila wrote:

Today, while studying your proposal and related material, I noticed
that in both the approaches DTM and tsDTM, you are talking about
committing a transaction and acquiring the snapshot consistently, but
not touched upon the how the locks will be managed across nodes and
how deadlock detection across nodes will work.  This will also be one
of the crucial points in selecting one of the approaches.

Lock manager is one of the tasks we are currently working on.
There are still a lot of open questions:
1. Should distributed lock manager (DLM) do something else except detection of distributed deadlock?
2. Should DLM be part of XTM API or it should be separate API?
3. Should DLM be implemented by separate process or should it be part of arbiter (dtmd).
4. How to globally identify resource owners (0transactions) in global lock graph. In case of DTM we have global (shared) XIDs,
and in tsDTM - global transactions IDs, assigned by application (which is not so clear how to retrieve).
In other cases we may need to have local->global transaction id mapping, so looks like DLM should be part of DTM...



Also I have
noticed that discussion about Rollback is not there, example how will
Rollback happen with API's provided in your second approach (tsDTM)?
 
In tsDTM approach two phase commit is performed by coordinator and currently is using standard PostgreSQL two phase commit:

Code in GO performing two phase commit:

          exec(conn1, "prepare transaction '" + gtid + "'")
          exec(conn2, "prepare transaction '" + gtid + "'")
          exec(conn1, "select dtm_begin_prepare($1)", gtid)
          exec(conn2, "select dtm_begin_prepare($1)", gtid)
          csn = _execQuery(conn1, "select dtm_prepare($1, 0)", gtid)
          csn = _execQuery(conn2, "select dtm_prepare($1, $2)", gtid, csn)
          exec(conn1, "select dtm_end_prepare($1, $2)", gtid, csn)
          exec(conn2, "select dtm_end_prepare($1, $2)", gtid, csn)
          exec(conn1, "commit prepared '" + gtid + "'")
          exec(conn2, "commit prepared '" + gtid + "'")

If commit at some of the nodes failed, coordinator should rollback prepared transaction at all nodes.

Similarly, having some discussion on parts of recovery that could be affected
would be great.

We are currently implementing fault tolerance and recovery for DTM approach (with centralized arbiter).
There are several replicas of arbiter, synchronized using RAFT protocol.
But with tsDTM approach recovery model is still obscure...
We are thinking about it.

I think in this patch, it is important to see the completeness of all the
API's that needs to be exposed for the implementation of distributed
transactions and the same is difficult to visualize without having complete
picture of all the components that has some interaction with the distributed
transaction system.  On the other hand we can do it in incremental fashion
as and when more parts of the design are clear.

That is exactly what we are going to do - we are trying to integrate DTM with existed systems (pg_shard, postgres_fdw, BDR) and find out what is missed and should be added. In parallel we are trying to compare efficiency and scalability of different solutions.
For example we still considering scalability problems with tsDTM approach:  to provide acceptable performance,  it requires very precise clock synchronization (we have to use PTP instead of NTP). So it may be waste of time trying to provide fault tolerance for tsDTM if we finally found out that this approach can not provide better scalability than simpler DTM approach.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Summary of Vienna sharding summit, new TODO item
Next
From: Corey Huinker
Date:
Subject: Re: Getting sorted data from foreign server for merge join