Re: Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Transactions involving multiple postgres foreign servers
Date
Msg-id 56406F10.9030508@postgrespro.ru
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: Transactions involving multiple postgres foreign servers  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers


On 09.11.2015 09:59, Ashutosh Bapat wrote:


Since the foreign server (referred to in the slides as secondary server) requires to call "create extension pg_dtm" and select dtm_join_transaction(xid);, I assume that the foreign server has to be a PostgreSQL server and one which has this extension installed and has a version that can support this extension. So, we can not use the extension for all FDWs and even for postgres_fdw it can be used only for a foreign server with above capabilities. The slides mention just FDW but I think they mean postgres_fdw and not all FDWs.

DTM approach is based on sharing XIDs and snapshots between different cluster nodes, so it really can be easily implemented only for PostgreSQL. So I really have in mind postgres_fdw rather than abstract FDW.
Approach with timestamps is more universal and in principle can be used for any DBMS where visibility is based on CSNs.

 

I think that this API is intended to provide not only consistent
cross-node decisions about whether a particular transaction has
committed, but also consistent visibility.  If the API is sufficient
for that and if it can be made sufficiently performant, that's a
strictly stronger guarantee than what this proposal would provide.

On the other hand, I see a couple of problems:

1. The extensible transaction manager API is meant to be pluggable.
Depending on which XTM module you choose to load, the SQL that needs
to be executed by postgres_fdw on the remote node will vary.
postgres_fdw shouldn't have knowledge of all the possible XTMs out
there, so it would need some way to know what SQL to send.
2. If the remote server isn't running the same XTM as the local
server, or if it is running the same XTM but is not part of the same
group of cooperating nodes as the local server, then we can't send a
command to join the distributed transaction at all.  In that case, the
2PC for FDW approach is still, maybe, useful.

Elaborating more on this: Slide 11 shows arbiter protocol to start a transaction and next slide shows the same for commit. Slide 15 shows the transaction flow diagram for tsDTM. In DTM approach it doesn't specify how xids are communicated between nodes, but it's implicit in the protocol that xid space is shared by the nodes. Similarly for tsDTM it assumes that CSN space is shared by all the nodes (see synchronization for max(CSN)). This can not be assumed for FDWs (not even postgres_fdw) where foreign servers are independent entities with independent xid space.

Proposed architecture of DTM includes "coordinator". Coordinator is a process  responsible for managing logic of distributed transaction. It can be just a normal client application, or it can be intermediate master node (like in case of pg_shard).
It can be also PostgreSQL instance (as in case of postgres_fdw) or not.  We try to put as less restriction on "coordinator" as possible.
It should just communicate with PostgreSQL backends using any communication protocol it likes (i.e. libpq) and invokes some special stored procedures which are part of particular DTM extension. Such functions also impose some protocol of exchanging data between different nodes involved in distributed transaction. In such way we are propagating XIDs/CSNs between different nodes which may even do not know about each other.
In DTM approach nodes only know about location of "arbiter". In tsDTM approach there is even not arbiter...


 

On the whole, I'm inclined to think that the XTM-based approach is
probably more useful and more general, if we can work out the problems
with it.  I'm not sure that I'm right, though, nor am I sure how hard
it will be.


2PC for FDW and XTM are trying to solve different problems with some commonality. 2PC for FDW is trying to solve problem of atomic commit (I am borrowing from the terminology you used in PGCon 2015) for FDWs in general (although limited to FDWs which can support 2 phase commit) and XTM tries to solve problems of atomic visibility, atomic commit and consistency for postgres_fdw where foreign servers support XTM. The only thing common between these two is atomic visibility.

If we accept XTM and discard 2PC for FDW, we will not be able to support atomic commit for FDWs in general. That, I think would be serious limitation for Postgres FDW, esp. now that DMLs are allowed. If we accept only 2PC for FDW and discard XTM, we won't be able to get atomic visibility and consistency for postgres_fdw with foreign servers supporting XTM. That would be again serious limitation for solutions implementing sharding, multi-master clusters etc.

There are approaches like [1] by which cluster of heterogenous servers (with some level of snapshot isolation) can be constructed. Ideally that will enable PostgreSQL users to maximize their utilization of FDWs.

Any distributed transaction management requires 2PC in some or other form. So, we should implement 2PC for FDW keeping in mind various forms of 2PC used practically. Use that infrastructure to build XTM like capabilities for restricted postgres_fdw uses. Previously, I have requested the authors of XTM to look at my patch and provide me feedback about their requirements for implementing 2PC part of XTM. But I have not heard anything from them.


Sorry, may be I missed some message. but I have not received request from you to provide feedback concerning your patch.


--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Victor Wagner
Date:
Subject: Re: Patch: Implement failover on libpq connect level.
Next
From: kawamichi@tkl.iis.u-tokyo.ac.jp
Date:
Subject: Erroneous cost estimation for nested loop join