RE: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id TYAPR01MB2990FAD79C5414740C507A58FE7B0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> I also believe they do. But I'm concerned that some FDW needs to start
> a transaction differently when using 2PC. For instance, IIUC MySQL
> also supports 2PC but the transaction needs to be started with "XA
> START id” when the transaction needs to be prepared. The transaction
> started with XA START can be closed by XA END followed by XA PREPARE
> or XA COMMIT ONE PHASE. It means that when starts a new transaction
> the transaction needs to prepare the transaction identifier and to
> know that 2PC might be used. It’s quite different from PostgreSQL. In
> PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
> TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
> required when PREPARE TRANSACTION.

I guess Postgres is rather a minority in this regard.  All I know is XA and its Java counterpart (Java Transaction API:
JTA). In XA, the connection needs to be associated with an XID before its transaction work is performed.
 
If some transaction work is already done before associating with XID, xa_start() returns an error like this:

[XA specification]
--------------------------------------------------
[XAER_OUTSIDE] 
The resource manager is doing work outside any global transaction on behalf of 
the application. 
--------------------------------------------------


[Java Transaction API (JTA)]
--------------------------------------------------
void start(Xid xid, int flags) throws XAException 

This method starts work on behalf of a transaction branch. 
...

3.4.7 Local and Global Transactions 
The resource adapter is encouraged to support the usage of both local and global 
transactions within the same transactional connection. Local transactions are 
transactions that are started and coordinated by the resource manager internally. The 
XAResource interface is not used for local transactions. 

When using the same connection to perform both local and global transactions, the 
following rules apply: 

. The local transaction must be committed (or rolled back) before starting a 
global transaction in the connection. 
. The global transaction must be disassociated from the connection before any 
local transaction is started. 
--------------------------------------------------


(FWIW, jdbc_fdw would expect to use JTA for this FDW 2PC?)



> I haven’t tested the above yet and it’s just a desk plan. it's
> definitely a good idea to try integrating this 2PC feature to FDWs
> other than postgres_fdw to see if design and interfaces are
> implemented sophisticatedly.

Yes, if we address this 2PC feature as an FDW enhancement, we need to make sure that at least some well-known DBMSs
shouldbe able to implement the new interface.  The following part may help devise the interface:
 


[References from XA specification]
--------------------------------------------------
The primary use of xa_start() is to register a new transaction branch with the RM. 
This marks the start of the branch. Subsequently, the AP, using the same thread of 
control, uses the RM’s native interface to do useful work. All requests for service 
made by the same thread are part of the same branch until the thread dissociates 
from the branch (see below). 

3.3.1 Registration of Resource Managers 
Normally, a TM involves all associated RMs in a transaction branch. (The TM’s set of 
RM switches, described in Section 4.3 on page 21 tells the TM which RMs are 
associated with it.) The TM calls all these RMs with xa_start(), xa_end(), and 
xa_prepare (), although an RM that is not active in a branch need not participate further 
(see Section 2.3.2 on page 8). A technique to reduce overhead for infrequently-used 
RMs is discussed below. 

Dynamic Registration 

Certain RMs, especially those involved in relatively few global transactions, may ask 
the TM to assume they are not involved in a transaction. These RMs must register with 
the TM before they do application work, to see whether the work is part of a global 
transaction. The TM never calls these RMs with any form of xa_start(). An RM 
declares dynamic registration in its switch (see Section 4.3 on page 21). An RM can 
make this declaration only on its own behalf, and doing so does not change the TM’s 
behaviour with respect to other RMs. 

When an AP requests work from such an RM, before doing any work, the RM contacts 
the TM by calling ax_reg(). The RM must call ax_reg() from the same thread of control 
that the AP would use if it called ax_reg() directly. The TM returns to the RM the 
appropriate XID if the AP is in a global transaction. 

The implications of dynamically registering are as follows: when a thread of control 
begins working on behalf of a transaction branch, the transaction manager calls 
xa_start() for all resource managers known to the thread except those having 
TMREGISTER set in their xa_switch_t structure. Thus, those resource managers with 
this flag set must explicitly join a branch with ax_reg(). Secondly, when a thread of 
control is working on behalf of a branch, a transaction manager calls xa_end() for all 
resource managers known to the thread that either do not have TMREGISTER set in 
their xa_switch_t structure or have dynamically registered with ax_reg(). 


int 
xa_start(XID *xid, int rmid, long flags) 

DESCRIPTION 
A transaction manager calls xa_start() to inform a resource manager that an application 
may do work on behalf of a transaction branch.
...
A transaction manager calls xa_start() only for those resource managers that do not 
have TMREGISTER set in the flags element of their xa_switch_t structure. Resource 
managers with TMREGISTER set must use ax_reg() to join a transaction branch (see 
ax_reg() for details). 
--------------------------------------------------


> > To track the foreign transaction status, we can add GetTransactionStatus() to
> the FDW interface as an alternative, can't we?
> 
> I haven't thought such an interface but it sounds like the transaction
> status is managed on both the core and FDWs. Could you elaborate on
> that?

I don't have such deep analysis.  I just thought that the core could keep track of the local transaction status, and
askeach participant FDW about its transaction status to determine an action.
 


> > If the WAL records of multiple concurrent transactions are written and
> synced separately, i.e. group commit doesn't take effect, then the OLTP
> transaction performance will be unacceptable.
> 
> I agree that it'll be a large performance penalty. I'd like to have it
> but I’m not sure we should have it in the first version from the
> perspective of complexity.

I think at least we should have a rough image of how we can reach the goal.  Otherwise, the current
design/implementationmay have to be overhauled with great efforts in the near future.  Apart from that, I feel it's
unnaturalthat the commit processing is serialized at the transaction resolver while the DML processing of multiple
foreigntransactions can be performed in parallel.
 


> Since the procedure of 2PC is originally
> high cost, in my opinion, the user should not use as much as possible
> in terms of performance. Especially in OLTP, its cost will directly
> affect the latency. I’d suggest designing database schema so
> transaction touches only one foreign server but do you have concrete
> OLTP usecase where normally requires 2PC, and how many servers
> involved within a distributed transaction?

I can't share the details, but some of our customers show interest in Postgres scale-out or FDW 2PC for the following
usecases:
 

* Multitenant OLTP where the data specific to one tenant is stored on one database server.  On the other hand, some
dataare shared among all tenants, and they are stored on a separate server.  The shared data and the tenant-specific
datais updated in the same transaction (I don't know the frequency of such transactions.)
 

* An IoT use case where each edge database server monitors and tracks the movement of objects in one area.  Those edge
databaseservers store the records of objects they manage.  When an object gets out of one area and moves to another,
therecord for the object is moved between the two edge database servers using an atomic distributed transaction.
 

(I wonder if TPC-C or TPC-E needs distributed transaction...)


Regards
Takayuki Tsunakawa





pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: Masahiko Sawada
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2