RE: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | tsunakawa.takay@fujitsu.com |
---|---|
Subject | RE: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | TYAPR01MB2990FAD79C5414740C507A58FE7B0@TYAPR01MB2990.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
List | pgsql-hackers |
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > I also believe they do. But I'm concerned that some FDW needs to start > a transaction differently when using 2PC. For instance, IIUC MySQL > also supports 2PC but the transaction needs to be started with "XA > START id” when the transaction needs to be prepared. The transaction > started with XA START can be closed by XA END followed by XA PREPARE > or XA COMMIT ONE PHASE. It means that when starts a new transaction > the transaction needs to prepare the transaction identifier and to > know that 2PC might be used. It’s quite different from PostgreSQL. In > PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE > TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is > required when PREPARE TRANSACTION. I guess Postgres is rather a minority in this regard. All I know is XA and its Java counterpart (Java Transaction API: JTA). In XA, the connection needs to be associated with an XID before its transaction work is performed. If some transaction work is already done before associating with XID, xa_start() returns an error like this: [XA specification] -------------------------------------------------- [XAER_OUTSIDE] The resource manager is doing work outside any global transaction on behalf of the application. -------------------------------------------------- [Java Transaction API (JTA)] -------------------------------------------------- void start(Xid xid, int flags) throws XAException This method starts work on behalf of a transaction branch. ... 3.4.7 Local and Global Transactions The resource adapter is encouraged to support the usage of both local and global transactions within the same transactional connection. Local transactions are transactions that are started and coordinated by the resource manager internally. The XAResource interface is not used for local transactions. When using the same connection to perform both local and global transactions, the following rules apply: . The local transaction must be committed (or rolled back) before starting a global transaction in the connection. . The global transaction must be disassociated from the connection before any local transaction is started. -------------------------------------------------- (FWIW, jdbc_fdw would expect to use JTA for this FDW 2PC?) > I haven’t tested the above yet and it’s just a desk plan. it's > definitely a good idea to try integrating this 2PC feature to FDWs > other than postgres_fdw to see if design and interfaces are > implemented sophisticatedly. Yes, if we address this 2PC feature as an FDW enhancement, we need to make sure that at least some well-known DBMSs shouldbe able to implement the new interface. The following part may help devise the interface: [References from XA specification] -------------------------------------------------- The primary use of xa_start() is to register a new transaction branch with the RM. This marks the start of the branch. Subsequently, the AP, using the same thread of control, uses the RM’s native interface to do useful work. All requests for service made by the same thread are part of the same branch until the thread dissociates from the branch (see below). 3.3.1 Registration of Resource Managers Normally, a TM involves all associated RMs in a transaction branch. (The TM’s set of RM switches, described in Section 4.3 on page 21 tells the TM which RMs are associated with it.) The TM calls all these RMs with xa_start(), xa_end(), and xa_prepare (), although an RM that is not active in a branch need not participate further (see Section 2.3.2 on page 8). A technique to reduce overhead for infrequently-used RMs is discussed below. Dynamic Registration Certain RMs, especially those involved in relatively few global transactions, may ask the TM to assume they are not involved in a transaction. These RMs must register with the TM before they do application work, to see whether the work is part of a global transaction. The TM never calls these RMs with any form of xa_start(). An RM declares dynamic registration in its switch (see Section 4.3 on page 21). An RM can make this declaration only on its own behalf, and doing so does not change the TM’s behaviour with respect to other RMs. When an AP requests work from such an RM, before doing any work, the RM contacts the TM by calling ax_reg(). The RM must call ax_reg() from the same thread of control that the AP would use if it called ax_reg() directly. The TM returns to the RM the appropriate XID if the AP is in a global transaction. The implications of dynamically registering are as follows: when a thread of control begins working on behalf of a transaction branch, the transaction manager calls xa_start() for all resource managers known to the thread except those having TMREGISTER set in their xa_switch_t structure. Thus, those resource managers with this flag set must explicitly join a branch with ax_reg(). Secondly, when a thread of control is working on behalf of a branch, a transaction manager calls xa_end() for all resource managers known to the thread that either do not have TMREGISTER set in their xa_switch_t structure or have dynamically registered with ax_reg(). int xa_start(XID *xid, int rmid, long flags) DESCRIPTION A transaction manager calls xa_start() to inform a resource manager that an application may do work on behalf of a transaction branch. ... A transaction manager calls xa_start() only for those resource managers that do not have TMREGISTER set in the flags element of their xa_switch_t structure. Resource managers with TMREGISTER set must use ax_reg() to join a transaction branch (see ax_reg() for details). -------------------------------------------------- > > To track the foreign transaction status, we can add GetTransactionStatus() to > the FDW interface as an alternative, can't we? > > I haven't thought such an interface but it sounds like the transaction > status is managed on both the core and FDWs. Could you elaborate on > that? I don't have such deep analysis. I just thought that the core could keep track of the local transaction status, and askeach participant FDW about its transaction status to determine an action. > > If the WAL records of multiple concurrent transactions are written and > synced separately, i.e. group commit doesn't take effect, then the OLTP > transaction performance will be unacceptable. > > I agree that it'll be a large performance penalty. I'd like to have it > but I’m not sure we should have it in the first version from the > perspective of complexity. I think at least we should have a rough image of how we can reach the goal. Otherwise, the current design/implementationmay have to be overhauled with great efforts in the near future. Apart from that, I feel it's unnaturalthat the commit processing is serialized at the transaction resolver while the DML processing of multiple foreigntransactions can be performed in parallel. > Since the procedure of 2PC is originally > high cost, in my opinion, the user should not use as much as possible > in terms of performance. Especially in OLTP, its cost will directly > affect the latency. I’d suggest designing database schema so > transaction touches only one foreign server but do you have concrete > OLTP usecase where normally requires 2PC, and how many servers > involved within a distributed transaction? I can't share the details, but some of our customers show interest in Postgres scale-out or FDW 2PC for the following usecases: * Multitenant OLTP where the data specific to one tenant is stored on one database server. On the other hand, some dataare shared among all tenants, and they are stored on a separate server. The shared data and the tenant-specific datais updated in the same transaction (I don't know the frequency of such transactions.) * An IoT use case where each edge database server monitors and tracks the movement of objects in one area. Those edge databaseservers store the records of objects they manage. When an object gets out of one area and moves to another, therecord for the object is moved between the two edge database servers using an atomic distributed transaction. (I wonder if TPC-C or TPC-E needs distributed transaction...) Regards Takayuki Tsunakawa
pgsql-hackers by date: