RE: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | tsunakawa.takay@fujitsu.com |
---|---|
Subject | RE: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | TYAPR01MB29905116075A10713D01AE30FE359@TYAPR01MB2990.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
List | pgsql-hackers |
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > If we accept each elementary-commit (via FDW connection) to fail, the > parent(?) there's no way the root 2pc-commit can succeed. How can we > ignore the fdw-error in that case? No, we don't ignore the error during FDW commit. As mentioned at the end of this mail, the question is how the FDW reportsthe eror to the caller (transaction manager in Postgres core), and how we should handle it. As below, Glassfish catches the resource manager's error during commit, retries the commit if the error is transient or communicationfailure, and hands off the processing of failed commit to the recovery manager. (I used all of my energy today;I'd be grateful if someone could figure out whether Glassfish reports the error to the application.) [XATerminatorImpl.java] public void commit(Xid xid, boolean onePhase) throws XAException { ... } else { coord.commit(); } [TopCoordinator.java] // Commit all participants. If a fatal error occurs during // this method, then the process must be ended with a fatal error. ... try { participants.distributeCommit(); } catch (Throwable exc) { [RegisteredResources.java] void distributeCommit() throws HeuristicMixed, HeuristicHazard, NotPrepared { ... // Browse through the participants, committing them. The following is // intended to be done asynchronously as a group of operations. ... // Tell the resource to commit. // Catch any exceptions here; keep going until // no exception is left. ... // If the exception is neither TRANSIENT or // COMM_FAILURE, it is unexpected, so display a // message and give up with this Resource. ... // For TRANSIENT or COMM_FAILURE, wait // for a while, then retry the commit. ... // If the retry limit has been exceeded, // end the process with a fatal error. ... if (!transactionCompleted) { if (coord != null) RecoveryManager.addToIncompleTx(coord, true); > > No. Taking the description literally and considering the relevant XA > specification, it's not about the remote commit failure. The remote server is > not allowed to fail the commit once it has reported successful prepare, which is > the contract of 2PC. HeuristicMixedException is about the manual resolution, > typically by the DBA, using the DBMS-specific tool or the standard > commit()/rollback() API. > > Mmm. The above seems as if saying that 2pc-comit does not interact > with remotes. The interface contract does not cover everything that > happens in the real world. If remote-commit fails, that is just an > issue outside of the 2pc world. In reality remote-commit may fail for > all reasons. The following part of XA specification is relevant. We're considering to model the FDW 2PC interface based on XA, becauseit seems like the only standard interface and thus other FDWS would naturally take advantage of, aren't we? Then,we need to take care of such things as this. The interface design is not easy. So, proper design and its review shouldcome first, before going deeper into the huge code patch. 2.3.3 Heuristic Branch Completion -------------------------------------------------- Some RMs may employ heuristic decision-making: an RM that has prepared to commit a transaction branch may decide to commit or roll back its work independently of the TM. It could then unlock shared resources. This may leave them in an inconsistent state. When the TM ultimately directs an RM to complete the branch, the RM may respond that it has already done so. The RM reports whether it committed the branch, rolled it back, or completed it with mixed results (committed some work and rolled back other work). An RM that reports heuristic completion to the TM must not discard its knowledge of the transaction branch. The TM calls the RM once more to authorise it to forget the branch. This requirement means that the RM must notify the TM of all heuristic decisions, even those that match the decision the TM requested. The referenced OSI DTP specifications (model) and (service) define heuristics more precisely. -------------------------------------------------- > https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distr > ibuted-transaction-that-uses-jta-methods > This suggests that both XAResoruce.prepare() and commit() can throw a > exception. Yes, XAResource() can throw an exception: void commit(Xid xid, boolean onePhase) throws XAException Throws: XAException An error has occurred. Possible XAExceptions are XA_HEURHAZ, XA_HEURCOM, XA_HEURRB, XA_HEURMIX, XAER_RMERR, XAER_RMFAIL, XAER_NOTA, XAER_INVAL, or XAER_PROTO. This is equivalent to xa_commit() in the XA specification. xa_commit() can return an error code that have the same namesas above. The question we're trying to answer here is: * How such an error should be handled? Glassfish (and possibly other Java EE servers) catch the error, continue to commit the rest of participants, and handle thefailed resource manager's commit in the background. In Postgres, if we allow FDWs to do ereport(ERROR), how can we dosimilar things? * Should we report the error to the client? If yes, should it be reported as a failure of commit, or as an informationalmessage (WARNING) of a successful commit? Why does the client want to know the error, where the global transaction'scommit has been promised? Regards Takayuki Tsunakawa
pgsql-hackers by date: