RE: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id TYAPR01MB299068D29020EBCC6EF4065AFE210@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> The resolver process has two functionalities: resolving foreign
> transactions automatically when the user issues COMMIT (the case you
> described in the second paragraph), and resolving foreign transaction
> when the corresponding backend no longer exist or when the server
> crashes during in the middle of 2PC (described in the third
> paragraph).
> 
> Considering the design without the resolver process, I think we can
> easily replace the latter with the manual resolution. OTOH, it's not
> easy for the former. I have no idea about better design for now,
> although, as you described, if we could ensure that the process
> doesn't raise an error during resolving foreign transactions after
> committing the local transaction we would not need the resolver
> process.

Yeah, the resolver background process -- someone independent of client sessions -- is necessary, because the client
sessiondisappears sometime.  When the server that hosts the 2PC coordinator crashes, there are no client sessions.  Our
DBMSSymfoware also runs background threads that take care of resolution of in-doubt transactions due to a server or
networkfailure.
 

Then, how does the resolver get involved in 2PC to enable parallel 2PC?  Two ideas quickly come to mind:

(1) Each client backend issues prepare and commit to multiple remote nodes asynchronously.
If the communication fails during commit, the client backend leaves the commit notification task to the resolver.
That is, the resolver lends a hand during failure recovery, and doesn't interfere with the transaction processing
duringnormal operation.
 

(2) The resolver takes some responsibility in 2PC processing during normal operation.
(send prepare and/or commit to remote nodes and get the results.)
To avoid serial execution per transaction, the resolver bundles multiple requests, send them in bulk, and wait for
multiplereplies at once.
 
This allows the coordinator to do its own prepare processing in parallel with those of participants.
However, in Postgres, this requires context switches between the client backend and the resolver.


Our Symfoware takes (2).  However, it doesn't suffer from the context switch, because the server is multi-threaded and
furtherimplements or uses more lightweight entities than the thread.
 


> Or the second idea would be that the backend commits only the local
> transaction then returns the acknowledgment of COMMIT to the user
> without resolving foreign transactions. Then the user manually
> resolves the foreign transactions by, for example, using the SQL
> function pg_resolve_foreign_xact() within a separate transaction. That
> way, even if an error occurred during resolving foreign transactions
> (i.g., executing COMMIT PREPARED), it’s okay as the user is already
> aware of the local transaction having been committed and can retry to
> resolve the unresolved foreign transaction. So we won't need the
> resolver process while avoiding such inconsistency.
> 
> But a drawback would be that the transaction commit doesn't ensure
> that all foreign transactions are completed. The subsequent
> transactions would need to check if the previous distributed
> transaction is completed to see its results. I’m not sure it’s a good
> design in terms of usability.

I don't think it's a good design as you are worried.  I guess that's why Postgres-XL had to create a tool called
pgxc_cleanand ask the user to resolve transactions with it.
 

pgxc_clean
https://www.postgres-xl.org/documentation/pgxcclean.html

"pgxc_clean is a Postgres-XL utility to maintain transaction status after a crash. When a Postgres-XL node crashes and
recoversor fails over, the commit status of the node may be inconsistent with other nodes. pgxc_clean checks
transactioncommit status and corrects them."
 


Regards
Takayuki Tsunakawa


pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Next
From: Amit Kapila
Date:
Subject: Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS