Thread: Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Tue, Jun 5, 2018 at 7:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sat, May 26, 2018 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, May 18, 2018 at 11:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Regarding to API design, should we use 2PC for a distributed >>> transaction if both two or more 2PC-capable foreign servers and >>> 2PC-non-capable foreign server are involved with it? Or should we end >>> up with an error? the 2PC-non-capable server might be either that has >>> 2PC functionality but just disables it or that doesn't have it. >> >> It seems to me that this is functionality that many people will not >> want to use. First, doing a PREPARE and then a COMMIT for each FDW >> write transaction is bound to be more expensive than just doing a >> COMMIT. Second, because the default value of >> max_prepared_transactions is 0, this can only work at all if special >> configuration has been done on the remote side. Because of the second >> point in particular, it seems to me that the default for this new >> feature must be "off". It would make to ship a default configuration >> of PostgreSQL that doesn't work with the default configuration of >> postgres_fdw, and I do not think we want to change the default value >> of max_prepared_transactions. It was changed from 5 to 0 a number of >> years back for good reason. > > I'm not sure that many people will not want to use this feature > because it seems to me that there are many people who don't want to > use the database that is missing transaction atomicity. But I agree > that this feature should not be enabled by default as we disable 2PC > by default. > >> >> So, I think the question could be broadened a bit: how you enable this >> feature if you want it, and what happens if you want it but it's not >> available for your choice of FDW? One possible enabling method is a >> GUC (e.g. foreign_twophase_commit). It could be true/false, with true >> meaning use PREPARE for all FDW writes and fail if that's not >> supported, or it could be three-valued, like require/prefer/disable, >> with require throwing an error if PREPARE support is not available and >> prefer using PREPARE where available but without failing when it isn't >> available. Another possibility could be to make it an FDW option, >> possibly capable of being set at multiple levels (e.g. server or >> foreign table). If any FDW involved in the transaction demands >> distributed 2PC semantics then the whole transaction must have those >> semantics or it fails. I was previous leaning toward the latter >> approach, but I guess now the former approach is sounding better. I'm >> not totally certain I know what's best here. >> > > I agree that the former is better. That way, we also can control that > parameter at transaction level. If we allow the 'prefer' behavior we > need to manage not only 2PC-capable foreign server but also > 2PC-non-capable foreign server. It requires all FDW to call the > registration function. So I think two-values parameter would be > better. > > BTW, sorry for late submitting the updated patch. I'll post the > updated patch in this week but I'd like to share the new APIs design > beforehand. Attached updated patches. I've changed the new APIs to 5 functions and 1 registration function because the rollback API can be called by both backend process and resolver process which is not good design. The latest version patches incorporated all comments I got except for documentation about overall point to user. I'm considering what contents I should document it there. I'll write it during the code patch is getting reviewed. The basic design of new patches is almost same as the previous mail I sent. I introduced 5 new FDW APIs: PrepareForeignTransaction, CommitForeignTransaction, RollbackForeignTransaction, ResolveForeignTransaction and IsTwophaseCommitEnabled. ResolveForeignTransaction is normally called by resolver process whereas other four functions are called by backend process. Also I introduced a registration function FdwXactRegisterForeignTransaction. FDW that wish to support atomic commit requires to call this function when a transaction opens on the foreign server. Registered foreign transactions are controlled by the foreign transaction manager of Postgres core and calls APIs at appropriate timing. It means that the foreign transaction manager controls only foreign servers that are capable of 2PC. For 2PC-non-capable foreign server, FDW must use XactCallback to control the foreign transaction. 2PC is used at commit when the distributed transaction modified data on two or more servers including local server and user requested by foreign_twophase_commit GUC parameter. All foreign transactions are prepared during pre-commit and then commit locally. After committed locally wait for resolver process to resolve all prepared foreign transactions. The waiting backend is released (that is, returns the prompt to client) either when all foreign transactions are resolved or when user requested to waiting. If 2PC is not required, a foreign transaction is committed during pre-commit phase of local transaction. IsTwophaseCommitEnabled is called whenever the transaction begins to modify data on foreign server. This is required to track whether the transaction modified data on the foreign server that doesn't support or enable 2PC. Atomic commit among multiple foreign servers is crash-safe. If the coordinator server crashes during atomic commit, the foreign transaction participants and their status are recovered during WAL apply. Recovered foreign transactions are in doubt-state, aka dangling transactions. If database has such transactions resolver process periodically tries to resolve them. I'll register this patch to next CF. Feedback is very welcome. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Mon, Jun 11, 2018 at 1:53 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Jun 5, 2018 at 7:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Sat, May 26, 2018 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Fri, May 18, 2018 at 11:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> Regarding to API design, should we use 2PC for a distributed >>>> transaction if both two or more 2PC-capable foreign servers and >>>> 2PC-non-capable foreign server are involved with it? Or should we end >>>> up with an error? the 2PC-non-capable server might be either that has >>>> 2PC functionality but just disables it or that doesn't have it. >>> >>> It seems to me that this is functionality that many people will not >>> want to use. First, doing a PREPARE and then a COMMIT for each FDW >>> write transaction is bound to be more expensive than just doing a >>> COMMIT. Second, because the default value of >>> max_prepared_transactions is 0, this can only work at all if special >>> configuration has been done on the remote side. Because of the second >>> point in particular, it seems to me that the default for this new >>> feature must be "off". It would make to ship a default configuration >>> of PostgreSQL that doesn't work with the default configuration of >>> postgres_fdw, and I do not think we want to change the default value >>> of max_prepared_transactions. It was changed from 5 to 0 a number of >>> years back for good reason. >> >> I'm not sure that many people will not want to use this feature >> because it seems to me that there are many people who don't want to >> use the database that is missing transaction atomicity. But I agree >> that this feature should not be enabled by default as we disable 2PC >> by default. >> >>> >>> So, I think the question could be broadened a bit: how you enable this >>> feature if you want it, and what happens if you want it but it's not >>> available for your choice of FDW? One possible enabling method is a >>> GUC (e.g. foreign_twophase_commit). It could be true/false, with true >>> meaning use PREPARE for all FDW writes and fail if that's not >>> supported, or it could be three-valued, like require/prefer/disable, >>> with require throwing an error if PREPARE support is not available and >>> prefer using PREPARE where available but without failing when it isn't >>> available. Another possibility could be to make it an FDW option, >>> possibly capable of being set at multiple levels (e.g. server or >>> foreign table). If any FDW involved in the transaction demands >>> distributed 2PC semantics then the whole transaction must have those >>> semantics or it fails. I was previous leaning toward the latter >>> approach, but I guess now the former approach is sounding better. I'm >>> not totally certain I know what's best here. >>> >> >> I agree that the former is better. That way, we also can control that >> parameter at transaction level. If we allow the 'prefer' behavior we >> need to manage not only 2PC-capable foreign server but also >> 2PC-non-capable foreign server. It requires all FDW to call the >> registration function. So I think two-values parameter would be >> better. >> >> BTW, sorry for late submitting the updated patch. I'll post the >> updated patch in this week but I'd like to share the new APIs design >> beforehand. > > Attached updated patches. > > I've changed the new APIs to 5 functions and 1 registration function > because the rollback API can be called by both backend process and > resolver process which is not good design. The latest version patches > incorporated all comments I got except for documentation about overall > point to user. I'm considering what contents I should document it > there. I'll write it during the code patch is getting reviewed. The > basic design of new patches is almost same as the previous mail I > sent. > > I introduced 5 new FDW APIs: PrepareForeignTransaction, > CommitForeignTransaction, RollbackForeignTransaction, > ResolveForeignTransaction and IsTwophaseCommitEnabled. > ResolveForeignTransaction is normally called by resolver process > whereas other four functions are called by backend process. Also I > introduced a registration function FdwXactRegisterForeignTransaction. > FDW that wish to support atomic commit requires to call this function > when a transaction opens on the foreign server. Registered foreign > transactions are controlled by the foreign transaction manager of > Postgres core and calls APIs at appropriate timing. It means that the > foreign transaction manager controls only foreign servers that are > capable of 2PC. For 2PC-non-capable foreign server, FDW must use > XactCallback to control the foreign transaction. 2PC is used at commit > when the distributed transaction modified data on two or more servers > including local server and user requested by foreign_twophase_commit > GUC parameter. All foreign transactions are prepared during pre-commit > and then commit locally. After committed locally wait for resolver > process to resolve all prepared foreign transactions. The waiting > backend is released (that is, returns the prompt to client) either > when all foreign transactions are resolved or when user requested to > waiting. If 2PC is not required, a foreign transaction is committed > during pre-commit phase of local transaction. IsTwophaseCommitEnabled > is called whenever the transaction begins to modify data on foreign > server. This is required to track whether the transaction modified > data on the foreign server that doesn't support or enable 2PC. > > Atomic commit among multiple foreign servers is crash-safe. If the > coordinator server crashes during atomic commit, the foreign > transaction participants and their status are recovered during WAL > apply. Recovered foreign transactions are in doubt-state, aka dangling > transactions. If database has such transactions resolver process > periodically tries to resolve them. > > I'll register this patch to next CF. Feedback is very welcome. > I attached the updated version patch as the previous versions conflict with the current HEAD. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Fri, Aug 03, 2018 at 05:52:24PM +0900, Masahiko Sawada wrote: > I attached the updated version patch as the previous versions conflict > with the current HEAD. Please note that the latest patch set does not apply anymore, so this patch is moved to next CF, waiting on author. -- Michael
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Tue, Oct 2, 2018 at 3:10 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 03, 2018 at 05:52:24PM +0900, Masahiko Sawada wrote: > > I attached the updated version patch as the previous versions conflict > > with the current HEAD. > > Please note that the latest patch set does not apply anymore, so this > patch is moved to next CF, waiting on author. Thank you! Attached the latest version patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation: tested, failed I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wantedto start on this while traffic is slow. I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can be resolvedlater. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but the people whoneed it will have the knowledge to work with issues if they come up. The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes through globaltransactions sequentially and does not move on to the next until the previous one is resolved. This means that ifI have a global transaction on server A, with foreign servers B and C, and I have another one on server A with foreignservers C and D, if server B goes down at the wrong moment, the background worker does not look like it will detectthe failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until this isresolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transactionto allow processing to continue and manual resolution (this is good and necessary because in this area there isno ability to have perfect recoverability without occasional administrative action). I would really like to see more documentationof failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viableaddition and I think we want it. It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy to takea first crack at such docs. To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we would usethis feature once available. ------------------ Testing however failed. make installcheck-world fails with errors like the following: -- Modify foreign server and raise an error BEGIN; INSERT INTO ft7_twophase VALUES(8); + ERROR: prepread foreign transactions are disabled + HINT: Set max_prepared_foreign_transactions to a nonzero value. INSERT INTO ft8_twophase VALUES(NULL); -- violation ! ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK; SELECT * FROM ft7_twophase; ! ERROR: prepread foreign transactions are disabled ! HINT: Set max_prepared_foreign_transactions to a nonzero value. SELECT * FROM ft8_twophase; ! ERROR: prepread foreign transactions are disabled ! HINT: Set max_prepared_foreign_transactions to a nonzero value. -- Rollback foreign transaction that involves both 2PC-capable -- and 2PC-non-capable foreign servers. BEGIN; INSERT INTO ft8_twophase VALUES(7); + ERROR: prepread foreign transactions are disabled + HINT: Set max_prepared_foreign_transactions to a nonzero value. INSERT INTO ft9_not_twophase VALUES(7); + ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK; SELECT * FROM ft8_twophase; ! ERROR: prepread foreign transactions are disabled ! HINT: Set max_prepared_foreign_transactions to a nonzero value. make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a new installation. I think the test cases will have to handle that sort of setup. make check in the contrib directory passes. For reasons of test failures, I am setting this back to waiting on author. ------------------ I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that once it isin place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handling partso that extensions that do not use the foreign data wrapper structure could use it as well (while this looks like a classicSQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API. The new status of this patch is: Waiting on Author
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed
(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires maX_foreign_xact_resolvers > 0")));
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com> wrote:(errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires maX_foreign_xact_resolvers > 0")));
max_prepared_foreign_transactions, max_prepared_transactions
--Best Regards,Chris TraversHead of DatabaseSaarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed
I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wanted to start on this while traffic is slow.
I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can be resolved later. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but the people who need it will have the knowledge to work with issues if they come up.
The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes through global transactions sequentially and does not move on to the next until the previous one is resolved. This means that if I have a global transaction on server A, with foreign servers B and C, and I have another one on server A with foreign servers C and D, if server B goes down at the wrong moment, the background worker does not look like it will detect the failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until this is resolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transaction to allow processing to continue and manual resolution (this is good and necessary because in this area there is no ability to have perfect recoverability without occasional administrative action). I would really like to see more documentation of failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viable addition and I think we want it.
It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy to take a first crack at such docs.
To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we would use this feature once available.
------------------
Testing however failed.
make installcheck-world fails with errors like the following:
-- Modify foreign server and raise an error
BEGIN;
INSERT INTO ft7_twophase VALUES(8);
+ ERROR: prepread foreign transactions are disabled
+ HINT: Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft8_twophase VALUES(NULL); -- violation
! ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft7_twophase;
! ERROR: prepread foreign transactions are disabled
! HINT: Set max_prepared_foreign_transactions to a nonzero value.
SELECT * FROM ft8_twophase;
! ERROR: prepread foreign transactions are disabled
! HINT: Set max_prepared_foreign_transactions to a nonzero value.
-- Rollback foreign transaction that involves both 2PC-capable
-- and 2PC-non-capable foreign servers.
BEGIN;
INSERT INTO ft8_twophase VALUES(7);
+ ERROR: prepread foreign transactions are disabled
+ HINT: Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft9_not_twophase VALUES(7);
+ ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft8_twophase;
! ERROR: prepread foreign transactions are disabled
! HINT: Set max_prepared_foreign_transactions to a nonzero value.
make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a new installation. I think the test cases will have to handle that sort of setup.
make check in the contrib directory passes.
For reasons of test failures, I am setting this back to waiting on author.
------------------
I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that once it is in place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handling part so that extensions that do not use the foreign data wrapper structure could use it as well (while this looks like a classic SQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API.
The new status of this patch is: Waiting on Author
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Oct 3, 2018 at 6:02 PM Chris Travers <chris.travers@adjust.com> wrote: > > > > On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com> wrote: >> >> The following review has been posted through the commitfest application: >> make installcheck-world: tested, failed >> Implements feature: not tested >> Spec compliant: not tested >> Documentation: tested, failed >> >> I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wantedto start on this while traffic is slow. >> >> I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can be resolvedlater. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but the people whoneed it will have the knowledge to work with issues if they come up. >> >> The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes through globaltransactions sequentially and does not move on to the next until the previous one is resolved. This means that ifI have a global transaction on server A, with foreign servers B and C, and I have another one on server A with foreignservers C and D, if server B goes down at the wrong moment, the background worker does not look like it will detectthe failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until this isresolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transactionto allow processing to continue and manual resolution (this is good and necessary because in this area there isno ability to have perfect recoverability without occasional administrative action). I would really like to see more documentationof failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viableaddition and I think we want it. >> >> It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy totake a first crack at such docs. > Thank you for reviewing the patch! > > After further testing I am pretty sure I misread the patch. It looks like one can have multiple resolvers which can, infact, work through a queue together solving this problem. So the objection above is not valid and I withdraw that objection. I will re-review the docs in light of the experience. Actually the patch doesn't solve this problem; the foreign transaction resolver processes distributed transactions sequentially. But since one resolver process is responsible for one database the backend connecting to another database can complete the distributed transaction. I understood the your concern and agreed to solve this problem. I'll address it in the next patch. > >> >> >> To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we woulduse this feature once available. >> >> ------------------ >> >> Testing however failed. >> >> make installcheck-world fails with errors like the following: >> >> -- Modify foreign server and raise an error >> BEGIN; >> INSERT INTO ft7_twophase VALUES(8); >> + ERROR: prepread foreign transactions are disabled >> + HINT: Set max_prepared_foreign_transactions to a nonzero value. >> INSERT INTO ft8_twophase VALUES(NULL); -- violation >> ! ERROR: current transaction is aborted, commands ignored until end of transaction block >> ROLLBACK; >> SELECT * FROM ft7_twophase; >> ! ERROR: prepread foreign transactions are disabled >> ! HINT: Set max_prepared_foreign_transactions to a nonzero value. >> SELECT * FROM ft8_twophase; >> ! ERROR: prepread foreign transactions are disabled >> ! HINT: Set max_prepared_foreign_transactions to a nonzero value. >> -- Rollback foreign transaction that involves both 2PC-capable >> -- and 2PC-non-capable foreign servers. >> BEGIN; >> INSERT INTO ft8_twophase VALUES(7); >> + ERROR: prepread foreign transactions are disabled >> + HINT: Set max_prepared_foreign_transactions to a nonzero value. >> INSERT INTO ft9_not_twophase VALUES(7); >> + ERROR: current transaction is aborted, commands ignored until end of transaction block >> ROLLBACK; >> SELECT * FROM ft8_twophase; >> ! ERROR: prepread foreign transactions are disabled >> ! HINT: Set max_prepared_foreign_transactions to a nonzero value. >> >> make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a newinstallation. I think the test cases will have to handle that sort of setup. The 'make installcheck' is a regression test mode to do the tests to the existing installation. If the installation disables atomic commit feature (e.g. max_prepared_foreign_transaction etc) the test will fail because the feature is disabled by default. >> >> make check in the contrib directory passes. >> >> For reasons of test failures, I am setting this back to waiting on author. >> >> ------------------ >> I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that once itis in place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handlingpart so that extensions that do not use the foreign data wrapper structure could use it as well (while this lookslike a classic SQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API. >> >> The new status of this patch is: Waiting on Author Also, I'll update the doc in the next patch that I'll post on this week. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Oct 3, 2018 at 6:02 PM Chris Travers <chris.travers@adjust.com> wrote: > > > > > > > > On Wed, Oct 3, 2018 at 9:41 AM Chris Travers <chris.travers@gmail.com> wrote: > >> > >> The following review has been posted through the commitfest application: > >> make installcheck-world: tested, failed > >> Implements feature: not tested > >> Spec compliant: not tested > >> Documentation: tested, failed > >> > >> I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wantedto start on this while traffic is slow. > >> > >> I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can beresolved later. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but thepeople who need it will have the knowledge to work with issues if they come up. > >> > >> The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes throughglobal transactions sequentially and does not move on to the next until the previous one is resolved. This meansthat if I have a global transaction on server A, with foreign servers B and C, and I have another one on server A withforeign servers C and D, if server B goes down at the wrong moment, the background worker does not look like it willdetect the failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until thisis resolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transactionto allow processing to continue and manual resolution (this is good and necessary because in this area there isno ability to have perfect recoverability without occasional administrative action). I would really like to see more documentationof failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viableaddition and I think we want it. > >> > >> It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy totake a first crack at such docs. > > > > Thank you for reviewing the patch! > > > > > After further testing I am pretty sure I misread the patch. It looks like one can have multiple resolvers which can,in fact, work through a queue together solving this problem. So the objection above is not valid and I withdraw thatobjection. I will re-review the docs in light of the experience. > > Actually the patch doesn't solve this problem; the foreign transaction > resolver processes distributed transactions sequentially. But since > one resolver process is responsible for one database the backend > connecting to another database can complete the distributed > transaction. I understood the your concern and agreed to solve this > problem. I'll address it in the next patch. > > > > >> > >> > >> To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we woulduse this feature once available. > >> > >> ------------------ > >> > >> Testing however failed. > >> > >> make installcheck-world fails with errors like the following: > >> > >> -- Modify foreign server and raise an error > >> BEGIN; > >> INSERT INTO ft7_twophase VALUES(8); > >> + ERROR: prepread foreign transactions are disabled > >> + HINT: Set max_prepared_foreign_transactions to a nonzero value. > >> INSERT INTO ft8_twophase VALUES(NULL); -- violation > >> ! ERROR: current transaction is aborted, commands ignored until end of transaction block > >> ROLLBACK; > >> SELECT * FROM ft7_twophase; > >> ! ERROR: prepread foreign transactions are disabled > >> ! HINT: Set max_prepared_foreign_transactions to a nonzero value. > >> SELECT * FROM ft8_twophase; > >> ! ERROR: prepread foreign transactions are disabled > >> ! HINT: Set max_prepared_foreign_transactions to a nonzero value. > >> -- Rollback foreign transaction that involves both 2PC-capable > >> -- and 2PC-non-capable foreign servers. > >> BEGIN; > >> INSERT INTO ft8_twophase VALUES(7); > >> + ERROR: prepread foreign transactions are disabled > >> + HINT: Set max_prepared_foreign_transactions to a nonzero value. > >> INSERT INTO ft9_not_twophase VALUES(7); > >> + ERROR: current transaction is aborted, commands ignored until end of transaction block > >> ROLLBACK; > >> SELECT * FROM ft8_twophase; > >> ! ERROR: prepread foreign transactions are disabled > >> ! HINT: Set max_prepared_foreign_transactions to a nonzero value. > >> > >> make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a newinstallation. I think the test cases will have to handle that sort of setup. > > The 'make installcheck' is a regression test mode to do the tests to > the existing installation. If the installation disables atomic commit > feature (e.g. max_prepared_foreign_transaction etc) the test will fail > because the feature is disabled by default. > > >> > >> make check in the contrib directory passes. > >> > >> For reasons of test failures, I am setting this back to waiting on author. > >> > >> ------------------ > >> I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that onceit is in place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handlingpart so that extensions that do not use the foreign data wrapper structure could use it as well (while this lookslike a classic SQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API. > >> > >> The new status of this patch is: Waiting on Author > > Also, I'll update the doc in the next patch that I'll post on this week. > Attached the updated version of patches. What I changed from the previous version are, * Enabled processing subsequent distributed transactions even when previous distributed transaction continues to fail due to participants error. To implement this, I've splited the waiting queue into two queues: the active queue and retry queue. All backend inserts itself to the active queue firstly and change its state to FDW_XACT_WAITING. Once the resolver process failed to resolve the distributed transaction, it move the backend entry in the active queue to the retry queue and change its state to FDW_XACT_WAITING_RETRY. The backend entries in the active queue are processed each commit time whereas entries in the retry queue are processed at interval of foreign_transaction_resolution_retry_interval. * Updated docs, added the new section "Distributed Transaction" at Chapter 33 to explain the concept to users * Moved atomic commit codes into src/backend/access/fdwxact directory. * Some bug fixes. Please reivew them. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
Hello. # It took a long time to come here.. At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com> > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: ... > * Updated docs, added the new section "Distributed Transaction" at > Chapter 33 to explain the concept to users > > * Moved atomic commit codes into src/backend/access/fdwxact directory. > > * Some bug fixes. > > Please reivew them. I have some comments, with apologize in advance for possible duplicate or conflict with others' comments so far. 0001: This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT relation is modified. Isn't it needed when UNLOGGED tables are modified? It may be better that we have dedicated classification macro or function. The flag is handled in heapam.c. I suppose that it should be done in the upper layer considering coming pluggable storage. (X_F_ACCESSEDTEMPREL is set in heapam, but..) 0002: The name FdwXactParticipantsForAC doesn't sound good for me. How about FdwXactAtomicCommitPartitcipants? Well, as the file comment of fdwxact.c, FdwXactRegisterTransaction is called from FDW driver and F_X_MarkForeignTransactionModified is called from executor. I think that we should clarify who is responsible to the whole sequence. Since the state of local tables affects, I suppose executor is that. Couldn't we do the whole thing within executor side? I'm not sure but I feel that F_X_RegisterForeignTransaction can be a part of F_X_MarkForeignTransactionModified. The callers of MarkForeignTransactionModified can find whether the table is involved in 2pc by IsTwoPhaseCommitEnabled interface. > if (foreign_twophase_commit == true && > ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) ) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't supportatomic commit"))); The error is emitted when a the GUC is turned off in the trasaction where MarkTransactionModify'ed. I think that the number of the variables' possible states should be reduced for simplicity. For example in the case, once foreign_twopase_commit is checked in a transaction, subsequent changes in the transaction should be ignored during the transaction. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Hello. > > # It took a long time to come here.. > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com> > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > ... > > * Updated docs, added the new section "Distributed Transaction" at > > Chapter 33 to explain the concept to users > > > > * Moved atomic commit codes into src/backend/access/fdwxact directory. > > > > * Some bug fixes. > > > > Please reivew them. > > I have some comments, with apologize in advance for possible > duplicate or conflict with others' comments so far. Thank youf so much for reviewing this patch! > > 0001: > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT > relation is modified. Isn't it needed when UNLOGGED tables are > modified? It may be better that we have dedicated classification > macro or function. I think even if we do atomic commit for modifying the an UNLOGGED table and a remote table the data will get inconsistent if the local server crashes. For example, if the local server crashes after prepared the transaction on foreign server but before the local commit and, we will lose the all data of the local UNLOGGED table whereas the modification of remote table is rollbacked. In case of persistent tables, the data consistency is left. So I think the keeping data consistency between remote data and local unlogged table is difficult and want to leave it as a restriction for now. Am I missing something? > > The flag is handled in heapam.c. I suppose that it should be done > in the upper layer considering coming pluggable storage. > (X_F_ACCESSEDTEMPREL is set in heapam, but..) > Yeah, or we can set the flag after heap_insert in ExecInsert. > > 0002: > > The name FdwXactParticipantsForAC doesn't sound good for me. How > about FdwXactAtomicCommitPartitcipants? +1, will fix it. > > Well, as the file comment of fdwxact.c, > FdwXactRegisterTransaction is called from FDW driver and > F_X_MarkForeignTransactionModified is called from executor. I > think that we should clarify who is responsible to the whole > sequence. Since the state of local tables affects, I suppose > executor is that. Couldn't we do the whole thing within executor > side? I'm not sure but I feel that > F_X_RegisterForeignTransaction can be a part of > F_X_MarkForeignTransactionModified. The callers of > MarkForeignTransactionModified can find whether the table is > involved in 2pc by IsTwoPhaseCommitEnabled interface. Indeed. We can register foreign servers by executor while FDWs don't need to register anything. I will remove the registration function so that FDW developers don't need to call the register function but only need to provide atomic commit APIs. > > > > if (foreign_twophase_commit == true && > > ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) ) > > ereport(ERROR, > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > errmsg("cannot COMMIT a distributed transaction that has operated on foreign server thatdoesn't support atomic commit"))); > > The error is emitted when a the GUC is turned off in the > trasaction where MarkTransactionModify'ed. I think that the > number of the variables' possible states should be reduced for > simplicity. For example in the case, once foreign_twopase_commit > is checked in a transaction, subsequent changes in the > transaction should be ignored during the transaction. > I might have not gotten your comment correctly but since the foreign_twophase_commit is a PGC_USERSET parameter I think we need to check it at commit time. Also we need to keep participant servers even when foreign_twophase_commit is off if both max_prepared_foreign_xacts and max_foreign_xact_resolvers are > 0. I will post the updated patch in this week. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > Hello. > > > > # It took a long time to come here.. > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com> > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > ... > > > * Updated docs, added the new section "Distributed Transaction" at > > > Chapter 33 to explain the concept to users > > > > > > * Moved atomic commit codes into src/backend/access/fdwxact directory. > > > > > > * Some bug fixes. > > > > > > Please reivew them. > > > > I have some comments, with apologize in advance for possible > > duplicate or conflict with others' comments so far. > > Thank youf so much for reviewing this patch! > > > > > 0001: > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT > > relation is modified. Isn't it needed when UNLOGGED tables are > > modified? It may be better that we have dedicated classification > > macro or function. > > I think even if we do atomic commit for modifying the an UNLOGGED > table and a remote table the data will get inconsistent if the local > server crashes. For example, if the local server crashes after > prepared the transaction on foreign server but before the local commit > and, we will lose the all data of the local UNLOGGED table whereas the > modification of remote table is rollbacked. In case of persistent > tables, the data consistency is left. So I think the keeping data > consistency between remote data and local unlogged table is difficult > and want to leave it as a restriction for now. Am I missing something? > > > > > The flag is handled in heapam.c. I suppose that it should be done > > in the upper layer considering coming pluggable storage. > > (X_F_ACCESSEDTEMPREL is set in heapam, but..) > > > > Yeah, or we can set the flag after heap_insert in ExecInsert. > > > > > 0002: > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How > > about FdwXactAtomicCommitPartitcipants? > > +1, will fix it. > > > > > Well, as the file comment of fdwxact.c, > > FdwXactRegisterTransaction is called from FDW driver and > > F_X_MarkForeignTransactionModified is called from executor. I > > think that we should clarify who is responsible to the whole > > sequence. Since the state of local tables affects, I suppose > > executor is that. Couldn't we do the whole thing within executor > > side? I'm not sure but I feel that > > F_X_RegisterForeignTransaction can be a part of > > F_X_MarkForeignTransactionModified. The callers of > > MarkForeignTransactionModified can find whether the table is > > involved in 2pc by IsTwoPhaseCommitEnabled interface. > > Indeed. We can register foreign servers by executor while FDWs don't > need to register anything. I will remove the registration function so > that FDW developers don't need to call the register function but only > need to provide atomic commit APIs. > > > > > > > > if (foreign_twophase_commit == true && > > > ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) ) > > > ereport(ERROR, > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > errmsg("cannot COMMIT a distributed transaction that has operated on foreign serverthat doesn't support atomic commit"))); > > > > The error is emitted when a the GUC is turned off in the > > trasaction where MarkTransactionModify'ed. I think that the > > number of the variables' possible states should be reduced for > > simplicity. For example in the case, once foreign_twopase_commit > > is checked in a transaction, subsequent changes in the > > transaction should be ignored during the transaction. > > > > I might have not gotten your comment correctly but since the > foreign_twophase_commit is a PGC_USERSET parameter I think we need to > check it at commit time. Also we need to keep participant servers even > when foreign_twophase_commit is off if both max_prepared_foreign_xacts > and max_foreign_xact_resolvers are > 0. > > I will post the updated patch in this week. > Attached the updated version patches. Based on the review comment from Horiguchi-san, I've changed the atomic commit API so that the FDW developer who wish to support atomic commit don't need to call the register function. The atomic commit APIs are following: * GetPrepareId * PrepareForeignTransaction * CommitForeignTransaction * RollbackForeignTransaction * ResolveForeignTransaction * IsTwophaseCommitEnabled The all APIs except for GetPreapreId is required for atomic commit. Also, I've changed the foreign_twophase_commit parameter to an enum parameter based on the suggestion from Robert[1]. Valid values are 'required', 'prefer' and 'disabled' (default). When set to either 'required' or 'prefer' the atomic commit will be used. The difference between 'required' and 'prefer' is that when set to 'requried' we require for *all* modified server to be able to use 2pc whereas when 'prefer' we require 2pc where available. So if any of written participants disables 2pc or doesn't support atomic comit API the transaction fails. IOW, when 'required' we can commit only when data consistency among all participant can be left. Please review the patches. [1] https://www.postgresql.org/message-id/CA%2BTgmob4EqxbaMp0e--jUKYT44RL4xBXkPMxF9EEAD%2ByBGAdxw%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > > > Hello. > > > > > > # It took a long time to come here.. > > > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com> > > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > ... > > > > * Updated docs, added the new section "Distributed Transaction" at > > > > Chapter 33 to explain the concept to users > > > > > > > > * Moved atomic commit codes into src/backend/access/fdwxact directory. > > > > > > > > * Some bug fixes. > > > > > > > > Please reivew them. > > > > > > I have some comments, with apologize in advance for possible > > > duplicate or conflict with others' comments so far. > > > > Thank youf so much for reviewing this patch! > > > > > > > > 0001: > > > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT > > > relation is modified. Isn't it needed when UNLOGGED tables are > > > modified? It may be better that we have dedicated classification > > > macro or function. > > > > I think even if we do atomic commit for modifying the an UNLOGGED > > table and a remote table the data will get inconsistent if the local > > server crashes. For example, if the local server crashes after > > prepared the transaction on foreign server but before the local commit > > and, we will lose the all data of the local UNLOGGED table whereas the > > modification of remote table is rollbacked. In case of persistent > > tables, the data consistency is left. So I think the keeping data > > consistency between remote data and local unlogged table is difficult > > and want to leave it as a restriction for now. Am I missing something? > > > > > > > > The flag is handled in heapam.c. I suppose that it should be done > > > in the upper layer considering coming pluggable storage. > > > (X_F_ACCESSEDTEMPREL is set in heapam, but..) > > > > > > > Yeah, or we can set the flag after heap_insert in ExecInsert. > > > > > > > > 0002: > > > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How > > > about FdwXactAtomicCommitPartitcipants? > > > > +1, will fix it. > > > > > > > > Well, as the file comment of fdwxact.c, > > > FdwXactRegisterTransaction is called from FDW driver and > > > F_X_MarkForeignTransactionModified is called from executor. I > > > think that we should clarify who is responsible to the whole > > > sequence. Since the state of local tables affects, I suppose > > > executor is that. Couldn't we do the whole thing within executor > > > side? I'm not sure but I feel that > > > F_X_RegisterForeignTransaction can be a part of > > > F_X_MarkForeignTransactionModified. The callers of > > > MarkForeignTransactionModified can find whether the table is > > > involved in 2pc by IsTwoPhaseCommitEnabled interface. > > > > Indeed. We can register foreign servers by executor while FDWs don't > > need to register anything. I will remove the registration function so > > that FDW developers don't need to call the register function but only > > need to provide atomic commit APIs. > > > > > > > > > > > > if (foreign_twophase_commit == true && > > > > ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) ) > > > > ereport(ERROR, > > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > > errmsg("cannot COMMIT a distributed transaction that has operated on foreign serverthat doesn't support atomic commit"))); > > > > > > The error is emitted when a the GUC is turned off in the > > > trasaction where MarkTransactionModify'ed. I think that the > > > number of the variables' possible states should be reduced for > > > simplicity. For example in the case, once foreign_twopase_commit > > > is checked in a transaction, subsequent changes in the > > > transaction should be ignored during the transaction. > > > > > > > I might have not gotten your comment correctly but since the > > foreign_twophase_commit is a PGC_USERSET parameter I think we need to > > check it at commit time. Also we need to keep participant servers even > > when foreign_twophase_commit is off if both max_prepared_foreign_xacts > > and max_foreign_xact_resolvers are > 0. > > > > I will post the updated patch in this week. > > > > Attached the updated version patches. > > Based on the review comment from Horiguchi-san, I've changed the > atomic commit API so that the FDW developer who wish to support atomic > commit don't need to call the register function. The atomic commit > APIs are following: > > * GetPrepareId > * PrepareForeignTransaction > * CommitForeignTransaction > * RollbackForeignTransaction > * ResolveForeignTransaction > * IsTwophaseCommitEnabled > > The all APIs except for GetPreapreId is required for atomic commit. > > Also, I've changed the foreign_twophase_commit parameter to an enum > parameter based on the suggestion from Robert[1]. Valid values are > 'required', 'prefer' and 'disabled' (default). When set to either > 'required' or 'prefer' the atomic commit will be used. The difference > between 'required' and 'prefer' is that when set to 'requried' we > require for *all* modified server to be able to use 2pc whereas when > 'prefer' we require 2pc where available. So if any of written > participants disables 2pc or doesn't support atomic comit API the > transaction fails. IOW, when 'required' we can commit only when data > consistency among all participant can be left. > > Please review the patches. > Since the previous patch conflicts with current HEAD attached updated set of patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI > > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > > > > > Hello. > > > > > > > > # It took a long time to come here.. > > > > > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com> > > > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > ... > > > > > * Updated docs, added the new section "Distributed Transaction" at > > > > > Chapter 33 to explain the concept to users > > > > > > > > > > * Moved atomic commit codes into src/backend/access/fdwxact directory. > > > > > > > > > > * Some bug fixes. > > > > > > > > > > Please reivew them. > > > > > > > > I have some comments, with apologize in advance for possible > > > > duplicate or conflict with others' comments so far. > > > > > > Thank youf so much for reviewing this patch! > > > > > > > > > > > 0001: > > > > > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT > > > > relation is modified. Isn't it needed when UNLOGGED tables are > > > > modified? It may be better that we have dedicated classification > > > > macro or function. > > > > > > I think even if we do atomic commit for modifying the an UNLOGGED > > > table and a remote table the data will get inconsistent if the local > > > server crashes. For example, if the local server crashes after > > > prepared the transaction on foreign server but before the local commit > > > and, we will lose the all data of the local UNLOGGED table whereas the > > > modification of remote table is rollbacked. In case of persistent > > > tables, the data consistency is left. So I think the keeping data > > > consistency between remote data and local unlogged table is difficult > > > and want to leave it as a restriction for now. Am I missing something? > > > > > > > > > > > The flag is handled in heapam.c. I suppose that it should be done > > > > in the upper layer considering coming pluggable storage. > > > > (X_F_ACCESSEDTEMPREL is set in heapam, but..) > > > > > > > > > > Yeah, or we can set the flag after heap_insert in ExecInsert. > > > > > > > > > > > 0002: > > > > > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How > > > > about FdwXactAtomicCommitPartitcipants? > > > > > > +1, will fix it. > > > > > > > > > > > Well, as the file comment of fdwxact.c, > > > > FdwXactRegisterTransaction is called from FDW driver and > > > > F_X_MarkForeignTransactionModified is called from executor. I > > > > think that we should clarify who is responsible to the whole > > > > sequence. Since the state of local tables affects, I suppose > > > > executor is that. Couldn't we do the whole thing within executor > > > > side? I'm not sure but I feel that > > > > F_X_RegisterForeignTransaction can be a part of > > > > F_X_MarkForeignTransactionModified. The callers of > > > > MarkForeignTransactionModified can find whether the table is > > > > involved in 2pc by IsTwoPhaseCommitEnabled interface. > > > > > > Indeed. We can register foreign servers by executor while FDWs don't > > > need to register anything. I will remove the registration function so > > > that FDW developers don't need to call the register function but only > > > need to provide atomic commit APIs. > > > > > > > > > > > > > > > > if (foreign_twophase_commit == true && > > > > > ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) ) > > > > > ereport(ERROR, > > > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > > > errmsg("cannot COMMIT a distributed transaction that has operated on foreign serverthat doesn't support atomic commit"))); > > > > > > > > The error is emitted when a the GUC is turned off in the > > > > trasaction where MarkTransactionModify'ed. I think that the > > > > number of the variables' possible states should be reduced for > > > > simplicity. For example in the case, once foreign_twopase_commit > > > > is checked in a transaction, subsequent changes in the > > > > transaction should be ignored during the transaction. > > > > > > > > > > I might have not gotten your comment correctly but since the > > > foreign_twophase_commit is a PGC_USERSET parameter I think we need to > > > check it at commit time. Also we need to keep participant servers even > > > when foreign_twophase_commit is off if both max_prepared_foreign_xacts > > > and max_foreign_xact_resolvers are > 0. > > > > > > I will post the updated patch in this week. > > > > > > > Attached the updated version patches. > > > > Based on the review comment from Horiguchi-san, I've changed the > > atomic commit API so that the FDW developer who wish to support atomic > > commit don't need to call the register function. The atomic commit > > APIs are following: > > > > * GetPrepareId > > * PrepareForeignTransaction > > * CommitForeignTransaction > > * RollbackForeignTransaction > > * ResolveForeignTransaction > > * IsTwophaseCommitEnabled > > > > The all APIs except for GetPreapreId is required for atomic commit. > > > > Also, I've changed the foreign_twophase_commit parameter to an enum > > parameter based on the suggestion from Robert[1]. Valid values are > > 'required', 'prefer' and 'disabled' (default). When set to either > > 'required' or 'prefer' the atomic commit will be used. The difference > > between 'required' and 'prefer' is that when set to 'requried' we > > require for *all* modified server to be able to use 2pc whereas when > > 'prefer' we require 2pc where available. So if any of written > > participants disables 2pc or doesn't support atomic comit API the > > transaction fails. IOW, when 'required' we can commit only when data > > consistency among all participant can be left. > > > > Please review the patches. > > > > Since the previous patch conflicts with current HEAD attached updated > set of patches. > Rebased and fixed a few bugs. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Thu, Nov 15, 2018 at 7:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI > > > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > > > > > > > Hello. > > > > > > > > > > # It took a long time to come here.. > > > > > > > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com> > > > > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > ... > > > > > > * Updated docs, added the new section "Distributed Transaction" at > > > > > > Chapter 33 to explain the concept to users > > > > > > > > > > > > * Moved atomic commit codes into src/backend/access/fdwxact directory. > > > > > > > > > > > > * Some bug fixes. > > > > > > > > > > > > Please reivew them. > > > > > > > > > > I have some comments, with apologize in advance for possible > > > > > duplicate or conflict with others' comments so far. > > > > > > > > Thank youf so much for reviewing this patch! > > > > > > > > > > > > > > 0001: > > > > > > > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT > > > > > relation is modified. Isn't it needed when UNLOGGED tables are > > > > > modified? It may be better that we have dedicated classification > > > > > macro or function. > > > > > > > > I think even if we do atomic commit for modifying the an UNLOGGED > > > > table and a remote table the data will get inconsistent if the local > > > > server crashes. For example, if the local server crashes after > > > > prepared the transaction on foreign server but before the local commit > > > > and, we will lose the all data of the local UNLOGGED table whereas the > > > > modification of remote table is rollbacked. In case of persistent > > > > tables, the data consistency is left. So I think the keeping data > > > > consistency between remote data and local unlogged table is difficult > > > > and want to leave it as a restriction for now. Am I missing something? > > > > > > > > > > > > > > The flag is handled in heapam.c. I suppose that it should be done > > > > > in the upper layer considering coming pluggable storage. > > > > > (X_F_ACCESSEDTEMPREL is set in heapam, but..) > > > > > > > > > > > > > Yeah, or we can set the flag after heap_insert in ExecInsert. > > > > > > > > > > > > > > 0002: > > > > > > > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How > > > > > about FdwXactAtomicCommitPartitcipants? > > > > > > > > +1, will fix it. > > > > > > > > > > > > > > Well, as the file comment of fdwxact.c, > > > > > FdwXactRegisterTransaction is called from FDW driver and > > > > > F_X_MarkForeignTransactionModified is called from executor. I > > > > > think that we should clarify who is responsible to the whole > > > > > sequence. Since the state of local tables affects, I suppose > > > > > executor is that. Couldn't we do the whole thing within executor > > > > > side? I'm not sure but I feel that > > > > > F_X_RegisterForeignTransaction can be a part of > > > > > F_X_MarkForeignTransactionModified. The callers of > > > > > MarkForeignTransactionModified can find whether the table is > > > > > involved in 2pc by IsTwoPhaseCommitEnabled interface. > > > > > > > > Indeed. We can register foreign servers by executor while FDWs don't > > > > need to register anything. I will remove the registration function so > > > > that FDW developers don't need to call the register function but only > > > > need to provide atomic commit APIs. > > > > > > > > > > > > > > > > > > > > if (foreign_twophase_commit == true && > > > > > > ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) ) > > > > > > ereport(ERROR, > > > > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > > > > errmsg("cannot COMMIT a distributed transaction that has operated on foreign serverthat doesn't support atomic commit"))); > > > > > > > > > > The error is emitted when a the GUC is turned off in the > > > > > trasaction where MarkTransactionModify'ed. I think that the > > > > > number of the variables' possible states should be reduced for > > > > > simplicity. For example in the case, once foreign_twopase_commit > > > > > is checked in a transaction, subsequent changes in the > > > > > transaction should be ignored during the transaction. > > > > > > > > > > > > > I might have not gotten your comment correctly but since the > > > > foreign_twophase_commit is a PGC_USERSET parameter I think we need to > > > > check it at commit time. Also we need to keep participant servers even > > > > when foreign_twophase_commit is off if both max_prepared_foreign_xacts > > > > and max_foreign_xact_resolvers are > 0. > > > > > > > > I will post the updated patch in this week. > > > > > > > > > > Attached the updated version patches. > > > > > > Based on the review comment from Horiguchi-san, I've changed the > > > atomic commit API so that the FDW developer who wish to support atomic > > > commit don't need to call the register function. The atomic commit > > > APIs are following: > > > > > > * GetPrepareId > > > * PrepareForeignTransaction > > > * CommitForeignTransaction > > > * RollbackForeignTransaction > > > * ResolveForeignTransaction > > > * IsTwophaseCommitEnabled > > > > > > The all APIs except for GetPreapreId is required for atomic commit. > > > > > > Also, I've changed the foreign_twophase_commit parameter to an enum > > > parameter based on the suggestion from Robert[1]. Valid values are > > > 'required', 'prefer' and 'disabled' (default). When set to either > > > 'required' or 'prefer' the atomic commit will be used. The difference > > > between 'required' and 'prefer' is that when set to 'requried' we > > > require for *all* modified server to be able to use 2pc whereas when > > > 'prefer' we require 2pc where available. So if any of written > > > participants disables 2pc or doesn't support atomic comit API the > > > transaction fails. IOW, when 'required' we can commit only when data > > > consistency among all participant can be left. > > > > > > Please review the patches. > > > > > > > Since the previous patch conflicts with current HEAD attached updated > > set of patches. > > > > Rebased and fixed a few bugs. > I got feedbacks regarding transaciton management FDW APIs at Japan PostgreSQL Developer Meetup[1] and am considering to change these APIs to make them consistent with XA interface[2] (xa_prepare(), xa_commit() and xa_rollback()) as follows[3]. * FdwXactResult PrepareForeignTransaction(FdwXactState *state, inf flags) * FdwXactResult CommitForeignTransaction(FdwXactState *state, inf flags) * FdwXactResult RollbackForeignTransaction(FdwXactState *state, inf flags) * char *GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len) Where flags set variaous setttings, currently it would contain only FDW_XACT_FLAG_ONEPHASE that requires FDW to commit in one-phase (i.e. without preparation). And where *state would contains information necessary for specifying transaction: serverid, userid, usermappingid and prepared id. GetPrepareId API is optional. Also I've removed the two_phase_commit parameter from postgres_fdw options because we can disable to use two-phase commit protocol for distributed transactions using by distributed_atomic_commit GUC parameter. Foreign transactions whose FDW provides both CommitForeignTransaction API and RollbackForeignTransaction API will be managed by the global transaction manager automatically. In addition, if the FDW also provide PrepareForeignTransaction API it will participate to two-phase commit protocol as a participant. So the existing FDWs that don't provide transaction management FDW APIs can continue to work as before even though this patch get committed. The one point I'm concerned about this API design would be that since both CommitForeignTransaction API and RollbackForeignTransaction API will be used by two different kinds of process (backend and transaction resolver processes), it might be hard to understand them correctly for FDW developers. I'd like to define new APIs so that FDW developers don't get confused. Feedback is very welcome. [1] https://wiki.postgresql.org/wiki/Japan_PostgreSQL_Developer_Meetup [2] https://en.wikipedia.org/wiki/X/Open_XA [3] The current API design I'm proposing has 6 APIs: Prepare, Commit, Rollback, Resolve, IsTwoPhaseEnabled and GetPrepareId. And these APIs are devided based on who executes it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Thu, Nov 15, 2018 at 7:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
> > > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > > > >
> > > > > Hello.
> > > > >
> > > > > # It took a long time to come here..
> > > > >
> > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBf-AJup-_ARfpqR42gJQ_XjNsvv-XE0rCOCLEkT=HCg@mail.gmail.com>
> > > > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > ...
> > > > > > * Updated docs, added the new section "Distributed Transaction" at
> > > > > > Chapter 33 to explain the concept to users
> > > > > >
> > > > > > * Moved atomic commit codes into src/backend/access/fdwxact directory.
> > > > > >
> > > > > > * Some bug fixes.
> > > > > >
> > > > > > Please reivew them.
> > > > >
> > > > > I have some comments, with apologize in advance for possible
> > > > > duplicate or conflict with others' comments so far.
> > > >
> > > > Thank youf so much for reviewing this patch!
> > > >
> > > > >
> > > > > 0001:
> > > > >
> > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
> > > > > relation is modified. Isn't it needed when UNLOGGED tables are
> > > > > modified? It may be better that we have dedicated classification
> > > > > macro or function.
> > > >
> > > > I think even if we do atomic commit for modifying the an UNLOGGED
> > > > table and a remote table the data will get inconsistent if the local
> > > > server crashes. For example, if the local server crashes after
> > > > prepared the transaction on foreign server but before the local commit
> > > > and, we will lose the all data of the local UNLOGGED table whereas the
> > > > modification of remote table is rollbacked. In case of persistent
> > > > tables, the data consistency is left. So I think the keeping data
> > > > consistency between remote data and local unlogged table is difficult
> > > > and want to leave it as a restriction for now. Am I missing something?
> > > >
> > > > >
> > > > > The flag is handled in heapam.c. I suppose that it should be done
> > > > > in the upper layer considering coming pluggable storage.
> > > > > (X_F_ACCESSEDTEMPREL is set in heapam, but..)
> > > > >
> > > >
> > > > Yeah, or we can set the flag after heap_insert in ExecInsert.
> > > >
> > > > >
> > > > > 0002:
> > > > >
> > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How
> > > > > about FdwXactAtomicCommitPartitcipants?
> > > >
> > > > +1, will fix it.
> > > >
> > > > >
> > > > > Well, as the file comment of fdwxact.c,
> > > > > FdwXactRegisterTransaction is called from FDW driver and
> > > > > F_X_MarkForeignTransactionModified is called from executor. I
> > > > > think that we should clarify who is responsible to the whole
> > > > > sequence. Since the state of local tables affects, I suppose
> > > > > executor is that. Couldn't we do the whole thing within executor
> > > > > side? I'm not sure but I feel that
> > > > > F_X_RegisterForeignTransaction can be a part of
> > > > > F_X_MarkForeignTransactionModified. The callers of
> > > > > MarkForeignTransactionModified can find whether the table is
> > > > > involved in 2pc by IsTwoPhaseCommitEnabled interface.
> > > >
> > > > Indeed. We can register foreign servers by executor while FDWs don't
> > > > need to register anything. I will remove the registration function so
> > > > that FDW developers don't need to call the register function but only
> > > > need to provide atomic commit APIs.
> > > >
> > > > >
> > > > >
> > > > > > if (foreign_twophase_commit == true &&
> > > > > > ((MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) )
> > > > > > ereport(ERROR,
> > > > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > > > > errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't support atomic commit")));
> > > > >
> > > > > The error is emitted when a the GUC is turned off in the
> > > > > trasaction where MarkTransactionModify'ed. I think that the
> > > > > number of the variables' possible states should be reduced for
> > > > > simplicity. For example in the case, once foreign_twopase_commit
> > > > > is checked in a transaction, subsequent changes in the
> > > > > transaction should be ignored during the transaction.
> > > > >
> > > >
> > > > I might have not gotten your comment correctly but since the
> > > > foreign_twophase_commit is a PGC_USERSET parameter I think we need to
> > > > check it at commit time. Also we need to keep participant servers even
> > > > when foreign_twophase_commit is off if both max_prepared_foreign_xacts
> > > > and max_foreign_xact_resolvers are > 0.
> > > >
> > > > I will post the updated patch in this week.
> > > >
> > >
> > > Attached the updated version patches.
> > >
> > > Based on the review comment from Horiguchi-san, I've changed the
> > > atomic commit API so that the FDW developer who wish to support atomic
> > > commit don't need to call the register function. The atomic commit
> > > APIs are following:
> > >
> > > * GetPrepareId
> > > * PrepareForeignTransaction
> > > * CommitForeignTransaction
> > > * RollbackForeignTransaction
> > > * ResolveForeignTransaction
> > > * IsTwophaseCommitEnabled
> > >
> > > The all APIs except for GetPreapreId is required for atomic commit.
> > >
> > > Also, I've changed the foreign_twophase_commit parameter to an enum
> > > parameter based on the suggestion from Robert[1]. Valid values are
> > > 'required', 'prefer' and 'disabled' (default). When set to either
> > > 'required' or 'prefer' the atomic commit will be used. The difference
> > > between 'required' and 'prefer' is that when set to 'requried' we
> > > require for *all* modified server to be able to use 2pc whereas when
> > > 'prefer' we require 2pc where available. So if any of written
> > > participants disables 2pc or doesn't support atomic comit API the
> > > transaction fails. IOW, when 'required' we can commit only when data
> > > consistency among all participant can be left.
> > >
> > > Please review the patches.
> > >
> >
> > Since the previous patch conflicts with current HEAD attached updated
> > set of patches.
> >
>
> Rebased and fixed a few bugs.
>
I got feedbacks regarding transaciton management FDW APIs at Japan
PostgreSQL Developer Meetup[1] and am considering to change these APIs
to make them consistent with XA interface[2] (xa_prepare(),
xa_commit() and xa_rollback()) as follows[3].
* FdwXactResult PrepareForeignTransaction(FdwXactState *state, inf flags)
* FdwXactResult CommitForeignTransaction(FdwXactState *state, inf flags)
* FdwXactResult RollbackForeignTransaction(FdwXactState *state, inf flags)
* char *GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int
*prep_id_len)
Where flags set variaous setttings, currently it would contain only
FDW_XACT_FLAG_ONEPHASE that requires FDW to commit in one-phase (i.e.
without preparation). And where *state would contains information
necessary for specifying transaction: serverid, userid, usermappingid
and prepared id. GetPrepareId API is optional. Also I've removed the
two_phase_commit parameter from postgres_fdw options because we can
disable to use two-phase commit protocol for distributed transactions
using by distributed_atomic_commit GUC parameter.
Foreign transactions whose FDW provides both CommitForeignTransaction
API and RollbackForeignTransaction API will be managed by the global
transaction manager automatically. In addition, if the FDW also
provide PrepareForeignTransaction API it will participate to two-phase
commit protocol as a participant. So the existing FDWs that don't
provide transaction management FDW APIs can continue to work as before
even though this patch get committed.
The one point I'm concerned about this API design would be that since
both CommitForeignTransaction API and RollbackForeignTransaction API
will be used by two different kinds of process (backend and
transaction resolver processes), it might be hard to understand them
correctly for FDW developers.
I'd like to define new APIs so that FDW developers don't get confused.
Feedback is very welcome.
[1] https://wiki.postgresql.org/wiki/Japan_PostgreSQL_Developer_Meetup
[2] https://en.wikipedia.org/wiki/X/Open_XA
[3] The current API design I'm proposing has 6 APIs: Prepare, Commit,
Rollback, Resolve, IsTwoPhaseEnabled and GetPrepareId. And these APIs
are devided based on who executes it.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Tue, Jan 29, 2019 at 5:47 PM Ildar Musin <ildar@adjust.com> wrote: > > Hello, > > The patch needs rebase as it doesn't apply to the current master. I applied it > to the older commit to test it. It worked fine so far. Thank you for testing the patch! > > I found one bug though which would cause resolver to finish by timeout even > though there are unresolved foreign transactions in the list. The > `fdw_xact_exists()` function expects database id as the first argument and xid > as the second. But everywhere it is called arguments specified in the different > order (xid first, then dbid). Also function declaration in header doesn't > match its definition. Will fix. > > There are some other things I found. > * In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is > declared as bool but used as integer. > * In fdwxact.c's module comment there are `FdwXactRegisterForeignTransaction()` > and `FdwXactMarkForeignTransactionModified()` functions mentioned that are > not there anymore. > * In documentation (storage.sgml) there is no mention of `pg_fdw_xact` > directory. > > Couple of stylistic notes. > * In `FdwXactCtlData struct` there are both camel case and snake case naming > used. > * In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with > `TransactionIdIsValid(xid)`. > * In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of format > string instead of being processed by `sprintf` as an extra argument. > I'll incorporate them at the next patch set. > I'll continue looking into the patch. Thanks! Thanks. Actually I'm updating the patch set, changing API interface as I proposed before and improving the document and README. I'll submit the latest patch next week. -- Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Thu, Jan 31, 2019 at 11:09:09AM +0100, Masahiko Sawada wrote: > Thanks. Actually I'm updating the patch set, changing API interface as > I proposed before and improving the document and README. I'll submit > the latest patch next week. Cool, I have moved the patch to next CF. -- Michael
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Thu, Jan 31, 2019 at 7:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Jan 29, 2019 at 5:47 PM Ildar Musin <ildar@adjust.com> wrote: > > > > Hello, > > > > The patch needs rebase as it doesn't apply to the current master. I applied it > > to the older commit to test it. It worked fine so far. > > Thank you for testing the patch! > > > > > I found one bug though which would cause resolver to finish by timeout even > > though there are unresolved foreign transactions in the list. The > > `fdw_xact_exists()` function expects database id as the first argument and xid > > as the second. But everywhere it is called arguments specified in the different > > order (xid first, then dbid). Also function declaration in header doesn't > > match its definition. > > Will fix. > > > > > There are some other things I found. > > * In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is > > declared as bool but used as integer. > > * In fdwxact.c's module comment there are `FdwXactRegisterForeignTransaction()` > > and `FdwXactMarkForeignTransactionModified()` functions mentioned that are > > not there anymore. > > * In documentation (storage.sgml) there is no mention of `pg_fdw_xact` > > directory. > > > > Couple of stylistic notes. > > * In `FdwXactCtlData struct` there are both camel case and snake case naming > > used. > > * In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with > > `TransactionIdIsValid(xid)`. > > * In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of format > > string instead of being processed by `sprintf` as an extra argument. > > > > I'll incorporate them at the next patch set. > > > I'll continue looking into the patch. Thanks! > > Thanks. Actually I'm updating the patch set, changing API interface as > I proposed before and improving the document and README. I'll submit > the latest patch next week. > Sorry for the very late. Attached updated version patches. The basic mechanism has not been changed since the previous version. But the updated version patch uses the single wait queue instead of two queues (active and retry) which were used in the previous version. Every backends processes has a timestamp in PGPROC (fdwXactNextResolutionTs), that is the time when they expect to be processed by foreign resolver process at. Entries in the wait queue is ordered by theirs timestamps. The wait queue and timestamp are used after a backend process prepared all transactions on foreign servers and wait for all of them to be resolved. Backend processes who are committing/aborting the distributed transaction insert itself to the wait queue (FdwXactRslvCtl->fdwxact_queue) with the current timestamp, and then request to launch a new resolver process if not launched yet. If there is resolver connecting to the same database they just set its latch. The wait queue is protected by LWLock FdwXactResolutionLock. Then the backend sleep until either user requests to cancel (press ctrl-c) or waken up by resolver process. Foreign resolver process continue to poll the wait queue, checking if there is any waiter on the database that the resolver process connects to. If there is a waiter, fetches it and check its timestamp. If the current timestamp goes over its timestamp, the resolver process start to resolve all foreign transactions. Usually backends processes insert itself to wait queue first then wake up the resolver and they use the same wall-clock, so the resolver can fetch the waiter just inserted. Once all foreign transactions are resolved, the resolver process delete the backend entry from the wait queue, and then wake up the waiting backend. On failure during foreign transaction resolution, while the backend is still sleeping, the resolver process removes and inserts the backend with the new timestamp (its timestamp foreign_transaction_resolution_interval) to appropriate position in the wait queue. This mechanism ensures that a distributed transaction is resolved as soon as the waiter inserted while ensuring that the resolver can retry to resolve the failed foreign transactions at a interval of foreign_transaction_resolution_interval time. For handling in-doubt transactions, I've removed the automatically foreign transaction resolution code from the first version patch since it's not essential feature and we can add it later. Therefore user needs to resolve unresolved foreign transactions manually using by pg_resolve_fdwxacts() function in three cases: where the foreign server crashed or we lost connectibility to it during preparing foreign transaction, where the coordinator node crashed during preparing/resolving the foreign transaction and where user canceled to resolve the foreign transaction. For foreign transaction resolver processes, they exit if they don't have any foreign transaction to resolve longer than foreign_transaction_resolver_timeout. Since we cannot drop a database while a resolver process is connecting to we can stop it call by pg_stop_fdwxact_resolver() function. The comment at top of fdwxact.c file describes about locking mechanism and recovery, and src/backend/fdwxact/README descries about status transition of FdwXact. Also the wiki page[1] describes how to use this feature with some examples. [1] https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Apr 17, 2019 at 10:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Sorry for the very late. Attached updated version patches. Hello Sawada-san, Can we please have a fresh rebase? Thanks, -- Thomas Munro https://enterprisedb.com
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Apr 17, 2019 at 10:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Sorry for the very late. Attached updated version patches. > > Hello Sawada-san, > > Can we please have a fresh rebase? > Thank you for the notice. Attached rebased patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
Hello Sawada-san, On 2019-Jul-02, Masahiko Sawada wrote: > On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Can we please have a fresh rebase? > > Thank you for the notice. Attached rebased patches. ... and again? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Sep 4, 2019 at 7:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Hello Sawada-san, > > On 2019-Jul-02, Masahiko Sawada wrote: > > > On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > Can we please have a fresh rebase? > > > > Thank you for the notice. Attached rebased patches. > > ... and again? > Thank you for the notice. I've attached rebased patch set. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Sep 4, 2019 at 10:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Sep 4, 2019 at 7:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > Hello Sawada-san, > > > > On 2019-Jul-02, Masahiko Sawada wrote: > > > > > On Mon, Jul 1, 2019 at 8:32 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > > > Can we please have a fresh rebase? > > > > > > Thank you for the notice. Attached rebased patches. > > > > ... and again? > > > > Thank you for the notice. I've attached rebased patch set. I forgot to include some new header files. Attached the updated patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Wed, Sep 04, 2019 at 12:44:20PM +0900, Masahiko Sawada wrote: > I forgot to include some new header files. Attached the updated patches. No reviews since and the patch does not apply anymore. I am moving it to next CF, waiting on author. -- Michael
Attachment
Hello. This is the reased (and a bit fixed) version of the patch. This applies on the master HEAD and passes all provided tests. I took over this work from Sawada-san. I'll begin with reviewing the current patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 733f1e413ef2b2fe1d3ecba41eb4cd8e355ab826 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 5 Dec 2019 16:59:47 +0900 Subject: [PATCH v26 1/5] Keep track of writing on non-temporary relation Original Author: Masahiko Sawada <sawada.mshk@gmail.com> --- src/backend/executor/nodeModifyTable.c | 12 ++++++++++++ src/include/access/xact.h | 6 ++++++ 2 files changed, 18 insertions(+) diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index e3eb9d7b90..cd91f9c8a8 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -587,6 +587,10 @@ ExecInsert(ModifyTableState *mtstate, estate->es_output_cid, 0, NULL); + /* Make note that we've wrote on non-temprary relation */ + if (RelationNeedsWAL(resultRelationDesc)) + MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL; + /* insert index entries for tuple */ if (resultRelInfo->ri_NumIndices > 0) recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, @@ -938,6 +942,10 @@ ldelete:; if (tupleDeleted) *tupleDeleted = true; + /* Make note that we've wrote on non-temprary relation */ + if (RelationNeedsWAL(resultRelationDesc)) + MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL; + /* * If this delete is the result of a partition key update that moved the * tuple to a new partition, put this row into the transition OLD TABLE, @@ -1447,6 +1455,10 @@ lreplace:; recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL); } + /* Make note that we've wrote on non-temprary relation */ + if (RelationNeedsWAL(resultRelationDesc)) + MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL; + if (canSetTag) (estate->es_processed)++; diff --git a/src/include/access/xact.h b/src/include/access/xact.h index 9d2899dea1..cb5c4935d2 100644 --- a/src/include/access/xact.h +++ b/src/include/access/xact.h @@ -102,6 +102,12 @@ extern int MyXactFlags; */ #define XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK (1U << 1) +/* + * XACT_FLAGS_WROTENONTEMPREL - set when we wrote data on non-temporary + * relation. + */ +#define XACT_FLAGS_WROTENONTEMPREL (1U << 2) + /* * start- and end-of-transaction callbacks for dynamically loaded modules */ -- 2.23.0 From d21c72a7db85c2211504f60fca8d39c0bd0ee5a6 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 5 Dec 2019 17:00:50 +0900 Subject: [PATCH v26 2/5] Support atomic commit among multiple foreign servers. Original Author: Masahiko Sawada <sawada.mshk@gmail.com> --- src/backend/access/Makefile | 2 +- src/backend/access/fdwxact/Makefile | 17 + src/backend/access/fdwxact/README | 130 + src/backend/access/fdwxact/fdwxact.c | 2816 +++++++++++++++++ src/backend/access/fdwxact/launcher.c | 644 ++++ src/backend/access/fdwxact/resolver.c | 344 ++ src/backend/access/rmgrdesc/Makefile | 1 + src/backend/access/rmgrdesc/fdwxactdesc.c | 58 + src/backend/access/rmgrdesc/xlogdesc.c | 6 +- src/backend/access/transam/rmgr.c | 1 + src/backend/access/transam/twophase.c | 42 + src/backend/access/transam/xact.c | 27 +- src/backend/access/transam/xlog.c | 34 +- src/backend/catalog/system_views.sql | 11 + src/backend/commands/copy.c | 6 + src/backend/commands/foreigncmds.c | 30 + src/backend/executor/execPartition.c | 8 + src/backend/executor/nodeForeignscan.c | 24 + src/backend/executor/nodeModifyTable.c | 18 + src/backend/foreign/foreign.c | 57 + src/backend/postmaster/bgworker.c | 8 + src/backend/postmaster/pgstat.c | 20 + src/backend/postmaster/postmaster.c | 15 +- src/backend/replication/logical/decode.c | 1 + src/backend/storage/ipc/ipci.c | 6 + src/backend/storage/ipc/procarray.c | 46 + src/backend/storage/lmgr/lwlocknames.txt | 3 + src/backend/storage/lmgr/proc.c | 8 + src/backend/tcop/postgres.c | 14 + src/backend/utils/misc/guc.c | 82 + src/backend/utils/misc/postgresql.conf.sample | 16 + src/backend/utils/probes.d | 2 + src/bin/initdb/initdb.c | 1 + src/bin/pg_controldata/pg_controldata.c | 2 + src/bin/pg_resetwal/pg_resetwal.c | 2 + src/bin/pg_waldump/fdwxactdesc.c | 1 + src/bin/pg_waldump/rmgrdesc.c | 1 + src/include/access/fdwxact.h | 165 + src/include/access/fdwxact_launcher.h | 29 + src/include/access/fdwxact_resolver.h | 23 + src/include/access/fdwxact_xlog.h | 54 + src/include/access/resolver_internal.h | 66 + src/include/access/rmgrlist.h | 1 + src/include/access/twophase.h | 1 + src/include/access/xact.h | 7 + src/include/access/xlog_internal.h | 1 + src/include/catalog/pg_control.h | 1 + src/include/catalog/pg_proc.dat | 29 + src/include/foreign/fdwapi.h | 12 + src/include/foreign/foreign.h | 1 + src/include/pgstat.h | 9 +- src/include/storage/proc.h | 11 + src/include/storage/procarray.h | 5 + src/include/utils/guc_tables.h | 3 + src/test/regress/expected/rules.out | 13 + 55 files changed, 4917 insertions(+), 18 deletions(-) create mode 100644 src/backend/access/fdwxact/Makefile create mode 100644 src/backend/access/fdwxact/README create mode 100644 src/backend/access/fdwxact/fdwxact.c create mode 100644 src/backend/access/fdwxact/launcher.c create mode 100644 src/backend/access/fdwxact/resolver.c create mode 100644 src/backend/access/rmgrdesc/fdwxactdesc.c create mode 120000 src/bin/pg_waldump/fdwxactdesc.c create mode 100644 src/include/access/fdwxact.h create mode 100644 src/include/access/fdwxact_launcher.h create mode 100644 src/include/access/fdwxact_resolver.h create mode 100644 src/include/access/fdwxact_xlog.h create mode 100644 src/include/access/resolver_internal.h diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile index 0880e0a8bb..49480dd039 100644 --- a/src/backend/access/Makefile +++ b/src/backend/access/Makefile @@ -9,6 +9,6 @@ top_builddir = ../../.. include $(top_builddir)/src/Makefile.global SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \ - table tablesample transam + table tablesample transam fdwxact include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/fdwxact/Makefile b/src/backend/access/fdwxact/Makefile new file mode 100644 index 0000000000..0207a66fb4 --- /dev/null +++ b/src/backend/access/fdwxact/Makefile @@ -0,0 +1,17 @@ +#------------------------------------------------------------------------- +# +# Makefile-- +# Makefile for access/fdwxact +# +# IDENTIFICATION +# src/backend/access/fdwxact/Makefile +# +#------------------------------------------------------------------------- + +subdir = src/backend/access/fdwxact +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global + +OBJS = fdwxact.o resolver.o launcher.o + +include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/fdwxact/README b/src/backend/access/fdwxact/README new file mode 100644 index 0000000000..46ccb7eeae --- /dev/null +++ b/src/backend/access/fdwxact/README @@ -0,0 +1,130 @@ +src/backend/access/fdwxact/README + +Atomic Commit for Distributed Transactions +=========================================== + +The atomic commit feature enables us to commit and rollback either all of +foreign servers or nothing. This ensures that the database data is always left +in a conssitent state in term of federated database. + + +Commit Sequence of Global Transactions +-------------------------------- + +We employee two-phase commit protocol to achieve commit among all foreign +servers atomically. The sequence of distributed transaction commit consisnts +of the following four steps: + +1. Foriegn Server Registration +During executor node initialization, accessed foreign servers are registered +to the list FdwXactAtomicCommitParticipants, which is maintained by +PostgreSQL's the global transaction manager (GTM), as a distributed transaction +participant The registered foreign transactions are tracked until the end of +transaction. + +2. Pre-Commit phase (1st phase of two-phase commit) +we record the corresponding WAL indicating that the foreign server is involved +with the current transaction before doing PREPARE all foreign transactions. +Thus in case we loose connectivity to the foreign server or crash ourselves, +we will remember that we might have prepared tranascation on the foreign +server, and try to resolve it when connectivity is restored or after crash +recovery. + +The two-phase commit is required only if the transaction modified two or more +servers including the local node. In other case, we can commit them at this +step by calling CommitForeignTransaction() API and no need further operation. + +After that we prepare all foreign transactions by calling +PrepareForeignTransaction() API. If we failed on any of them we change to +rollback, therefore at this time some participants might be prepared whereas +some are not prepared. The former foreign transactions need to be resolved +using pg_resolve_foreign_xact() manually and the latter ends transaction +in one-phase by calling RollbackForeignTransaction() API. + +3. Commit locally +Once we've prepared all of them, commit the transaction locally. + +4. Post-Commit Phase (2nd phase of two-phase commit) +The steps so far are done by the backend process committing the transaction but +this resolution step(commit or rollback) is done by the foreign transaction +resolver process. The backend process inserts itselft to the wait queue, and +then wake up the resolver process (or request to launch new one if necessary). +The resolver process enqueue the waiter and fetch the distributed transaction +information that the backend is waiting for. Once all foreign transaction are +committed or rolbacked the resolver process wake up the waiter. + + +API Contract With Transaction Management Callback Functions +----------------------------------------------------------- + +The core GTM manages the status of individual foreign transactions and calls +transaction management callback functions according to its status. Each +callback functions PrepareForiegnTransaction, CommitForeignTransaction and +RollbackForeignTransaction is responsible for either PREPARE, COMMIT or +ROLLBACK the trasaction on the foreign server respectively. +FdwXactRslvState->flags could contain FDWXACT_FLAG_ONEPHASE, meaning FDW can +commit or rollback the foreign transactio in one-phase. On failure during +processing a foreign transaction, FDW needs to raise an error. However, FDW +must accept ERRCODE_UNDEFINED_OBJECT error during committing or rolling back a +foreign transaction, because there is a race condition that the coordinator +could crash in time between the resolution is completed and writing the WAL +removing the FdwXact entry. + + +Foreign Transactions Status +---------------------------- + +Every foreign transactions has an FdwXact entry. When preparing a foreign +transaction a FdwXact entry of which status starts from FDWXACT_STATUS_INITIAL +are created with WAL logging. The status changes to FDWXACT_STATUS_PREPARED +after the foreign transaction is prepared and it changes to +FDWXACT_STATUS_PREPARING, FDWXACT_STATUS_COMMITTING and FDWXACT_STATUS_ABORTING +before the foreign transaction is prepared, committed and aborted by FDW +callback functions respectively(*1). And the status then changes to +FDWXACT_STATUS_RESOLVED once the foreign transaction are resolved, and then +the corresponding FdwXact entry is removed with WAL logging. If failed during +processing foreign transaction (i.g. preparing, committing or aborting) the +status changes back to the previous status. Therefore the status +FDWXACT_STATUS_xxxING appear only during the foreign transaction is being +processed by an FDW callback function. + +FdwXact entries recovered during the recovery are marked as in-doubt if the +corresponding local transaction is not prepared transaction. The initial +status is FDWXACT_STATUS_PREPARED(*2). Because the foreign transaction was +being processed we cannot know the exact status. So we regard it as PREPARED +for safety. + +The foreign transaction status transition is illustrated by the following graph +describing the FdwXact->status: + + +----------------------------------------------------+ + | INVALID | + +----------------------------------------------------+ + | | | + | v | + | +---------------------+ | + | | INITIAL | | + | +---------------------+ | + (*2) | (*2) + | v | + | +---------------------+ | + | | PREPARING(*1) | | + | +---------------------+ | + | | | + v v v + +----------------------------------------------------+ + | PREPARED | + +----------------------------------------------------+ + | | + v v + +--------------------+ +--------------------+ + | COMMITTING(*1) | | ABORTING(*1) | + +--------------------+ +--------------------+ + | | + v v + +----------------------------------------------------+ + | RESOLVED | + +----------------------------------------------------+ + +(*1) Status that appear only during being processed by FDW +(*2) Paths for recovered FdwXact entries diff --git a/src/backend/access/fdwxact/fdwxact.c b/src/backend/access/fdwxact/fdwxact.c new file mode 100644 index 0000000000..058a416f81 --- /dev/null +++ b/src/backend/access/fdwxact/fdwxact.c @@ -0,0 +1,2816 @@ +/*------------------------------------------------------------------------- + * + * fdwxact.c + * PostgreSQL global transaction manager for foreign servers. + * + * To achieve commit among all foreign servers automically, we employee + * two-phase commit protocol, which is a type of atomic commitment + * protocol(ACP). The basic strategy is that we prepare all of the remote + * transactions before committing locally and commit them after committing + * locally. + * + * During executor node initialization, they can register the foreign server + * by calling either RegisterFdwXactByRelId() or RegisterFdwXactByServerId() + * to participate it to a group for global commit. The foreign servers are + * registered if FDW has both CommitForeignTransaciton API and + * RollbackForeignTransactionAPI. Registered participant servers are identified + * by OIDs of foreign server and user. + * + * During pre-commit of local transaction, we prepare the transaction on + * foreign server everywhere. And after committing or rolling back locally, + * we notify the resolver process and tell it to commit or rollback those + * transactions. If we ask it to commit, we also tell it to notify us when + * it's done, so that we can wait interruptibly for it to finish, and so + * that we're not trying to locally do work that might fail after foreign + * transaction are committed. + * + * The best performing way to manage the waiting backends is to have a + * queue of waiting backends, so that we can avoid searching the through all + * foreign transactions each time we receive a request. We have one queue + * of which elements are ordered by the timestamp that they expect to be + * processed at. Before waiting for foreign transactions being resolved the + * backend enqueues with the timestamp that they expects to be processed. + * Similary if failed to resolve them, it enqueues again with new timestamp + * (its timestamp + foreign_xact_resolution_interval). + * + * If any network failure, server crash occurs or user stopped waiting + * prepared foreign transactions are left in in-doubt state (aka. in-doubt + * transaction). Foreign transactions in in-doubt state are not resolved + * automatically so must be processed manually using by pg_resovle_fdwxact() + * function. + * + * Two-phase commit protocol is required if the transaction modified two or + * more servers including itself. In other case, all foreign transactions are + * committed or rolled back during pre-commit. + * + * LOCKING + * + * Whenever a foreign transaction is processed by FDW, the corresponding + * FdwXact entry is update. In order to protect the entry from concurrent + * removing we need to hold a lock on the entry or a lock for entire global + * array. However, we don't want to hold the lock during FDW is processing the + * foreign transaction that may take a unpredictable time. To avoid this, the + * in-memory data of foreign transaction follows a locking model based on + * four linked concepts: + * + * * A foreign transaction's status variable is switched using the LWLock + * FdwXactLock, which need to be hold in exclusive mode when updating the + * status, while readers need to hold it in shared mode when looking at the + * status. + * * A process who is going to update FdwXact entry cannot process foreign + * transaction that is being resolved. + * * So setting the status to FDWACT_STATUS_PREPARING, + * FDWXACT_STATUS_COMMITTING or FDWXACT_STATUS_ABORTING, which makes foreign + * transaction in-progress states, means to own the FdwXact entry, which + * protect it from updating/removing by concurrent writers. + * * Individual fields are protected by mutex where only the backend owning + * the foreign transaction is authorized to update the fields from its own + * one. + + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a + * process who is going to call transaction callback functions needs to change + * the status to the corresponding status above while holding FdwXactLock in + * exclusive mode, and call callback function after releasing the lock. + * + * RECOVERY + * + * During replay WAL and replication FdwXactCtl also holds information about + * active prepared foreign transaction that haven't been moved to disk yet. + * + * Replay of fdwxact records happens by the following rules: + * + * * At the beginning of recovery, pg_fdwxacts is scanned once, filling FdwXact + * with entries marked with fdwxact->inredo and fdwxact->ondisk. FdwXact file + * data older than the XID horizon of the redo position are discarded. + * * On PREPARE redo, the foreign transaction is added to FdwXactCtl->fdwxacts. + * We set fdwxact->inredo to true for such entries. + * * On Checkpoint we iterate through FdwXactCtl->fdwxacts entries that + * have fdwxact->inredo set and are behind the redo_horizon. We save + * them to disk and then set fdwxact->ondisk to true. + * * On resolution we delete the entry from FdwXactCtl->fdwxacts. If + * fdwxact->ondisk is true, the corresponding entry from the disk is + * additionally deleted. + * * RecoverFdwXacts() and PrescanFdwXacts() have been modified to go through + * fdwxact->inredo entries that have not made it to dink. + * + * These replay rules are borrowed from twophase.c + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/access/fdwxact/fdwxact.c + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h> + +#include "access/fdwxact.h" +#include "access/fdwxact_resolver.h" +#include "access/fdwxact_launcher.h" +#include "access/fdwxact_xlog.h" +#include "access/resolver_internal.h" +#include "access/heapam.h" +#include "access/htup_details.h" +#include "access/twophase.h" +#include "access/xact.h" +#include "access/xlog.h" +#include "access/xloginsert.h" +#include "access/xlogutils.h" +#include "catalog/pg_type.h" +#include "foreign/fdwapi.h" +#include "foreign/foreign.h" +#include "funcapi.h" +#include "libpq/pqsignal.h" +#include "miscadmin.h" +#include "parser/parsetree.h" +#include "pg_trace.h" +#include "pgstat.h" +#include "storage/fd.h" +#include "storage/ipc.h" +#include "storage/latch.h" +#include "storage/lock.h" +#include "storage/proc.h" +#include "storage/procarray.h" +#include "storage/pmsignal.h" +#include "storage/shmem.h" +#include "tcop/tcopprot.h" +#include "utils/builtins.h" +#include "utils/guc.h" +#include "utils/memutils.h" +#include "utils/ps_status.h" +#include "utils/rel.h" +#include "utils/snapmgr.h" + +/* Atomic commit is enabled by configuration */ +#define IsForeignTwophaseCommitEnabled() \ + (max_prepared_foreign_xacts > 0 && \ + max_foreign_xact_resolvers > 0) + +/* Foreign twophase commit is enabled and requested by user */ +#define IsForeignTwophaseCommitRequested() \ + (IsForeignTwophaseCommitEnabled() && \ + (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)) + +/* Check the FdwXactParticipant is capable of two-phase commit */ +#define IsSeverCapableOfTwophaseCommit(fdw_part) \ + (((FdwXactParticipant *)(fdw_part))->prepare_foreign_xact_fn != NULL) + +/* Check the FdwXact is begin resolved */ +#define FdwXactIsBeingResolved(fx) \ + (((((FdwXact)(fx))->status) == FDWXACT_STATUS_PREPARING) || \ + ((((FdwXact)(fx))->status) == FDWXACT_STATUS_COMMITTING) || \ + ((((FdwXact)(fx))->status) == FDWXACT_STATUS_ABORTING)) + +/* + * Structure to bundle the foreign transaction participant. This struct + * is created at the beginning of execution for each foreign servers and + * is used until the end of transaction where we cannot look at syscaches. + * Therefore, this is allocated in the TopTransactionContext. + */ +typedef struct FdwXactParticipant +{ + /* + * Pointer to a FdwXact entry in the global array. NULL if the entry + * is not inserted yet but this is registered as a participant. + */ + FdwXact fdwxact; + + /* Foreign server and user mapping info, passed to callback routines */ + ForeignServer *server; + UserMapping *usermapping; + + /* Transaction identifier used for PREPARE */ + char *fdwxact_id; + + /* true if modified the data on the server */ + bool modified; + + /* Callbacks for foreign transaction */ + PrepareForeignTransaction_function prepare_foreign_xact_fn; + CommitForeignTransaction_function commit_foreign_xact_fn; + RollbackForeignTransaction_function rollback_foreign_xact_fn; + GetPrepareId_function get_prepareid_fn; +} FdwXactParticipant; + +/* + * List of foreign transaction participants for atomic commit. This list + * has only foreign servers that provides transaction management callbacks, + * that is CommitForeignTransaction and RollbackForeignTransaction. + */ +static List *FdwXactParticipants = NIL; +static bool ForeignTwophaseCommitIsRequired = false; + +/* Directory where the foreign prepared transaction files will reside */ +#define FDWXACTS_DIR "pg_fdwxact" + +/* + * Name of foreign prepared transaction file is 8 bytes database oid, + * xid, foreign server oid and user oid separated by '_'. + * + * Since FdwXact stat file is created per foreign transaction in a + * distributed transaction and the xid of unresolved distributed + * transaction never reused, the name is fairly enough to ensure + * uniqueness. + */ +#define FDWXACT_FILE_NAME_LEN (8 + 1 + 8 + 1 + 8 + 1 + 8) +#define FdwXactFilePath(path, dbid, xid, serverid, userid) \ + snprintf(path, MAXPGPATH, FDWXACTS_DIR "/%08X_%08X_%08X_%08X", \ + dbid, xid, serverid, userid) + +/* Guc parameters */ +int max_prepared_foreign_xacts = 0; +int max_foreign_xact_resolvers = 0; +int foreign_twophase_commit = FOREIGN_TWOPHASE_COMMIT_DISABLED; + +/* Keep track of registering process exit call back. */ +static bool fdwXactExitRegistered = false; + +static FdwXact FdwXactInsertFdwXactEntry(TransactionId xid, + FdwXactParticipant *fdw_part); +static void FdwXactPrepareForeignTransactions(void); +static void FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part, + bool for_commit); +static void FdwXactResolveForeignTransaction(FdwXact fdwxact, + FdwXactRslvState *state, + FdwXactStatus fallback_status); +static void FdwXactComputeRequiredXmin(void); +static void FdwXactCancelWait(void); +static void FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn); +static void FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid, + Oid userid, bool give_warnings); +static void FdwXactQueueInsert(PGPROC *waiter); +static void AtProcExit_FdwXact(int code, Datum arg); +static void ForgetAllFdwXactParticipants(void); +static char *ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, + Oid userid); +static void RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, + Oid userid, bool giveWarning); +static void RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, + Oid userid, void *content, int len); +static void XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len); +static char *ProcessFdwXactBuffer(Oid dbid, TransactionId local_xid, + Oid serverid, Oid userid, + XLogRecPtr insert_start_lsn, + bool from_disk); +static void FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock); +static bool is_foreign_twophase_commit_required(void); +static void register_fdwxact(Oid serverid, Oid userid, bool modified); +static List *get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid, + bool including_indoubts, bool include_in_progress, + bool need_lock); +static FdwXact get_all_fdwxacts(int *num_p); +static FdwXact insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, + Oid userid, Oid umid, char *fdwxact_id); +static char *get_fdwxact_identifier(FdwXactParticipant *fdw_part, + TransactionId xid); +static void remove_fdwxact(FdwXact fdwxact); +static FdwXact get_fdwxact_to_resolve(Oid dbid, TransactionId xid); +static FdwXactRslvState *create_fdwxact_state(void); + +#ifdef USE_ASSERT_CHECKING +static bool FdwXactQueueIsOrderedByTimestamp(void); +#endif + +/* + * Remember accessed foreign transaction. Both RegisterFdwXactByRelId and + * RegisterFdwXactByServerId are called by executor during initialization. + */ +void +RegisterFdwXactByRelId(Oid relid, bool modified) +{ + Relation rel; + Oid serverid; + Oid userid; + + rel = relation_open(relid, NoLock); + serverid = GetForeignServerIdByRelId(relid); + userid = rel->rd_rel->relowner ? rel->rd_rel->relowner : GetUserId(); + relation_close(rel, NoLock); + + register_fdwxact(serverid, userid, modified); +} + +void +RegisterFdwXactByServerId(Oid serverid, bool modified) +{ + register_fdwxact(serverid, GetUserId(), modified); +} + +/* + * Register given foreign transaction identified by given arguments as + * a participant of the transaction. + * + * The foreign transaction identified by given server id and user id. + * Registered foreign transactions are managed by the global transaction + * manager until the end of the transaction. + */ +static void +register_fdwxact(Oid serverid, Oid userid, bool modified) +{ + FdwXactParticipant *fdw_part; + ForeignServer *foreign_server; + UserMapping *user_mapping; + MemoryContext old_ctx; + FdwRoutine *routine; + ListCell *lc; + + foreach(lc, FdwXactParticipants) + { + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc); + + if (fdw_part->server->serverid == serverid && + fdw_part->usermapping->userid == userid) + { + /* The foreign server is already registered, return */ + fdw_part->modified |= modified; + return; + } + } + + /* + * Participant's information is also needed at the end of a transaction, + * where system cache are not available. Save it in TopTransactionContext + * so that these can live until the end of transaction. + */ + old_ctx = MemoryContextSwitchTo(TopTransactionContext); + routine = GetFdwRoutineByServerId(serverid); + + /* + * Don't register foreign server if it doesn't provide both commit and + * rollback transaction management callbacks. + */ + if (!routine->CommitForeignTransaction || + !routine->RollbackForeignTransaction) + { + MyXactFlags |= XACT_FLAGS_FDWNOPREPARE; + pfree(routine); + return; + } + + /* + * Remember we touched the foreign server that is not capable of two-phase + * commit. + */ + if (!routine->PrepareForeignTransaction) + MyXactFlags |= XACT_FLAGS_FDWNOPREPARE; + + foreign_server = GetForeignServer(serverid); + user_mapping = GetUserMapping(userid, serverid); + + + fdw_part = (FdwXactParticipant *) palloc(sizeof(FdwXactParticipant)); + + fdw_part->fdwxact_id = NULL; + fdw_part->server = foreign_server; + fdw_part->usermapping = user_mapping; + fdw_part->fdwxact = NULL; + fdw_part->modified = modified; + fdw_part->prepare_foreign_xact_fn = routine->PrepareForeignTransaction; + fdw_part->commit_foreign_xact_fn = routine->CommitForeignTransaction; + fdw_part->rollback_foreign_xact_fn = routine->RollbackForeignTransaction; + fdw_part->get_prepareid_fn = routine->GetPrepareId; + + /* Add to the participants list */ + FdwXactParticipants = lappend(FdwXactParticipants, fdw_part); + + /* Revert back the context */ + MemoryContextSwitchTo(old_ctx); +} + +/* + * Calculates the size of shared memory allocated for maintaining foreign + * prepared transaction entries. + */ +Size +FdwXactShmemSize(void) +{ + Size size; + + /* Size for foreign transaction information array */ + size = offsetof(FdwXactCtlData, fdwxacts); + size = add_size(size, mul_size(max_prepared_foreign_xacts, + sizeof(FdwXact))); + size = MAXALIGN(size); + size = add_size(size, mul_size(max_prepared_foreign_xacts, + sizeof(FdwXactData))); + + return size; +} + +/* + * Initialization of shared memory for maintaining foreign prepared transaction + * entries. The shared memory layout is defined in definition of FdwXactCtlData + * structure. + */ +void +FdwXactShmemInit(void) +{ + bool found; + + if (!fdwXactExitRegistered) + { + before_shmem_exit(AtProcExit_FdwXact, 0); + fdwXactExitRegistered = true; + } + + FdwXactCtl = ShmemInitStruct("Foreign transactions table", + FdwXactShmemSize(), + &found); + if (!IsUnderPostmaster) + { + FdwXact fdwxacts; + int cnt; + + Assert(!found); + FdwXactCtl->free_fdwxacts = NULL; + FdwXactCtl->num_fdwxacts = 0; + + /* Initialize the linked list of free FDW transactions */ + fdwxacts = (FdwXact) + ((char *) FdwXactCtl + + MAXALIGN(offsetof(FdwXactCtlData, fdwxacts) + + sizeof(FdwXact) * max_prepared_foreign_xacts)); + for (cnt = 0; cnt < max_prepared_foreign_xacts; cnt++) + { + fdwxacts[cnt].status = FDWXACT_STATUS_INVALID; + fdwxacts[cnt].fdwxact_free_next = FdwXactCtl->free_fdwxacts; + FdwXactCtl->free_fdwxacts = &fdwxacts[cnt]; + SpinLockInit(&(fdwxacts[cnt].mutex)); + } + } + else + { + Assert(FdwXactCtl); + Assert(found); + } +} + +/* + * Prepare all foreign transactions if foreign twophase commit is required. + * If foreign twophase commit is required, the behavior depends on the value + * of foreign_twophase_commit; when 'required' we strictly require for all + * foreign server's FDWs to support two-phase commit protocol and ask them to + * prepare foreign transactions, when 'prefer' we ask only foreign servers + * that are capable of two-phase commit to prepare foreign transactions and ask + * for other servers to commit, and for 'disabled' we ask all foreign servers + * to commit foreign transaction in one-phase. If we failed to commit any of + * them we change to aborting. + * + * Note that non-modified foreign servers always can be committed without + * preparation. + */ +void +PreCommit_FdwXacts(void) +{ + bool need_twophase_commit; + ListCell *lc = NULL; + + /* If there are no foreign servers involved, we have no business here */ + if (FdwXactParticipants == NIL) + return; + + /* + * we require all modified server have to be capable of two-phase + * commit protocol. + */ + if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_REQUIRED && + (MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot COMMIT a distributed transaction that has operated on foreign server that doesn't supportatomic commit"))); + + /* + * Check if we need to use foreign twophase commit. It's always false + * if foreign twophase commit is disabled. + */ + need_twophase_commit = is_foreign_twophase_commit_required(); + + /* + * Firstly, we consider to commit foreign transactions in one-phase. + */ + foreach(lc, FdwXactParticipants) + { + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc); + bool commit = false; + + /* Can commit in one-phase if two-phase commit is not requried */ + if (!need_twophase_commit) + commit = true; + + /* Non-modified foreign transaction always can be committed in one-phase */ + if (!fdw_part->modified) + commit = true; + + /* + * In 'prefer' case, non-twophase-commit capable server can be + * committed in one-phase. + */ + if (foreign_twophase_commit == FOREIGN_TWOPHASE_COMMIT_PREFER && + !IsSeverCapableOfTwophaseCommit(fdw_part)) + commit = true; + + if (commit) + { + /* Commit the foreign transaction in one-phase */ + FdwXactOnePhaseEndForeignTransaction(fdw_part, true); + + /* Delete it from the participant list */ + FdwXactParticipants = foreach_delete_current(FdwXactParticipants, + lc); + continue; + } + } + + /* All done if we committed all foreign transactions */ + if (FdwXactParticipants == NIL) + return; + + /* + * Secondary, if only one transaction is remained in the participant list + * and we didn't modified the local data we can commit it without + * preparation. + */ + if (list_length(FdwXactParticipants) == 1 && + (MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) == 0) + { + /* Commit the foreign transaction in one-phase */ + FdwXactOnePhaseEndForeignTransaction(linitial(FdwXactParticipants), + true); + + /* All foreign transaction must be committed */ + list_free(FdwXactParticipants); + return; + } + + /* + * Finally, prepare foreign transactions. Note that we keep + * FdwXactParticipants until the end of transaction. + */ + FdwXactPrepareForeignTransactions(); +} + +/* + * Insert FdwXact entries and prepare foreign transactions. Before inserting + * FdwXact entry we call get_preparedid callback to get a transaction + * identifier from FDW. + * + * We still can change to rollback here. If any error occurs, we rollback + * non-prepared foreign trasactions and leave others to the resolver. + */ +static void +FdwXactPrepareForeignTransactions(void) +{ + ListCell *lcell; + TransactionId xid; + + if (FdwXactParticipants == NIL) + return; + + /* Parameter check */ + if (max_prepared_foreign_xacts == 0) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("prepread foreign transactions are disabled"), + errhint("Set max_prepared_foreign_transactions to a nonzero value."))); + + if (max_foreign_xact_resolvers == 0) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("prepread foreign transactions are disabled"), + errhint("Set max_foreign_transaction_resolvers to a nonzero value."))); + + xid = GetTopTransactionId(); + + /* Loop over the foreign connections */ + foreach(lcell, FdwXactParticipants) + { + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lcell); + FdwXactRslvState *state; + FdwXact fdwxact; + + fdw_part->fdwxact_id = get_fdwxact_identifier(fdw_part, xid); + + Assert(fdw_part->fdwxact_id); + + /* + * Insert the foreign transaction entry with the FDWXACT_STATUS_PREPARING + * status. Registration persists this information to the disk and logs + * (that way relaying it on standby). Thus in case we loose connectivity + * to the foreign server or crash ourselves, we will remember that we + * might have prepared transaction on the foreign server and try to + * resolve it when connectivity is restored or after crash recovery. + * + * If we prepare the transaction on the foreign server before persisting + * the information to the disk and crash in-between these two steps, + * we will forget that we prepared the transaction on the foreign server + * and will not be able to resolve it after the crash. Hence persist + * first then prepare. + */ + fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part); + + state = create_fdwxact_state(); + state->server = fdw_part->server; + state->usermapping = fdw_part->usermapping; + state->fdwxact_id = pstrdup(fdw_part->fdwxact_id); + + /* Update the status */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + Assert(fdwxact->status == FDWXACT_STATUS_INITIAL); + fdwxact->status = FDWXACT_STATUS_PREPARING; + LWLockRelease(FdwXactLock); + + /* + * Prepare the foreign transaction. + * + * Between FdwXactInsertFdwXactEntry call till this backend hears + * acknowledge from foreign server, the backend may abort the local + * transaction (say, because of a signal). + * + * During abort processing, we might try to resolve a never-preapred + * transaction, and get an error. This is fine as long as the FDW + * provides us unique prepared transaction identifiers. + */ + PG_TRY(); + { + fdw_part->prepare_foreign_xact_fn(state); + } + PG_CATCH(); + { + /* failed, back to the initial state */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + fdwxact->status = FDWXACT_STATUS_INITIAL; + LWLockRelease(FdwXactLock); + + PG_RE_THROW(); + } + PG_END_TRY(); + + /* succeeded, update status */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + fdwxact->status = FDWXACT_STATUS_PREPARED; + LWLockRelease(FdwXactLock); + } +} + +/* + * One-phase commit or rollback the given foreign transaction participant. + */ +static void +FdwXactOnePhaseEndForeignTransaction(FdwXactParticipant *fdw_part, + bool for_commit) +{ + FdwXactRslvState *state; + + Assert(fdw_part->commit_foreign_xact_fn); + Assert(fdw_part->rollback_foreign_xact_fn); + + state = create_fdwxact_state(); + state->server = fdw_part->server; + state->usermapping = fdw_part->usermapping; + state->flags = FDWXACT_FLAG_ONEPHASE; + + /* + * Commit or rollback foreign transaction in one-phase. Since we didn't + * insert FdwXact entry for this transaction we don't need to care + * failures. On failure we change to rollback. + */ + if (for_commit) + fdw_part->commit_foreign_xact_fn(state); + else + fdw_part->rollback_foreign_xact_fn(state); +} + +/* + * This function is used to create new foreign transaction entry before an FDW + * prepares and commit/rollback. The function adds the entry to WAL and it will + * be persisted to the disk under pg_fdwxact directory when checkpoint. + */ +static FdwXact +FdwXactInsertFdwXactEntry(TransactionId xid, FdwXactParticipant *fdw_part) +{ + FdwXact fdwxact; + FdwXactOnDiskData *fdwxact_file_data; + MemoryContext old_context; + int data_len; + + old_context = MemoryContextSwitchTo(TopTransactionContext); + + /* + * Enter the foreign transaction in the shared memory structure. + */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + fdwxact = insert_fdwxact(MyDatabaseId, xid, fdw_part->server->serverid, + fdw_part->usermapping->userid, + fdw_part->usermapping->umid, fdw_part->fdwxact_id); + fdwxact->status = FDWXACT_STATUS_INITIAL; + fdwxact->held_by = MyBackendId; + LWLockRelease(FdwXactLock); + + fdw_part->fdwxact = fdwxact; + MemoryContextSwitchTo(old_context); + + /* + * Prepare to write the entry to a file. Also add xlog entry. The contents + * of the xlog record are same as what is written to the file. + */ + data_len = offsetof(FdwXactOnDiskData, fdwxact_id); + data_len = data_len + strlen(fdw_part->fdwxact_id) + 1; + data_len = MAXALIGN(data_len); + fdwxact_file_data = (FdwXactOnDiskData *) palloc0(data_len); + fdwxact_file_data->dbid = MyDatabaseId; + fdwxact_file_data->local_xid = xid; + fdwxact_file_data->serverid = fdw_part->server->serverid; + fdwxact_file_data->userid = fdw_part->usermapping->userid; + fdwxact_file_data->umid = fdw_part->usermapping->umid; + memcpy(fdwxact_file_data->fdwxact_id, fdw_part->fdwxact_id, + strlen(fdw_part->fdwxact_id) + 1); + + /* See note in RecordTransactionCommit */ + MyPgXact->delayChkpt = true; + + START_CRIT_SECTION(); + + /* Add the entry in the xlog and save LSN for checkpointer */ + XLogBeginInsert(); + XLogRegisterData((char *) fdwxact_file_data, data_len); + fdwxact->insert_end_lsn = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_INSERT); + XLogFlush(fdwxact->insert_end_lsn); + + /* If we crash now, we have prepared: WAL replay will fix things */ + + /* Store record's start location to read that later on CheckPoint */ + fdwxact->insert_start_lsn = ProcLastRecPtr; + + /* File is written completely, checkpoint can proceed with syncing */ + fdwxact->valid = true; + + /* Checkpoint can process now */ + MyPgXact->delayChkpt = false; + + END_CRIT_SECTION(); + + pfree(fdwxact_file_data); + return fdwxact; +} + +/* + * Insert a new entry for a given foreign transaction identified by transaction + * id, foreign server and user mapping, into the shared memory array. Caller + * must hold FdwXactLock in exclusive mode. + * + * If the entry already exists, the function raises an error. + */ +static FdwXact +insert_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid, + Oid umid, char *fdwxact_id) +{ + int i; + FdwXact fdwxact; + + Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE)); + + /* Check for duplicated foreign transaction entry */ + for (i = 0; i < FdwXactCtl->num_fdwxacts; i++) + { + fdwxact = FdwXactCtl->fdwxacts[i]; + if (fdwxact->dbid == dbid && + fdwxact->local_xid == xid && + fdwxact->serverid == serverid && + fdwxact->userid == userid) + ereport(ERROR, (errmsg("could not insert a foreign transaction entry"), + errdetail("duplicate entry with transaction id %u, serverid %u, userid %u", + xid, serverid, userid))); + } + + /* + * Get a next free foreign transaction entry. Raise error if there are + * none left. + */ + if (!FdwXactCtl->free_fdwxacts) + { + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("maximum number of foreign transactions reached"), + errhint("Increase max_prepared_foreign_transactions: \"%d\".", + max_prepared_foreign_xacts))); + } + fdwxact = FdwXactCtl->free_fdwxacts; + FdwXactCtl->free_fdwxacts = fdwxact->fdwxact_free_next; + + /* Insert the entry to shared memory array */ + Assert(FdwXactCtl->num_fdwxacts < max_prepared_foreign_xacts); + FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts++] = fdwxact; + + fdwxact->held_by = InvalidBackendId; + fdwxact->dbid = dbid; + fdwxact->local_xid = xid; + fdwxact->serverid = serverid; + fdwxact->userid = userid; + fdwxact->umid = umid; + fdwxact->insert_start_lsn = InvalidXLogRecPtr; + fdwxact->insert_end_lsn = InvalidXLogRecPtr; + fdwxact->valid = false; + fdwxact->ondisk = false; + fdwxact->inredo = false; + fdwxact->indoubt = false; + memcpy(fdwxact->fdwxact_id, fdwxact_id, strlen(fdwxact_id) + 1); + + return fdwxact; +} + +/* + * Remove the foreign prepared transaction entry from shared memory. + * Caller must hold FdwXactLock in exclusive mode. + */ +static void +remove_fdwxact(FdwXact fdwxact) +{ + int i; + + Assert(fdwxact != NULL); + Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE)); + + if (FdwXactIsBeingResolved(fdwxact)) + elog(ERROR, "cannot remove fdwxact entry that is beging resolved"); + + /* Search the slot where this entry resided */ + for (i = 0; i < FdwXactCtl->num_fdwxacts; i++) + { + if (FdwXactCtl->fdwxacts[i] == fdwxact) + break; + } + + /* We did not find the given entry in the array */ + if (i >= FdwXactCtl->num_fdwxacts) + ereport(ERROR, + (errmsg("could not remove a foreign transaction entry"), + errdetail("failed to find entry for xid %u, foreign server %u, and user %u", + fdwxact->local_xid, fdwxact->serverid, fdwxact->userid))); + + elog(DEBUG2, "remove fdwxact entry id %s, xid %u db %d user %d", + fdwxact->fdwxact_id, fdwxact->local_xid, fdwxact->dbid, + fdwxact->userid); + + /* Remove the entry from active array */ + FdwXactCtl->num_fdwxacts--; + FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts]; + + /* Put it back into free list */ + fdwxact->fdwxact_free_next = FdwXactCtl->free_fdwxacts; + FdwXactCtl->free_fdwxacts = fdwxact; + + /* Reset informations */ + fdwxact->status = FDWXACT_STATUS_INVALID; + fdwxact->held_by = InvalidBackendId; + fdwxact->indoubt = false; + + if (!RecoveryInProgress()) + { + xl_fdwxact_remove record; + XLogRecPtr recptr; + + /* Fill up the log record before releasing the entry */ + record.serverid = fdwxact->serverid; + record.dbid = fdwxact->dbid; + record.xid = fdwxact->local_xid; + record.userid = fdwxact->userid; + + /* + * Now writing FdwXact state data to WAL. We have to set delayChkpt + * here, otherwise a checkpoint starting immediately after the + * WAL record is inserted could complete without fsync'ing our + * state file. (This is essentially the same kind of race condition + * as the COMMIT-to-clog-write case that RecordTransactionCommit + * uses delayChkpt for; see notes there.) + */ + START_CRIT_SECTION(); + + MyPgXact->delayChkpt = true; + + /* + * Log that we are removing the foreign transaction entry and + * remove the file from the disk as well. + */ + XLogBeginInsert(); + XLogRegisterData((char *) &record, sizeof(xl_fdwxact_remove)); + recptr = XLogInsert(RM_FDWXACT_ID, XLOG_FDWXACT_REMOVE); + XLogFlush(recptr); + + /* + * Now we can mark ourselves as out of the commit critical section: a + * checkpoint starting after this will certainly see the gxact as a + * candidate for fsyncing. + */ + MyPgXact->delayChkpt = false; + + END_CRIT_SECTION(); + } +} + +/* + * Return true and set FdwXactAtomicCommitReady to true if the current transaction + * modified data on two or more servers in FdwXactParticipants and + * local server itself. + */ +static bool +is_foreign_twophase_commit_required(void) +{ + ListCell* lc; + int nserverswritten = 0; + + if (!IsForeignTwophaseCommitRequested()) + return false; + + foreach(lc, FdwXactParticipants) + { + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc); + + if (fdw_part->modified) + nserverswritten++; + } + + if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0) + ++nserverswritten; + + /* + * Atomic commit is required if we modified data on two or more + * participants. + */ + if (nserverswritten <= 1) + return false; + + ForeignTwophaseCommitIsRequired = true; + return true; +} + +bool +FdwXactIsForeignTwophaseCommitRequired(void) +{ + return ForeignTwophaseCommitIsRequired; +} + +/* + * Compute the oldest xmin across all unresolved foreign transactions + * and store it in the ProcArray. + */ +static void +FdwXactComputeRequiredXmin(void) +{ + int i; + TransactionId agg_xmin = InvalidTransactionId; + + Assert(FdwXactCtl != NULL); + + LWLockAcquire(FdwXactLock, LW_SHARED); + + for (i = 0; i < FdwXactCtl->num_fdwxacts; i++) + { + FdwXact fdwxact = FdwXactCtl->fdwxacts[i]; + + if (!fdwxact->valid) + continue; + + Assert(TransactionIdIsValid(fdwxact->local_xid)); + + if (!TransactionIdIsValid(agg_xmin) || + TransactionIdPrecedes(fdwxact->local_xid, agg_xmin)) + agg_xmin = fdwxact->local_xid; + } + + LWLockRelease(FdwXactLock); + + ProcArraySetFdwXactUnresolvedXmin(agg_xmin); +} + +/* + * Mark my foreign transaction participants as in-doubt and clear + * the FdwXactParticipants list. + * + * If we leave any foreign transaction, update the oldest xmin of unresolved + * transaction so that local transaction id of in-doubt transaction is not + * truncated. + */ +static void +ForgetAllFdwXactParticipants(void) +{ + ListCell *cell; + int n_lefts = 0; + + if (FdwXactParticipants == NIL) + return; + + foreach(cell, FdwXactParticipants) + { + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(cell); + FdwXact fdwxact = fdw_part->fdwxact; + + /* Nothing to do if didn't register FdwXact entry yet */ + if (!fdw_part->fdwxact) + continue; + + /* + * There is a race condition; the FdwXact entries in FdwXactParticipants + * could be used by other backend before we forget in case where the + * resolver process removes the FdwXact entry and other backend reuses + * it before we forget. So we need to check if the entries are still + * associated with the transaction. + */ + SpinLockAcquire(&fdwxact->mutex); + if (fdwxact->held_by == MyBackendId) + { + fdwxact->held_by = InvalidBackendId; + fdwxact->indoubt = true; + n_lefts++; + } + SpinLockRelease(&fdwxact->mutex); + } + + /* + * If we left any FdwXact entries, update the oldest local transaction of + * unresolved distributed transaction and take over them to the foreign + * transaction resolver. + */ + if (n_lefts > 0) + { + elog(DEBUG1, "left %u foreign transactions in in-doubt status", n_lefts); + FdwXactComputeRequiredXmin(); + } + + FdwXactParticipants = NIL; +} + +/* + * When the process exits, forget all the entries. + */ +static void +AtProcExit_FdwXact(int code, Datum arg) +{ + ForgetAllFdwXactParticipants(); +} + +void +FdwXactCleanupAtProcExit(void) +{ + if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks))) + { + LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE); + SHMQueueDelete(&(MyProc->fdwXactLinks)); + LWLockRelease(FdwXactResolutionLock); + } +} + +/* + * Wait for the foreign transaction to be resolved. + * + * Initially backends start in state FDWXACT_NOT_WAITING and then change + * that state to FDWXACT_WAITING before adding ourselves to the wait queue. + * During FdwXactResolveForeignTransaction a fdwxact resolver changes the + * state to FDWXACT_WAIT_COMPLETE once all foreign transactions are resolved. + * This backend then resets its state to FDWXACT_NOT_WAITING. + * If a resolver fails to resolve the waiting transaction it moves us to + * the retry queue. + * + * This function is inspired by SyncRepWaitForLSN. + */ +void +FdwXactWaitToBeResolved(TransactionId wait_xid, bool is_commit) +{ + char *new_status = NULL; + const char *old_status; + + Assert(FdwXactCtl != NULL); + Assert(TransactionIdIsValid(wait_xid)); + Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks))); + Assert(MyProc->fdwXactState == FDWXACT_NOT_WAITING); + + /* Quick exit if atomic commit is not requested */ + if (!IsForeignTwophaseCommitRequested()) + return; + + /* + * Also, exit if the transaction itself has no foreign transaction + * participants. + */ + if (FdwXactParticipants == NIL && wait_xid == MyPgXact->xid) + return; + + /* Set backend status and enqueue itself to the active queue */ + LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE); + MyProc->fdwXactState = FDWXACT_WAITING; + MyProc->fdwXactWaitXid = wait_xid; + MyProc->fdwXactNextResolutionTs = GetCurrentTransactionStopTimestamp(); + FdwXactQueueInsert(MyProc); + Assert(FdwXactQueueIsOrderedByTimestamp()); + LWLockRelease(FdwXactResolutionLock); + + /* Launch a resolver process if not yet, or wake up */ + FdwXactLaunchOrWakeupResolver(); + + /* + * Alter ps display to show waiting for foreign transaction + * resolution. + */ + if (update_process_title) + { + int len; + + old_status = get_ps_display(&len); + new_status = (char *) palloc(len + 31 + 1); + memcpy(new_status, old_status, len); + sprintf(new_status + len, " waiting for resolution %d", wait_xid); + set_ps_display(new_status, false); + new_status[len] = '\0'; /* truncate off "waiting ..." */ + } + + /* Wait for all foreign transactions to be resolved */ + for (;;) + { + /* Must reset the latch before testing state */ + ResetLatch(MyLatch); + + /* + * Acquiring the lock is not needed, the latch ensures proper + * barriers. If it looks like we're done, we must really be done, + * because once walsender changes the state to FDWXACT_WAIT_COMPLETE, + * it will never update it again, so we can't be seeing a stale value + * in that case. + */ + if (MyProc->fdwXactState == FDWXACT_WAIT_COMPLETE) + break; + + /* + * If a wait for foreign transaction resolution is pending, we can + * neither acknowledge the commit nor raise ERROR or FATAL. The latter + * would lead the client to believe that the distributed transaction + * aborted, which is not true: it's already committed locally. The + * former is no good either: the client has requested committing a + * distributed transaction, and is entitled to assume that a acknowledged + * commit is also commit on all foreign servers, which might not be + * true. So in this case we issue a WARNING (which some clients may + * be able to interpret) and shut off further output. We do NOT reset + * PorcDiePending, so that the process will die after the commit is + * cleaned up. + */ + if (ProcDiePending) + { + ereport(WARNING, + (errcode(ERRCODE_ADMIN_SHUTDOWN), + errmsg("canceling the wait for resolving foreign transaction and terminating connection due to administratorcommand"), + errdetail("The transaction has already committed locally, but might not have been committed on theforeign server."))); + whereToSendOutput = DestNone; + FdwXactCancelWait(); + break; + } + + /* + * If a query cancel interrupt arrives we just terminate the wait with + * a suitable warning. The foreign transactions can be orphaned but + * the foreign xact resolver can pick up them and tries to resolve them + * later. + */ + if (QueryCancelPending) + { + QueryCancelPending = false; + ereport(WARNING, + (errmsg("canceling wait for resolving foreign transaction due to user request"), + errdetail("The transaction has already committed locally, but might not have been committed on theforeign server."))); + FdwXactCancelWait(); + break; + } + + /* + * If the postmaster dies, we'll probably never get an + * acknowledgement, because all the wal sender processes will exit. So + * just bail out. + */ + if (!PostmasterIsAlive()) + { + ProcDiePending = true; + whereToSendOutput = DestNone; + FdwXactCancelWait(); + break; + } + + /* + * Wait on latch. Any condition that should wake us up will set the + * latch, so no need for timeout. + */ + WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, + WAIT_EVENT_FDWXACT_RESOLUTION); + } + + pg_read_barrier(); + + Assert(SHMQueueIsDetached(&(MyProc->fdwXactLinks))); + MyProc->fdwXactState = FDWXACT_NOT_WAITING; + + if (new_status) + { + set_ps_display(new_status, false); + pfree(new_status); + } +} + +/* + * Return true if there are at least one backend in the wait queue. The caller + * must hold FdwXactResolutionLock. + */ +bool +FdwXactWaiterExists(Oid dbid) +{ + PGPROC *proc; + + Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_SHARED)); + + proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue), + &(FdwXactRslvCtl->fdwxact_queue), + offsetof(PGPROC, fdwXactLinks)); + + while (proc) + { + if (proc->databaseId == dbid) + return true; + + proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue), + &(proc->fdwXactLinks), + offsetof(PGPROC, fdwXactLinks)); + } + + return false; +} + +/* + * Insert the waiter to the wait queue in fdwXactNextResolutoinTs order. + */ +static void +FdwXactQueueInsert(PGPROC *waiter) +{ + PGPROC *proc; + + Assert(LWLockHeldByMeInMode(FdwXactResolutionLock, LW_EXCLUSIVE)); + + proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue), + &(FdwXactRslvCtl->fdwxact_queue), + offsetof(PGPROC, fdwXactLinks)); + + while (proc) + { + if (proc->fdwXactNextResolutionTs < waiter->fdwXactNextResolutionTs) + break; + + proc = (PGPROC *) SHMQueuePrev(&(FdwXactRslvCtl->fdwxact_queue), + &(proc->fdwXactLinks), + offsetof(PGPROC, fdwXactLinks)); + } + + if (proc) + SHMQueueInsertAfter(&(proc->fdwXactLinks), &(waiter->fdwXactLinks)); + else + SHMQueueInsertAfter(&(FdwXactRslvCtl->fdwxact_queue), &(waiter->fdwXactLinks)); +} + +#ifdef USE_ASSERT_CHECKING +static bool +FdwXactQueueIsOrderedByTimestamp(void) +{ + PGPROC *proc; + TimestampTz lastTs; + + proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue), + &(FdwXactRslvCtl->fdwxact_queue), + offsetof(PGPROC, fdwXactLinks)); + lastTs = 0; + + while (proc) + { + + if (proc->fdwXactNextResolutionTs < lastTs) + return false; + + lastTs = proc->fdwXactNextResolutionTs; + + proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue), + &(proc->fdwXactLinks), + offsetof(PGPROC, fdwXactLinks)); + } + + return true; +} +#endif + +/* + * Acquire FdwXactResolutionLock and cancel any wait currently in progress. + */ +static void +FdwXactCancelWait(void) +{ + LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE); + if (!SHMQueueIsDetached(&(MyProc->fdwXactLinks))) + SHMQueueDelete(&(MyProc->fdwXactLinks)); + MyProc->fdwXactState = FDWXACT_NOT_WAITING; + LWLockRelease(FdwXactResolutionLock); +} + +/* + * AtEOXact_FdwXacts + */ +extern void +AtEOXact_FdwXacts(bool is_commit) +{ + ListCell *lcell; + + if (!is_commit) + { + foreach (lcell, FdwXactParticipants) + { + FdwXactParticipant *fdw_part = lfirst(lcell); + + /* + * If the foreign transaction has FdwXact entry we might have + * prepared it. Skip already-prepared foreign transaction because + * it has closed its transaction. But we are not sure that foreign + * transaction with status == FDWXACT_STATUS_PREPARING has been + * prepared or not. So we call the rollback API to close its + * transaction for safety. The prepared foreign transaction that + * we might have will be resolved by the foreign transaction + * resolver. + */ + if (fdw_part->fdwxact) + { + bool is_prepared; + + LWLockAcquire(FdwXactLock, LW_SHARED); + is_prepared = fdw_part->fdwxact && + fdw_part->fdwxact->status == FDWXACT_STATUS_PREPARED; + LWLockRelease(FdwXactLock); + + if (is_prepared) + continue; + } + + /* One-phase rollback foreign transaction */ + FdwXactOnePhaseEndForeignTransaction(fdw_part, false); + } + } + + /* + * In commit cases, we have already prepared foreign transactions during + * pre-commit phase. And these prepared transactions will be resolved by + * the resolver process. + */ + + ForgetAllFdwXactParticipants(); + ForeignTwophaseCommitIsRequired = false; +} + +/* + * Prepare foreign transactions. + * + * Note that it's possible that the transaction aborts after we prepared some + * of participants. In this case we change to rollback and rollback all foreign + * transactions. + */ +void +AtPrepare_FdwXacts(void) +{ + if (FdwXactParticipants == NIL) + return; + + /* Check for an invalid condition */ + if (!IsForeignTwophaseCommitRequested()) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot PREPARE a distributed transaction when foreign_twophase_commit is \'disabled\'"))); + + /* + * We cannot prepare if any foreign server of participants isn't capable + * of two-phase commit. + */ + if (is_foreign_twophase_commit_required() && + (MyXactFlags & XACT_FLAGS_FDWNOPREPARE) != 0) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot prepare the transaction because some foreign servers involved in transaction can not preparethe transaction"))); + + /* Prepare transactions on participating foreign servers. */ + FdwXactPrepareForeignTransactions(); + + FdwXactParticipants = NIL; +} + +/* + * Return one backend that connects to my database and is waiting for + * resolution. + */ +PGPROC * +FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p) +{ + PGPROC *proc; + + LWLockAcquire(FdwXactResolutionLock, LW_SHARED); + Assert(FdwXactQueueIsOrderedByTimestamp()); + + proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue), + &(FdwXactRslvCtl->fdwxact_queue), + offsetof(PGPROC, fdwXactLinks)); + + while (proc) + { + if (proc->databaseId == MyDatabaseId) + break; + + proc = (PGPROC *) SHMQueueNext(&(FdwXactRslvCtl->fdwxact_queue), + &(proc->fdwXactLinks), + offsetof(PGPROC, fdwXactLinks)); + } + + if (proc) + { + *nextResolutionTs_p = proc->fdwXactNextResolutionTs; + *waitXid_p = proc->fdwXactWaitXid; + } + else + { + *nextResolutionTs_p = -1; + *waitXid_p = InvalidTransactionId; + } + + LWLockRelease(FdwXactResolutionLock); + + return proc; +} + +/* + * Get one FdwXact entry to resolve. This function intended to be used when + * a resolver process get FdwXact entries to resolve. So we search entries + * while not including in-doubt transactions and in-progress transactions. + */ +static FdwXact +get_fdwxact_to_resolve(Oid dbid, TransactionId xid) +{ + List *fdwxacts = NIL; + + Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE)); + + /* Don't include both in-doubt transactions and in-progress transactions */ + fdwxacts = get_fdwxacts(dbid, xid, InvalidOid, InvalidOid, + false, false, false); + + return fdwxacts == NIL ? NULL : (FdwXact) linitial(fdwxacts); +} + +/* + * Resolve one distributed transaction on the given database . The target + * distributed transaction is fetched from the waiting queue and its transaction + * participants are fetched from the global array. + * + * Release the waiter and return true after we resolved the all of the foreign + * transaction participants. On failure, we re-enqueue the waiting backend after + * incremented the next resolution time. + */ +void +FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid, + PGPROC *waiter) +{ + FdwXact fdwxact; + + Assert(TransactionIdIsValid(xid)); + + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + + while ((fdwxact = get_fdwxact_to_resolve(MyDatabaseId, xid)) != NULL) + { + FdwXactRslvState *state; + ForeignServer *server; + UserMapping *usermapping; + + CHECK_FOR_INTERRUPTS(); + + server = GetForeignServer(fdwxact->serverid); + usermapping = GetUserMapping(fdwxact->userid, fdwxact->serverid); + + state = create_fdwxact_state(); + SpinLockAcquire(&fdwxact->mutex); + state->server = server; + state->usermapping = usermapping; + state->fdwxact_id = pstrdup(fdwxact->fdwxact_id); + SpinLockRelease(&fdwxact->mutex); + + FdwXactDetermineTransactionFate(fdwxact, false); + + /* Do not hold during foreign transaction resolution */ + LWLockRelease(FdwXactLock); + + PG_TRY(); + { + /* + * Resolve the foreign transaction. When committing or aborting + * prepared foreign transactions the previous status is always + * FDWXACT_STATUS_PREPARED. + */ + FdwXactResolveForeignTransaction(fdwxact, state, + FDWXACT_STATUS_PREPARED); + } + PG_CATCH(); + { + /* + * Failed to resolve. Re-insert the waiter to the tail of retry + * queue if the waiter is still waiting. + */ + LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE); + if (waiter->fdwXactState == FDWXACT_WAITING) + { + SHMQueueDelete(&(waiter->fdwXactLinks)); + pg_write_barrier(); + waiter->fdwXactNextResolutionTs = + TimestampTzPlusMilliseconds(waiter->fdwXactNextResolutionTs, + foreign_xact_resolution_retry_interval); + FdwXactQueueInsert(waiter); + } + LWLockRelease(FdwXactResolutionLock); + + PG_RE_THROW(); + } + PG_END_TRY(); + + elog(DEBUG2, "resolved one foreign transaction xid %u, serverid %d, userid %d", + fdwxact->local_xid, fdwxact->serverid, fdwxact->userid); + + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + } + + LWLockRelease(FdwXactLock); + + LWLockAcquire(FdwXactResolutionLock, LW_EXCLUSIVE); + + /* + * Remove waiter from shmem queue, if not detached yet. The waiter + * could already be detached if user cancelled to wait before + * resolution. + */ + if (!SHMQueueIsDetached(&(waiter->fdwXactLinks))) + { + TransactionId wait_xid = waiter->fdwXactWaitXid; + + SHMQueueDelete(&(waiter->fdwXactLinks)); + pg_write_barrier(); + + /* Set state to complete */ + waiter->fdwXactState = FDWXACT_WAIT_COMPLETE; + + /* Wake up the waiter only when we have set state and removed from queue */ + SetLatch(&(waiter->procLatch)); + + elog(DEBUG2, "released the proc with xid %u", wait_xid); + } + else + elog(DEBUG2, "the waiter backend had been already detached"); + + LWLockRelease(FdwXactResolutionLock); +} + +/* + * Determine whether the given foreign transaction should be committed or + * rolled back according to the result of the local transaction. This function + * changes fdwxact->status so the caller must hold FdwXactLock in exclusive + * mode or passing need_lock with true. + */ +static void +FdwXactDetermineTransactionFate(FdwXact fdwxact, bool need_lock) +{ + if (need_lock) + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + + /* + * The being resolved transaction must be either that has been cancelled + * and marked as in-doubt or that has been prepared. + */ + Assert(fdwxact->indoubt || + fdwxact->status == FDWXACT_STATUS_PREPARED); + + /* + * If the local transaction is already committed, commit prepared + * foreign transaction. + */ + if (TransactionIdDidCommit(fdwxact->local_xid)) + fdwxact->status = FDWXACT_STATUS_COMMITTING; + + /* + * If the local transaction is already aborted, abort prepared + * foreign transactions. + */ + else if (TransactionIdDidAbort(fdwxact->local_xid)) + fdwxact->status = FDWXACT_STATUS_ABORTING; + + + /* + * The local transaction is not in progress but the foreign + * transaction is not prepared on the foreign server. This + * can happen when transaction failed after registered this + * entry but before actual preparing on the foreign server. + * So let's assume it aborted. + */ + else if (!TransactionIdIsInProgress(fdwxact->local_xid)) + fdwxact->status = FDWXACT_STATUS_ABORTING; + + /* + * The Local transaction is in progress and foreign transaction is + * about to be committed or aborted. This should not happen except for one + * case where the local transaction is prepared and this foreign transaction + * is being resolved manually using by pg_resolve_foreign_xact(). Raise an + * error anyway since we cannot determine the fate of this foreign + * transaction according to the local transaction whose fate is also not + * determined. + */ + else + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot resolve the foreign transaction associated with in-progress transaction %u on server %u", + fdwxact->local_xid, fdwxact->serverid), + errhint("The local transaction with xid %u might be prepared", + fdwxact->local_xid))); + + if (need_lock) + LWLockRelease(FdwXactLock); +} + +/* + * Resolve the foreign transaction using the foreign data wrapper's transaction + * callback function. The 'state' is passed to the callback function. The fate of + * foreign transaction must be determined. If foreign transaction is resolved + * successfully, remove the FdwXact entry from the shared memory and also + * remove the corresponding on-disk file. If failed, the status of FdwXact + * entry changes to 'fallback_status' before erroring out. + */ +static void +FdwXactResolveForeignTransaction(FdwXact fdwxact, FdwXactRslvState *state, + FdwXactStatus fallback_status) +{ + ForeignServer *server; + ForeignDataWrapper *fdw; + FdwRoutine *fdw_routine; + bool is_commit; + + Assert(state != NULL); + Assert(state->server && state->usermapping && state->fdwxact_id); + Assert(fdwxact != NULL); + + LWLockAcquire(FdwXactLock, LW_SHARED); + + if (fdwxact->status != FDWXACT_STATUS_COMMITTING && + fdwxact->status != FDWXACT_STATUS_ABORTING) + elog(ERROR, "cannot resolve foreign transaction whose fate is not determined"); + + is_commit = fdwxact->status == FDWXACT_STATUS_COMMITTING; + LWLockRelease(FdwXactLock); + + server = GetForeignServer(fdwxact->serverid); + fdw = GetForeignDataWrapper(server->fdwid); + fdw_routine = GetFdwRoutine(fdw->fdwhandler); + + PG_TRY(); + { + if (is_commit) + fdw_routine->CommitForeignTransaction(state); + else + fdw_routine->RollbackForeignTransaction(state); + } + PG_CATCH(); + { + /* Back to the fallback status */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + fdwxact->status = fallback_status; + LWLockRelease(FdwXactLock); + + PG_RE_THROW(); + } + PG_END_TRY(); + + /* Resolution was a success, remove the entry */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + + elog(DEBUG1, "successfully %s the foreign transaction with xid %u db %u server %u user %u", + is_commit ? "committed" : "rolled back", + fdwxact->local_xid, fdwxact->dbid, fdwxact->serverid, + fdwxact->userid); + + fdwxact->status = FDWXACT_STATUS_RESOLVED; + if (fdwxact->ondisk) + RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, + fdwxact->serverid, fdwxact->userid, + true); + remove_fdwxact(fdwxact); + LWLockRelease(FdwXactLock); +} + +/* + * Return palloc'd and initialized FdwXactRslvState. + */ +static FdwXactRslvState * +create_fdwxact_state(void) +{ + FdwXactRslvState *state; + + state = palloc(sizeof(FdwXactRslvState)); + state->server = NULL; + state->usermapping = NULL; + state->fdwxact_id = NULL; + state->flags = 0; + + return state; +} + +/* + * Return at least one FdwXact entry that matches to given argument, + * otherwise return NULL. All arguments must be valid values so that it can + * search exactly one (or none) entry. Note that this function intended to be + * used for modifying the returned FdwXact entry, so the caller must hold + * FdwXactLock in exclusive mode and it doesn't include the in-progress + * FdwXact entries. + */ +static FdwXact +get_one_fdwxact(Oid dbid, TransactionId xid, Oid serverid, Oid userid) +{ + List *fdwxact_list; + + Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE)); + + /* All search conditions must be valid values */ + Assert(TransactionIdIsValid(xid)); + Assert(OidIsValid(serverid)); + Assert(OidIsValid(userid)); + Assert(OidIsValid(dbid)); + + /* Include in-dbout transactions but don't include in-progress ones */ + fdwxact_list = get_fdwxacts(dbid, xid, serverid, userid, + true, false, false); + + /* Must be one entry since we search it by the unique key */ + Assert(list_length(fdwxact_list) <= 1); + + /* Could not find entry */ + if (fdwxact_list == NIL) + return NULL; + + return (FdwXact) linitial(fdwxact_list); +} + +/* + * Return true if there is at least one prepared foreign transaction + * which matches given arguments. + */ +bool +fdwxact_exists(Oid dbid, Oid serverid, Oid userid) +{ + List *fdwxact_list; + + /* Find entries from all FdwXact entries */ + fdwxact_list = get_fdwxacts(dbid, InvalidTransactionId, serverid, + userid, true, true, true); + + return fdwxact_list != NIL; +} + +/* + * Returns an array of all foreign prepared transactions for the user-level + * function pg_foreign_xacts, and the number of entries to num_p. + * + * WARNING -- we return even those transactions whose information is not + * completely filled yet. The caller should filter them out if he doesn't + * want them. + * + * The returned array is palloc'd. + */ +static FdwXact +get_all_fdwxacts(int *num_p) +{ + List *all_fdwxacts; + ListCell *lc; + FdwXact fdwxacts; + int num_fdwxacts = 0; + + Assert(num_p != NULL); + + /* Get all entries */ + all_fdwxacts = get_fdwxacts(InvalidOid, InvalidTransactionId, + InvalidOid, InvalidOid, true, + true, true); + + if (all_fdwxacts == NIL) + { + *num_p = 0; + return NULL; + } + + fdwxacts = (FdwXact) + palloc(sizeof(FdwXactData) * list_length(all_fdwxacts)); + *num_p = list_length(all_fdwxacts); + + /* Convert list to array of FdwXact */ + foreach(lc, all_fdwxacts) + { + FdwXact fx = (FdwXact) lfirst(lc); + + memcpy(fdwxacts + num_fdwxacts, fx, + sizeof(FdwXactData)); + num_fdwxacts++; + } + + list_free(all_fdwxacts); + + return fdwxacts; +} + +/* + * Return a list of FdwXact matched to given arguments. Otherwise return NIL. + * The search condition is defined by arguments with valid values for + * respective datatypes. 'include_indoubt' and 'include_in_progress' are the + * option for that the result includes in-doubt transactions and in-progress + * transactions respecitively. + */ +static List* +get_fdwxacts(Oid dbid, TransactionId xid, Oid serverid, Oid userid, + bool include_indoubt, bool include_in_progress, bool need_lock) +{ + int i; + List *fdwxact_list = NIL; + + if (need_lock) + LWLockAcquire(FdwXactLock, LW_SHARED); + + for (i = 0; i < FdwXactCtl->num_fdwxacts; i++) + { + FdwXact fdwxact = FdwXactCtl->fdwxacts[i]; + + /* dbid */ + if (OidIsValid(dbid) && fdwxact->dbid != dbid) + continue; + + /* xid */ + if (TransactionIdIsValid(xid) && xid != fdwxact->local_xid) + continue; + + /* serverid */ + if (OidIsValid(serverid) && serverid != fdwxact->serverid) + continue; + + /* userid */ + if (OidIsValid(userid) && fdwxact->userid != userid) + continue; + + /* include in-doubt transaction? */ + if (!include_indoubt && fdwxact->indoubt) + continue; + + /* include in-progress transaction? */ + if (!include_in_progress && FdwXactIsBeingResolved(fdwxact)) + continue; + + /* Append it if matched */ + fdwxact_list = lappend(fdwxact_list, fdwxact); + } + + if (need_lock) + LWLockRelease(FdwXactLock); + + return fdwxact_list; +} + +/* Apply the redo log for a foreign transaction */ +void +fdwxact_redo(XLogReaderState *record) +{ + char *rec = XLogRecGetData(record); + uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; + + if (info == XLOG_FDWXACT_INSERT) + { + /* + * Add fdwxact entry and set start/end lsn of the WAL record + * in FdwXact entry. + */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + FdwXactRedoAdd(XLogRecGetData(record), + record->ReadRecPtr, + record->EndRecPtr); + LWLockRelease(FdwXactLock); + } + else if (info == XLOG_FDWXACT_REMOVE) + { + xl_fdwxact_remove *record = (xl_fdwxact_remove *) rec; + + /* Delete FdwXact entry and file if exists */ + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + FdwXactRedoRemove(record->dbid, record->xid, record->serverid, + record->userid, false); + LWLockRelease(FdwXactLock); + } + else + elog(ERROR, "invalid log type %d in foreign transction log record", info); + + return; +} + +/* + * Return a null-terminated foreign transaction identifier. If the given + * foreign server's FDW provides getPrepareId callback we return the identifier + * returned from it. Otherwise we generate an unique identifier with in the + * form of "fx_<random number>_<xid>_<serverid>_<userid> whose length is + * less than FDWXACT_ID_MAX_LEN. + * + * Returned string value is used to identify foreign transaction. The + * identifier should not be same as any other concurrent prepared transaction + * identifier. + * + * To make the foreign transactionid unique, we should ideally use something + * like UUID, which gives unique ids with high probability, but that may be + * expensive here and UUID extension which provides the function to generate + * UUID is not part of the core code. + */ +static char * +get_fdwxact_identifier(FdwXactParticipant *fdw_part, TransactionId xid) +{ + char *id; + int id_len = 0; + + if (!fdw_part->get_prepareid_fn) + { + char buf[FDWXACT_ID_MAX_LEN] = {0}; + + /* + * FDW doesn't provide the callback function, generate an unique + * idenetifier. + */ + snprintf(buf, FDWXACT_ID_MAX_LEN, "fx_%ld_%u_%d_%d", + Abs(random()), xid, fdw_part->server->serverid, + fdw_part->usermapping->userid); + + return pstrdup(buf); + } + + /* Get an unique identifier from callback function */ + id = fdw_part->get_prepareid_fn(xid, fdw_part->server->serverid, + fdw_part->usermapping->userid, + &id_len); + + if (id == NULL) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + (errmsg("foreign transaction identifier is not provided")))); + + /* Check length of foreign transaction identifier */ + if (id_len > FDWXACT_ID_MAX_LEN) + { + id[FDWXACT_ID_MAX_LEN] = '\0'; + ereport(ERROR, + (errcode(ERRCODE_NAME_TOO_LONG), + errmsg("foreign transaction identifer \"%s\" is too long", + id), + errdetail("foreign transaction identifier must be less than %d characters.", + FDWXACT_ID_MAX_LEN))); + } + + id[id_len] = '\0'; + return pstrdup(id); +} + +/* + * We must fsync the foreign transaction state file that is valid or generated + * during redo and has a inserted LSN <= the checkpoint'S redo horizon. + * The foreign transaction entries and hence the corresponding files are expected + * to be very short-lived. By executing this function at the end, we might have + * lesser files to fsync, thus reducing some I/O. This is similar to + * CheckPointTwoPhase(). + * + * This is deliberately run as late as possible in the checkpoint sequence, + * because FdwXacts ordinarily have short lifespans, and so it is quite + * possible that FdwXacts that were valid at checkpoint start will no longer + * exist if we wait a little bit. With typical checkpoint settings this + * will be about 3 minutes for an online checkpoint, so as a result we + * expect that there will be no FdwXacts that need to be copied to disk. + * + * If a FdwXact remains valid across multiple checkpoints, it will already + * be on disk so we don't bother to repeat that write. + */ +void +CheckPointFdwXacts(XLogRecPtr redo_horizon) +{ + int cnt; + int serialized_fdwxacts = 0; + + if (max_prepared_foreign_xacts <= 0) + return; /* nothing to do */ + + TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_START(); + + /* + * We are expecting there to be zero FdwXact that need to be copied to + * disk, so we perform all I/O while holding FdwXactLock for simplicity. + * This presents any new foreign xacts from preparing while this occurs, + * which shouldn't be a problem since the presence fo long-lived prepared + * foreign xacts indicated the transaction manager isn't active. + * + * It's also possible to move I/O out of the lock, but on every error we + * should check whether somebody committed our transaction in different + * backend. Let's leave this optimisation for future, if somebody will + * spot that this place cause bottleneck. + * + * Note that it isn't possible for there to be a FdwXact with a + * insert_end_lsn set prior to the last checkpoint yet is marked + * invalid, because of the efforts with delayChkpt. + */ + LWLockAcquire(FdwXactLock, LW_SHARED); + for (cnt = 0; cnt < FdwXactCtl->num_fdwxacts; cnt++) + { + FdwXact fdwxact = FdwXactCtl->fdwxacts[cnt]; + + if ((fdwxact->valid || fdwxact->inredo) && + !fdwxact->ondisk && + fdwxact->insert_end_lsn <= redo_horizon) + { + char *buf; + int len; + + XlogReadFdwXactData(fdwxact->insert_start_lsn, &buf, &len); + RecreateFdwXactFile(fdwxact->dbid, fdwxact->local_xid, + fdwxact->serverid, fdwxact->userid, + buf, len); + fdwxact->ondisk = true; + fdwxact->insert_start_lsn = InvalidXLogRecPtr; + fdwxact->insert_end_lsn = InvalidXLogRecPtr; + pfree(buf); + serialized_fdwxacts++; + } + } + + LWLockRelease(FdwXactLock); + + /* + * Flush unconditionally the parent directory to make any information + * durable on disk. FdwXact files could have been removed and those + * removals need to be made persistent as well as any files newly created. + */ + fsync_fname(FDWXACTS_DIR, true); + + TRACE_POSTGRESQL_FDWXACT_CHECKPOINT_DONE(); + + if (log_checkpoints && serialized_fdwxacts > 0) + ereport(LOG, + (errmsg_plural("%u foreign transaction state file was written " + "for long-running prepared transactions", + "%u foreign transaction state files were written " + "for long-running prepared transactions", + serialized_fdwxacts, + serialized_fdwxacts))); +} + +/* + * Reads foreign transaction data from xlog. During checkpoint this data will + * be moved to fdwxact files and ReadFdwXactFile should be used instead. + * + * Note clearly that this function accesses WAL during normal operation, similarly + * to the way WALSender or Logical Decoding would do. It does not run during + * crash recovery or standby processing. + */ +static void +XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len) +{ + XLogRecord *record; + XLogReaderState *xlogreader; + char *errormsg; + + xlogreader = XLogReaderAllocate(wal_segment_size, NULL, + &read_local_xlog_page, NULL); + if (!xlogreader) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"), + errdetail("Failed while allocating an XLog reading processor."))); + + record = XLogReadRecord(xlogreader, lsn, &errormsg); + if (record == NULL) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not read foreign transaction state from xlog at %X/%X", + (uint32) (lsn >> 32), + (uint32) lsn))); + + if (XLogRecGetRmid(xlogreader) != RM_FDWXACT_ID || + (XLogRecGetInfo(xlogreader) & ~XLR_INFO_MASK) != XLOG_FDWXACT_INSERT) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("expected foreign transaction state data is not present in xlog at %X/%X", + (uint32) (lsn >> 32), + (uint32) lsn))); + + if (len != NULL) + *len = XLogRecGetDataLen(xlogreader); + + *buf = palloc(sizeof(char) * XLogRecGetDataLen(xlogreader)); + memcpy(*buf, XLogRecGetData(xlogreader), sizeof(char) * XLogRecGetDataLen(xlogreader)); + + XLogReaderFree(xlogreader); +} + +/* + * Recreates a foreign transaction state file. This is used in WAL replay + * and during checkpoint creation. + * + * Note: content and len don't include CRC. + */ +void +RecreateFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, + Oid userid, void *content, int len) +{ + char path[MAXPGPATH]; + pg_crc32c statefile_crc; + int fd; + + /* Recompute CRC */ + INIT_CRC32C(statefile_crc); + COMP_CRC32C(statefile_crc, content, len); + FIN_CRC32C(statefile_crc); + + FdwXactFilePath(path, dbid, xid, serverid, userid); + + fd = OpenTransientFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY); + + if (fd < 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not recreate foreign transaction state file \"%s\": %m", + path))); + + /* Write content and CRC */ + pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_WRITE); + if (write(fd, content, len) != len) + { + /* if write didn't set errno, assume problem is no disk space */ + if (errno == 0) + errno = ENOSPC; + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not write foreign transcation state file: %m"))); + } + if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c)) + { + if (errno == 0) + errno = ENOSPC; + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not write foreign transcation state file: %m"))); + } + pgstat_report_wait_end(); + + /* + * We must fsync the file because the end-of-replay checkpoint will not do + * so, there being no FDWXACT in shared memory yet to tell it to. + */ + pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_SYNC); + if (pg_fsync(fd) != 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not fsync foreign transaction state file: %m"))); + pgstat_report_wait_end(); + + if (CloseTransientFile(fd) != 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not close foreign transaction file: %m"))); +} + +/* + * Given a transaction id, userid and serverid read it either from disk + * or read it directly via shmem xlog record pointer using the provided + * "insert_start_lsn". + */ +static char * +ProcessFdwXactBuffer(Oid dbid, TransactionId xid, Oid serverid, + Oid userid, XLogRecPtr insert_start_lsn, bool fromdisk) +{ + TransactionId origNextXid = + XidFromFullTransactionId(ShmemVariableCache->nextFullXid); + char *buf; + + Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE)); + + if (!fromdisk) + Assert(!XLogRecPtrIsInvalid(insert_start_lsn)); + + /* Reject XID if too new */ + if (TransactionIdFollowsOrEquals(xid, origNextXid)) + { + if (fromdisk) + { + ereport(WARNING, + (errmsg("removing future fdwxact state file for xid %u, server %u and user %u", + xid, serverid, userid))); + RemoveFdwXactFile(dbid, xid, serverid, userid, true); + } + else + { + ereport(WARNING, + (errmsg("removing future fdwxact state from memory for xid %u, server %u and user %u", + xid, serverid, userid))); + FdwXactRedoRemove(dbid, xid, serverid, userid, true); + } + return NULL; + } + + if (fromdisk) + { + /* Read and validate file */ + buf = ReadFdwXactFile(dbid, xid, serverid, userid); + } + else + { + /* Read xlog data */ + XlogReadFdwXactData(insert_start_lsn, &buf, NULL); + } + + return buf; +} + +/* + * Read and validate the foreign transaction state file. + * + * If it looks OK (has a valid magic number and CRC), return the palloc'd + * contents of the file, issuing an error when finding corrupted data. + * This state can be reached when doing recovery. + */ +static char * +ReadFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid) +{ + char path[MAXPGPATH]; + int fd; + FdwXactOnDiskData *fdwxact_file_data; + struct stat stat; + uint32 crc_offset; + pg_crc32c calc_crc; + pg_crc32c file_crc; + char *buf; + int r; + + FdwXactFilePath(path, dbid, xid, serverid, userid); + + fd = OpenTransientFile(path, O_RDONLY | PG_BINARY); + if (fd < 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not open FDW transaction state file \"%s\": %m", + path))); + + /* + * Check file length. We can determine a lower bound pretty easily. We + * set an upper bound to avoid palloc() failure on a corrupt file, though + * we can't guarantee that we won't get an out of memory error anyway, + * even on a valid file. + */ + if (fstat(fd, &stat)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not stat FDW transaction state file \"%s\": %m", + path))); + + if (stat.st_size < (offsetof(FdwXactOnDiskData, fdwxact_id) + + sizeof(pg_crc32c)) || + stat.st_size > MaxAllocSize) + + ereport(ERROR, + (errcode_for_file_access(), + errmsg("too large FDW transaction state file \"%s\": %m", + path))); + + crc_offset = stat.st_size - sizeof(pg_crc32c); + if (crc_offset != MAXALIGN(crc_offset)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("incorrect alignment of CRC offset for file \"%s\"", + path))); + + /* + * Ok, slurp in the file. + */ + buf = (char *) palloc(stat.st_size); + fdwxact_file_data = (FdwXactOnDiskData *) buf; + + /* Slurp the file */ + pgstat_report_wait_start(WAIT_EVENT_FDWXACT_FILE_READ); + r = read(fd, buf, stat.st_size); + if (r != stat.st_size) + { + if (r < 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not read file \"%s\": %m", path))); + else + ereport(ERROR, + (errmsg("could not read file \"%s\": read %d of %zu", + path, r, (Size) stat.st_size))); + } + pgstat_report_wait_end(); + + if (CloseTransientFile(fd)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not close file \"%s\": %m", path))); + + /* + * Check the CRC. + */ + INIT_CRC32C(calc_crc); + COMP_CRC32C(calc_crc, buf, crc_offset); + FIN_CRC32C(calc_crc); + + file_crc = *((pg_crc32c *) (buf + crc_offset)); + + if (!EQ_CRC32C(calc_crc, file_crc)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("calculated CRC checksum does not match value stored in file \"%s\"", + path))); + + /* Check if the contents is an expected data */ + fdwxact_file_data = (FdwXactOnDiskData *) buf; + if (fdwxact_file_data->dbid != dbid || + fdwxact_file_data->serverid != serverid || + fdwxact_file_data->userid != userid || + fdwxact_file_data->local_xid != xid) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("invalid foreign transaction state file \"%s\"", + path))); + + return buf; +} + +/* + * Scan the shared memory entries of FdwXact and determine the range of valid + * XIDs present. This is run during database startup, after we have completed + * reading WAL. ShmemVariableCache->nextFullXid has been set to one more than + * the highest XID for which evidence exists in WAL. + + * On corrupted two-phase files, fail immediately. Keeping around broken + * entries and let replay continue causes harm on the system, and a new + * backup should be rolled in. + + * Our other responsibility is to update and return the oldest valid XID + * among the distributed transactions. This is needed to synchronize pg_subtrans + * startup properly. + */ +TransactionId +PrescanFdwXacts(TransactionId oldestActiveXid) +{ + FullTransactionId nextFullXid = ShmemVariableCache->nextFullXid; + TransactionId origNextXid = XidFromFullTransactionId(nextFullXid); + TransactionId result = origNextXid; + int i; + + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + for (i = 0; i < FdwXactCtl->num_fdwxacts; i++) + { + FdwXact fdwxact = FdwXactCtl->fdwxacts[i]; + char *buf; + + buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid, + fdwxact->serverid, fdwxact->userid, + fdwxact->insert_start_lsn, fdwxact->ondisk); + + if (buf == NULL) + continue; + + if (TransactionIdPrecedes(fdwxact->local_xid, result)) + result = fdwxact->local_xid; + + pfree(buf); + } + LWLockRelease(FdwXactLock); + + return result; +} + +/* + * Scan pg_fdwxact and fill FdwXact depending on the on-disk data. + * This is called once at the beginning of recovery, saving any extra + * lookups in the future. FdwXact files that are newer than the + * minimum XID horizon are discarded on the way. + */ +void +restoreFdwXactData(void) +{ + DIR *cldir; + struct dirent *clde; + + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + cldir = AllocateDir(FDWXACTS_DIR); + while ((clde = ReadDir(cldir, FDWXACTS_DIR)) != NULL) + { + if (strlen(clde->d_name) == FDWXACT_FILE_NAME_LEN && + strspn(clde->d_name, "0123456789ABCDEF_") == FDWXACT_FILE_NAME_LEN) + { + TransactionId local_xid; + Oid dbid; + Oid serverid; + Oid userid; + char *buf; + + sscanf(clde->d_name, "%08x_%08x_%08x_%08x", + &dbid, &local_xid, &serverid, &userid); + + /* Read fdwxact data from disk */ + buf = ProcessFdwXactBuffer(dbid, local_xid, serverid, userid, + InvalidXLogRecPtr, true); + + if (buf == NULL) + continue; + + /* Add this entry into the table of foreign transactions */ + FdwXactRedoAdd(buf, InvalidXLogRecPtr, InvalidXLogRecPtr); + } + } + + LWLockRelease(FdwXactLock); + FreeDir(cldir); +} + +/* + * Remove the foreign transaction file for given entry. + * + * If giveWarning is false, do not complain about file-not-present; + * this is an expected case during WAL replay. + */ +static void +RemoveFdwXactFile(Oid dbid, TransactionId xid, Oid serverid, Oid userid, + bool giveWarning) +{ + char path[MAXPGPATH]; + + FdwXactFilePath(path, dbid, xid, serverid, userid); + if (unlink(path) < 0 && (errno != ENOENT || giveWarning)) + ereport(WARNING, + (errcode_for_file_access(), + errmsg("could not remove foreign transaction state file \"%s\": %m", + path))); +} + +/* + * Store pointer to the start/end of the WAL record along with the xid in + * a fdwxact entry in shared memory FdwXactData structure. + */ +static void +FdwXactRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn) +{ + FdwXactOnDiskData *fdwxact_data = (FdwXactOnDiskData *) buf; + FdwXact fdwxact; + + Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE)); + Assert(RecoveryInProgress()); + + /* + * Add this entry into the table of foreign transactions. The + * status of the transaction is set as preparing, since we do not + * know the exact status right now. Resolver will set it later + * based on the status of local transaction which prepared this + * foreign transaction. + */ + fdwxact = insert_fdwxact(fdwxact_data->dbid, fdwxact_data->local_xid, + fdwxact_data->serverid, fdwxact_data->userid, + fdwxact_data->umid, fdwxact_data->fdwxact_id); + + elog(DEBUG2, "added fdwxact entry in shared memory for foreign transaction, db %u xid %u server %u user %u id %s", + fdwxact_data->dbid, fdwxact_data->local_xid, + fdwxact_data->serverid, fdwxact_data->userid, + fdwxact_data->fdwxact_id); + + /* + * Set status as PREPARED and as in-doubt, since we do not know + * the xact status right now. Resolver will set it later based on + * the status of local transaction that prepared this fdwxact entry. + */ + fdwxact->status = FDWXACT_STATUS_PREPARED; + fdwxact->insert_start_lsn = start_lsn; + fdwxact->insert_end_lsn = end_lsn; + fdwxact->inredo = true; /* added in redo */ + fdwxact->indoubt = true; + fdwxact->valid = false; + fdwxact->ondisk = XLogRecPtrIsInvalid(start_lsn); +} + +/* + * Remove the corresponding fdwxact entry from FdwXactCtl. Also remove + * FdwXact file if a foreign transaction was saved via an earlier checkpoint. + * We could not found the FdwXact entry in the case where a crash recovery + * starts from the point where is after added but before removed the entry. + */ +void +FdwXactRedoRemove(Oid dbid, TransactionId xid, Oid serverid, + Oid userid, bool givewarning) +{ + FdwXact fdwxact; + + Assert(LWLockHeldByMeInMode(FdwXactLock, LW_EXCLUSIVE)); + Assert(RecoveryInProgress()); + + fdwxact = get_one_fdwxact(dbid, xid, serverid, userid); + + if (fdwxact == NULL) + return; + + elog(DEBUG2, "removed fdwxact entry from shared memory for foreign transaction, db %u xid %u server %u user %u id %s", + fdwxact->dbid, fdwxact->local_xid, fdwxact->serverid, + fdwxact->userid, fdwxact->fdwxact_id); + + /* Clean up entry and any files we may have left */ + if (fdwxact->ondisk) + RemoveFdwXactFile(fdwxact->dbid, fdwxact->local_xid, + fdwxact->serverid, fdwxact->userid, + givewarning); + remove_fdwxact(fdwxact); +} + +/* + * Scan the shared memory entries of FdwXact and valid them. + * + * This is run at the end of recovery, but before we allow backends to write + * WAL. + */ +void +RecoverFdwXacts(void) +{ + int i; + + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + for (i = 0; i < FdwXactCtl->num_fdwxacts; i++) + { + FdwXact fdwxact = FdwXactCtl->fdwxacts[i]; + char *buf; + + buf = ProcessFdwXactBuffer(fdwxact->dbid, fdwxact->local_xid, + fdwxact->serverid, fdwxact->userid, + fdwxact->insert_start_lsn, fdwxact->ondisk); + + if (buf == NULL) + continue; + + ereport(LOG, + (errmsg("recovering foreign transaction %u for server %u and user %u from shared memory", + fdwxact->local_xid, fdwxact->serverid, fdwxact->userid))); + + /* recovered, so reset the flag for entries generated by redo */ + fdwxact->inredo = false; + fdwxact->valid = true; + + /* + * If the foreign transaction is part of the prepared local + * transaction, it's not in in-doubt. The future COMMIT/ROLLBACK + * PREPARED can determine the fate of this foreign transaction. + */ + if (TwoPhaseExists(fdwxact->local_xid)) + { + ereport(DEBUG2, + (errmsg("clear in-doubt flag from foreign transaction %u, server %u, user %u as found the correspondinglocal prepared transaction", + fdwxact->local_xid, fdwxact->serverid, + fdwxact->userid))); + fdwxact->indoubt = false; + } + + pfree(buf); + } + LWLockRelease(FdwXactLock); +} + +bool +check_foreign_twophase_commit(int *newval, void **extra, GucSource source) +{ + ForeignTwophaseCommitLevel newForeignTwophaseCommitLevel = *newval; + + /* Parameter check */ + if (newForeignTwophaseCommitLevel > FOREIGN_TWOPHASE_COMMIT_DISABLED && + (max_prepared_foreign_xacts == 0 || max_foreign_xact_resolvers == 0)) + { + GUC_check_errdetail("Cannot enable \"foreign_twophase_commit\" when " + "\"max_prepared_foreign_transactions\" or \"max_foreign_transaction_resolvers\"" + "is zero value"); + return false; + } + + return true; +} + +/* Built in functions */ + +/* + * Structure to hold and iterate over the foreign transactions to be displayed + * by the built-in functions. + */ +typedef struct +{ + FdwXact fdwxacts; + int num_xacts; + int cur_xact; +} WorkingStatus; + +Datum +pg_foreign_xacts(PG_FUNCTION_ARGS) +{ +#define PG_PREPARED_FDWXACTS_COLS 7 + FuncCallContext *funcctx; + WorkingStatus *status; + char *xact_status; + + if (SRF_IS_FIRSTCALL()) + { + TupleDesc tupdesc; + MemoryContext oldcontext; + int num_fdwxacts = 0; + + /* create a function context for cross-call persistence */ + funcctx = SRF_FIRSTCALL_INIT(); + + /* + * Switch to memory context appropriate for multiple function calls + */ + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + + /* build tupdesc for result tuples */ + /* this had better match pg_fdwxacts view in system_views.sql */ + tupdesc = CreateTemplateTupleDesc(PG_PREPARED_FDWXACTS_COLS); + TupleDescInitEntry(tupdesc, (AttrNumber) 1, "dbid", + OIDOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 2, "transaction", + XIDOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "serverid", + OIDOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 4, "userid", + OIDOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 5, "status", + TEXTOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 6, "indoubt", + BOOLOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 7, "identifier", + TEXTOID, -1, 0); + + funcctx->tuple_desc = BlessTupleDesc(tupdesc); + + /* + * Collect status information that we will format and send out as a + * result set. + */ + status = (WorkingStatus *) palloc(sizeof(WorkingStatus)); + funcctx->user_fctx = (void *) status; + + status->fdwxacts = get_all_fdwxacts(&num_fdwxacts); + status->num_xacts = num_fdwxacts; + status->cur_xact = 0; + + MemoryContextSwitchTo(oldcontext); + } + + funcctx = SRF_PERCALL_SETUP(); + status = funcctx->user_fctx; + + while (status->cur_xact < status->num_xacts) + { + FdwXact fdwxact = &status->fdwxacts[status->cur_xact++]; + Datum values[PG_PREPARED_FDWXACTS_COLS]; + bool nulls[PG_PREPARED_FDWXACTS_COLS]; + HeapTuple tuple; + Datum result; + + if (!fdwxact->valid) + continue; + + /* + * Form tuple with appropriate data. + */ + MemSet(values, 0, sizeof(values)); + MemSet(nulls, 0, sizeof(nulls)); + + values[0] = ObjectIdGetDatum(fdwxact->dbid); + values[1] = TransactionIdGetDatum(fdwxact->local_xid); + values[2] = ObjectIdGetDatum(fdwxact->serverid); + values[3] = ObjectIdGetDatum(fdwxact->userid); + + switch (fdwxact->status) + { + case FDWXACT_STATUS_INITIAL: + xact_status = "initial"; + break; + case FDWXACT_STATUS_PREPARING: + xact_status = "preparing"; + break; + case FDWXACT_STATUS_PREPARED: + xact_status = "prepared"; + break; + case FDWXACT_STATUS_COMMITTING: + xact_status = "committing"; + break; + case FDWXACT_STATUS_ABORTING: + xact_status = "aborting"; + break; + case FDWXACT_STATUS_RESOLVED: + xact_status = "resolved"; + break; + default: + xact_status = "unknown"; + break; + } + values[4] = CStringGetTextDatum(xact_status); + values[5] = BoolGetDatum(fdwxact->indoubt); + values[6] = PointerGetDatum(cstring_to_text_with_len(fdwxact->fdwxact_id, + strlen(fdwxact->fdwxact_id))); + + tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls); + result = HeapTupleGetDatum(tuple); + SRF_RETURN_NEXT(funcctx, result); + } + + SRF_RETURN_DONE(funcctx); +} + +/* + * Built-in function to resolve a prepared foreign transaction manually. + */ +Datum +pg_resolve_foreign_xact(PG_FUNCTION_ARGS) +{ + TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0)); + Oid serverid = PG_GETARG_OID(1); + Oid userid = PG_GETARG_OID(2); + ForeignServer *server; + UserMapping *usermapping; + FdwXact fdwxact; + FdwXactRslvState *state; + FdwXactStatus prev_status; + + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg("must be superuser to resolve foreign transactions")))); + + server = GetForeignServer(serverid); + usermapping = GetUserMapping(userid, serverid); + state = create_fdwxact_state(); + + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + + fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid); + + if (fdwxact == NULL) + { + LWLockRelease(FdwXactLock); + PG_RETURN_BOOL(false); + } + + state->server = server; + state->usermapping = usermapping; + state->fdwxact_id = pstrdup(fdwxact->fdwxact_id); + + SpinLockAcquire(&fdwxact->mutex); + prev_status = fdwxact->status; + SpinLockRelease(&fdwxact->mutex); + + FdwXactDetermineTransactionFate(fdwxact, false); + + LWLockRelease(FdwXactLock); + + FdwXactResolveForeignTransaction(fdwxact, state, prev_status); + + PG_RETURN_BOOL(true); +} + +/* + * Built-in function to remove a prepared foreign transaction entry without + * resolution. The function gives a way to forget about such prepared + * transaction in case: the foreign server where it is prepared is no longer + * available, the user which prepared this transaction needs to be dropped. + */ +Datum +pg_remove_foreign_xact(PG_FUNCTION_ARGS) +{ + TransactionId xid = DatumGetTransactionId(PG_GETARG_DATUM(0)); + Oid serverid = PG_GETARG_OID(1); + Oid userid = PG_GETARG_OID(2); + FdwXact fdwxact; + + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg("must be superuser to remove foreign transactions")))); + + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + + fdwxact = get_one_fdwxact(MyDatabaseId, xid, serverid, userid); + + if (fdwxact == NULL) + PG_RETURN_BOOL(false); + + remove_fdwxact(fdwxact); + + LWLockRelease(FdwXactLock); + + PG_RETURN_BOOL(true); +} diff --git a/src/backend/access/fdwxact/launcher.c b/src/backend/access/fdwxact/launcher.c new file mode 100644 index 0000000000..45fb530916 --- /dev/null +++ b/src/backend/access/fdwxact/launcher.c @@ -0,0 +1,644 @@ +/*------------------------------------------------------------------------- + * + * launcher.c + * + * The foreign transaction resolver launcher process starts foreign + * transaction resolver processes. The launcher schedules resolver + * process to be started when arrived a requested by backend process. + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/access/fdwxact/launcher.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "funcapi.h" +#include "pgstat.h" +#include "funcapi.h" + +#include "access/fdwxact.h" +#include "access/fdwxact_launcher.h" +#include "access/fdwxact_resolver.h" +#include "access/resolver_internal.h" +#include "commands/dbcommands.h" +#include "nodes/pg_list.h" +#include "postmaster/bgworker.h" +#include "storage/ipc.h" +#include "storage/proc.h" +#include "tcop/tcopprot.h" +#include "utils/builtins.h" + +/* max sleep time between cycles (3min) */ +#define DEFAULT_NAPTIME_PER_CYCLE 180000L + +static void fdwxact_launcher_onexit(int code, Datum arg); +static void fdwxact_launcher_sighup(SIGNAL_ARGS); +static void fdwxact_launch_resolver(Oid dbid); +static bool fdwxact_relaunch_resolvers(void); + +static volatile sig_atomic_t got_SIGHUP = false; +static volatile sig_atomic_t got_SIGUSR2 = false; +FdwXactResolver *MyFdwXactResolver = NULL; + +/* + * Wake up the launcher process to retry resolution. + */ +void +FdwXactLauncherRequestToLaunchForRetry(void) +{ + if (FdwXactRslvCtl->launcher_pid != InvalidPid) + SetLatch(FdwXactRslvCtl->launcher_latch); +} + +/* + * Wake up the launcher process to request launching new resolvers + * immediately. + */ +void +FdwXactLauncherRequestToLaunch(void) +{ + if (FdwXactRslvCtl->launcher_pid != InvalidPid) + kill(FdwXactRslvCtl->launcher_pid, SIGUSR2); +} + +/* Report shared memory space needed by FdwXactRsoverShmemInit */ +Size +FdwXactRslvShmemSize(void) +{ + Size size = 0; + + size = add_size(size, SizeOfFdwXactRslvCtlData); + size = add_size(size, mul_size(max_foreign_xact_resolvers, + sizeof(FdwXactResolver))); + + return size; +} + +/* + * Allocate and initialize foreign transaction resolver shared + * memory. + */ +void +FdwXactRslvShmemInit(void) +{ + bool found; + + FdwXactRslvCtl = ShmemInitStruct("Foreign transactions resolvers", + FdwXactRslvShmemSize(), + &found); + + if (!IsUnderPostmaster) + { + int slot; + + /* First time through, so initialize */ + MemSet(FdwXactRslvCtl, 0, FdwXactRslvShmemSize()); + + SHMQueueInit(&(FdwXactRslvCtl->fdwxact_queue)); + + for (slot = 0; slot < max_foreign_xact_resolvers; slot++) + { + FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[slot]; + + resolver->pid = InvalidPid; + resolver->dbid = InvalidOid; + resolver->in_use = false; + resolver->last_resolved_time = 0; + resolver->latch = NULL; + SpinLockInit(&(resolver->mutex)); + } + } +} + +/* + * Cleanup function for fdwxact launcher + * + * Called on fdwxact launcher exit. + */ +static void +fdwxact_launcher_onexit(int code, Datum arg) +{ + FdwXactRslvCtl->launcher_pid = InvalidPid; +} + +/* SIGHUP: set flag to reload configuration at next convenient time */ +static void +fdwxact_launcher_sighup(SIGNAL_ARGS) +{ + int save_errno = errno; + + got_SIGHUP = true; + + SetLatch(MyLatch); + + errno = save_errno; +} + +/* SIGUSR2: set flag to launch new resolver process immediately */ +static void +fdwxact_launcher_sigusr2(SIGNAL_ARGS) +{ + int save_errno = errno; + + got_SIGUSR2 = true; + SetLatch(MyLatch); + + errno = save_errno; +} + +/* + * Main loop for the fdwxact launcher process. + */ +void +FdwXactLauncherMain(Datum main_arg) +{ + TimestampTz last_start_time = 0; + + ereport(DEBUG1, + (errmsg("fdwxact resolver launcher started"))); + + before_shmem_exit(fdwxact_launcher_onexit, (Datum) 0); + + Assert(FdwXactRslvCtl->launcher_pid == 0); + FdwXactRslvCtl->launcher_pid = MyProcPid; + FdwXactRslvCtl->launcher_latch = &MyProc->procLatch; + + pqsignal(SIGHUP, fdwxact_launcher_sighup); + pqsignal(SIGUSR2, fdwxact_launcher_sigusr2); + pqsignal(SIGTERM, die); + BackgroundWorkerUnblockSignals(); + + BackgroundWorkerInitializeConnection(NULL, NULL, 0); + + /* Enter main loop */ + for (;;) + { + TimestampTz now; + long wait_time = DEFAULT_NAPTIME_PER_CYCLE; + int rc; + + CHECK_FOR_INTERRUPTS(); + ResetLatch(MyLatch); + + now = GetCurrentTimestamp(); + + /* + * Limit the start retry to once a foreign_xact_resolution_retry_interval + * but always starts when the backend requested. + */ + if (got_SIGUSR2 || + TimestampDifferenceExceeds(last_start_time, now, + foreign_xact_resolution_retry_interval)) + { + MemoryContext oldctx; + MemoryContext subctx; + bool launched; + + if (got_SIGUSR2) + got_SIGUSR2 = false; + + subctx = AllocSetContextCreate(TopMemoryContext, + "Foreign Transaction Launcher", + ALLOCSET_DEFAULT_SIZES); + oldctx = MemoryContextSwitchTo(subctx); + + /* + * Launch foreign transaction resolvers that are requested + * but not running. + */ + launched = fdwxact_relaunch_resolvers(); + if (launched) + { + last_start_time = now; + wait_time = foreign_xact_resolution_retry_interval; + } + + /* Switch back to original memory context. */ + MemoryContextSwitchTo(oldctx); + /* Clean the temporary memory. */ + MemoryContextDelete(subctx); + } + else + { + /* + * The wait in previous cycle was interrupted in less than + * foreign_xact_resolution_retry_interval since last resolver + * started, this usually means crash of the resolver, so we + * should retry in foreign_xact_resolution_retry_interval again. + */ + wait_time = foreign_xact_resolution_retry_interval; + } + + /* Wait for more work */ + rc = WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, + wait_time, + WAIT_EVENT_FDWXACT_LAUNCHER_MAIN); + + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1); + + if (rc & WL_LATCH_SET) + { + ResetLatch(MyLatch); + CHECK_FOR_INTERRUPTS(); + } + + if (got_SIGHUP) + { + got_SIGHUP = false; + ProcessConfigFile(PGC_SIGHUP); + } + } + + /* Not reachable */ +} + +/* + * Request launcher to launch a new foreign transaction resolver process + * or wake up the resolver if it's already running. + */ +void +FdwXactLaunchOrWakeupResolver(void) +{ + volatile FdwXactResolver *resolver; + bool found = false; + int i; + + /* + * Looking for a resolver process that is running and working on the + * same database. + */ + LWLockAcquire(FdwXactResolverLock, LW_SHARED); + for (i = 0; i < max_foreign_xact_resolvers; i++) + { + resolver = &FdwXactRslvCtl->resolvers[i]; + + if (resolver->in_use && + resolver->dbid == MyDatabaseId) + { + found = true; + break; + } + } + LWLockRelease(FdwXactResolverLock); + + if (found) + { + /* Found the running resolver */ + elog(DEBUG1, + "found a running foreign transaction resolver process for database %u", + MyDatabaseId); + + /* + * Wakeup the resolver. It's possible that the resolver is starting up + * and doesn't attach its slot yet. Since the resolver will find FdwXact + * entry we inserted soon we don't anything. + */ + if (resolver->latch) + SetLatch(resolver->latch); + + return; + } + + /* Otherwise wake up the launcher to launch new resolver */ + FdwXactLauncherRequestToLaunch(); +} + +/* + * Launch a foreign transaction resolver process that will connect to given + * 'dbid'. + */ +static void +fdwxact_launch_resolver(Oid dbid) +{ + BackgroundWorker bgw; + BackgroundWorkerHandle *bgw_handle; + FdwXactResolver *resolver; + int unused_slot; + int i; + + LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE); + + /* Find unused resolver slot */ + for (i = 0; i < max_foreign_xact_resolvers; i++) + { + FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i]; + + if (!resolver->in_use) + { + unused_slot = i; + break; + } + } + + /* No unused found */ + if (unused_slot > max_foreign_xact_resolvers) + ereport(ERROR, + (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), + errmsg("out of foreign trasanction resolver slots"), + errhint("You might need to increase max_foreign_transaction_resolvers."))); + + resolver = &FdwXactRslvCtl->resolvers[unused_slot]; + resolver->in_use = true; + resolver->dbid = dbid; + LWLockRelease(FdwXactResolverLock); + + /* Register the new dynamic worker */ + memset(&bgw, 0, sizeof(bgw)); + bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | + BGWORKER_BACKEND_DATABASE_CONNECTION; + bgw.bgw_start_time = BgWorkerStart_RecoveryFinished; + snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres"); + snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactResolverMain"); + snprintf(bgw.bgw_name, BGW_MAXLEN, + "foreign transaction resolver for database %u", resolver->dbid); + snprintf(bgw.bgw_type, BGW_MAXLEN, "foreign transaction resolver"); + bgw.bgw_restart_time = BGW_NEVER_RESTART; + bgw.bgw_notify_pid = MyProcPid; + bgw.bgw_main_arg = Int32GetDatum(unused_slot); + + if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle)) + { + /* Failed to launch, cleanup the worker slot */ + SpinLockAcquire(&(MyFdwXactResolver->mutex)); + resolver->in_use = false; + SpinLockRelease(&(MyFdwXactResolver->mutex)); + + ereport(WARNING, + (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), + errmsg("out of background worker slots"), + errhint("You might need to increase max_worker_processes."))); + } + + /* + * We don't need to wait until it attaches here because we're going to wait + * until all foreign transactions are resolved. + */ +} + +/* + * Launch or relaunch foreign transaction resolvers on database that has + * at least one FdwXact entry but no resolvers are running on it. + */ +static bool +fdwxact_relaunch_resolvers(void) +{ + HTAB *resolver_dbs; /* DBs resolver's running on */ + HTAB *fdwxact_dbs; /* DBs having at least one FdwXact entry */ + HASHCTL ctl; + HASH_SEQ_STATUS status; + Oid *entry; + bool launched; + int i; + + memset(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(Oid); + ctl.entrysize = sizeof(Oid); + resolver_dbs = hash_create("resolver dblist", + 32, &ctl, HASH_ELEM | HASH_BLOBS); + fdwxact_dbs = hash_create("fdwxact dblist", + 32, &ctl, HASH_ELEM | HASH_BLOBS); + + /* Collect database oids that has at least one non-in-doubt FdwXact entry */ + LWLockAcquire(FdwXactLock, LW_SHARED); + for (i = 0; i < FdwXactCtl->num_fdwxacts; i++) + { + FdwXact fdwxact = FdwXactCtl->fdwxacts[i]; + + if (fdwxact->indoubt) + continue; + + hash_search(fdwxact_dbs, &(fdwxact->dbid), HASH_ENTER, NULL); + } + LWLockRelease(FdwXactLock); + + /* There is no FdwXact entry, no need to launch new one */ + if (hash_get_num_entries(fdwxact_dbs) == 0) + return false; + + /* Collect database oids on which resolvers are running */ + LWLockAcquire(FdwXactResolverLock, LW_SHARED); + for (i = 0; i < max_foreign_xact_resolvers; i++) + { + FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i]; + + if (!resolver->in_use) + continue; + + hash_search(resolver_dbs, &(resolver->dbid), HASH_ENTER, NULL); + } + LWLockRelease(FdwXactResolverLock); + + /* Find DBs on which no resolvers are running and launch new one on them */ + hash_seq_init(&status, fdwxact_dbs); + while ((entry = (Oid *) hash_seq_search(&status)) != NULL) + { + bool found; + + hash_search(resolver_dbs, entry, HASH_FIND, &found); + + if (!found) + { + /* No resolver is running on this database, launch new one */ + fdwxact_launch_resolver(*entry); + launched = true; + } + } + + return launched; +} + +/* + * FdwXactLauncherRegister + * Register a background worker running the foreign transaction + * launcher. + */ +void +FdwXactLauncherRegister(void) +{ + BackgroundWorker bgw; + + if (max_foreign_xact_resolvers == 0) + return; + + memset(&bgw, 0, sizeof(bgw)); + bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | + BGWORKER_BACKEND_DATABASE_CONNECTION; + bgw.bgw_start_time = BgWorkerStart_RecoveryFinished; + snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres"); + snprintf(bgw.bgw_function_name, BGW_MAXLEN, "FdwXactLauncherMain"); + snprintf(bgw.bgw_name, BGW_MAXLEN, + "foreign transaction launcher"); + snprintf(bgw.bgw_type, BGW_MAXLEN, + "foreign transaction launcher"); + bgw.bgw_restart_time = 5; + bgw.bgw_notify_pid = 0; + bgw.bgw_main_arg = (Datum) 0; + + RegisterBackgroundWorker(&bgw); +} + +bool +IsFdwXactLauncher(void) +{ + return FdwXactRslvCtl->launcher_pid == MyProcPid; +} + +/* + * Stop the fdwxact resolver running on the given database. + */ +Datum +pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS) +{ + Oid dbid = PG_GETARG_OID(0); + FdwXactResolver *resolver = NULL; + int i; + + /* Must be super user */ + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("permission denied to stop foreign transaction resolver"))); + + if (!OidIsValid(dbid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid database id"))); + + LWLockAcquire(FdwXactResolverLock, LW_SHARED); + + /* Find the running resolver process on the given database */ + for (i = 0; i < max_foreign_xact_resolvers; i++) + { + resolver = &FdwXactRslvCtl->resolvers[i]; + + /* found! */ + if (resolver->in_use && resolver->dbid == dbid) + break; + } + + if (i >= max_foreign_xact_resolvers) + ereport(ERROR, + (errmsg("there is no running foreign trasaction resolver process on database %d", + dbid))); + + /* Found the resolver, terminate it ... */ + kill(resolver->pid, SIGTERM); + + /* ... and wait for it to die */ + for (;;) + { + int rc; + + /* is it gone? */ + if (!resolver->in_use) + break; + + LWLockRelease(FdwXactResolverLock); + + /* Wait a bit --- we don't expect to have to wait long. */ + rc = WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + 10L, WAIT_EVENT_BGWORKER_SHUTDOWN); + + if (rc & WL_LATCH_SET) + { + ResetLatch(MyLatch); + CHECK_FOR_INTERRUPTS(); + } + + LWLockAcquire(FdwXactResolverLock, LW_SHARED); + } + + LWLockRelease(FdwXactResolverLock); + + PG_RETURN_BOOL(true); +} + +/* + * Returns activity of all foreign transaction resolvers. + */ +Datum +pg_stat_get_foreign_xact(PG_FUNCTION_ARGS) +{ +#define PG_STAT_GET_FDWXACT_RESOLVERS_COLS 3 + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + TupleDesc tupdesc; + Tuplestorestate *tupstore; + MemoryContext per_query_ctx; + MemoryContext oldcontext; + int i; + + /* check to see if caller supports us returning a tuplestore */ + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("set-valued function called in context that cannot accept a set"))); + if (!(rsinfo->allowedModes & SFRM_Materialize)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("materialize mode required, but it is not " \ + "allowed in this context"))); + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); + + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; + oldcontext = MemoryContextSwitchTo(per_query_ctx); + + tupstore = tuplestore_begin_heap(true, false, work_mem); + rsinfo->returnMode = SFRM_Materialize; + rsinfo->setResult = tupstore; + rsinfo->setDesc = tupdesc; + + MemoryContextSwitchTo(oldcontext); + + for (i = 0; i < max_foreign_xact_resolvers; i++) + { + FdwXactResolver *resolver = &FdwXactRslvCtl->resolvers[i]; + pid_t pid; + Oid dbid; + TimestampTz last_resolved_time; + Datum values[PG_STAT_GET_FDWXACT_RESOLVERS_COLS]; + bool nulls[PG_STAT_GET_FDWXACT_RESOLVERS_COLS]; + + + SpinLockAcquire(&(resolver->mutex)); + if (resolver->pid == InvalidPid) + { + SpinLockRelease(&(resolver->mutex)); + continue; + } + + pid = resolver->pid; + dbid = resolver->dbid; + last_resolved_time = resolver->last_resolved_time; + SpinLockRelease(&(resolver->mutex)); + + memset(nulls, 0, sizeof(nulls)); + /* pid */ + values[0] = Int32GetDatum(pid); + + /* dbid */ + values[1] = ObjectIdGetDatum(dbid); + + /* last_resolved_time */ + if (last_resolved_time == 0) + nulls[2] = true; + else + values[2] = TimestampTzGetDatum(last_resolved_time); + + tuplestore_putvalues(tupstore, tupdesc, values, nulls); + } + + /* clean up and return the tuplestore */ + tuplestore_donestoring(tupstore); + + return (Datum) 0; +} diff --git a/src/backend/access/fdwxact/resolver.c b/src/backend/access/fdwxact/resolver.c new file mode 100644 index 0000000000..9298877f10 --- /dev/null +++ b/src/backend/access/fdwxact/resolver.c @@ -0,0 +1,344 @@ +/*------------------------------------------------------------------------- + * + * resolver.c + * + * The foreign transaction resolver background worker resolves foreign + * transactions that participate to a distributed transaction. A resolver + * process is started by foreign transaction launcher for each databases. + * + * A resolver process continues to resolve foreign transactions on the + * database, which the backend process is waiting for resolution. + * + * Normal termination is by SIGTERM, which instructs the resolver process + * to exit(0) at the next convenient moment. Emergency termination is by + * SIGQUIT; like any backend. The resolver process also terminate by timeouts + * only if there is no pending foreign transactions on the database waiting + * to be resolved. + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/access/fdwxact/resolver.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include <signal.h> +#include <unistd.h> + +#include "access/fdwxact.h" +#include "access/fdwxact_resolver.h" +#include "access/fdwxact_launcher.h" +#include "access/resolver_internal.h" +#include "access/transam.h" +#include "access/xact.h" +#include "commands/dbcommands.h" +#include "funcapi.h" +#include "libpq/libpq.h" +#include "miscadmin.h" +#include "pgstat.h" +#include "postmaster/bgworker.h" +#include "storage/ipc.h" +#include "tcop/tcopprot.h" +#include "utils/builtins.h" +#include "utils/timeout.h" +#include "utils/timestamp.h" + +/* max sleep time between cycles (3min) */ +#define DEFAULT_NAPTIME_PER_CYCLE 180000L + +/* GUC parameters */ +int foreign_xact_resolution_retry_interval; +int foreign_xact_resolver_timeout = 60 * 1000; +bool foreign_xact_resolve_indoubt_xacts; + +FdwXactRslvCtlData *FdwXactRslvCtl; + +static void FXRslvLoop(void); +static long FXRslvComputeSleepTime(TimestampTz now, TimestampTz targetTime); +static void FXRslvCheckTimeout(TimestampTz now); + +static void fdwxact_resolver_sighup(SIGNAL_ARGS); +static void fdwxact_resolver_onexit(int code, Datum arg); +static void fdwxact_resolver_detach(void); +static void fdwxact_resolver_attach(int slot); + +/* Flags set by signal handlers */ +static volatile sig_atomic_t got_SIGHUP = false; + +/* Set flag to reload configuration at next convenient time */ +static void +fdwxact_resolver_sighup(SIGNAL_ARGS) +{ + int save_errno = errno; + + got_SIGHUP = true; + + SetLatch(MyLatch); + + errno = save_errno; +} + +/* + * Detach the resolver and cleanup the resolver info. + */ +static void +fdwxact_resolver_detach(void) +{ + /* Block concurrent access */ + LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE); + + MyFdwXactResolver->pid = InvalidPid; + MyFdwXactResolver->in_use = false; + MyFdwXactResolver->dbid = InvalidOid; + + LWLockRelease(FdwXactResolverLock); +} + +/* + * Cleanup up foreign transaction resolver info. + */ +static void +fdwxact_resolver_onexit(int code, Datum arg) +{ + fdwxact_resolver_detach(); + + FdwXactLauncherRequestToLaunch(); +} + +/* + * Attach to a slot. + */ +static void +fdwxact_resolver_attach(int slot) +{ + /* Block concurrent access */ + LWLockAcquire(FdwXactResolverLock, LW_EXCLUSIVE); + + Assert(slot >= 0 && slot < max_foreign_xact_resolvers); + MyFdwXactResolver = &FdwXactRslvCtl->resolvers[slot]; + + if (!MyFdwXactResolver->in_use) + { + LWLockRelease(FdwXactResolverLock); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("foreign transaction resolver slot %d is empty, cannot attach", + slot))); + } + + Assert(OidIsValid(MyFdwXactResolver->dbid)); + + MyFdwXactResolver->pid = MyProcPid; + MyFdwXactResolver->latch = &MyProc->procLatch; + MyFdwXactResolver->last_resolved_time = 0; + + before_shmem_exit(fdwxact_resolver_onexit, (Datum) 0); + + LWLockRelease(FdwXactResolverLock); +} + +/* Foreign transaction resolver entry point */ +void +FdwXactResolverMain(Datum main_arg) +{ + int slot = DatumGetInt32(main_arg); + + /* Attach to a slot */ + fdwxact_resolver_attach(slot); + + /* Establish signal handlers */ + pqsignal(SIGHUP, fdwxact_resolver_sighup); + pqsignal(SIGTERM, die); + BackgroundWorkerUnblockSignals(); + + /* Connect to our database */ + BackgroundWorkerInitializeConnectionByOid(MyFdwXactResolver->dbid, InvalidOid, 0); + + StartTransactionCommand(); + + ereport(LOG, + (errmsg("foreign transaction resolver for database \"%s\" has started", + get_database_name(MyFdwXactResolver->dbid)))); + + CommitTransactionCommand(); + + /* Initialize stats to a sanish value */ + MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp(); + + /* Run the main loop */ + FXRslvLoop(); + + proc_exit(0); +} + +/* + * Fdwxact resolver main loop + */ +static void +FXRslvLoop(void) +{ + MemoryContext resolver_ctx; + + resolver_ctx = AllocSetContextCreate(TopMemoryContext, + "Foreign Transaction Resolver", + ALLOCSET_DEFAULT_SIZES); + + /* Enter main loop */ + for (;;) + { + PGPROC *waiter = NULL; + TransactionId waitXid = InvalidTransactionId; + TimestampTz resolutionTs = -1; + int rc; + TimestampTz now; + long sleep_time = DEFAULT_NAPTIME_PER_CYCLE; + + ResetLatch(MyLatch); + + CHECK_FOR_INTERRUPTS(); + + MemoryContextSwitchTo(resolver_ctx); + + if (got_SIGHUP) + { + got_SIGHUP = false; + ProcessConfigFile(PGC_SIGHUP); + } + + now = GetCurrentTimestamp(); + + /* + * Process waiter until either the queue gets empty or got the waiter + * that has future resolution time. + */ + while ((waiter = FdwXactGetWaiter(&resolutionTs, &waitXid)) != NULL) + { + CHECK_FOR_INTERRUPTS(); + Assert(TransactionIdIsValid(waitXid)); + + if (resolutionTs > now) + break; + + elog(DEBUG2, "resolver got one waiter with xid %u", waitXid); + + /* Resolve the waiting distributed transaction */ + StartTransactionCommand(); + FdwXactResolveTransactionAndReleaseWaiter(MyDatabaseId, waitXid, + waiter); + CommitTransactionCommand(); + + /* Update my stats */ + SpinLockAcquire(&(MyFdwXactResolver->mutex)); + MyFdwXactResolver->last_resolved_time = GetCurrentTimestamp(); + SpinLockRelease(&(MyFdwXactResolver->mutex)); + } + + FXRslvCheckTimeout(now); + + sleep_time = FXRslvComputeSleepTime(now, resolutionTs); + + MemoryContextResetAndDeleteChildren(resolver_ctx); + MemoryContextSwitchTo(TopMemoryContext); + + rc = WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, + sleep_time, + WAIT_EVENT_FDWXACT_RESOLVER_MAIN); + + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1); + } +} + +/* + * Check whether there have been foreign transactions by the backend within + * foreign_xact_resolver_timeout and shutdown if not. + */ +static void +FXRslvCheckTimeout(TimestampTz now) +{ + TimestampTz last_resolved_time; + TimestampTz timeout; + + if (foreign_xact_resolver_timeout == 0) + return; + + last_resolved_time = MyFdwXactResolver->last_resolved_time; + timeout = TimestampTzPlusMilliseconds(last_resolved_time, + foreign_xact_resolver_timeout); + + if (now < timeout) + return; + + LWLockAcquire(FdwXactResolutionLock, LW_SHARED); + if (!FdwXactWaiterExists(MyDatabaseId)) + { + StartTransactionCommand(); + ereport(LOG, + (errmsg("foreign transaction resolver for database \"%s\" will stop because the timeout", + get_database_name(MyDatabaseId)))); + CommitTransactionCommand(); + + /* + * Keep holding FdwXactResolutionLock until detached the slot. It is + * necessary to prevent a race condition; a waiter enqueues after + * checked FdwXactWaiterExists. + */ + fdwxact_resolver_detach(); + LWLockRelease(FdwXactResolutionLock); + proc_exit(0); + } + else + elog(DEBUG2, "resolver reached to the timeout but don't exist as the queue is not empty"); + + LWLockRelease(FdwXactResolutionLock); +} + +/* + * Compute how long we should sleep by the next cycle. We can sleep until the time + * out or the next resolution time given by nextResolutionTs. + */ +static long +FXRslvComputeSleepTime(TimestampTz now, TimestampTz nextResolutionTs) +{ + long sleeptime = DEFAULT_NAPTIME_PER_CYCLE; + + if (foreign_xact_resolver_timeout > 0) + { + TimestampTz timeout; + long sec_to_timeout; + int microsec_to_timeout; + + /* Compute relative time until wakeup. */ + timeout = TimestampTzPlusMilliseconds(MyFdwXactResolver->last_resolved_time, + foreign_xact_resolver_timeout); + TimestampDifference(now, timeout, + &sec_to_timeout, µsec_to_timeout); + + sleeptime = Min(sleeptime, + sec_to_timeout * 1000 + microsec_to_timeout / 1000); + } + + if (nextResolutionTs > 0) + { + long sec_to_timeout; + int microsec_to_timeout; + + TimestampDifference(now, nextResolutionTs, + &sec_to_timeout, µsec_to_timeout); + + sleeptime = Min(sleeptime, + sec_to_timeout * 1000 + microsec_to_timeout / 1000); + } + + return sleeptime; +} + +bool +IsFdwXactResolver(void) +{ + return MyFdwXactResolver != NULL; +} diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile index f88d72fd86..982c1a36cc 100644 --- a/src/backend/access/rmgrdesc/Makefile +++ b/src/backend/access/rmgrdesc/Makefile @@ -13,6 +13,7 @@ OBJS = \ clogdesc.o \ committsdesc.o \ dbasedesc.o \ + fdwxactdesc.o \ genericdesc.o \ gindesc.o \ gistdesc.o \ diff --git a/src/backend/access/rmgrdesc/fdwxactdesc.c b/src/backend/access/rmgrdesc/fdwxactdesc.c new file mode 100644 index 0000000000..fe0cef9472 --- /dev/null +++ b/src/backend/access/rmgrdesc/fdwxactdesc.c @@ -0,0 +1,58 @@ +/*------------------------------------------------------------------------- + * + * fdwxactdesc.c + * PostgreSQL global transaction manager for foreign server. + * + * This module describes the WAL records for foreign transaction manager. + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * src/backend/access/rmgrdesc/fdwxactdesc.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "access/fdwxact_xlog.h" + +void +fdwxact_desc(StringInfo buf, XLogReaderState *record) +{ + char *rec = XLogRecGetData(record); + uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; + + if (info == XLOG_FDWXACT_INSERT) + { + FdwXactOnDiskData *fdwxact_insert = (FdwXactOnDiskData *) rec; + + appendStringInfo(buf, "server: %u,", fdwxact_insert->serverid); + appendStringInfo(buf, " user: %u,", fdwxact_insert->userid); + appendStringInfo(buf, " database: %u,", fdwxact_insert->dbid); + appendStringInfo(buf, " local xid: %u,", fdwxact_insert->local_xid); + appendStringInfo(buf, " id: %s", fdwxact_insert->fdwxact_id); + } + else + { + xl_fdwxact_remove *fdwxact_remove = (xl_fdwxact_remove *) rec; + + appendStringInfo(buf, "server: %u,", fdwxact_remove->serverid); + appendStringInfo(buf, " user: %u,", fdwxact_remove->userid); + appendStringInfo(buf, " database: %u,", fdwxact_remove->dbid); + appendStringInfo(buf, " local xid: %u", fdwxact_remove->xid); + } + +} + +const char * +fdwxact_identify(uint8 info) +{ + switch (info & ~XLR_INFO_MASK) + { + case XLOG_FDWXACT_INSERT: + return "NEW FOREIGN TRANSACTION"; + case XLOG_FDWXACT_REMOVE: + return "REMOVE FOREIGN TRANSACTION"; + } + /* Keep compiler happy */ + return NULL; +} diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index 33060f3042..1d4e1c82e1 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -114,7 +114,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record) appendStringInfo(buf, "max_connections=%d max_worker_processes=%d " "max_wal_senders=%d max_prepared_xacts=%d " "max_locks_per_xact=%d wal_level=%s " - "wal_log_hints=%s track_commit_timestamp=%s", + "wal_log_hints=%s track_commit_timestamp=%s " + "max_prepared_foreign_transactions=%d", xlrec.MaxConnections, xlrec.max_worker_processes, xlrec.max_wal_senders, @@ -122,7 +123,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record) xlrec.max_locks_per_xact, wal_level_str, xlrec.wal_log_hints ? "on" : "off", - xlrec.track_commit_timestamp ? "on" : "off"); + xlrec.track_commit_timestamp ? "on" : "off", + xlrec.max_prepared_foreign_xacts); } else if (info == XLOG_FPW_CHANGE) { diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c index 58091f6b52..200cf9d067 100644 --- a/src/backend/access/transam/rmgr.c +++ b/src/backend/access/transam/rmgr.c @@ -10,6 +10,7 @@ #include "access/brin_xlog.h" #include "access/clog.h" #include "access/commit_ts.h" +#include "access/fdwxact.h" #include "access/generic_xlog.h" #include "access/ginxlog.h" #include "access/gistxlog.h" diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 529976885f..2c9af36bbb 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -77,6 +77,7 @@ #include <unistd.h> #include "access/commit_ts.h" +#include "access/fdwxact.h" #include "access/htup_details.h" #include "access/subtrans.h" #include "access/transam.h" @@ -850,6 +851,35 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held) return result; } +/* + * TwoPhaseExists + * Return true if there is a prepared transaction specified by XID + */ +bool +TwoPhaseExists(TransactionId xid) +{ + int i; + bool found = false; + + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); + + for (i = 0; i < TwoPhaseState->numPrepXacts; i++) + { + GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; + PGXACT *pgxact = &ProcGlobal->allPgXact[gxact->pgprocno]; + + if (pgxact->xid == xid) + { + found = true; + break; + } + } + + LWLockRelease(TwoPhaseStateLock); + + return found; +} + /* * TwoPhaseGetDummyBackendId * Get the dummy backend ID for prepared transaction specified by XID @@ -2262,6 +2292,12 @@ RecordTransactionCommitPrepared(TransactionId xid, * in the procarray and continue to hold locks. */ SyncRepWaitForLSN(recptr, true); + + /* + * Wait for foreign transaction prepared as part of this prepared + * transaction to be committed. + */ + FdwXactWaitToBeResolved(xid, true); } /* @@ -2321,6 +2357,12 @@ RecordTransactionAbortPrepared(TransactionId xid, * in the procarray and continue to hold locks. */ SyncRepWaitForLSN(recptr, false); + + /* + * Wait for foreign transaction prepared as part of this prepared + * transaction to be committed. + */ + FdwXactWaitToBeResolved(xid, false); } /* diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 5353b6ab0b..5b67056c65 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -21,6 +21,7 @@ #include <unistd.h> #include "access/commit_ts.h" +#include "access/fdwxact.h" #include "access/multixact.h" #include "access/parallel.h" #include "access/subtrans.h" @@ -1218,6 +1219,7 @@ RecordTransactionCommit(void) SharedInvalidationMessage *invalMessages = NULL; bool RelcacheInitFileInval = false; bool wrote_xlog; + bool need_commit_globally; /* Get data needed for commit record */ nrels = smgrGetPendingDeletes(true, &rels); @@ -1226,6 +1228,7 @@ RecordTransactionCommit(void) nmsgs = xactGetCommittedInvalidationMessages(&invalMessages, &RelcacheInitFileInval); wrote_xlog = (XactLastRecEnd != 0); + need_commit_globally = FdwXactIsForeignTwophaseCommitRequired(); /* * If we haven't been assigned an XID yet, we neither can, nor do we want @@ -1264,12 +1267,13 @@ RecordTransactionCommit(void) } /* - * If we didn't create XLOG entries, we're done here; otherwise we - * should trigger flushing those entries the same as a commit record + * If we didn't create XLOG entries and the transaction does not need + * to be committed using two-phase commit. we're done here; otherwise + * we should trigger flushing those entries the same as a commit record * would. This will primarily happen for HOT pruning and the like; we * want these to be flushed to disk in due time. */ - if (!wrote_xlog) + if (!wrote_xlog && !need_commit_globally) goto cleanup; } else @@ -1427,6 +1431,14 @@ RecordTransactionCommit(void) if (wrote_xlog && markXidCommitted) SyncRepWaitForLSN(XactLastRecEnd, true); + /* + * Wait for prepared foreign transaction to be resolved, if required. + * We only want to wait if we prepared foreign transaction in this + * transaction. + */ + if (need_commit_globally && markXidCommitted) + FdwXactWaitToBeResolved(xid, true); + /* remember end of last commit record */ XactLastCommitEnd = XactLastRecEnd; @@ -2086,6 +2098,10 @@ CommitTransaction(void) break; } + + /* Pre-commit step for foreign transactions */ + PreCommit_FdwXacts(); + CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_PRE_COMMIT : XACT_EVENT_PRE_COMMIT); @@ -2246,6 +2262,7 @@ CommitTransaction(void) AtEOXact_PgStat(true, is_parallel_worker); AtEOXact_Snapshot(true, false); AtEOXact_ApplyLauncher(true); + AtEOXact_FdwXacts(true); pgstat_report_xact_timestamp(0); CurrentResourceOwner = NULL; @@ -2333,6 +2350,8 @@ PrepareTransaction(void) * the transaction-abort path. */ + AtPrepare_FdwXacts(); + /* Shut down the deferred-trigger manager */ AfterTriggerEndXact(true); @@ -2527,6 +2546,7 @@ PrepareTransaction(void) AtEOXact_Files(true); AtEOXact_ComboCid(); AtEOXact_HashTables(true); + AtEOXact_FdwXacts(true); /* don't call AtEOXact_PgStat here; we fixed pgstat state above */ AtEOXact_Snapshot(true, true); pgstat_report_xact_timestamp(0); @@ -2732,6 +2752,7 @@ AbortTransaction(void) AtEOXact_HashTables(false); AtEOXact_PgStat(false, is_parallel_worker); AtEOXact_ApplyLauncher(false); + AtEOXact_FdwXacts(false); pgstat_report_xact_timestamp(0); } diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6bc1a6b46d..428a974c51 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -24,6 +24,7 @@ #include "access/clog.h" #include "access/commit_ts.h" +#include "access/fdwxact.h" #include "access/heaptoast.h" #include "access/multixact.h" #include "access/rewriteheap.h" @@ -5246,6 +5247,7 @@ BootStrapXLOG(void) ControlFile->max_worker_processes = max_worker_processes; ControlFile->max_wal_senders = max_wal_senders; ControlFile->max_prepared_xacts = max_prepared_xacts; + ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts; ControlFile->max_locks_per_xact = max_locks_per_xact; ControlFile->wal_level = wal_level; ControlFile->wal_log_hints = wal_log_hints; @@ -6189,6 +6191,9 @@ CheckRequiredParameterValues(void) RecoveryRequiresIntParameter("max_wal_senders", max_wal_senders, ControlFile->max_wal_senders); + RecoveryRequiresIntParameter("max_prepared_foreign_transactions", + max_prepared_foreign_xacts, + ControlFile->max_prepared_foreign_xacts); RecoveryRequiresIntParameter("max_prepared_transactions", max_prepared_xacts, ControlFile->max_prepared_xacts); @@ -6729,14 +6734,15 @@ StartupXLOG(void) restoreTimeLineHistoryFiles(ThisTimeLineID, recoveryTargetTLI); /* - * Before running in recovery, scan pg_twophase and fill in its status to - * be able to work on entries generated by redo. Doing a scan before - * taking any recovery action has the merit to discard any 2PC files that - * are newer than the first record to replay, saving from any conflicts at - * replay. This avoids as well any subsequent scans when doing recovery - * of the on-disk two-phase data. + * Before running in recovery, scan pg_twophase and pg_fdwxacts, and then + * fill in its status to be able to work on entries generated by redo. + * Doing a scan before taking any recovery action has the merit to discard + * any state files that are newer than the first record to replay, saving + * from any conflicts at replay. This avoids as well any subsequent scans + * when doing recovery of the on-disk two-phase or fdwxact data. */ restoreTwoPhaseData(); + restoreFdwXactData(); lastFullPageWrites = checkPoint.fullPageWrites; @@ -6928,7 +6934,10 @@ StartupXLOG(void) InitRecoveryTransactionEnvironment(); if (wasShutdown) + { oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids); + oldestActiveXID = PrescanFdwXacts(oldestActiveXID); + } else oldestActiveXID = checkPoint.oldestActiveXid; Assert(TransactionIdIsValid(oldestActiveXID)); @@ -7424,6 +7433,7 @@ StartupXLOG(void) * as potential problems are detected before any on-disk change is done. */ oldestActiveXID = PrescanPreparedTransactions(NULL, NULL); + oldestActiveXID = PrescanFdwXacts(oldestActiveXID); /* * Consider whether we need to assign a new timeline ID. @@ -7754,6 +7764,9 @@ StartupXLOG(void) /* Reload shared-memory state for prepared transactions */ RecoverPreparedTransactions(); + /* Load all foreign transaction entries from disk to memory */ + RecoverFdwXacts(); + /* * Shutdown the recovery environment. This must occur after * RecoverPreparedTransactions(), see notes for lock_twophase_recover() @@ -9029,6 +9042,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags) CheckPointReplicationOrigin(); /* We deliberately delay 2PC checkpointing as long as possible */ CheckPointTwoPhase(checkPointRedo); + CheckPointFdwXacts(checkPointRedo); } /* @@ -9462,8 +9476,10 @@ XLogReportParameters(void) max_worker_processes != ControlFile->max_worker_processes || max_wal_senders != ControlFile->max_wal_senders || max_prepared_xacts != ControlFile->max_prepared_xacts || + max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts || max_locks_per_xact != ControlFile->max_locks_per_xact || - track_commit_timestamp != ControlFile->track_commit_timestamp) + track_commit_timestamp != ControlFile->track_commit_timestamp || + max_prepared_foreign_xacts != ControlFile->max_prepared_foreign_xacts) { /* * The change in number of backend slots doesn't need to be WAL-logged @@ -9481,6 +9497,7 @@ XLogReportParameters(void) xlrec.max_worker_processes = max_worker_processes; xlrec.max_wal_senders = max_wal_senders; xlrec.max_prepared_xacts = max_prepared_xacts; + xlrec.max_prepared_foreign_xacts = max_prepared_foreign_xacts; xlrec.max_locks_per_xact = max_locks_per_xact; xlrec.wal_level = wal_level; xlrec.wal_log_hints = wal_log_hints; @@ -9497,6 +9514,7 @@ XLogReportParameters(void) ControlFile->max_worker_processes = max_worker_processes; ControlFile->max_wal_senders = max_wal_senders; ControlFile->max_prepared_xacts = max_prepared_xacts; + ControlFile->max_prepared_foreign_xacts = max_prepared_foreign_xacts; ControlFile->max_locks_per_xact = max_locks_per_xact; ControlFile->wal_level = wal_level; ControlFile->wal_log_hints = wal_log_hints; @@ -9702,6 +9720,7 @@ xlog_redo(XLogReaderState *record) RunningTransactionsData running; oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids); + oldestActiveXID = PrescanFdwXacts(oldestActiveXID); /* * Construct a RunningTransactions snapshot representing a shut @@ -9901,6 +9920,7 @@ xlog_redo(XLogReaderState *record) ControlFile->max_worker_processes = xlrec.max_worker_processes; ControlFile->max_wal_senders = xlrec.max_wal_senders; ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts; + ControlFile->max_prepared_foreign_xacts = xlrec.max_prepared_foreign_xacts; ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact; ControlFile->wal_level = xlrec.wal_level; ControlFile->wal_log_hints = xlrec.wal_log_hints; diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index f7800f01a6..b4c1cce1f0 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -332,6 +332,9 @@ CREATE VIEW pg_prepared_xacts AS CREATE VIEW pg_prepared_statements AS SELECT * FROM pg_prepared_statement() AS P; +CREATE VIEW pg_foreign_xacts AS + SELECT * FROM pg_foreign_xacts() AS F; + CREATE VIEW pg_seclabels AS SELECT l.objoid, l.classoid, l.objsubid, @@ -818,6 +821,14 @@ CREATE VIEW pg_stat_subscription AS LEFT JOIN pg_stat_get_subscription(NULL) st ON (st.subid = su.oid); +CREATE VIEW pg_stat_foreign_xact AS + SELECT + r.pid, + r.dbid, + r.last_resolved_time + FROM pg_stat_get_foreign_xact() r + WHERE r.pid IS NOT NULL; + CREATE VIEW pg_stat_ssl AS SELECT S.pid, diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 42a147b67d..e3caef7ef9 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2857,8 +2857,14 @@ CopyFrom(CopyState cstate) if (resultRelInfo->ri_FdwRoutine != NULL && resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL) + { + /* Remember the transaction modifies data on a foreign server*/ + RegisterFdwXactByRelId(RelationGetRelid(resultRelInfo->ri_RelationDesc), + true); + resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, resultRelInfo); + } /* Prepare to catch AFTER triggers. */ AfterTriggerBeginQuery(); diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c index 766c9f95c8..43bbe8356d 100644 --- a/src/backend/commands/foreigncmds.c +++ b/src/backend/commands/foreigncmds.c @@ -13,6 +13,8 @@ */ #include "postgres.h" +#include "access/fdwxact.h" +#include "access/heapam.h" #include "access/htup_details.h" #include "access/reloptions.h" #include "access/table.h" @@ -1101,6 +1103,18 @@ RemoveForeignServerById(Oid srvId) if (!HeapTupleIsValid(tp)) elog(ERROR, "cache lookup failed for foreign server %u", srvId); + /* + * If there is a foreign prepared transaction with this foreign server, + * dropping it might result in dangling prepared transaction. + */ + if (fdwxact_exists(MyDatabaseId, srvId, InvalidOid)) + { + Form_pg_foreign_server srvForm = (Form_pg_foreign_server) GETSTRUCT(tp); + ereport(WARNING, + (errmsg("server \"%s\" has unresolved prepared transactions on it", + NameStr(srvForm->srvname)))); + } + CatalogTupleDelete(rel, &tp->t_self); ReleaseSysCache(tp); @@ -1419,6 +1433,15 @@ RemoveUserMapping(DropUserMappingStmt *stmt) user_mapping_ddl_aclcheck(useId, srv->serverid, srv->servername); + /* + * If there is a foreign prepared transaction with this user mapping, + * dropping it might result in dangling prepared transaction. + */ + if (fdwxact_exists(MyDatabaseId, srv->serverid, useId)) + ereport(WARNING, + (errmsg("server \"%s\" has unresolved prepared transaction for user \"%s\"", + srv->servername, MappingUserName(useId)))); + /* * Do the deletion */ @@ -1572,6 +1595,13 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt) errmsg("foreign-data wrapper \"%s\" does not support IMPORT FOREIGN SCHEMA", fdw->fdwname))); + /* + * Remember the transaction accesses to a foreign server. Normally during + * ImportForeignSchema we don't modify data on foreign servers, so remember it + * as not-modified server. + */ + RegisterFdwXactByServerId(server->serverid, false); + /* Call FDW to get a list of commands */ cmd_list = fdw_routine->ImportForeignSchema(stmt, server->serverid); diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index d23f292cb0..690717c34e 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -13,6 +13,7 @@ */ #include "postgres.h" +#include "access/fdwxact.h" #include "access/table.h" #include "access/tableam.h" #include "catalog/partition.h" @@ -944,7 +945,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, */ if (partRelInfo->ri_FdwRoutine != NULL && partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL) + { + Relation child = partRelInfo->ri_RelationDesc; + + /* Remember the transaction modifies data on a foreign server*/ + RegisterFdwXactByRelId(RelationGetRelid(child), true); + partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo); + } partRelInfo->ri_PartitionInfo = partrouteinfo; partRelInfo->ri_CopyMultiInsertBuffer = NULL; diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c index 52af1dac5c..3ac56d1678 100644 --- a/src/backend/executor/nodeForeignscan.c +++ b/src/backend/executor/nodeForeignscan.c @@ -22,6 +22,8 @@ */ #include "postgres.h" +#include "access/fdwxact.h" +#include "access/xact.h" #include "executor/executor.h" #include "executor/nodeForeignscan.h" #include "foreign/fdwapi.h" @@ -224,9 +226,31 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags) * Tell the FDW to initialize the scan. */ if (node->operation != CMD_SELECT) + { + RangeTblEntry *rte; + + rte = exec_rt_fetch(estate->es_result_relation_info->ri_RangeTableIndex, + estate); + + /* Remember the transaction modifies data on a foreign server*/ + RegisterFdwXactByRelId(rte->relid, true); + fdwroutine->BeginDirectModify(scanstate, eflags); + } else + { + RangeTblEntry *rte; + int rtindex = (scanrelid > 0) ? + scanrelid : + bms_next_member(node->fs_relids, -1); + + rte = exec_rt_fetch(rtindex, estate); + + /* Remember the transaction accesses to a foreign server */ + RegisterFdwXactByRelId(rte->relid, false); + fdwroutine->BeginForeignScan(scanstate, eflags); + } return scanstate; } diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index cd91f9c8a8..c1ab3d829a 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -37,6 +37,7 @@ #include "postgres.h" +#include "access/fdwxact.h" #include "access/heapam.h" #include "access/htup_details.h" #include "access/tableam.h" @@ -47,6 +48,7 @@ #include "executor/executor.h" #include "executor/nodeModifyTable.h" #include "foreign/fdwapi.h" +#include "foreign/foreign.h" #include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "rewrite/rewriteHandler.h" @@ -549,6 +551,10 @@ ExecInsert(ModifyTableState *mtstate, NULL, specToken); + /* Make note that we've wrote on non-temprary relation */ + if (RelationNeedsWAL(resultRelationDesc)) + MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL; + /* insert index entries for tuple */ recheckIndexes = ExecInsertIndexTuples(slot, estate, true, &specConflict, @@ -777,6 +783,10 @@ ldelete:; &tmfd, changingPart); + /* Make note that we've wrote on non-temprary relation */ + if (RelationNeedsWAL(resultRelationDesc)) + MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL; + switch (result) { case TM_SelfModified: @@ -1323,6 +1333,10 @@ lreplace:; true /* wait for commit */ , &tmfd, &lockmode, &update_indexes); + /* Make note that we've wrote on non-temprary relation */ + if (RelationNeedsWAL(resultRelationDesc)) + MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL; + switch (result) { case TM_SelfModified: @@ -2382,6 +2396,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL) { List *fdw_private = (List *) list_nth(node->fdwPrivLists, i); + Oid relid = RelationGetRelid(resultRelInfo->ri_RelationDesc); + + /* Remember the transaction modifies data on a foreign server*/ + RegisterFdwXactByRelId(relid, true); resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate, resultRelInfo, diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c index c917ec40ff..0b17505aac 100644 --- a/src/backend/foreign/foreign.c +++ b/src/backend/foreign/foreign.c @@ -187,6 +187,49 @@ GetForeignServerByName(const char *srvname, bool missing_ok) return GetForeignServer(serverid); } +/* + * GetUserMappingOid - look up the user mapping by user mapping oid. + * + * If userid of the mapping is invalid, we set it to current userid. + */ +UserMapping * +GetUserMappingByOid(Oid umid) +{ + Datum datum; + HeapTuple tp; + UserMapping *um; + bool isnull; + Form_pg_user_mapping tableform; + + tp = SearchSysCache1(USERMAPPINGOID, + ObjectIdGetDatum(umid)); + + if (!HeapTupleIsValid(tp)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("user mapping not found for %d", umid))); + + tableform = (Form_pg_user_mapping) GETSTRUCT(tp); + um = (UserMapping *) palloc(sizeof(UserMapping)); + um->umid = umid; + um->userid = OidIsValid(tableform->umuser) ? + tableform->umuser : GetUserId(); + um->serverid = tableform->umserver; + + /* Extract the umoptions */ + datum = SysCacheGetAttr(USERMAPPINGUSERSERVER, + tp, + Anum_pg_user_mapping_umoptions, + &isnull); + if (isnull) + um->options = NIL; + else + um->options = untransformRelOptions(datum); + + ReleaseSysCache(tp); + + return um; +} /* * GetUserMapping - look up the user mapping. @@ -328,6 +371,20 @@ GetFdwRoutine(Oid fdwhandler) elog(ERROR, "foreign-data wrapper handler function %u did not return an FdwRoutine struct", fdwhandler); + /* Sanity check for transaction management callbacks */ + if ((routine->CommitForeignTransaction && + !routine->RollbackForeignTransaction) || + (!routine->CommitForeignTransaction && + routine->RollbackForeignTransaction)) + elog(ERROR, + "foreign-data-wrapper must support both commit and rollback routine or either"); + + if (routine->PrepareForeignTransaction && + (!routine->CommitForeignTransaction || + !routine->RollbackForeignTransaction)) + elog(ERROR, + "foreign-data wrapper that supports prepare routine must support both commit and rollback routines"); + return routine; } diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 5f8a007e73..0a8890a984 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -14,6 +14,8 @@ #include <unistd.h> +#include "access/fdwxact_launcher.h" +#include "access/fdwxact_resolver.h" #include "access/parallel.h" #include "libpq/pqsignal.h" #include "miscadmin.h" @@ -129,6 +131,12 @@ static const struct }, { "ApplyWorkerMain", ApplyWorkerMain + }, + { + "FdwXactResolverMain", FdwXactResolverMain + }, + { + "FdwXactLauncherMain", FdwXactLauncherMain } }; diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index fabcf31de8..0d3932c2cf 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -3650,6 +3650,12 @@ pgstat_get_wait_activity(WaitEventActivity w) case WAIT_EVENT_CHECKPOINTER_MAIN: event_name = "CheckpointerMain"; break; + case WAIT_EVENT_FDWXACT_RESOLVER_MAIN: + event_name = "FdwXactResolverMain"; + break; + case WAIT_EVENT_FDWXACT_LAUNCHER_MAIN: + event_name = "FdwXactLauncherMain"; + break; case WAIT_EVENT_LOGICAL_APPLY_MAIN: event_name = "LogicalApplyMain"; break; @@ -3853,6 +3859,11 @@ pgstat_get_wait_ipc(WaitEventIPC w) case WAIT_EVENT_SYNC_REP: event_name = "SyncRep"; break; + case WAIT_EVENT_FDWXACT: + event_name = "FdwXact"; + case WAIT_EVENT_FDWXACT_RESOLUTION: + event_name = "FdwXactResolution"; + break; /* no default case, so that compiler will warn */ } @@ -4068,6 +4079,15 @@ pgstat_get_wait_io(WaitEventIO w) case WAIT_EVENT_TWOPHASE_FILE_WRITE: event_name = "TwophaseFileWrite"; break; + case WAIT_EVENT_FDWXACT_FILE_WRITE: + event_name = "FdwXactFileWrite"; + break; + case WAIT_EVENT_FDWXACT_FILE_READ: + event_name = "FdwXactFileRead"; + break; + case WAIT_EVENT_FDWXACT_FILE_SYNC: + event_name = "FdwXactFileSync"; + break; case WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ: event_name = "WALSenderTimelineHistoryRead"; break; diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 9ff2832c00..f92be8387d 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -93,6 +93,8 @@ #include <pthread.h> #endif +#include "access/fdwxact_resolver.h" +#include "access/fdwxact_launcher.h" #include "access/transam.h" #include "access/xlog.h" #include "bootstrap/bootstrap.h" @@ -909,6 +911,10 @@ PostmasterMain(int argc, char *argv[]) ereport(ERROR, (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\""))); + if (max_prepared_foreign_xacts > 0 && max_foreign_xact_resolvers == 0) + ereport(ERROR, + (errmsg("preparing foreign transactions (max_prepared_foreign_transactions > 0) requires max_foreign_transaction_resolvers> 0"))); + /* * Other one-time internal sanity checks can go here, if they are fast. * (Put any slow processing further down, after postmaster.pid creation.) @@ -984,12 +990,13 @@ PostmasterMain(int argc, char *argv[]) #endif /* - * Register the apply launcher. Since it registers a background worker, - * it needs to be called before InitializeMaxBackends(), and it's probably - * a good idea to call it before any modules had chance to take the - * background worker slots. + * Register the apply launcher and foreign transaction launcher. Since + * it registers a background worker, it needs to be called before + * InitializeMaxBackends(), and it's probably a good idea to call it + * before any modules had chance to take the background worker slots. */ ApplyLauncherRegister(); + FdwXactLauncherRegister(); /* * process any libraries that should be preloaded at postmaster start diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index bc532d027b..6269f384af 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -151,6 +151,7 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor case RM_COMMIT_TS_ID: case RM_REPLORIGIN_ID: case RM_GENERIC_ID: + case RM_FDWXACT_ID: /* just deal with xid, and done */ ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record), buf.origptr); diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 4829953ee6..6bde7a735a 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -16,6 +16,8 @@ #include "access/clog.h" #include "access/commit_ts.h" +#include "access/fdwxact.h" +#include "access/fdwxact_launcher.h" #include "access/heapam.h" #include "access/multixact.h" #include "access/nbtree.h" @@ -147,6 +149,8 @@ CreateSharedMemoryAndSemaphores(void) size = add_size(size, BTreeShmemSize()); size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); + size = add_size(size, FdwXactShmemSize()); + size = add_size(size, FdwXactRslvShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); #endif @@ -263,6 +267,8 @@ CreateSharedMemoryAndSemaphores(void) BTreeShmemInit(); SyncScanShmemInit(); AsyncShmemInit(); + FdwXactShmemInit(); + FdwXactRslvShmemInit(); #ifdef EXEC_BACKEND diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 13bcbe77de..020eb76b6a 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -93,6 +93,8 @@ typedef struct ProcArrayStruct TransactionId replication_slot_xmin; /* oldest catalog xmin of any replication slot */ TransactionId replication_slot_catalog_xmin; + /* local transaction id of oldest unresolved distributed transaction */ + TransactionId fdwxact_unresolved_xmin; /* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */ int pgprocnos[FLEXIBLE_ARRAY_MEMBER]; @@ -248,6 +250,7 @@ CreateSharedProcArray(void) procArray->lastOverflowedXid = InvalidTransactionId; procArray->replication_slot_xmin = InvalidTransactionId; procArray->replication_slot_catalog_xmin = InvalidTransactionId; + procArray->fdwxact_unresolved_xmin = InvalidTransactionId; } allProcs = ProcGlobal->allProcs; @@ -1312,6 +1315,7 @@ GetOldestXmin(Relation rel, int flags) TransactionId replication_slot_xmin = InvalidTransactionId; TransactionId replication_slot_catalog_xmin = InvalidTransactionId; + TransactionId fdwxact_unresolved_xmin = InvalidTransactionId; /* * If we're not computing a relation specific limit, or if a shared @@ -1377,6 +1381,7 @@ GetOldestXmin(Relation rel, int flags) */ replication_slot_xmin = procArray->replication_slot_xmin; replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin; + fdwxact_unresolved_xmin = procArray->fdwxact_unresolved_xmin; if (RecoveryInProgress()) { @@ -1426,6 +1431,15 @@ GetOldestXmin(Relation rel, int flags) NormalTransactionIdPrecedes(replication_slot_xmin, result)) result = replication_slot_xmin; + /* + * Check whether there are unresolved distributed transaction + * requiring an older xmin. + */ + if (!(flags & PROCARRAY_FDWXACT_XMIN) && + TransactionIdIsValid(fdwxact_unresolved_xmin) && + NormalTransactionIdPrecedes(fdwxact_unresolved_xmin, result)) + result = fdwxact_unresolved_xmin; + /* * After locks have been released and vacuum_defer_cleanup_age has been * applied, check whether we need to back up further to make logical @@ -3128,6 +3142,38 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin, LWLockRelease(ProcArrayLock); } +/* + * ProcArraySetFdwXactUnresolvedXmin + * + * Install limits to future computations fo the xmin horizon to prevent + * vacuum clog from affected transactions still needed by resolving + * distributed transaction. + */ +void +ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin) +{ + + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + procArray->fdwxact_unresolved_xmin = xmin; + LWLockRelease(ProcArrayLock); +} + +/* + * ProcArrayGetFdwXactUnresolvedXmin + * + * Return the current unresolved xmin limits. + */ +TransactionId +ProcArrayGetFdwXactUnresolvedXmin(void) +{ + TransactionId xmin; + + LWLockAcquire(ProcArrayLock, LW_SHARED); + xmin = procArray->fdwxact_unresolved_xmin; + LWLockRelease(ProcArrayLock); + + return xmin; +} #define XidCacheRemove(i) \ do { \ diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt index db47843229..adb276370c 100644 --- a/src/backend/storage/lmgr/lwlocknames.txt +++ b/src/backend/storage/lmgr/lwlocknames.txt @@ -49,3 +49,6 @@ MultiXactTruncationLock 41 OldSnapshotTimeMapLock 42 LogicalRepWorkerLock 43 CLogTruncationLock 44 +FdwXactLock 45 +FdwXactResolverLock 46 +FdwXactResolutionLock 47 diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index fff0628e58..af5e418a03 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -35,6 +35,7 @@ #include <unistd.h> #include <sys/time.h> +#include "access/fdwxact.h" #include "access/transam.h" #include "access/twophase.h" #include "access/xact.h" @@ -421,6 +422,10 @@ InitProcess(void) MyProc->syncRepState = SYNC_REP_NOT_WAITING; SHMQueueElemInit(&(MyProc->syncRepLinks)); + /* Initialize fields for fdw xact */ + MyProc->fdwXactState = FDWXACT_NOT_WAITING; + SHMQueueElemInit(&(MyProc->fdwXactLinks)); + /* Initialize fields for group XID clearing. */ MyProc->procArrayGroupMember = false; MyProc->procArrayGroupMemberXid = InvalidTransactionId; @@ -822,6 +827,9 @@ ProcKill(int code, Datum arg) /* Make sure we're out of the sync rep lists */ SyncRepCleanupAtProcExit(); + /* Make sure we're out of the fdwxact lists */ + FdwXactCleanupAtProcExit(); + #ifdef USE_ASSERT_CHECKING { int i; diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 3b85e48333..a0f8498862 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -36,6 +36,8 @@ #include "rusagestub.h" #endif +#include "access/fdwxact_resolver.h" +#include "access/fdwxact_launcher.h" #include "access/parallel.h" #include "access/printtup.h" #include "access/xact.h" @@ -3029,6 +3031,18 @@ ProcessInterrupts(void) */ proc_exit(1); } + else if (IsFdwXactResolver()) + ereport(FATAL, + (errcode(ERRCODE_ADMIN_SHUTDOWN), + errmsg("terminating foreign transaction resolver due to administrator command"))); + else if (IsFdwXactLauncher()) + { + /* + * The foreign transaction launcher can be stopped at any time. + * Use exit status 1 so the background worker is restarted. + */ + proc_exit(1); + } else if (RecoveryConflictPending && RecoveryConflictRetryable) { pgstat_report_recovery_conflict(RecoveryConflictReason); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index ba74bf9f7d..d38c33b64c 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -27,6 +27,7 @@ #endif #include "access/commit_ts.h" +#include "access/fdwxact.h" #include "access/gin.h" #include "access/rmgr.h" #include "access/tableam.h" @@ -399,6 +400,25 @@ static const struct config_enum_entry synchronous_commit_options[] = { {NULL, 0, false} }; +/* + * Although only "required", "prefer", and "disabled" are documented, + * we accept all the likely variants of "on" and "off". + */ +static const struct config_enum_entry foreign_twophase_commit_options[] = { + {"required", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false}, + {"prefer", FOREIGN_TWOPHASE_COMMIT_PREFER, false}, + {"disabled", FOREIGN_TWOPHASE_COMMIT_DISABLED, false}, + {"on", FOREIGN_TWOPHASE_COMMIT_REQUIRED, false}, + {"off", FOREIGN_TWOPHASE_COMMIT_DISABLED, false}, + {"true", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true}, + {"false", FOREIGN_TWOPHASE_COMMIT_DISABLED, true}, + {"yes", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true}, + {"no", FOREIGN_TWOPHASE_COMMIT_DISABLED, true}, + {"1", FOREIGN_TWOPHASE_COMMIT_REQUIRED, true}, + {"0", FOREIGN_TWOPHASE_COMMIT_DISABLED, true}, + {NULL, 0, false} +}; + /* * Although only "on", "off", "try" are documented, we accept all the likely * variants of "on" and "off". @@ -725,6 +745,12 @@ const char *const config_group_names[] = gettext_noop("Client Connection Defaults / Other Defaults"), /* LOCK_MANAGEMENT */ gettext_noop("Lock Management"), + /* FDWXACT */ + gettext_noop("Foreign Transaction Management"), + /* FDWXACT_SETTINGS */ + gettext_noop("Foreign Transaction Management / Settings"), + /* FDWXACT_RESOLVER */ + gettext_noop("Foreign Transaction Management / Resolver"), /* COMPAT_OPTIONS */ gettext_noop("Version and Platform Compatibility"), /* COMPAT_OPTIONS_PREVIOUS */ @@ -2370,6 +2396,52 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + /* + * See also CheckRequiredParameterValues() if this parameter changes + */ + { + {"max_prepared_foreign_transactions", PGC_POSTMASTER, RESOURCES_MEM, + gettext_noop("Sets the maximum number of simultaneously prepared transactions on foreign servers."), + NULL + }, + &max_prepared_foreign_xacts, + 0, 0, INT_MAX, + NULL, NULL, NULL + }, + + { + {"foreign_transaction_resolver_timeout", PGC_SIGHUP, FDWXACT_RESOLVER, + gettext_noop("Sets the maximum time to wait for foreign transaction resolution."), + NULL, + GUC_UNIT_MS + }, + &foreign_xact_resolver_timeout, + 60 * 1000, 0, INT_MAX, + NULL, NULL, NULL + }, + + { + {"max_foreign_transaction_resolvers", PGC_POSTMASTER, RESOURCES_MEM, + gettext_noop("Maximum number of foreign transaction resolution processes."), + NULL + }, + &max_foreign_xact_resolvers, + 0, 0, INT_MAX, + NULL, NULL, NULL + }, + + { + {"foreign_transaction_resolution_retry_interval", PGC_SIGHUP, FDWXACT_RESOLVER, + gettext_noop("Sets the time to wait before retrying to resolve foreign transaction " + "after a failed attempt."), + NULL, + GUC_UNIT_MS + }, + &foreign_xact_resolution_retry_interval, + 5000, 1, INT_MAX, + NULL, NULL, NULL + }, + #ifdef LOCK_DEBUG { {"trace_lock_oidmin", PGC_SUSET, DEVELOPER_OPTIONS, @@ -4413,6 +4485,16 @@ static struct config_enum ConfigureNamesEnum[] = NULL, assign_synchronous_commit, NULL }, + { + {"foreign_twophase_commit", PGC_USERSET, FDWXACT_SETTINGS, + gettext_noop("Use of foreign twophase commit for the current transaction."), + NULL + }, + &foreign_twophase_commit, + FOREIGN_TWOPHASE_COMMIT_DISABLED, foreign_twophase_commit_options, + check_foreign_twophase_commit, NULL, NULL + }, + { {"archive_mode", PGC_POSTMASTER, WAL_ARCHIVING, gettext_noop("Allows archiving of WAL files using archive_command."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 9541879c1f..22e014aecd 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -125,6 +125,8 @@ #temp_buffers = 8MB # min 800kB #max_prepared_transactions = 0 # zero disables the feature # (change requires restart) +#max_prepared_foreign_transactions = 0 # zero disables the feature + # (change requires restart) # Caution: it is not advisable to set max_prepared_transactions nonzero unless # you actively intend to use prepared transactions. #work_mem = 4MB # min 64kB @@ -341,6 +343,20 @@ #max_sync_workers_per_subscription = 2 # taken from max_logical_replication_workers +#------------------------------------------------------------------------------ +# FOREIGN TRANSACTION +#------------------------------------------------------------------------------ + +#foreign_twophase_commit = off + +#max_foreign_transaction_resolvers = 0 # max number of resolver process + # (change requires restart) +#foreign_transaction_resolver_timeout = 60s # in milliseconds; 0 disables +#foreign_transaction_resolution_retry_interval = 5s # time to wait before + # retrying to resolve + # foreign transactions + # after a failed attempt + #------------------------------------------------------------------------------ # QUERY TUNING #------------------------------------------------------------------------------ diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d index f08a49c9dd..dd8878025b 100644 --- a/src/backend/utils/probes.d +++ b/src/backend/utils/probes.d @@ -81,6 +81,8 @@ provider postgresql { probe multixact__checkpoint__done(bool); probe twophase__checkpoint__start(); probe twophase__checkpoint__done(); + probe fdwxact__checkpoint__start(); + probe fdwxact__checkpoint__done(); probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int); probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int); diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 1f6d8939be..49dc5a519f 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -210,6 +210,7 @@ static const char *const subdirs[] = { "pg_snapshots", "pg_subtrans", "pg_twophase", + "pg_fdwxact", "pg_multixact", "pg_multixact/members", "pg_multixact/offsets", diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 19e21ab491..9ae3bfe4dd 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -301,6 +301,8 @@ main(int argc, char *argv[]) ControlFile->max_wal_senders); printf(_("max_prepared_xacts setting: %d\n"), ControlFile->max_prepared_xacts); + printf(_("max_prepared_foreign_transactions setting: %d\n"), + ControlFile->max_prepared_foreign_xacts); printf(_("max_locks_per_xact setting: %d\n"), ControlFile->max_locks_per_xact); printf(_("track_commit_timestamp setting: %s\n"), diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index 2e286f6339..c5ee22132e 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -710,6 +710,7 @@ GuessControlValues(void) ControlFile.max_wal_senders = 10; ControlFile.max_worker_processes = 8; ControlFile.max_prepared_xacts = 0; + ControlFile.max_prepared_foreign_xacts = 0; ControlFile.max_locks_per_xact = 64; ControlFile.maxAlign = MAXIMUM_ALIGNOF; @@ -914,6 +915,7 @@ RewriteControlFile(void) ControlFile.max_wal_senders = 10; ControlFile.max_worker_processes = 8; ControlFile.max_prepared_xacts = 0; + ControlFile.max_prepared_foreign_xacts = 0; ControlFile.max_locks_per_xact = 64; /* The control file gets flushed here. */ diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c new file mode 120000 index 0000000000..ce8c21880c --- /dev/null +++ b/src/bin/pg_waldump/fdwxactdesc.c @@ -0,0 +1 @@ +../../../src/backend/access/rmgrdesc/fdwxactdesc.c \ No newline at end of file diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c index 852d8ca4b1..b616cea347 100644 --- a/src/bin/pg_waldump/rmgrdesc.c +++ b/src/bin/pg_waldump/rmgrdesc.c @@ -11,6 +11,7 @@ #include "access/brin_xlog.h" #include "access/clog.h" #include "access/commit_ts.h" +#include "access/fdwxact_xlog.h" #include "access/generic_xlog.h" #include "access/ginxlog.h" #include "access/gistxlog.h" diff --git a/src/include/access/fdwxact.h b/src/include/access/fdwxact.h new file mode 100644 index 0000000000..147d41c708 --- /dev/null +++ b/src/include/access/fdwxact.h @@ -0,0 +1,165 @@ +/* + * fdwxact.h + * + * PostgreSQL global transaction manager + * + * Portions Copyright (c) 2018, PostgreSQL Global Development Group + * + * src/include/access/fdwxact.h + */ +#ifndef FDWXACT_H +#define FDWXACT_H + +#include "access/fdwxact_xlog.h" +#include "access/xlogreader.h" +#include "foreign/foreign.h" +#include "lib/stringinfo.h" +#include "miscadmin.h" +#include "nodes/pg_list.h" +#include "nodes/execnodes.h" +#include "storage/backendid.h" +#include "storage/proc.h" +#include "storage/shmem.h" +#include "utils/guc.h" +#include "utils/timeout.h" +#include "utils/timestamp.h" + +/* fdwXactState */ +#define FDWXACT_NOT_WAITING 0 +#define FDWXACT_WAITING 1 +#define FDWXACT_WAIT_COMPLETE 2 + +/* Flag passed to FDW transaction management APIs */ +#define FDWXACT_FLAG_ONEPHASE 0x01 /* transaction can commit/rollback + without preparation */ + +/* Enum for foreign_twophase_commit parameter */ +typedef enum +{ + FOREIGN_TWOPHASE_COMMIT_DISABLED, /* disable foreign twophase commit */ + FOREIGN_TWOPHASE_COMMIT_PREFER, /* use twophase commit where available */ + FOREIGN_TWOPHASE_COMMIT_REQUIRED /* all foreign servers have to support + twophase commit */ +} ForeignTwophaseCommitLevel; + +/* Enum to track the status of foreign transaction */ +typedef enum +{ + FDWXACT_STATUS_INVALID, + FDWXACT_STATUS_INITIAL, + FDWXACT_STATUS_PREPARING, /* foreign transaction is being prepared */ + FDWXACT_STATUS_PREPARED, /* foreign transaction is prepared */ + FDWXACT_STATUS_COMMITTING, /* foreign prepared transaction is to + * be committed */ + FDWXACT_STATUS_ABORTING, /* foreign prepared transaction is to be + * aborted */ + FDWXACT_STATUS_RESOLVED +} FdwXactStatus; + +typedef struct FdwXactData *FdwXact; + +/* + * Shared memory state of a single foreign transaction. + */ +typedef struct FdwXactData +{ + FdwXact fdwxact_free_next; /* Next free FdwXact entry */ + + Oid dbid; /* database oid where to find foreign server + * and user mapping */ + TransactionId local_xid; /* XID of local transaction */ + Oid serverid; /* foreign server where transaction takes + * place */ + Oid userid; /* user who initiated the foreign + * transaction */ + Oid umid; + bool indoubt; /* Is an in-doubt transaction? */ + slock_t mutex; /* Protect the above fields */ + + /* The status of the foreign transaction, protected by FdwXactLock */ + FdwXactStatus status; + /* + * Note that we need to keep track of two LSNs for each FdwXact. We keep + * track of the start LSN because this is the address we must use to read + * state data back from WAL when committing a FdwXact. We keep track of + * the end LSN because that is the LSN we need to wait for prior to + * commit. + */ + XLogRecPtr insert_start_lsn; /* XLOG offset of inserting this entry start */ + XLogRecPtr insert_end_lsn; /* XLOG offset of inserting this entry end */ + + bool valid; /* has the entry been complete and written to file? */ + BackendId held_by; /* backend who are holding */ + bool ondisk; /* true if prepare state file is on disk */ + bool inredo; /* true if entry was added via xlog_redo */ + + char fdwxact_id[FDWXACT_ID_MAX_LEN]; /* prepared transaction identifier */ +} FdwXactData; + +/* + * Shared memory layout for maintaining foreign prepared transaction entries. + * Adding or removing FdwXact entry needs to hold FdwXactLock in exclusive mode, + * and iterating fdwXacts needs that in shared mode. + */ +typedef struct +{ + /* Head of linked list of free FdwXactData structs */ + FdwXact free_fdwxacts; + + /* Number of valid foreign transaction entries */ + int num_fdwxacts; + + /* Upto max_prepared_foreign_xacts entries in the array */ + FdwXact fdwxacts[FLEXIBLE_ARRAY_MEMBER]; /* Variable length array */ +} FdwXactCtlData; + +/* Pointer to the shared memory holding the foreign transactions data */ +FdwXactCtlData *FdwXactCtl; + +/* State data for foreign transaction resolution, passed to FDW callbacks */ +typedef struct FdwXactRslvState +{ + /* Foreign transaction information */ + char *fdwxact_id; + + ForeignServer *server; + UserMapping *usermapping; + + int flags; /* OR of FDWXACT_FLAG_xx flags */ +} FdwXactRslvState; + +/* GUC parameters */ +extern int max_prepared_foreign_xacts; +extern int max_foreign_xact_resolvers; +extern int foreign_xact_resolution_retry_interval; +extern int foreign_xact_resolver_timeout; +extern int foreign_twophase_commit; + +/* Function declarations */ +extern Size FdwXactShmemSize(void); +extern void FdwXactShmemInit(void); +extern void restoreFdwXactData(void); +extern TransactionId PrescanFdwXacts(TransactionId oldestActiveXid); +extern void RecoverFdwXacts(void); +extern void AtEOXact_FdwXacts(bool is_commit); +extern void AtPrepare_FdwXacts(void); +extern bool fdwxact_exists(Oid dboid, Oid serverid, Oid userid); +extern void CheckPointFdwXacts(XLogRecPtr redo_horizon); +extern bool FdwTwoPhaseNeeded(void); +extern void PreCommit_FdwXacts(void); +extern void KnownFdwXactRecreateFiles(XLogRecPtr redo_horizon); +extern void FdwXactWaitToBeResolved(TransactionId wait_xid, bool commit); +extern bool FdwXactIsForeignTwophaseCommitRequired(void); +extern void FdwXactResolveTransactionAndReleaseWaiter(Oid dbid, TransactionId xid, + PGPROC *waiter); +extern bool FdwXactResolveInDoubtTransactions(Oid dbid); +extern PGPROC *FdwXactGetWaiter(TimestampTz *nextResolutionTs_p, TransactionId *waitXid_p); +extern void FdwXactCleanupAtProcExit(void); +extern void RegisterFdwXactByRelId(Oid relid, bool modified); +extern void RegisterFdwXactByServerId(Oid serverid, bool modified); +extern void FdwXactMarkForeignServerAccessed(Oid relid, bool modified); +extern bool check_foreign_twophase_commit(int *newval, void **extra, + GucSource source); +extern bool FdwXactWaiterExists(Oid dbid); + +#endif /* FDWXACT_H */ diff --git a/src/include/access/fdwxact_launcher.h b/src/include/access/fdwxact_launcher.h new file mode 100644 index 0000000000..dd0f5d16ff --- /dev/null +++ b/src/include/access/fdwxact_launcher.h @@ -0,0 +1,29 @@ +/*------------------------------------------------------------------------- + * + * fdwxact_launcher.h + * PostgreSQL foreign transaction launcher definitions + * + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * src/include/access/fdwxact_launcher.h + * + *------------------------------------------------------------------------- + */ + +#ifndef FDWXACT_LAUNCHER_H +#define FDWXACT_LAUNCHER_H + +#include "access/fdwxact.h" + +extern void FdwXactLauncherRegister(void); +extern void FdwXactLauncherMain(Datum main_arg); +extern void FdwXactLauncherRequestToLaunch(void); +extern void FdwXactLauncherRequestToLaunchForRetry(void); +extern void FdwXactLaunchOrWakeupResolver(void); +extern Size FdwXactRslvShmemSize(void); +extern void FdwXactRslvShmemInit(void); +extern bool IsFdwXactLauncher(void); + + +#endif /* FDWXACT_LAUNCHER_H */ diff --git a/src/include/access/fdwxact_resolver.h b/src/include/access/fdwxact_resolver.h new file mode 100644 index 0000000000..2607654024 --- /dev/null +++ b/src/include/access/fdwxact_resolver.h @@ -0,0 +1,23 @@ +/*------------------------------------------------------------------------- + * + * fdwxact_resolver.h + * PostgreSQL foreign transaction resolver definitions + * + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * src/include/access/fdwxact_resolver.h + * + *------------------------------------------------------------------------- + */ +#ifndef FDWXACT_RESOLVER_H +#define FDWXACT_RESOLVER_H + +#include "access/fdwxact.h" + +extern void FdwXactResolverMain(Datum main_arg); +extern bool IsFdwXactResolver(void); + +extern int foreign_xact_resolver_timeout; + +#endif /* FDWXACT_RESOLVER_H */ diff --git a/src/include/access/fdwxact_xlog.h b/src/include/access/fdwxact_xlog.h new file mode 100644 index 0000000000..39ca66beef --- /dev/null +++ b/src/include/access/fdwxact_xlog.h @@ -0,0 +1,54 @@ +/*------------------------------------------------------------------------- + * + * fdwxact_xlog.h + * Foreign transaction XLOG definitions. + * + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * src/include/access/fdwxact_xlog.h + * + *------------------------------------------------------------------------- + */ +#ifndef FDWXACT_XLOG_H +#define FDWXACT_XLOG_H + +#include "access/xlogreader.h" +#include "lib/stringinfo.h" + +/* Info types for logs related to FDW transactions */ +#define XLOG_FDWXACT_INSERT 0x00 +#define XLOG_FDWXACT_REMOVE 0x10 + +/* Maximum length of the prepared transaction id, borrowed from twophase.c */ +#define FDWXACT_ID_MAX_LEN 200 + +/* + * On disk file structure, also used to WAL + */ +typedef struct +{ + TransactionId local_xid; + Oid dbid; /* database oid where to find foreign server + * and user mapping */ + Oid serverid; /* foreign server where transaction takes + * place */ + Oid userid; /* user who initiated the foreign transaction */ + Oid umid; + char fdwxact_id[FDWXACT_ID_MAX_LEN]; /* foreign txn prepare id */ +} FdwXactOnDiskData; + +typedef struct xl_fdwxact_remove +{ + TransactionId xid; + Oid serverid; + Oid userid; + Oid dbid; + bool force; +} xl_fdwxact_remove; + +extern void fdwxact_redo(XLogReaderState *record); +extern void fdwxact_desc(StringInfo buf, XLogReaderState *record); +extern const char *fdwxact_identify(uint8 info); + +#endif /* FDWXACT_XLOG_H */ diff --git a/src/include/access/resolver_internal.h b/src/include/access/resolver_internal.h new file mode 100644 index 0000000000..55fc970b69 --- /dev/null +++ b/src/include/access/resolver_internal.h @@ -0,0 +1,66 @@ +/*------------------------------------------------------------------------- + * + * resolver_internal.h + * Internal headers shared by fdwxact resolvers. + * + * Portions Copyright (c) 2019, PostgreSQL Global Development Group + * + * src/include/access/resovler_internal.h + * + *------------------------------------------------------------------------- + */ + +#ifndef RESOLVER_INTERNAL_H +#define RESOLVER_INTERNAL_H + +#include "storage/latch.h" +#include "storage/shmem.h" +#include "storage/spin.h" +#include "utils/timestamp.h" + +/* + * Each foreign transaction resolver has a FdwXactResolver struct in + * shared memory. This struct is protected by FdwXactResolverLaunchLock. + */ +typedef struct FdwXactResolver +{ + pid_t pid; /* this resolver's PID, or 0 if not active */ + Oid dbid; /* database oid */ + + /* Indicates if this slot is used of free */ + bool in_use; + + /* Stats */ + TimestampTz last_resolved_time; + + /* Protect shared variables shown above */ + slock_t mutex; + + /* + * Pointer to the resolver's patch. Used by backends to wake up this + * resolver when it has work to do. NULL if the resolver isn't active. + */ + Latch *latch; +} FdwXactResolver; + +/* There is one FdwXactRslvCtlData struct for the whole database cluster */ +typedef struct FdwXactRslvCtlData +{ + /* Foreign transaction resolution queue. Protected by FdwXactLock */ + SHM_QUEUE fdwxact_queue; + + /* Supervisor process and latch */ + pid_t launcher_pid; + Latch *launcher_latch; + + FdwXactResolver resolvers[FLEXIBLE_ARRAY_MEMBER]; +} FdwXactRslvCtlData; +#define SizeOfFdwXactRslvCtlData \ + (offsetof(FdwXactRslvCtlData, resolvers) + sizeof(FdwXactResolver)) + +extern FdwXactRslvCtlData *FdwXactRslvCtl; + +extern FdwXactResolver *MyFdwXactResolver; +extern FdwXactRslvCtlData *FdwXactRslvCtl; + +#endif /* RESOLVER_INTERNAL_H */ diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h index 3c0db2ccf5..5798b4cd99 100644 --- a/src/include/access/rmgrlist.h +++ b/src/include/access/rmgrlist.h @@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL) PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask) PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL) +PG_RMGR(RM_FDWXACT_ID, "Foreign Transactions", fdwxact_redo, fdwxact_desc, fdwxact_identify, NULL, NULL, NULL) diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h index 02b5315c43..e8c094d708 100644 --- a/src/include/access/twophase.h +++ b/src/include/access/twophase.h @@ -36,6 +36,7 @@ extern void PostPrepare_Twophase(void); extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held); extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid, bool lock_held); +extern bool TwoPhaseExists(TransactionId xid); extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid, TimestampTz prepared_at, diff --git a/src/include/access/xact.h b/src/include/access/xact.h index cb5c4935d2..a75e6998f0 100644 --- a/src/include/access/xact.h +++ b/src/include/access/xact.h @@ -108,6 +108,13 @@ extern int MyXactFlags; */ #define XACT_FLAGS_WROTENONTEMPREL (1U << 2) +/* + * XACT_FLAGS_FDWNONPREPARE - set when we wrote data on foreign table of which + * server isn't capable of two-phase commit + * relation. + */ +#define XACT_FLAGS_FDWNOPREPARE (1U << 3) + /* * start- and end-of-transaction callbacks for dynamically loaded modules */ diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index e295dc65fb..d1ce20242f 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -232,6 +232,7 @@ typedef struct xl_parameter_change int max_worker_processes; int max_wal_senders; int max_prepared_xacts; + int max_prepared_foreign_xacts; int max_locks_per_xact; int wal_level; bool wal_log_hints; diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index cf7d4485e9..f2174a0208 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -179,6 +179,7 @@ typedef struct ControlFileData int max_worker_processes; int max_wal_senders; int max_prepared_xacts; + int max_prepared_foreign_xacts; int max_locks_per_xact; bool track_commit_timestamp; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index ac8f64b219..1072c38aa6 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5184,6 +5184,13 @@ proargmodes => '{i,o,o,o,o,o,o,o,o}', proargnames => '{subid,subid,relid,pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}', prosrc => 'pg_stat_get_subscription' }, +{ oid => '9705', descr => 'statistics: information about foreign transaction resolver', + proname => 'pg_stat_get_foreign_xact', proisstrict => 'f', provolatile => 's', + proparallel => 'r', prorettype => 'record', proargtypes => '', + proallargtypes => '{oid,oid,timestamptz}', + proargmodes => '{o,o,o}', + proargnames => '{pid,dbid,last_resolved_time}', + prosrc => 'pg_stat_get_foreign_xact' }, { oid => '2026', descr => 'statistics: current backend PID', proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r', prorettype => 'int4', proargtypes => '', prosrc => 'pg_backend_pid' }, @@ -5897,6 +5904,24 @@ proargnames => '{type,object_names,object_args,classid,objid,objsubid}', prosrc => 'pg_get_object_address' }, +{ oid => '9706', descr => 'view foreign transactions', + proname => 'pg_foreign_xacts', prorows => '1000', proretset => 't', + provolatile => 'v', prorettype => 'record', proargtypes => '', + proallargtypes => '{oid,xid,oid,oid,text,bool,text}', + proargmodes => '{o,o,o,o,o,o,o}', + proargnames => '{dbid,xid,serverid,userid,status,in_doubt,identifier}', + prosrc => 'pg_foreign_xacts' }, +{ oid => '9707', descr => 'remove foreign transaction without resolution', + proname => 'pg_remove_foreign_xact', provolatile => 'v', prorettype => 'bool', + proargtypes => 'xid oid oid', + proargnames => '{xid,serverid,userid}', + prosrc => 'pg_remove_foreign_xact' }, +{ oid => '9708', descr => 'resolve one foreign transaction', + proname => 'pg_resolve_foreign_xact', provolatile => 'v', prorettype => 'bool', + proargtypes => 'xid oid oid', + proargnames => '{xid,serverid,userid}', + prosrc => 'pg_resolve_foreign_xact' }, + { oid => '2079', descr => 'is table visible in search path?', proname => 'pg_table_is_visible', procost => '10', provolatile => 's', prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_table_is_visible' }, @@ -6015,6 +6040,10 @@ { oid => '2851', descr => 'wal filename, given a wal location', proname => 'pg_walfile_name', prorettype => 'text', proargtypes => 'pg_lsn', prosrc => 'pg_walfile_name' }, +{ oid => '9709', + descr => 'stop a foreign transaction resolver process running on the given database', + proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool', + proargtypes => 'oid', prosrc => 'pg_stop_foreign_xact_resolver'}, { oid => '3165', descr => 'difference in bytes, given two wal locations', proname => 'pg_wal_lsn_diff', prorettype => 'numeric', diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h index 822686033e..c7b33d72ec 100644 --- a/src/include/foreign/fdwapi.h +++ b/src/include/foreign/fdwapi.h @@ -12,6 +12,7 @@ #ifndef FDWAPI_H #define FDWAPI_H +#include "access/fdwxact.h" #include "access/parallel.h" #include "nodes/execnodes.h" #include "nodes/pathnodes.h" @@ -169,6 +170,11 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root, typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root, List *fdw_private, RelOptInfo *child_rel); +typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate); +typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate); +typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate); +typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid, + Oid userid, int *prep_id_len); /* * FdwRoutine is the struct returned by a foreign-data wrapper's handler @@ -236,6 +242,12 @@ typedef struct FdwRoutine /* Support functions for IMPORT FOREIGN SCHEMA */ ImportForeignSchema_function ImportForeignSchema; + /* Support functions for transaction management */ + PrepareForeignTransaction_function PrepareForeignTransaction; + CommitForeignTransaction_function CommitForeignTransaction; + RollbackForeignTransaction_function RollbackForeignTransaction; + GetPrepareId_function GetPrepareId; + /* Support functions for parallelism under Gather node */ IsForeignScanParallelSafe_function IsForeignScanParallelSafe; EstimateDSMForeignScan_function EstimateDSMForeignScan; diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h index 4de157c19c..91c2276915 100644 --- a/src/include/foreign/foreign.h +++ b/src/include/foreign/foreign.h @@ -69,6 +69,7 @@ extern ForeignServer *GetForeignServerExtended(Oid serverid, bits16 flags); extern ForeignServer *GetForeignServerByName(const char *name, bool missing_ok); extern UserMapping *GetUserMapping(Oid userid, Oid serverid); +extern UserMapping *GetUserMappingByOid(Oid umid); extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid); extern ForeignDataWrapper *GetForeignDataWrapperExtended(Oid fdwid, bits16 flags); diff --git a/src/include/pgstat.h b/src/include/pgstat.h index fe076d823d..d82d8f7abc 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -776,6 +776,8 @@ typedef enum WAIT_EVENT_BGWRITER_HIBERNATE, WAIT_EVENT_BGWRITER_MAIN, WAIT_EVENT_CHECKPOINTER_MAIN, + WAIT_EVENT_FDWXACT_RESOLVER_MAIN, + WAIT_EVENT_FDWXACT_LAUNCHER_MAIN, WAIT_EVENT_LOGICAL_APPLY_MAIN, WAIT_EVENT_LOGICAL_LAUNCHER_MAIN, WAIT_EVENT_PGSTAT_MAIN, @@ -853,7 +855,9 @@ typedef enum WAIT_EVENT_REPLICATION_ORIGIN_DROP, WAIT_EVENT_REPLICATION_SLOT_DROP, WAIT_EVENT_SAFE_SNAPSHOT, - WAIT_EVENT_SYNC_REP + WAIT_EVENT_SYNC_REP, + WAIT_EVENT_FDWXACT, + WAIT_EVENT_FDWXACT_RESOLUTION } WaitEventIPC; /* ---------- @@ -933,6 +937,9 @@ typedef enum WAIT_EVENT_TWOPHASE_FILE_READ, WAIT_EVENT_TWOPHASE_FILE_SYNC, WAIT_EVENT_TWOPHASE_FILE_WRITE, + WAIT_EVENT_FDWXACT_FILE_READ, + WAIT_EVENT_FDWXACT_FILE_WRITE, + WAIT_EVENT_FDWXACT_FILE_SYNC, WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ, WAIT_EVENT_WAL_BOOTSTRAP_SYNC, WAIT_EVENT_WAL_BOOTSTRAP_WRITE, diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 281e1db725..c802201193 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -16,6 +16,7 @@ #include "access/clog.h" #include "access/xlogdefs.h" +#include "datatype/timestamp.h" #include "lib/ilist.h" #include "storage/latch.h" #include "storage/lock.h" @@ -152,6 +153,16 @@ struct PGPROC int syncRepState; /* wait state for sync rep */ SHM_QUEUE syncRepLinks; /* list link if process is in syncrep queue */ + /* + * Info to allow us to wait for foreign transaction to be resolved, if + * needed. + */ + TransactionId fdwXactWaitXid; /* waiting for foreign transaction involved with + * this transaction id to be resolved */ + int fdwXactState; /* wait state for foreign transaction resolution */ + SHM_QUEUE fdwXactLinks; /* list link if process is in queue */ + TimestampTz fdwXactNextResolutionTs; + /* * All PROCLOCK objects for locks held or awaited by this backend are * linked into one of these lists, according to the partition number of diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h index 8f67b860e7..deb293c1a9 100644 --- a/src/include/storage/procarray.h +++ b/src/include/storage/procarray.h @@ -36,6 +36,8 @@ #define PROCARRAY_SLOTS_XMIN 0x20 /* replication slot xmin, * catalog_xmin */ +#define PROCARRAY_FDWXACT_XMIN 0x40 /* unresolved distributed + transaciton xmin */ /* * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching * PGXACT->vacuumFlags. Other flags are used for different purposes and @@ -125,4 +127,7 @@ extern void ProcArraySetReplicationSlotXmin(TransactionId xmin, extern void ProcArrayGetReplicationSlotXmin(TransactionId *xmin, TransactionId *catalog_xmin); + +extern void ProcArraySetFdwXactUnresolvedXmin(TransactionId xmin); +extern TransactionId ProcArrayGetFdwXactUnresolvedXmin(void); #endif /* PROCARRAY_H */ diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h index d68976fafa..d5fec50969 100644 --- a/src/include/utils/guc_tables.h +++ b/src/include/utils/guc_tables.h @@ -96,6 +96,9 @@ enum config_group CLIENT_CONN_PRELOAD, CLIENT_CONN_OTHER, LOCK_MANAGEMENT, + FDWXACT, + FDWXACT_SETTINGS, + FDWXACT_RESOLVER, COMPAT_OPTIONS, COMPAT_OPTIONS_PREVIOUS, COMPAT_OPTIONS_CLIENT, diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index c9cc569404..ed229d5a67 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1341,6 +1341,14 @@ pg_file_settings| SELECT a.sourcefile, a.applied, a.error FROM pg_show_all_file_settings() a(sourcefile, sourceline, seqno, name, setting, applied, error); +pg_foreign_xacts| SELECT f.dbid, + f.xid, + f.serverid, + f.userid, + f.status, + f.in_doubt, + f.identifier + FROM pg_foreign_xacts() f(dbid, xid, serverid, userid, status, in_doubt, identifier); pg_group| SELECT pg_authid.rolname AS groname, pg_authid.oid AS grosysid, ARRAY( SELECT pg_auth_members.member @@ -1841,6 +1849,11 @@ pg_stat_database_conflicts| SELECT d.oid AS datid, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d; +pg_stat_foreign_xact| SELECT r.pid, + r.dbid, + r.last_resolved_time + FROM pg_stat_get_foreign_xact() r(pid, dbid, last_resolved_time) + WHERE (r.pid IS NOT NULL); pg_stat_gssapi| SELECT s.pid, s.gss_auth AS gss_authenticated, s.gss_princ AS principal, -- 2.23.0 From 3363abd531595233fb59e0ab6078a011ab8060e9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 5 Dec 2019 17:01:08 +0900 Subject: [PATCH v26 3/5] Documentation update. Original Author: Masahiko Sawada <sawada.mshk@gmail.com> --- doc/src/sgml/catalogs.sgml | 145 +++++++++++++ doc/src/sgml/config.sgml | 146 ++++++++++++- doc/src/sgml/distributed-transaction.sgml | 158 +++++++++++++++ doc/src/sgml/fdwhandler.sgml | 236 ++++++++++++++++++++++ doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/func.sgml | 89 ++++++++ doc/src/sgml/monitoring.sgml | 60 ++++++ doc/src/sgml/postgres.sgml | 1 + doc/src/sgml/storage.sgml | 6 + 9 files changed, 841 insertions(+), 1 deletion(-) create mode 100644 doc/src/sgml/distributed-transaction.sgml diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 55694c4368..1b720da03d 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8267,6 +8267,11 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l <entry>open cursors</entry> </row> + <row> + <entry><link linkend="view-pg-foreign-xacts"><structname>pg_foreign_xacts</structname></link></entry> + <entry>foreign transactions</entry> + </row> + <row> <entry><link linkend="view-pg-file-settings"><structname>pg_file_settings</structname></link></entry> <entry>summary of configuration file contents</entry> @@ -9712,6 +9717,146 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx </sect1> + <sect1 id="view-pg-foreign-xacts"> + <title><structname>pg_foreign_xacts</structname></title> + + <indexterm zone="view-pg-foreign-xacts"> + <primary>pg_foreign_xacts</primary> + </indexterm> + + <para> + The view <structname>pg_foreign_xacts</structname> displays + information about foreign transactions that are opened on + foreign servers for atomic distributed transaction commit (see + <xref linkend="atomic-commit"/> for details). + </para> + + <para> + <structname>pg_foreign_xacts</structname> contains one row per foreign + transaction. An entry is removed when the foreign transaction is + committed or rolled back. + </para> + + <table> + <title><structname>pg_foreign_xacts</structname> Columns</title> + + <tgroup cols="4"> + <thead> + <row> + <entry>Name</entry> + <entry>Type</entry> + <entry>References</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><structfield>dbid</structfield></entry> + <entry><type>oid</type></entry> + <entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry> + <entry> + OID of the database which the foreign transaction resides in + </entry> + </row> + <row> + <entry><structfield>xid</structfield></entry> + <entry><type>xid</type></entry> + <entry></entry> + <entry> + Numeric transaction identifier with that this foreign transaction + associates + </entry> + </row> + <row> + <entry><structfield>serverid</structfield></entry> + <entry><type>oid</type></entry> + <entry><literal><link linkend="catalog-pg-foreign-server"><structname>pg_foreign_server</structname></link>.oid</literal></entry> + <entry> + The OID of the foreign server on that the foreign transaction is prepared + </entry> + </row> + <row> + <entry><structfield>userid</structfield></entry> + <entry><type>oid</type></entry> + <entry><literal><link linkend="view-pg-user"><structname>pg_user</structname></link>.oid</literal></entry> + <entry> + The OID of the user that prepared this foreign transaction. + </entry> + </row> + <row> + <entry><structfield>status</structfield></entry> + <entry><type>text</type></entry> + <entry></entry> + <entry> + Status of foreign transaction. Possible values are: + <itemizedlist> + <listitem> + <para> + <literal>initial</literal> : Initial status. + </para> + </listitem> + <listitem> + <para> + <literal>preparing</literal> : This foreign transaction is being prepared. + </para> + </listitem> + <listitem> + <para> + <literal>prepared</literal> : This foreign transaction has been prepared. + </para> + </listitem> + <listitem> + <para> + <literal>committing</literal> : This foreign transcation is being committed. + </para> + </listitem> + <listitem> + <para> + <literal>aborting</literal> : This foreign transaction is being aborted. + </para> + </listitem> + <listitem> + <para> + <literal>resolved</literal> : This foreign transaction has been resolved. + </para> + </listitem> + </itemizedlist> + </entry> + </row> + <row> + <entry><structfield>in_doubt</structfield></entry> + <entry><type>boolean</type></entry> + <entry></entry> + <entry> + If <literal>true</literal> this foreign transaction is in-dbout status and + needs to be resolved by calling <function>pg_resolve_fdwxact</function> + function. + </entry> + </row> + <row> + <entry><structfield>identifier</structfield></entry> + <entry><type>text</type></entry> + <entry></entry> + <entry> + The identifier of the prepared foreign transaction. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + When the <structname>pg_prepared_xacts</structname> view is accessed, the + internal transaction manager data structures are momentarily locked, and + a copy is made for the view to display. This ensures that the + view produces a consistent set of results, while not blocking + normal operations longer than necessary. Nonetheless + there could be some impact on database performance if this view is + frequently accessed. + </para> + + </sect1> + <sect1 id="view-pg-publication-tables"> <title><structname>pg_publication_tables</structname></title> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 53ac14490a..69778750f3 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4378,7 +4378,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class=" </variablelist> </sect2> - </sect1> <sect1 id="runtime-config-query"> @@ -8818,6 +8817,151 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir' </variablelist> </sect1> + <sect1 id="runtime-config-distributed-transaction"> + <title>Distributed Transaction Management</title> + + <sect2 id="runtime-config-distributed-transaction-settings"> + <title>Setting</title> + <variablelist> + + <varlistentry id="guc-foreign-twophase-commit" xreflabel="foreign_twophse_commit"> + <term><varname>foreign_twophase_commit</varname> (<type>enum</type>) + <indexterm> + <primary><varname>foreign_twophase_commit</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specifies whether transaction commit will wait for all involving foreign + transaction to be resolved before the command returns a "success" + indication to the client. Valid values are <literal>required</literal>, + <literal>prefer</literal> and <literal>disabled</literal>. The default + setting is <literal>disabled</literal>. Setting to + <literal>disabled</literal> don't use two-phase commit protocol to + commit or rollback distributed transactions. When set to + <literal>required</literal> the distributed transaction strictly + requires that all written servers can use two-phase commit protocol. + That is, the distributed transaction cannot commit if even one server + does not support the transaction management callback routines + (described in <xref linkend="fdw-callbacks-transaction-managements"/>). + When set to <literal>prefer</literal> the distributed transaction use + two-phase commit protocol on only servers where available and commit on + others. Note that when <literal>disabled</literal> or + <literal>prefer</literal> there can be risk of database consistency + among all servers that involved in the distributed transaction when some + foreign server crashes during committing the distributed transaction. + </para> + + <para> + Both <varname>max_prepared_foreign_transactions</varname> and + <varname>max_foreign_transaction_resolvers</varname> must be non-zero + value to set this parameter either <literal>required</literal> or + <literal>prefer</literal>. + </para> + + <para> + This parameter can be changed at any time; the behavior for any one + transaction is determined by the setting in effect when it commits. + </para> + </listitem> + </varlistentry> + + <varlistentry id="guc-max-prepared-foreign-transactions" xreflabel="max_prepared_foreign_transactions"> + <term><varname>max_prepared_foreign_transactions</varname> (<type>integer</type>) + <indexterm> + <primary><varname>max_prepared_foreign_transactions</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Sets the maximum number of foreign transactions that can be prepared + simultaneously. A single local transaction can give rise to multiple + foreign transaction. If <literal>N</literal> local transactions each + across <literal>K</literal> foreign server this value need to be set + <literal>N * K</literal>, not just <literal>N</literal>. + This parameter can only be set at server start. + </para> + <para> + When running a standby server, you must set this parameter to the + same or higher value than on the master server. Otherwise, queries + will not be allowed in the standby server. + </para> + </listitem> + </varlistentry> + + </variablelist> + </sect2> + + <sect2 id="runtime-config-foreign-transaction-resolver"> + <title>Foreign Transaction Resolvers</title> + + <para> + These settings control the behavior of a foreign transaction resolver. + </para> + + <variablelist> + <varlistentry id="guc-max-foreign-transaction-resolvers" xreflabel="max_foreign_transaction_resolvers"> + <term><varname>max_foreign_transaction_resolvers</varname> (<type>int</type>) + <indexterm> + <primary><varname>max_foreign_transaction_resolvers</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specifies maximum number of foreign transaction resolution workers. A foreign transaction + resolver is responsible for foreign transaction resolution on one database. + </para> + <para> + Foreign transaction resolution workers are taken from the pool defined by + <varname>max_worker_processes</varname>. + </para> + <para> + The default value is 0. + </para> + </listitem> + </varlistentry> + + <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval"> + <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>) + <indexterm> + <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specify how long the foreign transaction resolver should wait when the last resolution + fails before retrying to resolve foreign transaction. This parameter can only be set in the + <filename>postgresql.conf</filename> file or on the server command line. + </para> + <para> + The default value is 10 seconds. + </para> + </listitem> + </varlistentry> + + <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout"> + <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>) + <indexterm> + <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Terminate foreign transaction resolver processes that don't have any foreign + transactions to resolve longer than the specified number of milliseconds. + A value of zero disables the timeout mechanism, meaning it connects to one + database until stopping manually. This parameter can only be set in the + <filename>postgresql.conf</filename> file or on the server command line. + </para> + <para> + The default value is 60 seconds. + </para> + </listitem> + </varlistentry> + </variablelist> + </sect2> + </sect1> + <sect1 id="runtime-config-compatible"> <title>Version and Platform Compatibility</title> diff --git a/doc/src/sgml/distributed-transaction.sgml b/doc/src/sgml/distributed-transaction.sgml new file mode 100644 index 0000000000..350b1afe68 --- /dev/null +++ b/doc/src/sgml/distributed-transaction.sgml @@ -0,0 +1,158 @@ +<!-- doc/src/sgml/distributed-transaction.sgml --> + +<chapter id="distributed-transaction"> + <title>Distributed Transaction</title> + + <para> + A distributed transaction is a transaction in which two or more network hosts + are involved. <productname>PostgreSQL</productname>'s global Transaction + manager supports distributed transactions that access foreign servers using + Foreign Data Wrappers. The global transaction manager is responsible for + managing transactions on foreign servers. + </para> + + <sect1 id="atomic-commit"> + <title>Atomic Commit</title> + + <para> + Atomic commit of distributed transaction is an operation that applies a set + of changes as a single operation globally. This guarantees all-or-nothing + results for the changes on all remote hosts involved in. + <productname>PostgreSQL</productname> provides a way to perform read-write + transactions with foreign resources using foreign data wrappers. + Using the <productname>PostgreSQL</productname>'s atomic commit ensures that + all changes on foreign servers end in either commit or rollback using the + transaction callback routines + (see <xref linkend="fdw-callbacks-transaction-managements"/>). + </para> + + <sect2> + <title>Atomic Commit Using Two-phase Commit Protocol</title> + + <para> + To achieve commit among all foreign servers automatially, + <productname>PostgreSQL</productname> employs two-phase commit protocol, + which is a type of atomic commitment protocol (ACP). + A <productname>PostgreSQL</productname> server that received SQL is called + <firstterm>coordinator node</firstterm> who is responsible for coordinating + all the partipanting transactions. Using two-phase commit protocol, the commit + sequence of distributed transaction performs with the following steps. + <orderedlist> + <listitem> + <para> + Prepare all transactions on foreign servers. + </para> + </listitem> + <listitem> + <para> + Commit locally. + </para> + </listitem> + <listitem> + <para> + Resolve all prepared transaction on foreign servers. + </para> + </listitem> + </orderedlist> + + </para> + + <para> + At the first step, <productname>PostgreSQL</productname> distributed + transaction manager prepares all transaction on the foreign servers if + two-phase commit is required. Two-phase commit is required when the + transaction modifies data on two or more servers including the local server + itself and <xref linkend="guc-foreign-twophase-commit"/>is + <literal>required</literal> or <literal>prefer</literal>. If all preparations + on foreign servers got successful go to the next step. Any failure happens + in this step <productname>PostgreSQL</productname> changes to rollback, then + rollback all transactions on both local and foreign servers. + </para> + + <para> + At the local commit step, <productname>PostgreSQL</productname> commit the + transaction locally. Any failure happens in this step + <productname>PostgreSQL</productname> changes rollback, then rollback all + transactions on both local and foreign servers. + </para> + + <para> + At the final step, prepared transactions are resolved by a foreign transaction + resolver process. + </para> + </sect2> + + <sect2 id="atomic-commit-transaction-resolution"> + <title>Foreign Transaction Resolver Processes</title> + + <para> + Foreign transaction resolver processes are auxiliary processes that is + responsible for foreign transaction resolution. They commit or rollback all + prepared transaction on foreign servers if the coordinator received agreement + messages from all foreign servers during the first step. + </para> + + <para> + One foreign transaction resolver is responsible for transaction resolutions + on one database of the coordinator side. On failure during resolution, they + retries to resolve at an interval of + <varname>foreign_transaction_resolution_interval</varname> time. + </para> + + <note> + <para> + During a foreign transaction resolver process connecting to the database, + database cannot be dropped. So to drop the database, you can call + <function>pg_stop_foreign_xact_resovler</function> function before dropping + the database. + </para> + </note> + </sect2> + + <sect2 id="atomic-commit-in-doubt-transaction"> + <title>Manual Resolution of In-Doubt Transactions</title> + + <para> + The atomic commit mechanism ensures that all foreign servers either commit + or rollback using two-phase commit protocol. However, distributed transactions + become <firstterm>in-doubt</firstterm> in three cases: where the foreign + server crashed or lost the connectibility to it during preparing foreign + transaction, where the coordinator node crashed during either preparing or + resolving distributed transaction and where user canceled the query. You can + check in-doubt transaction in <xref linkend="pg-stat-foreign-xact-view"/> + view. These foreign transactions need to be resolved by using + <function>pg_resolve_foriegn_xact</function> function. + <productname>PostgreSQL</productname> doesn't have facilities to automatially + resolve in-doubt transactions. These behavior might change in a future release. + </para> + </sect2> + + <sect2 id="atomic-commit-monitoring"> + <title>Monitoring</title> + <para> + The monitoring information about foreign transaction resolvers is visible in + <link linkend="pg-stat-foreign-xact-view"><literal>pg_stat_foreign_xact</literal></link> + view. This view contains one row for every foreign transaction resolver worker. + </para> + </sect2> + + <sect2> + <title>Configuration Settings</title> + + <para> + Atomic commit requires several configuration options to be set. + </para> + + <para> + On the coordinator side, <xref linkend="guc-max-prepared-foreign-transactions"/> and + <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value. + Additionally the <varname>max_worker_processes</varname> may need to be adjusted to + accommodate for foreign transaction resolver workers, at least + (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>). + Note that some extensions and parallel queries also take worker slots from + <varname>max_worker_processes</varname>. + </para> + + </sect2> + </sect1> +</chapter> diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml index 6587678af2..dd0358ef22 100644 --- a/doc/src/sgml/fdwhandler.sgml +++ b/doc/src/sgml/fdwhandler.sgml @@ -1415,6 +1415,127 @@ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private, </para> </sect2> + <sect2 id="fdw-callbacks-transaction-managements"> + <title>FDW Routines For Transaction Managements</title> + + <para> + Transaction management callbacks are used for doing commit, rollback and + prepare the foreign transaction. If an FDW wishes that its foreign + transaction is managed by <productname>PostgreSQL</productname>'s global + transaction manager it must provide both + <function>CommitForeignTransaction</function> and + <function>RollbackForeignTransaction</function>. In addition, if an FDW + wishes to support <firstterm>atomic commit</firstterm> (as described in + <xref linkend="fdw-transaction-managements"/>), it must provide + <function>PrepareForeignTransaction</function> as well and can provide + <function>GetPrepareId</function> callback optionally. + </para> + + <para> +<programlisting> +void +PrepareForeignTransaction(FdwXactRslvState *frstate); +</programlisting> + Prepare the transaction on the foreign server. This function is called at the + pre-commit phase of the local transactions if foreign twophase commit is + required. This function is used only for distribute transaction management + (see <xref linkend="distributed-transaction"/>). + </para> + + <para> + Note that this callback function is always executed by backend processes. + </para> + <para> +<programlisting> +bool +CommitForeignTransaction(FdwXactRslvState *frstate); +</programlisting> + Commit the foreign transaction. This function is called either at + the pre-commit phase of the local transaction if the transaction + can be committed in one-phase or at the post-commit phase if + two-phase commit is required. If <literal>frstate->flag</literal> has + the flag <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction + can be committed in one-phase, this function must commit the prepared + transaction identified by <literal>frstate->fdwxact_id</literal>. + </para> + + <para> + The foreign transaction identified by <literal>frstate->fdwxact_id</literal> + might not exist on the foreign servers. This can happen when, for instance, + <productname>PostgreSQL</productname> server crashed during preparing or + committing the foreign tranasction. Therefore, this function needs to + tolerate the undefined object error + (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error. + </para> + + <para> + Note that all cases except for calling <function>pg_resolve_fdwxact</function> + SQL function, this callback function is executed by foreign transaction + resolver processes. + </para> + <para> +<programlisting> +bool +RollbackForeignTransaction(FdwXactRslvState *frstate); +</programlisting> + Rollback the foreign transaction. This function is called either at + the end of the local transaction after rolled back locally. The foreign + transactions are rolled back when user requested rollbacking or when + any error occurs during the transaction. This function must be tolerate to + being called recursively if any error occurs during rollback the foreign + transaction. So you would need to track recursion and prevent being called + infinitely. If <literal>frstate->flag</literal> has the flag + <literal>FDW_XACT_FLAG_ONEPHASE</literal> the transaction can be rolled + back in one-phase, otherwise this function must rollback the prepared + transaction identified by <literal>frstate->fdwxact_id</literal>. + </para> + + <para> + The foreign transaction identified by <literal>frstate->fdwxact_id</literal> + might not exist on the foreign servers. This can happen when, for instance, + <productname>PostgreSQL</productname> server crashed during preparing or + committing the foreign tranasction. Therefore, this function needs to + tolerate the undefined object error + (<literal>ERRCODE_UNDEFINED_OBJECT</literal>) rather than raising an error. + </para> + + <para> + Note that all cases except for calling <function>pg_resolve_fdwxact</function> + SQL function, this callback function is executed by foreign transaction + resolver processes. + </para> + <para> +<programlisting> +char * +GetPrepareId(TransactionId xid, Oid serverid, Oid userid, int *prep_id_len); +</programlisting> + Return null terminated string that represents prepared transaction identifier + with its length <varname>*prep_id_len</varname>. + This optional function is called during executor startup for once per the + foreign server. Note that the transaction identifier must be string literal, + less than <symbol>NAMEDATALEN</symbol> bytes long and should not be same + as any other concurrent prepared transaction id. If this callback routine + is not supported, <productname>PostgreSQL</productname>'s distributed + transaction manager generates an unique identifier with in the form of + <literal>fx_<random value up to 2<superscript>31</superscript>>_<server oid>_<user oid></literal>. + </para> + + <para> + Note that this callback function is always executed by backend processes. + </para> + + <note> + <para> + Functions <function>PrepareForeignTransaction</function>, + <function>CommitForeignTransaction</function> and + <function>RolblackForeignTransaction</function> are called + at outside of a valid transaction state. So please note that + you cannot use functions that use the system catalog cache + such as Foreign Data Wrapper helper functions described in + <xref linkend="fdw-helpers"/>. + </para> + </note> + </sect2> </sect1> <sect1 id="fdw-helpers"> @@ -1894,4 +2015,119 @@ GetForeignServerByName(const char *name, bool missing_ok); </sect1> + <sect1 id="fdw-transaction-managements"> + <title>Transaction managements for Foreign Data Wrappers</title> + <para> + If a FDW's server supports transaction, it is usually worthwhile for the + FDW to manage transaction opened on the foreign server. The FDW callback + function <literal>CommitForeignTransaction</literal>, + <literal>RollbackForeignTransaction</literal> and + <literal>PrepareForeignTransaction</literal> are used to manage Transaction + management and must fit into the working of the + <productname>PostgreSQL</productname> transaction processing. + </para> + + <para> + The information in <literal>FdwXactRslvState</literal> can be used to get + information of foreign server being processed such as server name, OID of + server, user and user mapping. The <literal>flags</literal> has contains flag + bit describing the foreign transaction state for transaction management. + </para> + + <sect2 id="fdw-transaction-commit-rollback"> + <title>Commit And Rollback Single Foreign Transaction</title> + <para> + The FDW callback function <literal>CommitForeignTransaction</literal> + and <literal>RollbackForeignTransaction</literal> can be used to commit + and rollback the foreign transaction. During transaction commit, the core + transaction manager calls <literal>CommitForeignTransaction</literal> function + in the pre-commit phase and calls + <literal>RollbackForeignTransaction</literal> function in the post-rollback + phase. + </para> + </sect2> + + <sect2 id="fdw-transaction-distributed-transaction-commit"> + <title>Atomic Commit And Rollback Distributed Transaction</title> + <para> + In addition to simply commit and rollback foreign transactions described at + <xref linkend="fdw-transaction-commit-rollback"/>, + <productname>PostgreSQL</productname> global transaction manager enables + distributed transactions to atomically commit and rollback among all foreign + servers, which is as known as atomic commit in literature. To achieve atomic + commit, <productname>PostgreSQL</productname> employs two-phase commit + protocol, which is a type of atomic commitment protocol. Every FDWs that wish + to support two-phase commit protocol are required to have the FDW callback + function <function>PrepareForeignTransaction</function> and optionally + <function>GetPrepareId</function>, in addition to + <function>CommitForeignTransaction</function> and + <function>RollbackForeignTransaction</function> + (see <xref linkend="fdw-callbacks-transaction-managements"/> for details). + </para> + + <para> + An example of distributed transaction is as follows +<programlisting> +BEGIN; +UPDATE ft1 SET col = 'a'; +UPDATE ft2 SET col = 'b'; +COMMIT; +</programlisting> + ft1 and ft2 are foreign tables on different foreign servers may be using different + Foreign Data Wrappers. + </para> + + <para> + When the core executor access the foreign servers, foreign servers whose FDW + supports transaction management callback routines is registered as a participant. + During registration, <function>GetPrepareId</function> is called if provided to + generate an unique transaction identifer. + </para> + + <para> + During pre-commit phase of local transaction, the foreign transaction manager + persists the foreign transaction information to the disk and WAL, and then + prepare all foreign transaction by calling + <function>PrepareForeignTransaction</function> if two-phase commit protocol + is required. Two-phase commit is required when the transaction modified data + on more than one servers including the local server itself and user requests + foreign twophase commit (see <xref linkend="guc-foreign-twophase-commit"/>). + </para> + + <para> + <productname>PostgreSQL</productname> can commit locally and go to the next + step if and only if all foreign transactions are prepared successfully. + If any failure happens or user requests to cancel during preparation, + the distributed transaction manager changes over rollback and calls + <function>RollbackForeignTransaction</function>. + </para> + + <para> + Note that when <literal>(frstate->flags & FDWXACT_FLAG_ONEPHASE)</literal> + is true, both <literal>CommitForeignTransaction</literal> function and + <literal>RollbackForeignTransaction</literal> function should commit and + rollback directly, rather than processing prepared transactions. This can + happen when two-phase commit is not required or foreign server is not + modified with in the transaction. + </para> + + <para> + Once all foreign transaction is prepared, the core transaction manager commits + locally. After that the transaction commit waits for all prepared foreign + transaction to be committed before completetion. After all prepared foreign + transactions are resolved the transaction commit completes. + </para> + + <para> + One foreign transaction resolver process is responsible for foreign + transaction resolution on a database. Foreign transaction resolver process + calls either <function>CommitForeignTransaction</function> or + <function>RollbackForeignTransaction</function> to resolve foreign + transaction identified by <literal>frstate->fdwxact_id</literal>. If failed + to resolve, resolver process will exit with an error message. The foreign + transaction launcher will launch the resolver process again at + <xref linkend="guc-foreign-transaction-resolution-rety-interval"/> interval. + </para> + </sect2> + </sect1> </chapter> diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 3da2365ea9..80a87fa5d1 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -48,6 +48,7 @@ <!ENTITY wal SYSTEM "wal.sgml"> <!ENTITY logical-replication SYSTEM "logical-replication.sgml"> <!ENTITY jit SYSTEM "jit.sgml"> +<!ENTITY distributed-transaction SYSTEM "distributed-transaction.sgml"> <!-- programmer's guide --> <!ENTITY bgworker SYSTEM "bgworker.sgml"> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 57a1539506..b9a918b9ee 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -22355,6 +22355,95 @@ SELECT (pg_stat_file('filename')).modification; </sect2> + <sect2 id="functions-foreign-transaction"> + <title>Foreign Transaction Management Functions</title> + + <indexterm> + <primary>pg_resolve_foreign_xact</primary> + </indexterm> + <indexterm> + <primary>pg_remove_foreign_xact</primary> + </indexterm> + + <para> + <xref linkend="functions-fdw-transaction-control-table"/> shows the functions + available for foreign transaction management. + These functions cannot be executed during recovery. Use of these function + is restricted to superusers. + </para> + + <table id="functions-fdw-transaction-control-table"> + <title>Foreign Transaction Management Functions</title> + <tgroup cols="3"> + <thead> + <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row> + </thead> + + <tbody> + <row> + <entry> + <literal><function>pg_resolve_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>userid</parameter><type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal> + </entry> + <entry><type>bool</type></entry> + <entry> + Resolve a foreign transaction. This function searches for foreign + transaction matching the arguments and resolves it. Once the foreign + transaction is resolved successfully, this function removes the + corresponding entry from <xref linkend="view-pg-foreign-xacts"/>. + This function won't resolve a foreign transaction which is being + processed. + </entry> + </row> + <row> + <entry> + <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter><type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal> + </entry> + <entry><type>void</type></entry> + <entry> + This function works the same as <function>pg_resolve_foreign_xact</function> + except that this removes the foreign transcation entry without resolution. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + The function shown in <xref linkend="functions-fdwxact-resolver-control-table"/> + control the foreign transaction resolvers. + </para> + + <table id="functions-fdwxact-resolver-control-table"> + <title>Foreign Transaction Resolver Control Functions</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry> + <literal><function>pg_stop_fdwxact_resolver(<parameter>dbid</parameter> <type>oid</type>)</function></literal> + </entry> + <entry><type>bool</type></entry> + <entry> + Stop the foreign transaction resolver running on the given database. + This function is useful for stopping a resolver process on the database + that you want to drop. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + <function>pg_stop_fdwxact_resolver</function> is useful to be used before + dropping the database to that the foreign transaction resolver is connecting. + </para> + + </sect2> </sect1> <sect1 id="functions-trigger"> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index a3c5f86b7e..65938e81ca 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -368,6 +368,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser </entry> </row> + <row> + <entry><structname>pg_stat_foreign_xact</structname><indexterm><primary>pg_stat_fdw_xact_resolver</primary></indexterm></entry> + <entry>One row per foreign transaction resolver process, showing statistics about + foreign transaction resolution. See <xref linkend="pg-stat-foreign-xact-view"/> for + details. + </entry> + </row> + </tbody> </tgroup> </table> @@ -1236,6 +1244,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser <entry><literal>CheckpointerMain</literal></entry> <entry>Waiting in main loop of checkpointer process.</entry> </row> + <row> + <entry><literal>FdwXactLauncherMain</literal></entry> + <entry>Waiting in main loop of foreign transaction resolution launcher process.</entry> + </row> + <row> + <entry><literal>FdwXactResolverMain</literal></entry> + <entry>Waiting in main loop of foreign transaction resolution worker process.</entry> + </row> + <row> + <entry><literal>LogicalLauncherMain</literal></entry> + <entry>Waiting in main loop of logical launcher process.</entry> + </row> <row> <entry><literal>LogicalApplyMain</literal></entry> <entry>Waiting in main loop of logical apply process.</entry> @@ -1459,6 +1479,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser <entry><literal>SyncRep</literal></entry> <entry>Waiting for confirmation from remote server during synchronous replication.</entry> </row> + <row> + <entry><literal>FdwXactResolution</literal></entry> + <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry> + </row> <row> <entry morerows="2"><literal>Timeout</literal></entry> <entry><literal>BaseBackupThrottle</literal></entry> @@ -2359,6 +2383,42 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i connection. </para> + <table id="pg-stat-foreign-xact-view" xreflabel="pg_stat_foreign_xact"> + <title><structname>pg_stat_foreign_xact</structname> View</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Column</entry> + <entry>Type</entry> + <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry><structfield>pid</structfield></entry> + <entry><type>integer</type></entry> + <entry>Process ID of a foreign transaction resolver process</entry> + </row> + <row> + <entry><structfield>dbid</structfield></entry> + <entry><type>oid</type></entry> + <entry>OID of the database to which the foreign transaction resolver is connected</entry> + </row> + <row> + <entry><structfield>last_resolved_time</structfield></entry> + <entry><type>timestamp with time zone</type></entry> + <entry>Time at which the process last resolved a foreign transaction</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + The <structname>pg_stat_fdw_xact_resolver</structname> view will contain one + row per foreign transaction resolver process, showing state of resolution + of foreign transactions. + </para> <table id="pg-stat-archiver-view" xreflabel="pg_stat_archiver"> <title><structname>pg_stat_archiver</structname> View</title> diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index e59cba7997..dee3f72f7e 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -163,6 +163,7 @@ &wal; &logical-replication; &jit; + &distributed-transaction; ®ress; </part> diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index 1c19e863d2..3f4c806ed1 100644 --- a/doc/src/sgml/storage.sgml +++ b/doc/src/sgml/storage.sgml @@ -83,6 +83,12 @@ Item subsystem</entry> </row> +<row> + <entry><filename>pg_fdwxact</filename></entry> + <entry>Subdirectory containing files used by the distributed transaction + manager subsystem</entry> +</row> + <row> <entry><filename>pg_logical</filename></entry> <entry>Subdirectory containing status data for logical decoding</entry> -- 2.23.0 From 84f81fdcb2bd823e34edba79c81c29871d7906fb Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 5 Dec 2019 17:01:15 +0900 Subject: [PATCH v26 4/5] postgres_fdw supports atomic commit APIs. Original Author: Masahiko Sawada <sawada.mshk@gmail.com> --- contrib/postgres_fdw/Makefile | 7 +- contrib/postgres_fdw/connection.c | 604 +++++++++++------- .../postgres_fdw/expected/postgres_fdw.out | 265 +++++++- contrib/postgres_fdw/fdwxact.conf | 3 + contrib/postgres_fdw/postgres_fdw.c | 21 +- contrib/postgres_fdw/postgres_fdw.h | 7 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 120 +++- doc/src/sgml/postgres-fdw.sgml | 45 ++ 8 files changed, 822 insertions(+), 250 deletions(-) create mode 100644 contrib/postgres_fdw/fdwxact.conf diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile index ee8a80a392..91fa6e39fc 100644 --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq) EXTENSION = postgres_fdw DATA = postgres_fdw--1.0.sql -REGRESS = postgres_fdw +REGRESSCHECK = postgres_fdw ifdef USE_PGXS PG_CONFIG = pg_config @@ -29,3 +29,8 @@ top_builddir = ../.. include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk endif + +check: + $(pg_regress_check) \ + --temp-config $(top_srcdir)/contrib/postgres_fdw/fdwxact.conf \ + $(REGRESSCHECK) diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 27b86a03f8..0b07e6c5cc 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -1,7 +1,7 @@ /*------------------------------------------------------------------------- * * connection.c - * Connection management functions for postgres_fdw + * Connection and transaction management functions for postgres_fdw * * Portions Copyright (c) 2012-2019, PostgreSQL Global Development Group * @@ -12,6 +12,7 @@ */ #include "postgres.h" +#include "access/fdwxact.h" #include "access/htup_details.h" #include "access/xact.h" #include "catalog/pg_user_mapping.h" @@ -54,6 +55,7 @@ typedef struct ConnCacheEntry bool have_error; /* have any subxacts aborted in this xact? */ bool changing_xact_state; /* xact state change in process */ bool invalidated; /* true if reconnect is pending */ + bool xact_got_connection; uint32 server_hashvalue; /* hash value of foreign server OID */ uint32 mapping_hashvalue; /* hash value of user mapping OID */ } ConnCacheEntry; @@ -67,17 +69,13 @@ static HTAB *ConnectionHash = NULL; static unsigned int cursor_number = 0; static unsigned int prep_stmt_number = 0; -/* tracks whether any work is needed in callback functions */ -static bool xact_got_connection = false; - /* prototypes of private functions */ static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user); static void disconnect_pg_server(ConnCacheEntry *entry); static void check_conn_params(const char **keywords, const char **values, UserMapping *user); static void configure_remote_session(PGconn *conn); static void do_sql_command(PGconn *conn, const char *sql); -static void begin_remote_xact(ConnCacheEntry *entry); -static void pgfdw_xact_callback(XactEvent event, void *arg); +static void begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid); static void pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid, SubTransactionId parentSubid, @@ -89,24 +87,26 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query, bool ignore_errors); static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime, PGresult **result); - +static void pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, + bool is_commit); +static void pgfdw_cleanup_after_transaction(ConnCacheEntry *entry); +static ConnCacheEntry *GetConnectionState(Oid umid, bool will_prep_stmt, + bool start_transaction); +static ConnCacheEntry *GetConnectionCacheEntry(Oid umid); /* - * Get a PGconn which can be used to execute queries on the remote PostgreSQL - * server with the user's authorization. A new connection is established - * if we don't already have a suitable one, and a transaction is opened at - * the right subtransaction nesting depth if we didn't do that already. - * - * will_prep_stmt must be true if caller intends to create any prepared - * statements. Since those don't go away automatically at transaction end - * (not even on error), we need this flag to cue manual cleanup. + * Get connection cache entry. Unlike GetConenctionState function, this function + * doesn't establish new connection even if not yet. */ -PGconn * -GetConnection(UserMapping *user, bool will_prep_stmt) +static ConnCacheEntry * +GetConnectionCacheEntry(Oid umid) { - bool found; ConnCacheEntry *entry; - ConnCacheKey key; + ConnCacheKey key; + bool found; + + /* Create hash key for the entry. Assume no pad bytes in key struct */ + key = umid; /* First time through, initialize connection cache hashtable */ if (ConnectionHash == NULL) @@ -126,7 +126,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt) * Register some callback functions that manage connection cleanup. * This should be done just once in each backend. */ - RegisterXactCallback(pgfdw_xact_callback, NULL); RegisterSubXactCallback(pgfdw_subxact_callback, NULL); CacheRegisterSyscacheCallback(FOREIGNSERVEROID, pgfdw_inval_callback, (Datum) 0); @@ -134,12 +133,6 @@ GetConnection(UserMapping *user, bool will_prep_stmt) pgfdw_inval_callback, (Datum) 0); } - /* Set flag that we did GetConnection during the current transaction */ - xact_got_connection = true; - - /* Create hash key for the entry. Assume no pad bytes in key struct */ - key = user->umid; - /* * Find or create cached entry for requested connection. */ @@ -153,6 +146,21 @@ GetConnection(UserMapping *user, bool will_prep_stmt) entry->conn = NULL; } + return entry; +} + +/* + * This function gets the connection cache entry and establishes connection + * to the foreign server if there is no connection and starts a new transaction + * if 'start_transaction' is true. + */ +static ConnCacheEntry * +GetConnectionState(Oid umid, bool will_prep_stmt, bool start_transaction) +{ + ConnCacheEntry *entry; + + entry = GetConnectionCacheEntry(umid); + /* Reject further use of connections which failed abort cleanup. */ pgfdw_reject_incomplete_xact_state_change(entry); @@ -180,6 +188,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt) */ if (entry->conn == NULL) { + UserMapping *user = GetUserMappingByOid(umid); ForeignServer *server = GetForeignServer(user->serverid); /* Reset all transient state fields, to be sure all are clean */ @@ -188,6 +197,7 @@ GetConnection(UserMapping *user, bool will_prep_stmt) entry->have_error = false; entry->changing_xact_state = false; entry->invalidated = false; + entry->xact_got_connection = false; entry->server_hashvalue = GetSysCacheHashValue1(FOREIGNSERVEROID, ObjectIdGetDatum(server->serverid)); @@ -198,6 +208,15 @@ GetConnection(UserMapping *user, bool will_prep_stmt) /* Now try to make the connection */ entry->conn = connect_pg_server(server, user); + Assert(entry->conn); + + if (!entry->conn) + { + elog(DEBUG3, "attempt to connection to server \"%s\" by postgres_fdw failed", + server->servername); + return NULL; + } + elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\" (user mapping oid %u, userid %u)", entry->conn, server->servername, user->umid, user->userid); } @@ -205,11 +224,39 @@ GetConnection(UserMapping *user, bool will_prep_stmt) /* * Start a new transaction or subtransaction if needed. */ - begin_remote_xact(entry); + if (start_transaction) + { + UserMapping *user = GetUserMappingByOid(umid); + + begin_remote_xact(entry, user->serverid, user->userid); + + /* Set flag that we did GetConnection during the current transaction */ + entry->xact_got_connection = true; + } /* Remember if caller will prepare statements */ entry->have_prep_stmt |= will_prep_stmt; + return entry; +} + +/* + * Get a PGconn which can be used to execute queries on the remote PostgreSQL + * server with the user's authorization. A new connection is established + * if we don't already have a suitable one, and a transaction is opened at + * the right subtransaction nesting depth if we didn't do that already. + * + * will_prep_stmt must be true if caller intends to create any prepared + * statements. Since those don't go away automatically at transaction end + * (not even on error), we need this flag to cue manual cleanup. + */ +PGconn * +GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction) +{ + ConnCacheEntry *entry; + + entry = GetConnectionState(umid, will_prep_stmt, start_transaction); + return entry->conn; } @@ -412,7 +459,7 @@ do_sql_command(PGconn *conn, const char *sql) * control which remote queries share a snapshot. */ static void -begin_remote_xact(ConnCacheEntry *entry) +begin_remote_xact(ConnCacheEntry *entry, Oid serverid, Oid userid) { int curlevel = GetCurrentTransactionNestLevel(); @@ -639,193 +686,6 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn, PG_END_TRY(); } -/* - * pgfdw_xact_callback --- cleanup at main-transaction end. - */ -static void -pgfdw_xact_callback(XactEvent event, void *arg) -{ - HASH_SEQ_STATUS scan; - ConnCacheEntry *entry; - - /* Quick exit if no connections were touched in this transaction. */ - if (!xact_got_connection) - return; - - /* - * Scan all connection cache entries to find open remote transactions, and - * close them. - */ - hash_seq_init(&scan, ConnectionHash); - while ((entry = (ConnCacheEntry *) hash_seq_search(&scan))) - { - PGresult *res; - - /* Ignore cache entry if no open connection right now */ - if (entry->conn == NULL) - continue; - - /* If it has an open remote transaction, try to close it */ - if (entry->xact_depth > 0) - { - bool abort_cleanup_failure = false; - - elog(DEBUG3, "closing remote transaction on connection %p", - entry->conn); - - switch (event) - { - case XACT_EVENT_PARALLEL_PRE_COMMIT: - case XACT_EVENT_PRE_COMMIT: - - /* - * If abort cleanup previously failed for this connection, - * we can't issue any more commands against it. - */ - pgfdw_reject_incomplete_xact_state_change(entry); - - /* Commit all remote transactions during pre-commit */ - entry->changing_xact_state = true; - do_sql_command(entry->conn, "COMMIT TRANSACTION"); - entry->changing_xact_state = false; - - /* - * If there were any errors in subtransactions, and we - * made prepared statements, do a DEALLOCATE ALL to make - * sure we get rid of all prepared statements. This is - * annoying and not terribly bulletproof, but it's - * probably not worth trying harder. - * - * DEALLOCATE ALL only exists in 8.3 and later, so this - * constrains how old a server postgres_fdw can - * communicate with. We intentionally ignore errors in - * the DEALLOCATE, so that we can hobble along to some - * extent with older servers (leaking prepared statements - * as we go; but we don't really support update operations - * pre-8.3 anyway). - */ - if (entry->have_prep_stmt && entry->have_error) - { - res = PQexec(entry->conn, "DEALLOCATE ALL"); - PQclear(res); - } - entry->have_prep_stmt = false; - entry->have_error = false; - break; - case XACT_EVENT_PRE_PREPARE: - - /* - * We disallow any remote transactions, since it's not - * very reasonable to hold them open until the prepared - * transaction is committed. For the moment, throw error - * unconditionally; later we might allow read-only cases. - * Note that the error will cause us to come right back - * here with event == XACT_EVENT_ABORT, so we'll clean up - * the connection state at that point. - */ - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot PREPARE a transaction that has operated on postgres_fdw foreign tables"))); - break; - case XACT_EVENT_PARALLEL_COMMIT: - case XACT_EVENT_COMMIT: - case XACT_EVENT_PREPARE: - /* Pre-commit should have closed the open transaction */ - elog(ERROR, "missed cleaning up connection during pre-commit"); - break; - case XACT_EVENT_PARALLEL_ABORT: - case XACT_EVENT_ABORT: - - /* - * Don't try to clean up the connection if we're already - * in error recursion trouble. - */ - if (in_error_recursion_trouble()) - entry->changing_xact_state = true; - - /* - * If connection is already unsalvageable, don't touch it - * further. - */ - if (entry->changing_xact_state) - break; - - /* - * Mark this connection as in the process of changing - * transaction state. - */ - entry->changing_xact_state = true; - - /* Assume we might have lost track of prepared statements */ - entry->have_error = true; - - /* - * If a command has been submitted to the remote server by - * using an asynchronous execution function, the command - * might not have yet completed. Check to see if a - * command is still being processed by the remote server, - * and if so, request cancellation of the command. - */ - if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE && - !pgfdw_cancel_query(entry->conn)) - { - /* Unable to cancel running query. */ - abort_cleanup_failure = true; - } - else if (!pgfdw_exec_cleanup_query(entry->conn, - "ABORT TRANSACTION", - false)) - { - /* Unable to abort remote transaction. */ - abort_cleanup_failure = true; - } - else if (entry->have_prep_stmt && entry->have_error && - !pgfdw_exec_cleanup_query(entry->conn, - "DEALLOCATE ALL", - true)) - { - /* Trouble clearing prepared statements. */ - abort_cleanup_failure = true; - } - else - { - entry->have_prep_stmt = false; - entry->have_error = false; - } - - /* Disarm changing_xact_state if it all worked. */ - entry->changing_xact_state = abort_cleanup_failure; - break; - } - } - - /* Reset state to show we're out of a transaction */ - entry->xact_depth = 0; - - /* - * If the connection isn't in a good idle state, discard it to - * recover. Next GetConnection will open a new connection. - */ - if (PQstatus(entry->conn) != CONNECTION_OK || - PQtransactionStatus(entry->conn) != PQTRANS_IDLE || - entry->changing_xact_state) - { - elog(DEBUG3, "discarding connection %p", entry->conn); - disconnect_pg_server(entry); - } - } - - /* - * Regardless of the event type, we can now mark ourselves as out of the - * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE, - * this saves a useless scan of the hashtable during COMMIT or PREPARE.) - */ - xact_got_connection = false; - - /* Also reset cursor numbering for next transaction */ - cursor_number = 0; -} - /* * pgfdw_subxact_callback --- cleanup at subtransaction end. */ @@ -842,10 +702,6 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid, event == SUBXACT_EVENT_ABORT_SUB)) return; - /* Quick exit if no connections were touched in this transaction. */ - if (!xact_got_connection) - return; - /* * Scan all connection cache entries to find open remote subtransactions * of the current level, and close them. @@ -856,6 +712,10 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid, { char sql[100]; + /* Quick exit if no connections were touched in this transaction. */ + if (!entry->xact_got_connection) + continue; + /* * We only care about connections with open remote subtransactions of * the current level. @@ -1190,3 +1050,309 @@ exit: ; *result = last_res; return timed_out; } + +/* + * Prepare a transaction on foreign server. + */ +void +postgresPrepareForeignTransaction(FdwXactRslvState *state) +{ + ConnCacheEntry *entry = NULL; + PGresult *res; + StringInfo command; + + /* The transaction should have started already get the cache entry */ + entry = GetConnectionCacheEntry(state->usermapping->umid); + + /* The transaction should have been started */ + Assert(entry->xact_got_connection && entry->conn); + + pgfdw_reject_incomplete_xact_state_change(entry); + + command = makeStringInfo(); + appendStringInfo(command, "PREPARE TRANSACTION '%s'", state->fdwxact_id); + + /* Do commit foreign transaction */ + entry->changing_xact_state = true; + res = pgfdw_exec_query(entry->conn, command->data); + entry->changing_xact_state = false; + + if (PQresultStatus(res) != PGRES_COMMAND_OK) + ereport(ERROR, (errmsg("could not prepare transaction on server %s with ID %s", + state->server->servername, state->fdwxact_id))); + + elog(DEBUG1, "prepared foreign transaction on server %s with ID %s", + state->server->servername, state->fdwxact_id); + + if (entry->have_prep_stmt && entry->have_error) + { + res = PQexec(entry->conn, "DEALLOCATE ALL"); + PQclear(res); + } + + pgfdw_cleanup_after_transaction(entry); +} + +/* + * Commit a transaction or a prepared transaction on foreign server. If + * state->flags contains FDWXACT_FLAG_ONEPHASE this function can commit the + * foreign transaction without preparation, otherwise commit the prepared + * transaction. + */ +void +postgresCommitForeignTransaction(FdwXactRslvState *state) +{ + ConnCacheEntry *entry = NULL; + bool is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0; + PGresult *res; + + if (!is_onephase) + { + /* + * In two-phase commit case, the foreign transaction has prepared and + * closed, so we might not have a connection to it. We get a connection + * but don't start transaction. + */ + entry = GetConnectionState(state->usermapping->umid, false, false); + + /* COMMIT PREPARED the transaction */ + pgfdw_end_prepared_xact(entry, state->fdwxact_id, true); + return; + } + + /* + * In simple commit case, we must have a connection to the foreign server + * because the foreign transaction is not closed yet. We get the connection + * entry from the cache. + */ + entry = GetConnectionCacheEntry(state->usermapping->umid); + Assert(entry); + + if (!entry->conn || !entry->xact_got_connection) + return; + + /* + * If abort cleanup previously failed for this connection, we can't issue + * any more commands against it. + */ + pgfdw_reject_incomplete_xact_state_change(entry); + + entry->changing_xact_state = true; + res = pgfdw_exec_query(entry->conn, "COMMIT TRANSACTION"); + entry->changing_xact_state = false; + + if (PQresultStatus(res) != PGRES_COMMAND_OK) + ereport(ERROR, (errmsg("could not commit transaction on server %s", + state->server->servername))); + + /* + * If there were any errors in subtransactions, and we ma + * made prepared statements, do a DEALLOCATE ALL to make + * sure we get rid of all prepared statements. This is + * annoying and not terribly bulletproof, but it's + * probably not worth trying harder. + * + * DEALLOCATE ALL only exists in 8.3 and later, so this + * constrains how old a server postgres_fdw can + * communicate with. We intentionally ignore errors in + * the DEALLOCATE, so that we can hobble along to some + * extent with older servers (leaking prepared statements + * as we go; but we don't really support update operations + * pre-8.3 anyway). + */ + if (entry->have_prep_stmt && entry->have_error) + { + res = PQexec(entry->conn, "DEALLOCATE ALL"); + PQclear(res); + } + + /* Cleanup transaction status */ + pgfdw_cleanup_after_transaction(entry); +} + +/* + * Rollback a transaction on foreign server. As with commit case, if state->flags + * contains FDWAXCT_FLAG_ONEPHASE this function can rollback the foreign + * transaction without preparation, other wise rollback the prepared transaction. + * This function must tolerate to being called recusively as an error can happen + * during aborting. + */ +void +postgresRollbackForeignTransaction(FdwXactRslvState *state) +{ + bool is_onephase = (state->flags & FDWXACT_FLAG_ONEPHASE) != 0; + ConnCacheEntry *entry = NULL; + bool abort_cleanup_failure = false; + + if (!is_onephase) + { + /* + * In two-phase commit case, the foreign transaction has prepared and + * closed, so we might not have a connection to it. We get a connection + * but don't start transaction. + */ + entry = GetConnectionState(state->usermapping->umid, false, false); + + /* ROLLBACK PREPARED the transaction */ + pgfdw_end_prepared_xact(entry, state->fdwxact_id, false); + return; + } + + /* + * In simple rollback case, we must have a connection to the foreign server + * because the foreign transaction is not closed yet. We get the connection + * entry from the cache. + */ + entry = GetConnectionCacheEntry(state->usermapping->umid); + Assert(entry); + + /* + * Cleanup connection entry transaction if transaction fails before + * establishing a connection or starting transaction. + */ + if (!entry->conn || !entry->xact_got_connection) + { + pgfdw_cleanup_after_transaction(entry); + return; + } + + /* + * Don't try to clean up the connection if we're already + * in error recursion trouble. + */ + if (in_error_recursion_trouble()) + entry->changing_xact_state = true; + + /* + * If connection is before starting transaction or is already unsalvageable, + * do only the cleanup and don't touch it further. + */ + if (entry->changing_xact_state || !entry->xact_got_connection) + { + pgfdw_cleanup_after_transaction(entry); + return; + } + + /* + * Mark this connection as in the process of changing + * transaction state. + */ + entry->changing_xact_state = true; + + /* Assume we might have lost track of prepared statements */ + entry->have_error = true; + + /* + * If a command has been submitted to the remote server by + * using an asynchronous execution function, the command + * might not have yet completed. Check to see if a + * command is still being processed by the remote server, + * and if so, request cancellation of the command. + */ + if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE && + !pgfdw_cancel_query(entry->conn)) + { + /* Unable to cancel running query. */ + abort_cleanup_failure = true; + } + else if (!pgfdw_exec_cleanup_query(entry->conn, + "ABORT TRANSACTION", + false)) + { + /* Unable to abort remote transaction. */ + abort_cleanup_failure = true; + } + else if (entry->have_prep_stmt && entry->have_error && + !pgfdw_exec_cleanup_query(entry->conn, + "DEALLOCATE ALL", + true)) + { + /* Trouble clearing prepared statements. */ + abort_cleanup_failure = true; + } + + /* Disarm changing_xact_state if it all worked. */ + entry->changing_xact_state = abort_cleanup_failure; + + /* Cleanup transaction status */ + pgfdw_cleanup_after_transaction(entry); + + return; +} + +/* + * Commit or rollback prepared transaction on the foreign server. + */ +static void +pgfdw_end_prepared_xact(ConnCacheEntry *entry, char *fdwxact_id, bool is_commit) +{ + StringInfo command; + PGresult *res; + + command = makeStringInfo(); + appendStringInfo(command, "%s PREPARED '%s'", + is_commit ? "COMMIT" : "ROLLBACK", + fdwxact_id); + + res = pgfdw_exec_query(entry->conn, command->data); + + if (PQresultStatus(res) != PGRES_COMMAND_OK) + { + int sqlstate; + char *diag_sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE); + + if (diag_sqlstate) + { + sqlstate = MAKE_SQLSTATE(diag_sqlstate[0], + diag_sqlstate[1], + diag_sqlstate[2], + diag_sqlstate[3], + diag_sqlstate[4]); + } + else + sqlstate = ERRCODE_CONNECTION_FAILURE; + + /* + * As core global transaction manager states, it's possible that the + * given foreign transaction doesn't exist on the foreign server. So + * we should accept an UNDEFINED_OBJECT error. + */ + if (sqlstate != ERRCODE_UNDEFINED_OBJECT) + pgfdw_report_error(ERROR, res, entry->conn, false, command->data); + } + + elog(DEBUG1, "%s prepared foreign transaction with ID %s", + is_commit ? "commit" : "rollback", + fdwxact_id); + + /* Cleanup transaction status */ + pgfdw_cleanup_after_transaction(entry); +} + +/* Cleanup at main-transaction end */ +static void +pgfdw_cleanup_after_transaction(ConnCacheEntry *entry) +{ + /* Reset state to show we're out of a transaction */ + entry->xact_depth = 0; + entry->have_prep_stmt = false; + entry->have_error = false; + entry->xact_got_connection = false; + + /* + * If the connection isn't in a good idle state, discard it to + * recover. Next GetConnection will open a new connection. + */ + if (PQstatus(entry->conn) != CONNECTION_OK || + PQtransactionStatus(entry->conn) != PQTRANS_IDLE || + entry->changing_xact_state) + { + elog(DEBUG3, "discarding connection %p", entry->conn); + disconnect_pg_server(entry); + } + + entry->changing_xact_state = false; + + /* Also reset cursor numbering for next transaction */ + cursor_number = 0; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 48282ab151..0ee91a49ac 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -13,12 +13,17 @@ DO $d$ OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$' )$$; + EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw + OPTIONS (dbname '$$||current_database()||$$', + port '$$||current_setting('port')||$$' + )$$; END; $d$; CREATE USER MAPPING FOR public SERVER testserver1 OPTIONS (user 'value', password 'value'); CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2; +CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3; -- =================================================================== -- create objects used through FDW loopback server -- =================================================================== @@ -52,6 +57,13 @@ CREATE TABLE "S 1"."T 4" ( c3 text, CONSTRAINT t4_pkey PRIMARY KEY (c1) ); +CREATE TABLE "S 1"."T 5" ( + c1 int NOT NULL +); +CREATE TABLE "S 1"."T 6" ( + c1 int NOT NULL, + CONSTRAINT t6_pkey PRIMARY KEY (c1) +); -- Disable autovacuum for these tables to avoid unexpected effects of that ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false'); ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false'); @@ -87,6 +99,7 @@ ANALYZE "S 1"."T 1"; ANALYZE "S 1"."T 2"; ANALYZE "S 1"."T 3"; ANALYZE "S 1"."T 4"; +ANALYZE "S 1"."T 5"; -- =================================================================== -- create foreign tables -- =================================================================== @@ -129,6 +142,12 @@ CREATE FOREIGN TABLE ft6 ( c2 int NOT NULL, c3 text ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4'); +CREATE FOREIGN TABLE ft7_2pc ( + c1 int NOT NULL +) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5'); +CREATE FOREIGN TABLE ft8_2pc ( + c1 int NOT NULL +) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5'); -- =================================================================== -- tests for validator -- =================================================================== @@ -179,15 +198,17 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1'); ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1'); ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1'); \det+ - List of foreign tables - Schema | Table | Server | FDW options | Description ---------+-------+-----------+---------------------------------------+------------- - public | ft1 | loopback | (schema_name 'S 1', table_name 'T 1') | - public | ft2 | loopback | (schema_name 'S 1', table_name 'T 1') | - public | ft4 | loopback | (schema_name 'S 1', table_name 'T 3') | - public | ft5 | loopback | (schema_name 'S 1', table_name 'T 4') | - public | ft6 | loopback2 | (schema_name 'S 1', table_name 'T 4') | -(5 rows) + List of foreign tables + Schema | Table | Server | FDW options | Description +--------+---------+-----------+---------------------------------------+------------- + public | ft1 | loopback | (schema_name 'S 1', table_name 'T 1') | + public | ft2 | loopback | (schema_name 'S 1', table_name 'T 1') | + public | ft4 | loopback | (schema_name 'S 1', table_name 'T 3') | + public | ft5 | loopback | (schema_name 'S 1', table_name 'T 4') | + public | ft6 | loopback2 | (schema_name 'S 1', table_name 'T 4') | + public | ft7_2pc | loopback | (schema_name 'S 1', table_name 'T 5') | + public | ft8_2pc | loopback2 | (schema_name 'S 1', table_name 'T 5') | +(7 rows) -- Test that alteration of server options causes reconnection -- Remote's errors might be non-English, so hide them to ensure stable results @@ -8781,16 +8802,226 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 -- Clean-up RESET enable_partitionwise_aggregate; --- Two-phase transactions are not supported. + +-- =================================================================== +-- test distributed atomic commit across foreign servers +-- =================================================================== +-- Enable atomic commit +SET foreign_twophase_commit TO 'required'; +-- Modify single foreign server and then commit and rollback. +BEGIN; +INSERT INTO ft7_2pc VALUES(1); +COMMIT; +SELECT * FROM ft7_2pc; + c1 +---- + 1 +(1 row) + +BEGIN; +INSERT INTO ft7_2pc VALUES(1); +ROLLBACK; +SELECT * FROM ft7_2pc; + c1 +---- + 1 +(1 row) + +-- Modify two servers then commit and rollback. This requires to use 2PC. +BEGIN; +INSERT INTO ft7_2pc VALUES(2); +INSERT INTO ft8_2pc VALUES(2); +COMMIT; +SELECT * FROM ft8_2pc; + c1 +---- + 1 + 2 + 2 +(3 rows) + +BEGIN; +INSERT INTO ft7_2pc VALUES(2); +INSERT INTO ft8_2pc VALUES(2); +ROLLBACK; +SELECT * FROM ft8_2pc; + c1 +---- + 1 + 2 + 2 +(3 rows) + +-- Modify both local data and 2PC-capable server then commit and rollback. +-- This also requires to use 2PC. +BEGIN; +INSERT INTO ft7_2pc VALUES(3); +INSERT INTO "S 1"."T 6" VALUES (3); +COMMIT; +SELECT * FROM ft7_2pc; + c1 +---- + 1 + 2 + 2 + 3 +(4 rows) + +SELECT * FROM "S 1"."T 6"; + c1 +---- + 3 +(1 row) + BEGIN; -SELECT count(*) FROM ft1; +INSERT INTO ft7_2pc VALUES(3); +INSERT INTO "S 1"."T 6" VALUES (3); +ERROR: duplicate key value violates unique constraint "t6_pkey" +DETAIL: Key (c1)=(3) already exists. +ROLLBACK; +SELECT * FROM ft7_2pc; + c1 +---- + 1 + 2 + 2 + 3 +(4 rows) + +SELECT * FROM "S 1"."T 6"; + c1 +---- + 3 +(1 row) + +-- Modify foreign server and raise an error. No data changed. +BEGIN; +INSERT INTO ft7_2pc VALUES(4); +INSERT INTO ft8_2pc VALUES(NULL); -- violation +ERROR: null value in column "c1" violates not-null constraint +DETAIL: Failing row contains (null). +CONTEXT: remote SQL command: INSERT INTO "S 1"."T 5"(c1) VALUES ($1) +ROLLBACK; +SELECT * FROM ft8_2pc; + c1 +---- + 1 + 2 + 2 + 3 +(4 rows) + +BEGIN; +INSERT INTO ft7_2pc VALUES (5); +INSERT INTO ft8_2pc VALUES (5); +SAVEPOINT S1; +INSERT INTO ft7_2pc VALUES (6); +INSERT INTO ft8_2pc VALUES (6); +ROLLBACK TO S1; +COMMIT; +SELECT * FROM ft7_2pc; + c1 +---- + 1 + 2 + 2 + 3 + 5 + 5 +(6 rows) + +SELECT * FROM ft8_2pc; + c1 +---- + 1 + 2 + 2 + 3 + 5 + 5 +(6 rows) + +RELEASE SAVEPOINT S1; +ERROR: RELEASE SAVEPOINT can only be used in transaction blocks +-- When set to 'disabled', we can commit it +SET foreign_twophase_commit TO 'disabled'; +BEGIN; +INSERT INTO ft7_2pc VALUES(8); +INSERT INTO ft8_2pc VALUES(8); +COMMIT; -- success +SELECT * FROM ft7_2pc; + c1 +---- + 1 + 2 + 2 + 3 + 5 + 5 + 8 + 8 +(8 rows) + +SELECT * FROM ft8_2pc; + c1 +---- + 1 + 2 + 2 + 3 + 5 + 5 + 8 + 8 +(8 rows) + +SET foreign_twophase_commit TO 'required'; +-- Commit and rollback foreign transactions that are part of +-- prepare transaction. +BEGIN; +INSERT INTO ft7_2pc VALUES(9); +INSERT INTO ft8_2pc VALUES(9); +PREPARE TRANSACTION 'gx1'; +COMMIT PREPARED 'gx1'; +SELECT * FROM ft8_2pc; + c1 +---- + 1 + 2 + 2 + 3 + 5 + 5 + 8 + 8 + 9 + 9 +(10 rows) + +BEGIN; +INSERT INTO ft7_2pc VALUES(9); +INSERT INTO ft8_2pc VALUES(9); +PREPARE TRANSACTION 'gx1'; +ROLLBACK PREPARED 'gx1'; +SELECT * FROM ft8_2pc; + c1 +---- + 1 + 2 + 2 + 3 + 5 + 5 + 8 + 8 + 9 + 9 +(10 rows) + +-- No entry remained +SELECT count(*) FROM pg_foreign_xacts; count ------- - 822 + 0 (1 row) --- error here -PREPARE TRANSACTION 'fdw_tpc'; -ERROR: cannot PREPARE a transaction that has operated on postgres_fdw foreign tables -ROLLBACK; -WARNING: there is no transaction in progress diff --git a/contrib/postgres_fdw/fdwxact.conf b/contrib/postgres_fdw/fdwxact.conf new file mode 100644 index 0000000000..3fdbf93cdb --- /dev/null +++ b/contrib/postgres_fdw/fdwxact.conf @@ -0,0 +1,3 @@ +max_prepared_transactions = 3 +max_prepared_foreign_transactions = 3 +max_foreign_transaction_resolvers = 2 diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index bdc21b36d1..9c63f0aa3b 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -14,6 +14,7 @@ #include <limits.h> +#include "access/fdwxact.h" #include "access/htup_details.h" #include "access/sysattr.h" #include "access/table.h" @@ -504,7 +505,6 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo, const PgFdwRelationInfo *fpinfo_o, const PgFdwRelationInfo *fpinfo_i); - /* * Foreign-data wrapper handler function: return a struct with pointers * to my callback routines. @@ -558,6 +558,11 @@ postgres_fdw_handler(PG_FUNCTION_ARGS) /* Support functions for upper relation push-down */ routine->GetForeignUpperPaths = postgresGetForeignUpperPaths; + /* Support functions for foreign transactions */ + routine->PrepareForeignTransaction = postgresPrepareForeignTransaction; + routine->CommitForeignTransaction = postgresCommitForeignTransaction; + routine->RollbackForeignTransaction = postgresRollbackForeignTransaction; + PG_RETURN_POINTER(routine); } @@ -1434,7 +1439,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags) * Get connection to the foreign server. Connection manager will * establish new connection if necessary. */ - fsstate->conn = GetConnection(user, false); + fsstate->conn = GetConnection(user->umid, false, true); /* Assign a unique ID for my cursor */ fsstate->cursor_number = GetCursorNumber(fsstate->conn); @@ -2372,7 +2377,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags) * Get connection to the foreign server. Connection manager will * establish new connection if necessary. */ - dmstate->conn = GetConnection(user, false); + dmstate->conn = GetConnection(user->umid, false, true); /* Update the foreign-join-related fields. */ if (fsplan->scan.scanrelid == 0) @@ -2746,7 +2751,7 @@ estimate_path_cost_size(PlannerInfo *root, false, &retrieved_attrs, NULL); /* Get the remote estimate */ - conn = GetConnection(fpinfo->user, false); + conn = GetConnection(fpinfo->user->umid, false, true); get_remote_estimate(sql.data, conn, &rows, &width, &startup_cost, &total_cost); ReleaseConnection(conn); @@ -3566,7 +3571,7 @@ create_foreign_modify(EState *estate, user = GetUserMapping(userid, table->serverid); /* Open connection; report that we'll create a prepared statement. */ - fmstate->conn = GetConnection(user, true); + fmstate->conn = GetConnection(user->umid, true, true); fmstate->p_name = NULL; /* prepared statement not made yet */ /* Set up remote query information. */ @@ -4441,7 +4446,7 @@ postgresAnalyzeForeignTable(Relation relation, */ table = GetForeignTable(RelationGetRelid(relation)); user = GetUserMapping(relation->rd_rel->relowner, table->serverid); - conn = GetConnection(user, false); + conn = GetConnection(user->umid, false, true); /* * Construct command to get page count for relation. @@ -4527,7 +4532,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel, table = GetForeignTable(RelationGetRelid(relation)); server = GetForeignServer(table->serverid); user = GetUserMapping(relation->rd_rel->relowner, table->serverid); - conn = GetConnection(user, false); + conn = GetConnection(user->umid, false, true); /* * Construct cursor that retrieves whole rows from remote. @@ -4755,7 +4760,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid) */ server = GetForeignServer(serverOid); mapping = GetUserMapping(GetUserId(), server->serverid); - conn = GetConnection(mapping, false); + conn = GetConnection(mapping->umid, false, true); /* Don't attempt to import collation if remote server hasn't got it */ if (PQserverVersion(conn) < 90100) diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h index ea052872c3..d7ba45c8d2 100644 --- a/contrib/postgres_fdw/postgres_fdw.h +++ b/contrib/postgres_fdw/postgres_fdw.h @@ -13,6 +13,7 @@ #ifndef POSTGRES_FDW_H #define POSTGRES_FDW_H +#include "access/fdwxact.h" #include "foreign/foreign.h" #include "lib/stringinfo.h" #include "libpq-fe.h" @@ -129,7 +130,7 @@ extern int set_transmission_modes(void); extern void reset_transmission_modes(int nestlevel); /* in connection.c */ -extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt); +extern PGconn *GetConnection(Oid umid, bool will_prep_stmt, bool start_transaction); extern void ReleaseConnection(PGconn *conn); extern unsigned int GetCursorNumber(PGconn *conn); extern unsigned int GetPrepStmtNumber(PGconn *conn); @@ -137,6 +138,9 @@ extern PGresult *pgfdw_get_result(PGconn *conn, const char *query); extern PGresult *pgfdw_exec_query(PGconn *conn, const char *query); extern void pgfdw_report_error(int elevel, PGresult *res, PGconn *conn, bool clear, const char *sql); +extern void postgresPrepareForeignTransaction(FdwXactRslvState *state); +extern void postgresCommitForeignTransaction(FdwXactRslvState *state); +extern void postgresRollbackForeignTransaction(FdwXactRslvState *state); /* in option.c */ extern int ExtractConnectionOptions(List *defelems, @@ -203,6 +207,7 @@ extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, bool is_subquery, List **retrieved_attrs, List **params_list); extern const char *get_jointype_name(JoinType jointype); +extern bool server_uses_twophase_commit(ForeignServer *server); /* in shippable.c */ extern bool is_builtin(Oid objectId); diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 1c5c37b783..572077c57c 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -15,6 +15,10 @@ DO $d$ OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$' )$$; + EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw + OPTIONS (dbname '$$||current_database()||$$', + port '$$||current_setting('port')||$$' + )$$; END; $d$; @@ -22,6 +26,7 @@ CREATE USER MAPPING FOR public SERVER testserver1 OPTIONS (user 'value', password 'value'); CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2; +CREATE USER MAPPING FOR CURRENT_USER SERVER loopback3; -- =================================================================== -- create objects used through FDW loopback server @@ -56,6 +61,14 @@ CREATE TABLE "S 1"."T 4" ( c3 text, CONSTRAINT t4_pkey PRIMARY KEY (c1) ); +CREATE TABLE "S 1"."T 5" ( + c1 int NOT NULL +); + +CREATE TABLE "S 1"."T 6" ( + c1 int NOT NULL, + CONSTRAINT t6_pkey PRIMARY KEY (c1) +); -- Disable autovacuum for these tables to avoid unexpected effects of that ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false'); @@ -94,6 +107,7 @@ ANALYZE "S 1"."T 1"; ANALYZE "S 1"."T 2"; ANALYZE "S 1"."T 3"; ANALYZE "S 1"."T 4"; +ANALYZE "S 1"."T 5"; -- =================================================================== -- create foreign tables @@ -142,6 +156,15 @@ CREATE FOREIGN TABLE ft6 ( c3 text ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4'); +CREATE FOREIGN TABLE ft7_2pc ( + c1 int NOT NULL +) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5'); + +CREATE FOREIGN TABLE ft8_2pc ( + c1 int NOT NULL +) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 5'); + + -- =================================================================== -- tests for validator -- =================================================================== @@ -2480,9 +2503,98 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 -- Clean-up RESET enable_partitionwise_aggregate; --- Two-phase transactions are not supported. +-- =================================================================== +-- test distributed atomic commit across foreign servers +-- =================================================================== + +-- Enable atomic commit +SET foreign_twophase_commit TO 'required'; + +-- Modify single foreign server and then commit and rollback. +BEGIN; +INSERT INTO ft7_2pc VALUES(1); +COMMIT; +SELECT * FROM ft7_2pc; + BEGIN; -SELECT count(*) FROM ft1; --- error here -PREPARE TRANSACTION 'fdw_tpc'; +INSERT INTO ft7_2pc VALUES(1); ROLLBACK; +SELECT * FROM ft7_2pc; + +-- Modify two servers then commit and rollback. This requires to use 2PC. +BEGIN; +INSERT INTO ft7_2pc VALUES(2); +INSERT INTO ft8_2pc VALUES(2); +COMMIT; +SELECT * FROM ft8_2pc; + +BEGIN; +INSERT INTO ft7_2pc VALUES(2); +INSERT INTO ft8_2pc VALUES(2); +ROLLBACK; +SELECT * FROM ft8_2pc; + +-- Modify both local data and 2PC-capable server then commit and rollback. +-- This also requires to use 2PC. +BEGIN; +INSERT INTO ft7_2pc VALUES(3); +INSERT INTO "S 1"."T 6" VALUES (3); +COMMIT; +SELECT * FROM ft7_2pc; +SELECT * FROM "S 1"."T 6"; + +BEGIN; +INSERT INTO ft7_2pc VALUES(3); +INSERT INTO "S 1"."T 6" VALUES (3); +ROLLBACK; +SELECT * FROM ft7_2pc; +SELECT * FROM "S 1"."T 6"; + +-- Modify foreign server and raise an error. No data changed. +BEGIN; +INSERT INTO ft7_2pc VALUES(4); +INSERT INTO ft8_2pc VALUES(NULL); -- violation +ROLLBACK; +SELECT * FROM ft8_2pc; + +BEGIN; +INSERT INTO ft7_2pc VALUES (5); +INSERT INTO ft8_2pc VALUES (5); +SAVEPOINT S1; +INSERT INTO ft7_2pc VALUES (6); +INSERT INTO ft8_2pc VALUES (6); +ROLLBACK TO S1; +COMMIT; +SELECT * FROM ft7_2pc; +SELECT * FROM ft8_2pc; +RELEASE SAVEPOINT S1; + +-- When set to 'disabled', we can commit it +SET foreign_twophase_commit TO 'disabled'; +BEGIN; +INSERT INTO ft7_2pc VALUES(8); +INSERT INTO ft8_2pc VALUES(8); +COMMIT; -- success +SELECT * FROM ft7_2pc; +SELECT * FROM ft8_2pc; + +SET foreign_twophase_commit TO 'required'; + +-- Commit and rollback foreign transactions that are part of +-- prepare transaction. +BEGIN; +INSERT INTO ft7_2pc VALUES(9); +INSERT INTO ft8_2pc VALUES(9); +PREPARE TRANSACTION 'gx1'; +COMMIT PREPARED 'gx1'; +SELECT * FROM ft8_2pc; + +BEGIN; +INSERT INTO ft7_2pc VALUES(9); +INSERT INTO ft8_2pc VALUES(9); +PREPARE TRANSACTION 'gx1'; +ROLLBACK PREPARED 'gx1'; +SELECT * FROM ft8_2pc; + +-- No entry remained +SELECT count(*) FROM pg_foreign_xacts; diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 1d4bafd9f0..362f7be9e3 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -441,6 +441,43 @@ </para> </sect3> + + <sect3> + <title>Transaction Management Options</title> + + <para> + By default, if the transaction involves with multiple remote server, + each transaction on remote server is committed or aborted independently. + Some of transactions may fail to commit on remote server while other + transactions commit successfully. This may be overridden using + following option: + </para> + + <variablelist> + + <varlistentry> + <term><literal>two_phase_commit</literal></term> + <listitem> + <para> + This option controls whether <filename>postgres_fdw</filename> allows + to use two-phase-commit when transaction commits. This option can + only be specified for foreign servers, not per-table. + The default is <literal>false</literal>. + </para> + + <para> + If this option is enabled, <filename>postgres_fdw</filename> prepares + transaction on remote server and <productname>PostgreSQL</productname> + keeps track of the distributed transaction. + <xref linkend="guc-max-prepared-foreign-transactions"/> must be set more + than 1 on local server and <xref linkend="guc-max-prepared-transactions"/> + must set to more than 1 on remote server. + </para> + </listitem> + </varlistentry> + + </variablelist> + </sect3> </sect2> <sect2> @@ -468,6 +505,14 @@ managed by creating corresponding remote savepoints. </para> + <para> + <filename>postgrs_fdw</filename> uses two-phase commit protocol during + transaction commits or aborts when the atomic commit of distributed + transaction (see <xref linkend="atomic-commit"/>) is required. So the remote + server should set <xref linkend="guc-max-prepared-transactions"/> more + than one so that it can prepare the remote transaction. + </para> + <para> The remote transaction uses <literal>SERIALIZABLE</literal> isolation level when the local transaction has <literal>SERIALIZABLE</literal> -- 2.23.0 From 639d9156323594430ec4b2217a95bfcf08195e9d Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 5 Dec 2019 17:01:26 +0900 Subject: [PATCH v26 5/5] Add regression tests for atomic commit. Original Author: Masahiko Sawada <sawada.mshk@gmail.com> --- src/test/recovery/Makefile | 2 +- src/test/recovery/t/016_fdwxact.pl | 175 +++++++++++++++++++++++++++++ src/test/regress/pg_regress.c | 13 ++- 3 files changed, 185 insertions(+), 5 deletions(-) create mode 100644 src/test/recovery/t/016_fdwxact.pl diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile index e66e69521f..b17429f501 100644 --- a/src/test/recovery/Makefile +++ b/src/test/recovery/Makefile @@ -9,7 +9,7 @@ # #------------------------------------------------------------------------- -EXTRA_INSTALL=contrib/test_decoding +EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect contrib/postgres_fdw subdir = src/test/recovery top_builddir = ../../.. diff --git a/src/test/recovery/t/016_fdwxact.pl b/src/test/recovery/t/016_fdwxact.pl new file mode 100644 index 0000000000..9af9bb81dc --- /dev/null +++ b/src/test/recovery/t/016_fdwxact.pl @@ -0,0 +1,175 @@ +# Tests for transaction involving foreign servers +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 7; + +# Setup master node +my $node_master = get_new_node("master"); +my $node_standby = get_new_node("standby"); + +$node_master->init(allows_streaming => 1); +$node_master->append_conf('postgresql.conf', qq( +max_prepared_transactions = 10 +max_prepared_foreign_transactions = 10 +max_foreign_transaction_resolvers = 2 +foreign_transaction_resolver_timeout = 0 +foreign_transaction_resolution_retry_interval = 5s +foreign_twophase_commit = on +)); +$node_master->start; + +# Take backup from master node +my $backup_name = 'master_backup'; +$node_master->backup($backup_name); + +# Set up standby node +$node_standby->init_from_backup($node_master, $backup_name, + has_streaming => 1); +$node_standby->start; + +# Set up foreign nodes +my $node_fs1 = get_new_node("fs1"); +my $node_fs2 = get_new_node("fs2"); +my $fs1_port = $node_fs1->port; +my $fs2_port = $node_fs2->port; +$node_fs1->init; +$node_fs2->init; +$node_fs1->append_conf('postgresql.conf', qq(max_prepared_transactions = 10)); +$node_fs2->append_conf('postgresql.conf', qq(max_prepared_transactions = 10)); +$node_fs1->start; +$node_fs2->start; + +# Create foreign servers on the master node +$node_master->safe_psql('postgres', qq( +CREATE EXTENSION postgres_fdw +)); +$node_master->safe_psql('postgres', qq( +CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw +OPTIONS (dbname 'postgres', port '$fs1_port'); +)); +$node_master->safe_psql('postgres', qq( +CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw +OPTIONS (dbname 'postgres', port '$fs2_port'); +)); + +# Create user mapping on the master node +$node_master->safe_psql('postgres', qq( +CREATE USER MAPPING FOR CURRENT_USER SERVER fs1; +CREATE USER MAPPING FOR CURRENT_USER SERVER fs2; +)); + +# Create tables on foreign nodes and import them to the master node +$node_fs1->safe_psql('postgres', qq( +CREATE SCHEMA fs; +CREATE TABLE fs.t1 (c int); +)); +$node_fs2->safe_psql('postgres', qq( +CREATE SCHEMA fs; +CREATE TABLE fs.t2 (c int); +)); +$node_master->safe_psql('postgres', qq( +IMPORT FOREIGN SCHEMA fs FROM SERVER fs1 INTO public; +IMPORT FOREIGN SCHEMA fs FROM SERVER fs2 INTO public; +CREATE TABLE l_table (c int); +)); + +# Switch to synchronous replication +$node_master->safe_psql('postgres', qq( +ALTER SYSTEM SET synchronous_standby_names ='*'; +)); +$node_master->reload; + +my $result; + +# Prepare two transactions involving multiple foreign servers and shutdown +# the master node. Check if we can commit and rollback the foreign transactions +# after the normal recovery. +$node_master->safe_psql('postgres', qq( +BEGIN; +INSERT INTO t1 VALUES (1); +INSERT INTO t2 VALUES (1); +PREPARE TRANSACTION 'gxid1'; +BEGIN; +INSERT INTO t1 VALUES (2); +INSERT INTO t2 VALUES (2); +PREPARE TRANSACTION 'gxid2'; +)); + +$node_master->stop; +$node_master->start; + +# Commit and rollback foreign transactions after the recovery. +$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1')); +is($result, 0, 'Commit foreign transactions after recovery'); +$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2')); +is($result, 0, 'Rollback foreign transactions after recovery'); + +# +# Prepare two transactions involving multiple foreign servers and shutdown +# the master node immediately. Check if we can commit and rollback the foreign +# transactions after the crash recovery. +# +$node_master->safe_psql('postgres', qq( +BEGIN; +INSERT INTO t1 VALUES (3); +INSERT INTO t2 VALUES (3); +PREPARE TRANSACTION 'gxid1'; +BEGIN; +INSERT INTO t1 VALUES (4); +INSERT INTO t2 VALUES (4); +PREPARE TRANSACTION 'gxid2'; +)); + +$node_master->teardown_node; +$node_master->start; + +# Commit and rollback foreign transactions after the crash recovery. +$result = $node_master->psql('postgres', qq(COMMIT PREPARED 'gxid1')); +is($result, 0, 'Commit foreign transactions after crash recovery'); +$result = $node_master->psql('postgres', qq(ROLLBACK PREPARED 'gxid2')); +is($result, 0, 'Rollback foreign transactions after crash recovery'); + +# +# Commit transaction involving foreign servers and shutdown the master node +# immediately before checkpoint. Check that WAL replay cleans up +# its shared memory state release locks while replaying transaction commit. +# +$node_master->safe_psql('postgres', qq( +BEGIN; +INSERT INTO t1 VALUES (5); +INSERT INTO t2 VALUES (5); +COMMIT; +)); + +$node_master->teardown_node; +$node_master->start; + +$result = $node_master->safe_psql('postgres', qq( +SELECT count(*) FROM pg_foreign_xacts; +)); +is($result, 0, "Cleanup of shared memory state for foreign transactions"); + +# +# Check if the standby node can process prepared foreign transaction +# after promotion. +# +$node_master->safe_psql('postgres', qq( +BEGIN; +INSERT INTO t1 VALUES (6); +INSERT INTO t2 VALUES (6); +PREPARE TRANSACTION 'gxid1'; +BEGIN; +INSERT INTO t1 VALUES (7); +INSERT INTO t2 VALUES (7); +PREPARE TRANSACTION 'gxid2'; +)); + +$node_master->teardown_node; +$node_standby->promote; + +$result = $node_standby->psql('postgres', qq(COMMIT PREPARED 'gxid1';)); +is($result, 0, 'Commit foreign transaction after promotion'); +$result = $node_standby->psql('postgres', qq(ROLLBACK PREPARED 'gxid2';)); +is($result, 0, 'Rollback foreign transaction after promotion'); diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 297b8fbd6f..82a1e7d541 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -2336,9 +2336,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc * Adjust the default postgresql.conf for regression testing. The user * can specify a file to be appended; in any case we expand logging * and set max_prepared_transactions to enable testing of prepared - * xacts. (Note: to reduce the probability of unexpected shmmax - * failures, don't set max_prepared_transactions any higher than - * actually needed by the prepared_xacts regression test.) + * xacts. We also set max_prepared_foreign_transactions and + * max_foreign_transaction_resolvers to enable testing of transaction + * involving multiple foreign servers. (Note: to reduce the probability + * of unexpected shmmax failures, don't set max_prepared_transactions + * any higher than actually needed by the prepared_xacts regression + * test.) */ snprintf(buf, sizeof(buf), "%s/data/postgresql.conf", temp_instance); pg_conf = fopen(buf, "a"); @@ -2353,7 +2356,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc fputs("log_line_prefix = '%m [%p] %q%a '\n", pg_conf); fputs("log_lock_waits = on\n", pg_conf); fputs("log_temp_files = 128kB\n", pg_conf); - fputs("max_prepared_transactions = 2\n", pg_conf); + fputs("max_prepared_transactions = 3\n", pg_conf); + fputs("max_prepared_foreign_transactions = 2\n", pg_conf); + fputs("max_foreign_transaction_resolvers = 2\n", pg_conf); for (sl = temp_configs; sl != NULL; sl = sl->next) { -- 2.23.0
On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Hello. > > This is the reased (and a bit fixed) version of the patch. This > applies on the master HEAD and passes all provided tests. > > I took over this work from Sawada-san. I'll begin with reviewing the > current patch. > The previous patch set is no longer applied cleanly to the current HEAD. I've updated and slightly modified the codes. This patch set has been marked as Waiting on Author for a long time but the correct status now is Needs Review. The patch actually was updated and incorporated all review comments but they was not rebased actively. The mail[1] I posted before would be helpful to understand the current patch design and there are README in the patch and a wiki page[2]. I've marked this as Needs Review. Regards, [1] https://www.postgresql.org/message-id/CAD21AoDn98axH1bEoMnte%2BS7WWR%3DnsmOpjz1WGH-NvJi4aLu3Q%40mail.gmail.com [2] https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> Hello.
>
> This is the reased (and a bit fixed) version of the patch. This
> applies on the master HEAD and passes all provided tests.
>
> I took over this work from Sawada-san. I'll begin with reviewing the
> current patch.
>
The previous patch set is no longer applied cleanly to the current
HEAD. I've updated and slightly modified the codes.
This patch set has been marked as Waiting on Author for a long time
but the correct status now is Needs Review. The patch actually was
updated and incorporated all review comments but they was not rebased
actively.
The mail[1] I posted before would be helpful to understand the current
patch design and there are README in the patch and a wiki page[2].
I've marked this as Needs Review.
I just had a quick look to 0001 and 0002 patch here is the few suggestions.
patch: v27-0001:
Typo: s/non-temprary/non-temporary
----
patch: v27-0002: (Note:The left-hand number is the line number in the v27-0002 patch):
138 +PostgreSQL's the global transaction manager (GTM), as a distributed transaction
139 +participant The registered foreign transactions are tracked until the end of
Full stop "." is missing after "participant"
174 +API Contract With Transaction Management Callback Functions
Can we just say "Transaction Management Callback Functions";
203 +processing foreign transaction (i.g. preparing, committing or aborting) the
Do you mean "i.e" instead of i.g. ?
269 + * RollbackForeignTransactionAPI. Registered participant servers are identified
Add space before between RollbackForeignTransaction and API.
292 + * automatically so must be processed manually using by pg_resovle_fdwxact()
Do you mean pg_resolve_foreign_xact() here?
320 + * the foreign transaction is authorized to update the fields from its own
321 + * one.
322 +
323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
Please add asterisk '*' on line#322.
816 +static void
817 +FdwXactPrepareForeignTransactions(void)
818 +{
819 + ListCell *lcell;
Let's have this variable name as "lc" like elsewhere.
1036 + ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
1037 + errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
1038 + xid, serverid, userid)));
1039 + }
Incorrect formatting.
1166 +/*
1167 + * Return true and set FdwXactAtomicCommitReady to true if the current transaction
Do you mean ForeignTwophaseCommitIsRequired instead of FdwXactAtomicCommitReady?
3529 +
3530 +/*
3531 + * FdwXactLauncherRegister
3532 + * Register a background worker running the foreign transaction
3533 + * launcher.
3534 + */
This prolog style is not consistent with the other function in the file.
And here are the few typos:
s/conssitent/consistent
s/consisnts/consist
s/Foriegn/Foreign
s/tranascation/transaction
s/itselft/itself
s/rolbacked/rollbacked
s/trasaction/transaction
s/transactio/transaction
s/automically/automatically
s/CommitForeignTransaciton/CommitForeignTransaction
s/Similary/Similarly
s/FDWACT_/FDWXACT_
s/dink/disk
s/requried/required
s/trasactions/transactions
s/prepread/prepared
s/preapred/prepared
s/beging/being
s/gxact/xact
s/in-dbout/in-doubt
s/respecitively/respectively
s/transction/transaction
s/idenetifier/identifier
s/identifer/identifier
s/checkpoint'S/checkpoint's
s/fo/of
s/transcation/transaction
s/trasanction/transaction
s/non-temprary/non-temporary
s/resovler_internal.h/resolver_internal.h
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
Hi Sawada San, I have a couple of comments on "v27-0002-Support-atomic-commit-among-multiple-foreign-ser.patch" 1- As part of the XLogReadRecord refactoring commit the signature of XLogReadRecord was changed, so the function call to XLogReadRecord() needs a small adjustment. i.e. In function XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len) ... - record = XLogReadRecord(xlogreader, lsn, &errormsg); + XLogBeginRead(xlogreader, lsn) + record = XLogReadRecord(xlogreader, &errormsg); 2- In register_fdwxact(..) function you are setting the XACT_FLAGS_FDWNOPREPARE transaction flag when the register request comes in for foreign server that does not support two-phase commit regardless of the value of 'bool modified' argument. And later in the PreCommit_FdwXacts() you just error out when "foreign_twophase_commit" is set to 'required' only by looking at the XACT_FLAGS_FDWNOPREPARE flag. which I think is not correct. Since there is a possibility that the transaction might have only read from the foreign servers (not capable of handling transactions or two-phase commit) and all other servers where we require to do atomic commit are capable enough of doing so. If I am not missing something obvious here, then IMHO the XACT_FLAGS_FDWNOPREPARE flag should only be set when the transaction management/two-phase functionality is not available and "modified" argument is true in register_fdwxact() Thanks Best regards Muhammad Usama Highgo Software (Canada/China/Pakistan) The new status of this patch is: Waiting on Author
On Tue, 11 Feb 2020 at 12:42, amul sul <sulamul@gmail.com> wrote: > > On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: >> > >> > Hello. >> > >> > This is the reased (and a bit fixed) version of the patch. This >> > applies on the master HEAD and passes all provided tests. >> > >> > I took over this work from Sawada-san. I'll begin with reviewing the >> > current patch. >> > >> >> The previous patch set is no longer applied cleanly to the current >> HEAD. I've updated and slightly modified the codes. >> >> This patch set has been marked as Waiting on Author for a long time >> but the correct status now is Needs Review. The patch actually was >> updated and incorporated all review comments but they was not rebased >> actively. >> >> The mail[1] I posted before would be helpful to understand the current >> patch design and there are README in the patch and a wiki page[2]. >> >> I've marked this as Needs Review. >> > > Hi Sawada san, > > I just had a quick look to 0001 and 0002 patch here is the few suggestions. > > patch: v27-0001: > > Typo: s/non-temprary/non-temporary > ---- > > patch: v27-0002: (Note:The left-hand number is the line number in the v27-0002 patch): > > 138 +PostgreSQL's the global transaction manager (GTM), as a distributed transaction > 139 +participant The registered foreign transactions are tracked until the end of > > Full stop "." is missing after "participant" > > > 174 +API Contract With Transaction Management Callback Functions > > Can we just say "Transaction Management Callback Functions"; > TOBH, I am not sure that I understand this title. > > > 203 +processing foreign transaction (i.g. preparing, committing or aborting) the > > Do you mean "i.e" instead of i.g. ? > > > 269 + * RollbackForeignTransactionAPI. Registered participant servers are identified > > Add space before between RollbackForeignTransaction and API. > > > 292 + * automatically so must be processed manually using by pg_resovle_fdwxact() > > Do you mean pg_resolve_foreign_xact() here? > > > 320 + * the foreign transaction is authorized to update the fields from its own > 321 + * one. > 322 + > 323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a > > Please add asterisk '*' on line#322. > > > 816 +static void > 817 +FdwXactPrepareForeignTransactions(void) > 818 +{ > 819 + ListCell *lcell; > > Let's have this variable name as "lc" like elsewhere. > > > 1036 + ereport(ERROR, (errmsg("could not insert a foreign transaction entry"), > 1037 + errdetail("duplicate entry with transaction id %u, serverid %u, userid %u", > 1038 + xid, serverid, userid))); > 1039 + } > > Incorrect formatting. > > > 1166 +/* > 1167 + * Return true and set FdwXactAtomicCommitReady to true if the current transaction > > Do you mean ForeignTwophaseCommitIsRequired instead of FdwXactAtomicCommitReady? > > > 3529 + > 3530 +/* > 3531 + * FdwXactLauncherRegister > 3532 + * Register a background worker running the foreign transaction > 3533 + * launcher. > 3534 + */ > > This prolog style is not consistent with the other function in the file. > > > And here are the few typos: > > s/conssitent/consistent > s/consisnts/consist > s/Foriegn/Foreign > s/tranascation/transaction > s/itselft/itself > s/rolbacked/rollbacked > s/trasaction/transaction > s/transactio/transaction > s/automically/automatically > s/CommitForeignTransaciton/CommitForeignTransaction > s/Similary/Similarly > s/FDWACT_/FDWXACT_ > s/dink/disk > s/requried/required > s/trasactions/transactions > s/prepread/prepared > s/preapred/prepared > s/beging/being > s/gxact/xact > s/in-dbout/in-doubt > s/respecitively/respectively > s/transction/transaction > s/idenetifier/identifier > s/identifer/identifier > s/checkpoint'S/checkpoint's > s/fo/of > s/transcation/transaction > s/trasanction/transaction > s/non-temprary/non-temporary > s/resovler_internal.h/resolver_internal.h > > Thank you for reviewing the patch! I've incorporated all comments in local branch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 19 Feb 2020 at 07:55, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 11 Feb 2020 at 12:42, amul sul <sulamul@gmail.com> wrote: > > > > On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > >> > > >> > Hello. > >> > > >> > This is the reased (and a bit fixed) version of the patch. This > >> > applies on the master HEAD and passes all provided tests. > >> > > >> > I took over this work from Sawada-san. I'll begin with reviewing the > >> > current patch. > >> > > >> > >> The previous patch set is no longer applied cleanly to the current > >> HEAD. I've updated and slightly modified the codes. > >> > >> This patch set has been marked as Waiting on Author for a long time > >> but the correct status now is Needs Review. The patch actually was > >> updated and incorporated all review comments but they was not rebased > >> actively. > >> > >> The mail[1] I posted before would be helpful to understand the current > >> patch design and there are README in the patch and a wiki page[2]. > >> > >> I've marked this as Needs Review. > >> > > > > Hi Sawada san, > > > > I just had a quick look to 0001 and 0002 patch here is the few suggestions. > > > > patch: v27-0001: > > > > Typo: s/non-temprary/non-temporary > > ---- > > > > patch: v27-0002: (Note:The left-hand number is the line number in the v27-0002 patch): > > > > 138 +PostgreSQL's the global transaction manager (GTM), as a distributed transaction > > 139 +participant The registered foreign transactions are tracked until the end of > > > > Full stop "." is missing after "participant" > > > > > > 174 +API Contract With Transaction Management Callback Functions > > > > Can we just say "Transaction Management Callback Functions"; > > TOBH, I am not sure that I understand this title. > > > > > > 203 +processing foreign transaction (i.g. preparing, committing or aborting) the > > > > Do you mean "i.e" instead of i.g. ? > > > > > > 269 + * RollbackForeignTransactionAPI. Registered participant servers are identified > > > > Add space before between RollbackForeignTransaction and API. > > > > > > 292 + * automatically so must be processed manually using by pg_resovle_fdwxact() > > > > Do you mean pg_resolve_foreign_xact() here? > > > > > > 320 + * the foreign transaction is authorized to update the fields from its own > > 321 + * one. > > 322 + > > 323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a > > > > Please add asterisk '*' on line#322. > > > > > > 816 +static void > > 817 +FdwXactPrepareForeignTransactions(void) > > 818 +{ > > 819 + ListCell *lcell; > > > > Let's have this variable name as "lc" like elsewhere. > > > > > > 1036 + ereport(ERROR, (errmsg("could not insert a foreign transaction entry"), > > 1037 + errdetail("duplicate entry with transaction id %u, serverid %u, userid %u", > > 1038 + xid, serverid, userid))); > > 1039 + } > > > > Incorrect formatting. > > > > > > 1166 +/* > > 1167 + * Return true and set FdwXactAtomicCommitReady to true if the current transaction > > > > Do you mean ForeignTwophaseCommitIsRequired instead of FdwXactAtomicCommitReady? > > > > > > 3529 + > > 3530 +/* > > 3531 + * FdwXactLauncherRegister > > 3532 + * Register a background worker running the foreign transaction > > 3533 + * launcher. > > 3534 + */ > > > > This prolog style is not consistent with the other function in the file. > > > > > > And here are the few typos: > > > > s/conssitent/consistent > > s/consisnts/consist > > s/Foriegn/Foreign > > s/tranascation/transaction > > s/itselft/itself > > s/rolbacked/rollbacked > > s/trasaction/transaction > > s/transactio/transaction > > s/automically/automatically > > s/CommitForeignTransaciton/CommitForeignTransaction > > s/Similary/Similarly > > s/FDWACT_/FDWXACT_ > > s/dink/disk > > s/requried/required > > s/trasactions/transactions > > s/prepread/prepared > > s/preapred/prepared > > s/beging/being > > s/gxact/xact > > s/in-dbout/in-doubt > > s/respecitively/respectively > > s/transction/transaction > > s/idenetifier/identifier > > s/identifer/identifier > > s/checkpoint'S/checkpoint's > > s/fo/of > > s/transcation/transaction > > s/trasanction/transaction > > s/non-temprary/non-temporary > > s/resovler_internal.h/resolver_internal.h > > > > > > Thank you for reviewing the patch! I've incorporated all comments in > local branch. Attached the updated version patch sets that incorporated review comments from Amul and Muhammad. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
On Tue, 18 Feb 2020 at 00:40, Muhammad Usama <m.usama@gmail.com> wrote: > > Hi Sawada San, > > I have a couple of comments on "v27-0002-Support-atomic-commit-among-multiple-foreign-ser.patch" > > 1- As part of the XLogReadRecord refactoring commit the signature of XLogReadRecord was changed, > so the function call to XLogReadRecord() needs a small adjustment. > > i.e. In function XlogReadFdwXactData(XLogRecPtr lsn, char **buf, int *len) > ... > - record = XLogReadRecord(xlogreader, lsn, &errormsg); > + XLogBeginRead(xlogreader, lsn) > + record = XLogReadRecord(xlogreader, &errormsg); > > 2- In register_fdwxact(..) function you are setting the XACT_FLAGS_FDWNOPREPARE transaction flag > when the register request comes in for foreign server that does not support two-phase commit regardless > of the value of 'bool modified' argument. And later in the PreCommit_FdwXacts() you just error out when > "foreign_twophase_commit" is set to 'required' only by looking at the XACT_FLAGS_FDWNOPREPARE flag. > which I think is not correct. > Since there is a possibility that the transaction might have only read from the foreign servers (not capable of > handling transactions or two-phase commit) and all other servers where we require to do atomic commit > are capable enough of doing so. > If I am not missing something obvious here, then IMHO the XACT_FLAGS_FDWNOPREPARE flag should only > be set when the transaction management/two-phase functionality is not available and "modified" argument is > true in register_fdwxact() > Thank you for reviewing this patch! Your comments are incorporated in the latest patch set I recently sent[1]. [1] https://www.postgresql.org/message-id/CA%2Bfd4k5ZcDvoiY_5c-mF1oDACS5nUWS7ppoiOwjCOnM%2BgrJO-Q%40mail.gmail.com Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 19 Feb 2020 at 07:55, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 11 Feb 2020 at 12:42, amul sul <sulamul@gmail.com> wrote:
> >
> > On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
> >>
> >> On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> >> >
> >> > Hello.
> >> >
> >> > This is the reased (and a bit fixed) version of the patch. This
> >> > applies on the master HEAD and passes all provided tests.
> >> >
> >> > I took over this work from Sawada-san. I'll begin with reviewing the
> >> > current patch.
> >> >
> >>
> >> The previous patch set is no longer applied cleanly to the current
> >> HEAD. I've updated and slightly modified the codes.
> >>
> >> This patch set has been marked as Waiting on Author for a long time
> >> but the correct status now is Needs Review. The patch actually was
> >> updated and incorporated all review comments but they was not rebased
> >> actively.
> >>
> >> The mail[1] I posted before would be helpful to understand the current
> >> patch design and there are README in the patch and a wiki page[2].
> >>
> >> I've marked this as Needs Review.
> >>
> >
> > Hi Sawada san,
> >
> > I just had a quick look to 0001 and 0002 patch here is the few suggestions.
> >
> > patch: v27-0001:
> >
> > Typo: s/non-temprary/non-temporary
> > ----
> >
> > patch: v27-0002: (Note:The left-hand number is the line number in the v27-0002 patch):
> >
> > 138 +PostgreSQL's the global transaction manager (GTM), as a distributed transaction
> > 139 +participant The registered foreign transactions are tracked until the end of
> >
> > Full stop "." is missing after "participant"
> >
> >
> > 174 +API Contract With Transaction Management Callback Functions
> >
> > Can we just say "Transaction Management Callback Functions";
> > TOBH, I am not sure that I understand this title.
> >
> >
> > 203 +processing foreign transaction (i.g. preparing, committing or aborting) the
> >
> > Do you mean "i.e" instead of i.g. ?
> >
> >
> > 269 + * RollbackForeignTransactionAPI. Registered participant servers are identified
> >
> > Add space before between RollbackForeignTransaction and API.
> >
> >
> > 292 + * automatically so must be processed manually using by pg_resovle_fdwxact()
> >
> > Do you mean pg_resolve_foreign_xact() here?
> >
> >
> > 320 + * the foreign transaction is authorized to update the fields from its own
> > 321 + * one.
> > 322 +
> > 323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK PREPARED a
> >
> > Please add asterisk '*' on line#322.
> >
> >
> > 816 +static void
> > 817 +FdwXactPrepareForeignTransactions(void)
> > 818 +{
> > 819 + ListCell *lcell;
> >
> > Let's have this variable name as "lc" like elsewhere.
> >
> >
> > 1036 + ereport(ERROR, (errmsg("could not insert a foreign transaction entry"),
> > 1037 + errdetail("duplicate entry with transaction id %u, serverid %u, userid %u",
> > 1038 + xid, serverid, userid)));
> > 1039 + }
> >
> > Incorrect formatting.
> >
> >
> > 1166 +/*
> > 1167 + * Return true and set FdwXactAtomicCommitReady to true if the current transaction
> >
> > Do you mean ForeignTwophaseCommitIsRequired instead of FdwXactAtomicCommitReady?
> >
> >
> > 3529 +
> > 3530 +/*
> > 3531 + * FdwXactLauncherRegister
> > 3532 + * Register a background worker running the foreign transaction
> > 3533 + * launcher.
> > 3534 + */
> >
> > This prolog style is not consistent with the other function in the file.
> >
> >
> > And here are the few typos:
> >
> > s/conssitent/consistent
> > s/consisnts/consist
> > s/Foriegn/Foreign
> > s/tranascation/transaction
> > s/itselft/itself
> > s/rolbacked/rollbacked
> > s/trasaction/transaction
> > s/transactio/transaction
> > s/automically/automatically
> > s/CommitForeignTransaciton/CommitForeignTransaction
> > s/Similary/Similarly
> > s/FDWACT_/FDWXACT_
> > s/dink/disk
> > s/requried/required
> > s/trasactions/transactions
> > s/prepread/prepared
> > s/preapred/prepared
> > s/beging/being
> > s/gxact/xact
> > s/in-dbout/in-doubt
> > s/respecitively/respectively
> > s/transction/transaction
> > s/idenetifier/identifier
> > s/identifer/identifier
> > s/checkpoint'S/checkpoint's
> > s/fo/of
> > s/transcation/transaction
> > s/trasanction/transaction
> > s/non-temprary/non-temporary
> > s/resovler_internal.h/resolver_internal.h
> >
> >
>
> Thank you for reviewing the patch! I've incorporated all comments in
> local branch.
Attached the updated version patch sets that incorporated review
comments from Amul and Muhammad.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I have been further reviewing and testing the transaction involving multiple server patches.
Below is the list of changes I have made on top of V18 patches.
1- In register_fdwxact(), As we are just storing the callback function pointers from
So I have moved the MemoryContextSwitch to TopMemoryContext after the
2- If PrepareForeignTransaction functionality is not present in some FDW then
3- I have moved the foreign_twophase_commit in sample file after
4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required()
6- In prefer mode, we commit the transaction in single-phase if the server does not support
So I have modified the flow a little bit and instead of doing a one-phase commit right away
7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the
8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds
BEGIN
INSERT 0 1
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'local_prepared';
PREPARE TRANSACTION
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)
-- Now commit the prepared transaction
postgres=# COMMIT PREPARED 'local_prepared';
COMMIT PREPARED
--Foreign prepared transactions associated with 'local_prepared' not resolved
postgres=#
-------+-----+----------+--------+----------+----------+----------------------------
12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
(2 rows)
So to fix this in case of the two-phase transaction, the function checks the existence
9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord()
10- incorporated set_ps_display() signature change.
Attachment
On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote: > > Hi Sawada San, > > I have been further reviewing and testing the transaction involving multiple server patches. > Overall the patches are working as expected bar a few important exceptions. > So as discussed over the call I have fixed the issues I found during the testing > and also rebased the patches with the current head of the master branch. > So can you please have a look at the attached updated patches. Thank you for reviewing and updating the patch! > > Below is the list of changes I have made on top of V18 patches. > > 1- In register_fdwxact(), As we are just storing the callback function pointers from > FdwRoutine in fdw_part structure, So I think we can avoid calling > GetFdwRoutineByServerId() in TopMemoryContext. > So I have moved the MemoryContextSwitch to TopMemoryContext after the > GetFdwRoutineByServerId() call. Agreed. > > > 2- If PrepareForeignTransaction functionality is not present in some FDW then > during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE > transaction flag if the modified flag is also set for that server. As for the server that has > not done any data modification within the transaction we do not do two-phase commit anyway. Agreed. > > 3- I have moved the foreign_twophase_commit in sample file after > max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers > is 0 and enabling the foreign_twophase_commit produces an error with default > configuration parameter positioning in postgresql.conf > Also, foreign_twophase_commit configuration was missing the comments > about allowed values in the sample config file. Sounds good. Agreed. > > 4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required() > function does not seem to be the correct place. The reason being, even when > is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired > to true, we could still end up not using the two-phase commit in the case when some server does > not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER > mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts() > function after doing the prepare transaction. Agreed. > > 6- In prefer mode, we commit the transaction in single-phase if the server does not support > the two-phase commit. But instead of doing the single-phase commit right away, > IMHO the better way is to wait until all the two-phase transactions are successfully prepared > on servers that support the two-phase. Since an error during a "PREPARE" stage would > rollback the transaction and in that case, we would end up with committed transactions on > the server that lacks the support of the two-phase commit. When an error occurred before the local commit, a 2pc-unsupported server could be rolled back or committed depending on the error timing. On the other hand all 2pc-supported servers are always rolled back when an error occurred before the local commit. Therefore even if we change the order of COMMIT and PREPARE it is still possible that we will end up committing the part of 2pc-unsupported servers while rolling back others including 2pc-supported servers. I guess the motivation of your change is that since errors are likely to happen during executing PREPARE on foreign servers, we can minimize the possibility of rolling back 2pc-unsupported servers by deferring the commit of 2pc-unsupported server as much as possible. Is that right? > So I have modified the flow a little bit and instead of doing a one-phase commit right away > the servers that do not support a two-phase commit is added to another list and that list is > processed after once we have successfully prepared all the transactions on two-phase supported > foreign servers. Although this technique is also not bulletproof, still it is better than doing > the one-phase commits before doing the PREPAREs. Hmm the current logic seems complex. Maybe we can just reverse the order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and modified servers first and then do COMMIT on others? > > Also, I think we can improve on this one by throwing an error even in PREFER > mode if there is more than one server that had data modified within the transaction > and lacks the two-phase commit support. > IIUC the concept of PREFER mode is that the transaction uses 2pc only for 2pc-supported servers. IOW, even if the transaction modifies on a 2pc-unsupported server we can proceed with the commit if in PREFER mode, which cannot if in REQUIRED mode. What is the motivation of your above idea? > 7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the > memory if fdw_part is removed from the list I think at the end of the transaction we free entries of FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we need to do that in PreCommit_FdwXacts()? > > 8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds > (FdwXactParticipants == NIL). The problem with that was in the case of > "COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and > effectively the foreign prepared transactions(if any) associated with locally > prepared transactions were never getting resolved automatically. > > > postgres=# BEGIN; > BEGIN > INSERT INTO test_local VALUES ( 2, 'TWO'); > INSERT 0 1 > INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO'); > INSERT 0 1 > INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO'); > INSERT 0 1 > postgres=*# PREPARE TRANSACTION 'local_prepared'; > PREPARE TRANSACTION > > postgres=# select * from pg_foreign_xacts ; > dbid | xid | serverid | userid | status | in_doubt | identifier > -------+-----+----------+--------+----------+----------+---------------------------- > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10 > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10 > (2 rows) > > -- Now commit the prepared transaction > > postgres=# COMMIT PREPARED 'local_prepared'; > > COMMIT PREPARED > > --Foreign prepared transactions associated with 'local_prepared' not resolved > > postgres=# > > postgres=# select * from pg_foreign_xacts ; > dbid | xid | serverid | userid | status | in_doubt | identifier > -------+-----+----------+--------+----------+----------+---------------------------- > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10 > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10 > (2 rows) > > > So to fix this in case of the two-phase transaction, the function checks the existence > of associated foreign prepared transactions before bailing out. > Good catch. But looking at your change, we should not accept the case where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false. if (FdwXactParticipants == NIL) { /* * If we are here because of COMMIT/ROLLBACK PREPARED then the * FdwXactParticipants list would be empty. So we need to * see if there are any foreign prepared transactions exists * for this prepared transaction */ if (TwoPhaseExists(wait_xid)) { List *foreign_trans = NIL; foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid, false, false, true); if (foreign_trans == NIL) return; list_free(foreign_trans); } } > 9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord() > that was causing the crash during recovery. Agreed. > > 10- incorporated set_ps_display() signature change. Thanks. Regarding other changes you did in v19 patch, I have some comments: 1. + ereport(LOG, + (errmsg("trying to %s the foreign transaction associated with transaction %u on server %u", + fdwxact->status == FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT", + fdwxact->local_xid, fdwxact->serverid))); + Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function? 2. diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c deleted file mode 120000 index ce8c21880c..0000000000 --- a/src/bin/pg_waldump/fdwxactdesc.c +++ /dev/null @@ -1 +0,0 @@ -../../../src/backend/access/rmgrdesc/fdwxactdesc.c \ No newline at end of file diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c new file mode 100644 index 0000000000..ce8c21880c --- /dev/null +++ b/src/bin/pg_waldump/fdwxactdesc.c @@ -0,0 +1 @@ +../../../src/backend/access/rmgrdesc/fdwxactdesc.c We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch. 3. --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1526,14 +1526,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser <entry><literal>SafeSnapshot</literal></entry> <entry>Waiting for a snapshot for a <literal>READ ONLY DEFERRABLE</literal> transaction.</entry> </row> - <row> - <entry><literal>SyncRep</literal></entry> - <entry>Waiting for confirmation from remote server during synchronous replication.</entry> - </row> <row> <entry><literal>FdwXactResolution</literal></entry> <entry>Waiting for all foreign transaction participants to be resolved during atomic commit among foreign servers.</entry> </row> + <row> + <entry><literal>SyncRep</literal></entry> + <entry>Waiting for confirmation from remote server during synchronous replication.</entry> + </row> <row> <entry morerows="4"><literal>Timeout</literal></entry> <entry><literal>BaseBackupThrottle</literal></entry> We need to move the entry of FdwXactResolution to right before Hash/Batch/Allocating for alphabetical order. I've incorporated your changes I agreed with to my local branch and will incorporate other changes after discussion. I'll also do more test and self-review and will submit the latest version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote:
>
> Hi Sawada San,
>
> I have been further reviewing and testing the transaction involving multiple server patches.
> Overall the patches are working as expected bar a few important exceptions.
> So as discussed over the call I have fixed the issues I found during the testing
> and also rebased the patches with the current head of the master branch.
> So can you please have a look at the attached updated patches.
Thank you for reviewing and updating the patch!
>
> Below is the list of changes I have made on top of V18 patches.
>
> 1- In register_fdwxact(), As we are just storing the callback function pointers from
> FdwRoutine in fdw_part structure, So I think we can avoid calling
> GetFdwRoutineByServerId() in TopMemoryContext.
> So I have moved the MemoryContextSwitch to TopMemoryContext after the
> GetFdwRoutineByServerId() call.
Agreed.
>
>
> 2- If PrepareForeignTransaction functionality is not present in some FDW then
> during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE
> transaction flag if the modified flag is also set for that server. As for the server that has
> not done any data modification within the transaction we do not do two-phase commit anyway.
Agreed.
>
> 3- I have moved the foreign_twophase_commit in sample file after
> max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers
> is 0 and enabling the foreign_twophase_commit produces an error with default
> configuration parameter positioning in postgresql.conf
> Also, foreign_twophase_commit configuration was missing the comments
> about allowed values in the sample config file.
Sounds good. Agreed.
>
> 4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required()
> function does not seem to be the correct place. The reason being, even when
> is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired
> to true, we could still end up not using the two-phase commit in the case when some server does
> not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER
> mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts()
> function after doing the prepare transaction.
Agreed.
>
> 6- In prefer mode, we commit the transaction in single-phase if the server does not support
> the two-phase commit. But instead of doing the single-phase commit right away,
> IMHO the better way is to wait until all the two-phase transactions are successfully prepared
> on servers that support the two-phase. Since an error during a "PREPARE" stage would
> rollback the transaction and in that case, we would end up with committed transactions on
> the server that lacks the support of the two-phase commit.
When an error occurred before the local commit, a 2pc-unsupported
server could be rolled back or committed depending on the error
timing. On the other hand all 2pc-supported servers are always rolled
back when an error occurred before the local commit. Therefore even if
we change the order of COMMIT and PREPARE it is still possible that we
will end up committing the part of 2pc-unsupported servers while
rolling back others including 2pc-supported servers.
I guess the motivation of your change is that since errors are likely
to happen during executing PREPARE on foreign servers, we can minimize
the possibility of rolling back 2pc-unsupported servers by deferring
the commit of 2pc-unsupported server as much as possible. Is that
right?
> So I have modified the flow a little bit and instead of doing a one-phase commit right away
> the servers that do not support a two-phase commit is added to another list and that list is
> processed after once we have successfully prepared all the transactions on two-phase supported
> foreign servers. Although this technique is also not bulletproof, still it is better than doing
> the one-phase commits before doing the PREPAREs.
Hmm the current logic seems complex. Maybe we can just reverse the
order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
modified servers first and then do COMMIT on others?
>
> Also, I think we can improve on this one by throwing an error even in PREFER
> mode if there is more than one server that had data modified within the transaction
> and lacks the two-phase commit support.
>
IIUC the concept of PREFER mode is that the transaction uses 2pc only
for 2pc-supported servers. IOW, even if the transaction modifies on a
2pc-unsupported server we can proceed with the commit if in PREFER
mode, which cannot if in REQUIRED mode. What is the motivation of your
above idea?
> 7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the
> memory if fdw_part is removed from the list
I think at the end of the transaction we free entries of
FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
need to do that in PreCommit_FdwXacts()?
>
> 8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds
> (FdwXactParticipants == NIL). The problem with that was in the case of
> "COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
> effectively the foreign prepared transactions(if any) associated with locally
> prepared transactions were never getting resolved automatically.
>
>
> postgres=# BEGIN;
> BEGIN
> INSERT INTO test_local VALUES ( 2, 'TWO');
> INSERT 0 1
> INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
> INSERT 0 1
> INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
> INSERT 0 1
> postgres=*# PREPARE TRANSACTION 'local_prepared';
> PREPARE TRANSACTION
>
> postgres=# select * from pg_foreign_xacts ;
> dbid | xid | serverid | userid | status | in_doubt | identifier
> -------+-----+----------+--------+----------+----------+----------------------------
> 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
> 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
> (2 rows)
>
> -- Now commit the prepared transaction
>
> postgres=# COMMIT PREPARED 'local_prepared';
>
> COMMIT PREPARED
>
> --Foreign prepared transactions associated with 'local_prepared' not resolved
>
> postgres=#
>
> postgres=# select * from pg_foreign_xacts ;
> dbid | xid | serverid | userid | status | in_doubt | identifier
> -------+-----+----------+--------+----------+----------+----------------------------
> 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
> 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
> (2 rows)
>
>
> So to fix this in case of the two-phase transaction, the function checks the existence
> of associated foreign prepared transactions before bailing out.
>
Good catch. But looking at your change, we should not accept the case
where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
false.
if (FdwXactParticipants == NIL)
{
/*
* If we are here because of COMMIT/ROLLBACK PREPARED then the
* FdwXactParticipants list would be empty. So we need to
* see if there are any foreign prepared transactions exists
* for this prepared transaction
*/
if (TwoPhaseExists(wait_xid))
{
List *foreign_trans = NIL;
foreign_trans = get_fdwxacts(MyDatabaseId,
wait_xid, InvalidOid, InvalidOid,
false, false, true);
if (foreign_trans == NIL)
return;
list_free(foreign_trans);
}
}
{
List *foreign_trans = NIL;
foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid,
false, false, true);
return;
list_free(foreign_trans);
}
else
return;
> 9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord()
> that was causing the crash during recovery.
Agreed.
>
> 10- incorporated set_ps_display() signature change.
Thanks.
Regarding other changes you did in v19 patch, I have some comments:
1.
+ ereport(LOG,
+ (errmsg("trying to %s the foreign transaction
associated with transaction %u on server %u",
+ fdwxact->status ==
FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
+ fdwxact->local_xid,
fdwxact->serverid)));
+
Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function?
2.
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
deleted file mode 120000
index ce8c21880c..0000000000
--- a/src/bin/pg_waldump/fdwxactdesc.c
+++ /dev/null
@@ -1 +0,0 @@
-../../../src/backend/access/rmgrdesc/fdwxactdesc.c
\ No newline at end of file
diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
new file mode 100644
index 0000000000..ce8c21880c
--- /dev/null
+++ b/src/bin/pg_waldump/fdwxactdesc.c
@@ -0,0 +1 @@
+../../../src/backend/access/rmgrdesc/fdwxactdesc.c
We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.
3.
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1526,14 +1526,14 @@ postgres 27093 0.0 0.0 30096 2752 ?
Ss 11:34 0:00 postgres: ser
<entry><literal>SafeSnapshot</literal></entry>
<entry>Waiting for a snapshot for a <literal>READ ONLY
DEFERRABLE</literal> transaction.</entry>
</row>
- <row>
- <entry><literal>SyncRep</literal></entry>
- <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
- </row>
<row>
<entry><literal>FdwXactResolution</literal></entry>
<entry>Waiting for all foreign transaction participants to
be resolved during atomic commit among foreign servers.</entry>
</row>
+ <row>
+ <entry><literal>SyncRep</literal></entry>
+ <entry>Waiting for confirmation from remote server during
synchronous replication.</entry>
+ </row>
<row>
<entry morerows="4"><literal>Timeout</literal></entry>
<entry><literal>BaseBackupThrottle</literal></entry>
We need to move the entry of FdwXactResolution to right before
Hash/Batch/Allocating for alphabetical order.
I've incorporated your changes I agreed with to my local branch and
will incorporate other changes after discussion. I'll also do more
test and self-review and will submit the latest version patch.
i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers()
could return un-initialized value.
I am attaching a small patch for these changes that can be applied on top of existing
patches.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Highgo Software
URL : http://www.highgo.ca
Attachment
On Tue, 28 Apr 2020 at 19:37, Muhammad Usama <m.usama@gmail.com> wrote: > > > > On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote: >> > >> > Hi Sawada San, >> > >> > I have been further reviewing and testing the transaction involving multiple server patches. >> > Overall the patches are working as expected bar a few important exceptions. >> > So as discussed over the call I have fixed the issues I found during the testing >> > and also rebased the patches with the current head of the master branch. >> > So can you please have a look at the attached updated patches. >> >> Thank you for reviewing and updating the patch! >> >> > >> > Below is the list of changes I have made on top of V18 patches. >> > >> > 1- In register_fdwxact(), As we are just storing the callback function pointers from >> > FdwRoutine in fdw_part structure, So I think we can avoid calling >> > GetFdwRoutineByServerId() in TopMemoryContext. >> > So I have moved the MemoryContextSwitch to TopMemoryContext after the >> > GetFdwRoutineByServerId() call. >> >> Agreed. >> >> > >> > >> > 2- If PrepareForeignTransaction functionality is not present in some FDW then >> > during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE >> > transaction flag if the modified flag is also set for that server. As for the server that has >> > not done any data modification within the transaction we do not do two-phase commit anyway. >> >> Agreed. >> >> > >> > 3- I have moved the foreign_twophase_commit in sample file after >> > max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers >> > is 0 and enabling the foreign_twophase_commit produces an error with default >> > configuration parameter positioning in postgresql.conf >> > Also, foreign_twophase_commit configuration was missing the comments >> > about allowed values in the sample config file. >> >> Sounds good. Agreed. >> >> > >> > 4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required() >> > function does not seem to be the correct place. The reason being, even when >> > is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired >> > to true, we could still end up not using the two-phase commit in the case when some server does >> > not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER >> > mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts() >> > function after doing the prepare transaction. >> >> Agreed. >> >> > >> > 6- In prefer mode, we commit the transaction in single-phase if the server does not support >> > the two-phase commit. But instead of doing the single-phase commit right away, >> > IMHO the better way is to wait until all the two-phase transactions are successfully prepared >> > on servers that support the two-phase. Since an error during a "PREPARE" stage would >> > rollback the transaction and in that case, we would end up with committed transactions on >> > the server that lacks the support of the two-phase commit. >> >> When an error occurred before the local commit, a 2pc-unsupported >> server could be rolled back or committed depending on the error >> timing. On the other hand all 2pc-supported servers are always rolled >> back when an error occurred before the local commit. Therefore even if >> we change the order of COMMIT and PREPARE it is still possible that we >> will end up committing the part of 2pc-unsupported servers while >> rolling back others including 2pc-supported servers. >> >> I guess the motivation of your change is that since errors are likely >> to happen during executing PREPARE on foreign servers, we can minimize >> the possibility of rolling back 2pc-unsupported servers by deferring >> the commit of 2pc-unsupported server as much as possible. Is that >> right? > > > Yes, that is correct. The idea of doing the COMMIT on NON-2pc-supported servers > after all the PREPAREs are successful is to minimize the chances of partial commits. > And as you mentioned there will still be chances of getting a partial commit even with > this approach but the probability of that would be less than what it is with the > current sequence. > > >> >> >> > So I have modified the flow a little bit and instead of doing a one-phase commit right away >> > the servers that do not support a two-phase commit is added to another list and that list is >> > processed after once we have successfully prepared all the transactions on two-phase supported >> > foreign servers. Although this technique is also not bulletproof, still it is better than doing >> > the one-phase commits before doing the PREPAREs. >> >> Hmm the current logic seems complex. Maybe we can just reverse the >> order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and >> modified servers first and then do COMMIT on others? > > > Agreed, seems reasonable. >> >> >> > >> > Also, I think we can improve on this one by throwing an error even in PREFER >> > mode if there is more than one server that had data modified within the transaction >> > and lacks the two-phase commit support. >> > >> >> IIUC the concept of PREFER mode is that the transaction uses 2pc only >> for 2pc-supported servers. IOW, even if the transaction modifies on a >> 2pc-unsupported server we can proceed with the commit if in PREFER >> mode, which cannot if in REQUIRED mode. What is the motivation of your >> above idea? > > > I was thinking that we could change the behavior of PREFER mode such that we only allow > to COMMIT the transaction if the transaction needs to do a single-phase commit on one > server only. That way we can ensure that we would never end up with partial commit. > I think it's good to avoid a partial commit by using your idea but if we want to avoid a partial commit we can use the 'required' mode, which requires all participant servers to support 2pc. We throw an error if participant servers include even one 2pc-unsupported server is modified within the transaction. Of course if the participant node is only one 2pc-unsupported server it can use 1pc even in the 'required' mode. > One Idea in this regards would be to switch the local transaction to commit using 2pc > if there is a total of only one foreign server that does not support the 2pc in the transaction, > ensuring that 1-pc commit servers should always be less than or equal to 1. and if there are more > than one foreign server requires 1-pc then we just throw an error. I might be missing your point but I suppose this idea is to do something like the following? 1. prepare the local transaction 2. commit the foreign transaction on 2pc-unsupported server 3. commit the prepared local transaction > > However having said that, I am not 100% sure if its a good or an acceptable Idea, and > I am okay with continuing with the current behavior of PREFER mode if we put it in the > document that this mode can cause a partial commit. There will three types of servers: (a) a server doesn't support any transaction API, (b) a server supports only commit and rollback API and (c) a server supports all APIs (commit, rollback and prepare). Currently postgres transaction manager manages only server-(b) and server-(c), adds them to FdwXactParticipants. I'm considering changing the code so that it adds also server-(a) to FdwXactParticipants, in order to track the number of server-(a) involved in the transaction. But it doesn't insert FdwXact entry for it, and manage transactions on these servers. The reason is this; if we want to have the 'required' mode strictly require all participant servers to support 2pc, we should use 2pc when (# of server-(a) + # of server-(b) + # of server-(c)) >= 2. But since currently we just track the modification on a server-(a) by a flag we cannot handle the case where two server-(a) are modified in the transaction. On the other hand, if we don't consider server-(a) the transaction could end up with a partial commit when a server-(a) participates in the transaction. Therefore I'm thinking of the above change so that the transaction manager can ensure that a partial commit doesn't happen in the 'required' mode. What do you think? > >> >> > 7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the >> > memory if fdw_part is removed from the list >> >> I think at the end of the transaction we free entries of >> FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we >> need to do that in PreCommit_FdwXacts()? > > > Correct me if I am wrong, The fdw_part structures are created in TopMemoryContext > and if that fdw_part structure is removed from the list at pre_commit stage > (because we did 1-PC COMMIT on it) then it would leak memory. The fdw_part structures are created in TopTransactionContext so these are freed at the end of the transaction. > >> >> > >> > 8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds >> > (FdwXactParticipants == NIL). The problem with that was in the case of >> > "COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and >> > effectively the foreign prepared transactions(if any) associated with locally >> > prepared transactions were never getting resolved automatically. >> > >> > >> > postgres=# BEGIN; >> > BEGIN >> > INSERT INTO test_local VALUES ( 2, 'TWO'); >> > INSERT 0 1 >> > INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO'); >> > INSERT 0 1 >> > INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO'); >> > INSERT 0 1 >> > postgres=*# PREPARE TRANSACTION 'local_prepared'; >> > PREPARE TRANSACTION >> > >> > postgres=# select * from pg_foreign_xacts ; >> > dbid | xid | serverid | userid | status | in_doubt | identifier >> > -------+-----+----------+--------+----------+----------+---------------------------- >> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10 >> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10 >> > (2 rows) >> > >> > -- Now commit the prepared transaction >> > >> > postgres=# COMMIT PREPARED 'local_prepared'; >> > >> > COMMIT PREPARED >> > >> > --Foreign prepared transactions associated with 'local_prepared' not resolved >> > >> > postgres=# >> > >> > postgres=# select * from pg_foreign_xacts ; >> > dbid | xid | serverid | userid | status | in_doubt | identifier >> > -------+-----+----------+--------+----------+----------+---------------------------- >> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10 >> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10 >> > (2 rows) >> > >> > >> > So to fix this in case of the two-phase transaction, the function checks the existence >> > of associated foreign prepared transactions before bailing out. >> > >> >> Good catch. But looking at your change, we should not accept the case >> where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == >> false. >> >> if (FdwXactParticipants == NIL) >> { >> /* >> * If we are here because of COMMIT/ROLLBACK PREPARED then the >> * FdwXactParticipants list would be empty. So we need to >> * see if there are any foreign prepared transactions exists >> * for this prepared transaction >> */ >> if (TwoPhaseExists(wait_xid)) >> { >> List *foreign_trans = NIL; >> >> foreign_trans = get_fdwxacts(MyDatabaseId, >> wait_xid, InvalidOid, InvalidOid, >> false, false, true); >> >> if (foreign_trans == NIL) >> return; >> list_free(foreign_trans); >> } >> } >> > > Sorry my bad, its a mistake on my part. we should just return from the function when > FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false. > > if (TwoPhaseExists(wait_xid)) > { > List *foreign_trans = NIL; > foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid, > false, false, true); > > if (foreign_trans == NIL) > return; > list_free(foreign_trans); > } > else > return; > >> >> > 9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord() >> > that was causing the crash during recovery. >> >> Agreed. >> >> > >> > 10- incorporated set_ps_display() signature change. >> >> Thanks. >> >> Regarding other changes you did in v19 patch, I have some comments: >> >> 1. >> + ereport(LOG, >> + (errmsg("trying to %s the foreign transaction >> associated with transaction %u on server %u", >> + fdwxact->status == >> FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT", >> + fdwxact->local_xid, >> fdwxact->serverid))); >> + >> >> Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function? > > > That change was not intended to get into the patch file. I had done it during testing to > quickly get info on which way the transaction is going to be resolved. > >> >> 2. >> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c >> deleted file mode 120000 >> index ce8c21880c..0000000000 >> --- a/src/bin/pg_waldump/fdwxactdesc.c >> +++ /dev/null >> @@ -1 +0,0 @@ >> -../../../src/backend/access/rmgrdesc/fdwxactdesc.c >> \ No newline at end of file >> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c >> new file mode 100644 >> index 0000000000..ce8c21880c >> --- /dev/null >> +++ b/src/bin/pg_waldump/fdwxactdesc.c >> @@ -0,0 +1 @@ >> +../../../src/backend/access/rmgrdesc/fdwxactdesc.c >> >> We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch. > > > Again sorry! that was an oversight on my part. > >> >> 3. >> --- a/doc/src/sgml/monitoring.sgml >> +++ b/doc/src/sgml/monitoring.sgml >> @@ -1526,14 +1526,14 @@ postgres 27093 0.0 0.0 30096 2752 ? >> Ss 11:34 0:00 postgres: ser >> <entry><literal>SafeSnapshot</literal></entry> >> <entry>Waiting for a snapshot for a <literal>READ ONLY >> DEFERRABLE</literal> transaction.</entry> >> </row> >> - <row> >> - <entry><literal>SyncRep</literal></entry> >> - <entry>Waiting for confirmation from remote server during >> synchronous replication.</entry> >> - </row> >> <row> >> <entry><literal>FdwXactResolution</literal></entry> >> <entry>Waiting for all foreign transaction participants to >> be resolved during atomic commit among foreign servers.</entry> >> </row> >> + <row> >> + <entry><literal>SyncRep</literal></entry> >> + <entry>Waiting for confirmation from remote server during >> synchronous replication.</entry> >> + </row> >> <row> >> <entry morerows="4"><literal>Timeout</literal></entry> >> <entry><literal>BaseBackupThrottle</literal></entry> >> >> We need to move the entry of FdwXactResolution to right before >> Hash/Batch/Allocating for alphabetical order. > > > Agreed! >> >> >> I've incorporated your changes I agreed with to my local branch and >> will incorporate other changes after discussion. I'll also do more >> test and self-review and will submit the latest version patch. >> > > Meanwhile, I found a couple of more small issues, One is the break statement missing > i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers() > could return un-initialized value. > I am attaching a small patch for these changes that can be applied on top of existing > patches. Thank you for the patch! I'm updating the patches because current behavior in error case would not be good. For example, when an error occurs in the prepare phase, prepared transactions are left as in-doubt transaction. And these transactions are not handled by the resolver process. That means that a user could need to resolve these transactions manually every abort time, which is not good. In abort case, I think that prepared transactions can be resolved by the backend itself, rather than leaving them for the resolver. I'll submit the updated patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 30 Apr 2020 at 20:43, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 28 Apr 2020 at 19:37, Muhammad Usama <m.usama@gmail.com> wrote: > > > > > > > > On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote: > >> > > >> > Hi Sawada San, > >> > > >> > I have been further reviewing and testing the transaction involving multiple server patches. > >> > Overall the patches are working as expected bar a few important exceptions. > >> > So as discussed over the call I have fixed the issues I found during the testing > >> > and also rebased the patches with the current head of the master branch. > >> > So can you please have a look at the attached updated patches. > >> > >> Thank you for reviewing and updating the patch! > >> > >> > > >> > Below is the list of changes I have made on top of V18 patches. > >> > > >> > 1- In register_fdwxact(), As we are just storing the callback function pointers from > >> > FdwRoutine in fdw_part structure, So I think we can avoid calling > >> > GetFdwRoutineByServerId() in TopMemoryContext. > >> > So I have moved the MemoryContextSwitch to TopMemoryContext after the > >> > GetFdwRoutineByServerId() call. > >> > >> Agreed. > >> > >> > > >> > > >> > 2- If PrepareForeignTransaction functionality is not present in some FDW then > >> > during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE > >> > transaction flag if the modified flag is also set for that server. As for the server that has > >> > not done any data modification within the transaction we do not do two-phase commit anyway. > >> > >> Agreed. > >> > >> > > >> > 3- I have moved the foreign_twophase_commit in sample file after > >> > max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers > >> > is 0 and enabling the foreign_twophase_commit produces an error with default > >> > configuration parameter positioning in postgresql.conf > >> > Also, foreign_twophase_commit configuration was missing the comments > >> > about allowed values in the sample config file. > >> > >> Sounds good. Agreed. > >> > >> > > >> > 4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required() > >> > function does not seem to be the correct place. The reason being, even when > >> > is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired > >> > to true, we could still end up not using the two-phase commit in the case when some server does > >> > not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER > >> > mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts() > >> > function after doing the prepare transaction. > >> > >> Agreed. > >> > >> > > >> > 6- In prefer mode, we commit the transaction in single-phase if the server does not support > >> > the two-phase commit. But instead of doing the single-phase commit right away, > >> > IMHO the better way is to wait until all the two-phase transactions are successfully prepared > >> > on servers that support the two-phase. Since an error during a "PREPARE" stage would > >> > rollback the transaction and in that case, we would end up with committed transactions on > >> > the server that lacks the support of the two-phase commit. > >> > >> When an error occurred before the local commit, a 2pc-unsupported > >> server could be rolled back or committed depending on the error > >> timing. On the other hand all 2pc-supported servers are always rolled > >> back when an error occurred before the local commit. Therefore even if > >> we change the order of COMMIT and PREPARE it is still possible that we > >> will end up committing the part of 2pc-unsupported servers while > >> rolling back others including 2pc-supported servers. > >> > >> I guess the motivation of your change is that since errors are likely > >> to happen during executing PREPARE on foreign servers, we can minimize > >> the possibility of rolling back 2pc-unsupported servers by deferring > >> the commit of 2pc-unsupported server as much as possible. Is that > >> right? > > > > > > Yes, that is correct. The idea of doing the COMMIT on NON-2pc-supported servers > > after all the PREPAREs are successful is to minimize the chances of partial commits. > > And as you mentioned there will still be chances of getting a partial commit even with > > this approach but the probability of that would be less than what it is with the > > current sequence. > > > > > >> > >> > >> > So I have modified the flow a little bit and instead of doing a one-phase commit right away > >> > the servers that do not support a two-phase commit is added to another list and that list is > >> > processed after once we have successfully prepared all the transactions on two-phase supported > >> > foreign servers. Although this technique is also not bulletproof, still it is better than doing > >> > the one-phase commits before doing the PREPAREs. > >> > >> Hmm the current logic seems complex. Maybe we can just reverse the > >> order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and > >> modified servers first and then do COMMIT on others? > > > > > > Agreed, seems reasonable. > >> > >> > >> > > >> > Also, I think we can improve on this one by throwing an error even in PREFER > >> > mode if there is more than one server that had data modified within the transaction > >> > and lacks the two-phase commit support. > >> > > >> > >> IIUC the concept of PREFER mode is that the transaction uses 2pc only > >> for 2pc-supported servers. IOW, even if the transaction modifies on a > >> 2pc-unsupported server we can proceed with the commit if in PREFER > >> mode, which cannot if in REQUIRED mode. What is the motivation of your > >> above idea? > > > > > > I was thinking that we could change the behavior of PREFER mode such that we only allow > > to COMMIT the transaction if the transaction needs to do a single-phase commit on one > > server only. That way we can ensure that we would never end up with partial commit. > > > > I think it's good to avoid a partial commit by using your idea but if > we want to avoid a partial commit we can use the 'required' mode, > which requires all participant servers to support 2pc. We throw an > error if participant servers include even one 2pc-unsupported server > is modified within the transaction. Of course if the participant node > is only one 2pc-unsupported server it can use 1pc even in the > 'required' mode. > > > One Idea in this regards would be to switch the local transaction to commit using 2pc > > if there is a total of only one foreign server that does not support the 2pc in the transaction, > > ensuring that 1-pc commit servers should always be less than or equal to 1. and if there are more > > than one foreign server requires 1-pc then we just throw an error. > > I might be missing your point but I suppose this idea is to do > something like the following? > > 1. prepare the local transaction > 2. commit the foreign transaction on 2pc-unsupported server > 3. commit the prepared local transaction > > > > > However having said that, I am not 100% sure if its a good or an acceptable Idea, and > > I am okay with continuing with the current behavior of PREFER mode if we put it in the > > document that this mode can cause a partial commit. > > There will three types of servers: (a) a server doesn't support any > transaction API, (b) a server supports only commit and rollback API > and (c) a server supports all APIs (commit, rollback and prepare). > Currently postgres transaction manager manages only server-(b) and > server-(c), adds them to FdwXactParticipants. I'm considering changing > the code so that it adds also server-(a) to FdwXactParticipants, in > order to track the number of server-(a) involved in the transaction. > But it doesn't insert FdwXact entry for it, and manage transactions on > these servers. > > The reason is this; if we want to have the 'required' mode strictly > require all participant servers to support 2pc, we should use 2pc when > (# of server-(a) + # of server-(b) + # of server-(c)) >= 2. But since > currently we just track the modification on a server-(a) by a flag we > cannot handle the case where two server-(a) are modified in the > transaction. On the other hand, if we don't consider server-(a) the > transaction could end up with a partial commit when a server-(a) > participates in the transaction. Therefore I'm thinking of the above > change so that the transaction manager can ensure that a partial > commit doesn't happen in the 'required' mode. What do you think? > > > > >> > >> > 7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the > >> > memory if fdw_part is removed from the list > >> > >> I think at the end of the transaction we free entries of > >> FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we > >> need to do that in PreCommit_FdwXacts()? > > > > > > Correct me if I am wrong, The fdw_part structures are created in TopMemoryContext > > and if that fdw_part structure is removed from the list at pre_commit stage > > (because we did 1-PC COMMIT on it) then it would leak memory. > > The fdw_part structures are created in TopTransactionContext so these > are freed at the end of the transaction. > > > > >> > >> > > >> > 8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds > >> > (FdwXactParticipants == NIL). The problem with that was in the case of > >> > "COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and > >> > effectively the foreign prepared transactions(if any) associated with locally > >> > prepared transactions were never getting resolved automatically. > >> > > >> > > >> > postgres=# BEGIN; > >> > BEGIN > >> > INSERT INTO test_local VALUES ( 2, 'TWO'); > >> > INSERT 0 1 > >> > INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO'); > >> > INSERT 0 1 > >> > INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO'); > >> > INSERT 0 1 > >> > postgres=*# PREPARE TRANSACTION 'local_prepared'; > >> > PREPARE TRANSACTION > >> > > >> > postgres=# select * from pg_foreign_xacts ; > >> > dbid | xid | serverid | userid | status | in_doubt | identifier > >> > -------+-----+----------+--------+----------+----------+---------------------------- > >> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10 > >> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10 > >> > (2 rows) > >> > > >> > -- Now commit the prepared transaction > >> > > >> > postgres=# COMMIT PREPARED 'local_prepared'; > >> > > >> > COMMIT PREPARED > >> > > >> > --Foreign prepared transactions associated with 'local_prepared' not resolved > >> > > >> > postgres=# > >> > > >> > postgres=# select * from pg_foreign_xacts ; > >> > dbid | xid | serverid | userid | status | in_doubt | identifier > >> > -------+-----+----------+--------+----------+----------+---------------------------- > >> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10 > >> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10 > >> > (2 rows) > >> > > >> > > >> > So to fix this in case of the two-phase transaction, the function checks the existence > >> > of associated foreign prepared transactions before bailing out. > >> > > >> > >> Good catch. But looking at your change, we should not accept the case > >> where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == > >> false. > >> > >> if (FdwXactParticipants == NIL) > >> { > >> /* > >> * If we are here because of COMMIT/ROLLBACK PREPARED then the > >> * FdwXactParticipants list would be empty. So we need to > >> * see if there are any foreign prepared transactions exists > >> * for this prepared transaction > >> */ > >> if (TwoPhaseExists(wait_xid)) > >> { > >> List *foreign_trans = NIL; > >> > >> foreign_trans = get_fdwxacts(MyDatabaseId, > >> wait_xid, InvalidOid, InvalidOid, > >> false, false, true); > >> > >> if (foreign_trans == NIL) > >> return; > >> list_free(foreign_trans); > >> } > >> } > >> > > > > Sorry my bad, its a mistake on my part. we should just return from the function when > > FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false. > > > > if (TwoPhaseExists(wait_xid)) > > { > > List *foreign_trans = NIL; > > foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid, > > false, false, true); > > > > if (foreign_trans == NIL) > > return; > > list_free(foreign_trans); > > } > > else > > return; > > > >> > >> > 9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord() > >> > that was causing the crash during recovery. > >> > >> Agreed. > >> > >> > > >> > 10- incorporated set_ps_display() signature change. > >> > >> Thanks. > >> > >> Regarding other changes you did in v19 patch, I have some comments: > >> > >> 1. > >> + ereport(LOG, > >> + (errmsg("trying to %s the foreign transaction > >> associated with transaction %u on server %u", > >> + fdwxact->status == > >> FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT", > >> + fdwxact->local_xid, > >> fdwxact->serverid))); > >> + > >> > >> Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function? > > > > > > That change was not intended to get into the patch file. I had done it during testing to > > quickly get info on which way the transaction is going to be resolved. > > > >> > >> 2. > >> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c > >> deleted file mode 120000 > >> index ce8c21880c..0000000000 > >> --- a/src/bin/pg_waldump/fdwxactdesc.c > >> +++ /dev/null > >> @@ -1 +0,0 @@ > >> -../../../src/backend/access/rmgrdesc/fdwxactdesc.c > >> \ No newline at end of file > >> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c > >> new file mode 100644 > >> index 0000000000..ce8c21880c > >> --- /dev/null > >> +++ b/src/bin/pg_waldump/fdwxactdesc.c > >> @@ -0,0 +1 @@ > >> +../../../src/backend/access/rmgrdesc/fdwxactdesc.c > >> > >> We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch. > > > > > > Again sorry! that was an oversight on my part. > > > >> > >> 3. > >> --- a/doc/src/sgml/monitoring.sgml > >> +++ b/doc/src/sgml/monitoring.sgml > >> @@ -1526,14 +1526,14 @@ postgres 27093 0.0 0.0 30096 2752 ? > >> Ss 11:34 0:00 postgres: ser > >> <entry><literal>SafeSnapshot</literal></entry> > >> <entry>Waiting for a snapshot for a <literal>READ ONLY > >> DEFERRABLE</literal> transaction.</entry> > >> </row> > >> - <row> > >> - <entry><literal>SyncRep</literal></entry> > >> - <entry>Waiting for confirmation from remote server during > >> synchronous replication.</entry> > >> - </row> > >> <row> > >> <entry><literal>FdwXactResolution</literal></entry> > >> <entry>Waiting for all foreign transaction participants to > >> be resolved during atomic commit among foreign servers.</entry> > >> </row> > >> + <row> > >> + <entry><literal>SyncRep</literal></entry> > >> + <entry>Waiting for confirmation from remote server during > >> synchronous replication.</entry> > >> + </row> > >> <row> > >> <entry morerows="4"><literal>Timeout</literal></entry> > >> <entry><literal>BaseBackupThrottle</literal></entry> > >> > >> We need to move the entry of FdwXactResolution to right before > >> Hash/Batch/Allocating for alphabetical order. > > > > > > Agreed! > >> > >> > >> I've incorporated your changes I agreed with to my local branch and > >> will incorporate other changes after discussion. I'll also do more > >> test and self-review and will submit the latest version patch. > >> > > > > Meanwhile, I found a couple of more small issues, One is the break statement missing > > i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers() > > could return un-initialized value. > > I am attaching a small patch for these changes that can be applied on top of existing > > patches. > > Thank you for the patch! > > I'm updating the patches because current behavior in error case would > not be good. For example, when an error occurs in the prepare phase, > prepared transactions are left as in-doubt transaction. And these > transactions are not handled by the resolver process. That means that > a user could need to resolve these transactions manually every abort > time, which is not good. In abort case, I think that prepared > transactions can be resolved by the backend itself, rather than > leaving them for the resolver. I'll submit the updated patch. > I've attached the latest version patch set which includes some changes from the previous version: * I've added regression tests that test all types of FDW implementations. There are three types of FDW: FDW doesn't support any transaction APIs, FDW supports only commit and rollback APIs and FDW supports all (prepare, commit and rollback) APISs. src/test/module/test_fdwxact contains those FDW implementations for tests, and test some cases where a transaction reads/writes data on various types of foreign servers. * Also test_fdwxact has TAP tests that check failure cases. The test FDW implementation has the ability to inject error or panic into prepare or commit phase. Using it the TAP test checks if distributed transactions can be committed or rolled back even in failure cases. * When foreign_twophase_commit = 'required', the transaction commit fails if the transaction modified data on even one server not supporting prepare API. Previously, we used to ignore servers that don't support any transaction API but we check them to strictly require all involved foreign servers to support all transaction APIs. * Transaction resolver process resolves in-doubt transactions automatically. * Incorporated comments from Muhammad Usama. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, 30 Apr 2020 at 20:43, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 28 Apr 2020 at 19:37, Muhammad Usama <m.usama@gmail.com> wrote:
> >
> >
> >
> > On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
> >>
> >> On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m.usama@gmail.com> wrote:
> >> >
> >> > Hi Sawada San,
> >> >
> >> > I have been further reviewing and testing the transaction involving multiple server patches.
> >> > Overall the patches are working as expected bar a few important exceptions.
> >> > So as discussed over the call I have fixed the issues I found during the testing
> >> > and also rebased the patches with the current head of the master branch.
> >> > So can you please have a look at the attached updated patches.
> >>
> >> Thank you for reviewing and updating the patch!
> >>
> >> >
> >> > Below is the list of changes I have made on top of V18 patches.
> >> >
> >> > 1- In register_fdwxact(), As we are just storing the callback function pointers from
> >> > FdwRoutine in fdw_part structure, So I think we can avoid calling
> >> > GetFdwRoutineByServerId() in TopMemoryContext.
> >> > So I have moved the MemoryContextSwitch to TopMemoryContext after the
> >> > GetFdwRoutineByServerId() call.
> >>
> >> Agreed.
> >>
> >> >
> >> >
> >> > 2- If PrepareForeignTransaction functionality is not present in some FDW then
> >> > during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE
> >> > transaction flag if the modified flag is also set for that server. As for the server that has
> >> > not done any data modification within the transaction we do not do two-phase commit anyway.
> >>
> >> Agreed.
> >>
> >> >
> >> > 3- I have moved the foreign_twophase_commit in sample file after
> >> > max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers
> >> > is 0 and enabling the foreign_twophase_commit produces an error with default
> >> > configuration parameter positioning in postgresql.conf
> >> > Also, foreign_twophase_commit configuration was missing the comments
> >> > about allowed values in the sample config file.
> >>
> >> Sounds good. Agreed.
> >>
> >> >
> >> > 4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required()
> >> > function does not seem to be the correct place. The reason being, even when
> >> > is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired
> >> > to true, we could still end up not using the two-phase commit in the case when some server does
> >> > not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER
> >> > mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts()
> >> > function after doing the prepare transaction.
> >>
> >> Agreed.
> >>
> >> >
> >> > 6- In prefer mode, we commit the transaction in single-phase if the server does not support
> >> > the two-phase commit. But instead of doing the single-phase commit right away,
> >> > IMHO the better way is to wait until all the two-phase transactions are successfully prepared
> >> > on servers that support the two-phase. Since an error during a "PREPARE" stage would
> >> > rollback the transaction and in that case, we would end up with committed transactions on
> >> > the server that lacks the support of the two-phase commit.
> >>
> >> When an error occurred before the local commit, a 2pc-unsupported
> >> server could be rolled back or committed depending on the error
> >> timing. On the other hand all 2pc-supported servers are always rolled
> >> back when an error occurred before the local commit. Therefore even if
> >> we change the order of COMMIT and PREPARE it is still possible that we
> >> will end up committing the part of 2pc-unsupported servers while
> >> rolling back others including 2pc-supported servers.
> >>
> >> I guess the motivation of your change is that since errors are likely
> >> to happen during executing PREPARE on foreign servers, we can minimize
> >> the possibility of rolling back 2pc-unsupported servers by deferring
> >> the commit of 2pc-unsupported server as much as possible. Is that
> >> right?
> >
> >
> > Yes, that is correct. The idea of doing the COMMIT on NON-2pc-supported servers
> > after all the PREPAREs are successful is to minimize the chances of partial commits.
> > And as you mentioned there will still be chances of getting a partial commit even with
> > this approach but the probability of that would be less than what it is with the
> > current sequence.
> >
> >
> >>
> >>
> >> > So I have modified the flow a little bit and instead of doing a one-phase commit right away
> >> > the servers that do not support a two-phase commit is added to another list and that list is
> >> > processed after once we have successfully prepared all the transactions on two-phase supported
> >> > foreign servers. Although this technique is also not bulletproof, still it is better than doing
> >> > the one-phase commits before doing the PREPAREs.
> >>
> >> Hmm the current logic seems complex. Maybe we can just reverse the
> >> order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
> >> modified servers first and then do COMMIT on others?
> >
> >
> > Agreed, seems reasonable.
> >>
> >>
> >> >
> >> > Also, I think we can improve on this one by throwing an error even in PREFER
> >> > mode if there is more than one server that had data modified within the transaction
> >> > and lacks the two-phase commit support.
> >> >
> >>
> >> IIUC the concept of PREFER mode is that the transaction uses 2pc only
> >> for 2pc-supported servers. IOW, even if the transaction modifies on a
> >> 2pc-unsupported server we can proceed with the commit if in PREFER
> >> mode, which cannot if in REQUIRED mode. What is the motivation of your
> >> above idea?
> >
> >
> > I was thinking that we could change the behavior of PREFER mode such that we only allow
> > to COMMIT the transaction if the transaction needs to do a single-phase commit on one
> > server only. That way we can ensure that we would never end up with partial commit.
> >
>
> I think it's good to avoid a partial commit by using your idea but if
> we want to avoid a partial commit we can use the 'required' mode,
> which requires all participant servers to support 2pc. We throw an
> error if participant servers include even one 2pc-unsupported server
> is modified within the transaction. Of course if the participant node
> is only one 2pc-unsupported server it can use 1pc even in the
> 'required' mode.
>
> > One Idea in this regards would be to switch the local transaction to commit using 2pc
> > if there is a total of only one foreign server that does not support the 2pc in the transaction,
> > ensuring that 1-pc commit servers should always be less than or equal to 1. and if there are more
> > than one foreign server requires 1-pc then we just throw an error.
>
> I might be missing your point but I suppose this idea is to do
> something like the following?
>
> 1. prepare the local transaction
> 2. commit the foreign transaction on 2pc-unsupported server
> 3. commit the prepared local transaction
>
> >
> > However having said that, I am not 100% sure if its a good or an acceptable Idea, and
> > I am okay with continuing with the current behavior of PREFER mode if we put it in the
> > document that this mode can cause a partial commit.
>
> There will three types of servers: (a) a server doesn't support any
> transaction API, (b) a server supports only commit and rollback API
> and (c) a server supports all APIs (commit, rollback and prepare).
> Currently postgres transaction manager manages only server-(b) and
> server-(c), adds them to FdwXactParticipants. I'm considering changing
> the code so that it adds also server-(a) to FdwXactParticipants, in
> order to track the number of server-(a) involved in the transaction.
> But it doesn't insert FdwXact entry for it, and manage transactions on
> these servers.
>
> The reason is this; if we want to have the 'required' mode strictly
> require all participant servers to support 2pc, we should use 2pc when
> (# of server-(a) + # of server-(b) + # of server-(c)) >= 2. But since
> currently we just track the modification on a server-(a) by a flag we
> cannot handle the case where two server-(a) are modified in the
> transaction. On the other hand, if we don't consider server-(a) the
> transaction could end up with a partial commit when a server-(a)
> participates in the transaction. Therefore I'm thinking of the above
> change so that the transaction manager can ensure that a partial
> commit doesn't happen in the 'required' mode. What do you think?
>
> >
> >>
> >> > 7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the
> >> > memory if fdw_part is removed from the list
> >>
> >> I think at the end of the transaction we free entries of
> >> FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
> >> need to do that in PreCommit_FdwXacts()?
> >
> >
> > Correct me if I am wrong, The fdw_part structures are created in TopMemoryContext
> > and if that fdw_part structure is removed from the list at pre_commit stage
> > (because we did 1-PC COMMIT on it) then it would leak memory.
>
> The fdw_part structures are created in TopTransactionContext so these
> are freed at the end of the transaction.
>
> >
> >>
> >> >
> >> > 8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds
> >> > (FdwXactParticipants == NIL). The problem with that was in the case of
> >> > "COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
> >> > effectively the foreign prepared transactions(if any) associated with locally
> >> > prepared transactions were never getting resolved automatically.
> >> >
> >> >
> >> > postgres=# BEGIN;
> >> > BEGIN
> >> > INSERT INTO test_local VALUES ( 2, 'TWO');
> >> > INSERT 0 1
> >> > INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
> >> > INSERT 0 1
> >> > INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
> >> > INSERT 0 1
> >> > postgres=*# PREPARE TRANSACTION 'local_prepared';
> >> > PREPARE TRANSACTION
> >> >
> >> > postgres=# select * from pg_foreign_xacts ;
> >> > dbid | xid | serverid | userid | status | in_doubt | identifier
> >> > -------+-----+----------+--------+----------+----------+----------------------------
> >> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
> >> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
> >> > (2 rows)
> >> >
> >> > -- Now commit the prepared transaction
> >> >
> >> > postgres=# COMMIT PREPARED 'local_prepared';
> >> >
> >> > COMMIT PREPARED
> >> >
> >> > --Foreign prepared transactions associated with 'local_prepared' not resolved
> >> >
> >> > postgres=#
> >> >
> >> > postgres=# select * from pg_foreign_xacts ;
> >> > dbid | xid | serverid | userid | status | in_doubt | identifier
> >> > -------+-----+----------+--------+----------+----------+----------------------------
> >> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
> >> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
> >> > (2 rows)
> >> >
> >> >
> >> > So to fix this in case of the two-phase transaction, the function checks the existence
> >> > of associated foreign prepared transactions before bailing out.
> >> >
> >>
> >> Good catch. But looking at your change, we should not accept the case
> >> where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
> >> false.
> >>
> >> if (FdwXactParticipants == NIL)
> >> {
> >> /*
> >> * If we are here because of COMMIT/ROLLBACK PREPARED then the
> >> * FdwXactParticipants list would be empty. So we need to
> >> * see if there are any foreign prepared transactions exists
> >> * for this prepared transaction
> >> */
> >> if (TwoPhaseExists(wait_xid))
> >> {
> >> List *foreign_trans = NIL;
> >>
> >> foreign_trans = get_fdwxacts(MyDatabaseId,
> >> wait_xid, InvalidOid, InvalidOid,
> >> false, false, true);
> >>
> >> if (foreign_trans == NIL)
> >> return;
> >> list_free(foreign_trans);
> >> }
> >> }
> >>
> >
> > Sorry my bad, its a mistake on my part. we should just return from the function when
> > FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false.
> >
> > if (TwoPhaseExists(wait_xid))
> > {
> > List *foreign_trans = NIL;
> > foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid,
> > false, false, true);
> >
> > if (foreign_trans == NIL)
> > return;
> > list_free(foreign_trans);
> > }
> > else
> > return;
> >
> >>
> >> > 9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord()
> >> > that was causing the crash during recovery.
> >>
> >> Agreed.
> >>
> >> >
> >> > 10- incorporated set_ps_display() signature change.
> >>
> >> Thanks.
> >>
> >> Regarding other changes you did in v19 patch, I have some comments:
> >>
> >> 1.
> >> + ereport(LOG,
> >> + (errmsg("trying to %s the foreign transaction
> >> associated with transaction %u on server %u",
> >> + fdwxact->status ==
> >> FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
> >> + fdwxact->local_xid,
> >> fdwxact->serverid)));
> >> +
> >>
> >> Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function?
> >
> >
> > That change was not intended to get into the patch file. I had done it during testing to
> > quickly get info on which way the transaction is going to be resolved.
> >
> >>
> >> 2.
> >> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
> >> deleted file mode 120000
> >> index ce8c21880c..0000000000
> >> --- a/src/bin/pg_waldump/fdwxactdesc.c
> >> +++ /dev/null
> >> @@ -1 +0,0 @@
> >> -../../../src/backend/access/rmgrdesc/fdwxactdesc.c
> >> \ No newline at end of file
> >> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
> >> new file mode 100644
> >> index 0000000000..ce8c21880c
> >> --- /dev/null
> >> +++ b/src/bin/pg_waldump/fdwxactdesc.c
> >> @@ -0,0 +1 @@
> >> +../../../src/backend/access/rmgrdesc/fdwxactdesc.c
> >>
> >> We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.
> >
> >
> > Again sorry! that was an oversight on my part.
> >
> >>
> >> 3.
> >> --- a/doc/src/sgml/monitoring.sgml
> >> +++ b/doc/src/sgml/monitoring.sgml
> >> @@ -1526,14 +1526,14 @@ postgres 27093 0.0 0.0 30096 2752 ?
> >> Ss 11:34 0:00 postgres: ser
> >> <entry><literal>SafeSnapshot</literal></entry>
> >> <entry>Waiting for a snapshot for a <literal>READ ONLY
> >> DEFERRABLE</literal> transaction.</entry>
> >> </row>
> >> - <row>
> >> - <entry><literal>SyncRep</literal></entry>
> >> - <entry>Waiting for confirmation from remote server during
> >> synchronous replication.</entry>
> >> - </row>
> >> <row>
> >> <entry><literal>FdwXactResolution</literal></entry>
> >> <entry>Waiting for all foreign transaction participants to
> >> be resolved during atomic commit among foreign servers.</entry>
> >> </row>
> >> + <row>
> >> + <entry><literal>SyncRep</literal></entry>
> >> + <entry>Waiting for confirmation from remote server during
> >> synchronous replication.</entry>
> >> + </row>
> >> <row>
> >> <entry morerows="4"><literal>Timeout</literal></entry>
> >> <entry><literal>BaseBackupThrottle</literal></entry>
> >>
> >> We need to move the entry of FdwXactResolution to right before
> >> Hash/Batch/Allocating for alphabetical order.
> >
> >
> > Agreed!
> >>
> >>
> >> I've incorporated your changes I agreed with to my local branch and
> >> will incorporate other changes after discussion. I'll also do more
> >> test and self-review and will submit the latest version patch.
> >>
> >
> > Meanwhile, I found a couple of more small issues, One is the break statement missing
> > i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers()
> > could return un-initialized value.
> > I am attaching a small patch for these changes that can be applied on top of existing
> > patches.
>
> Thank you for the patch!
>
> I'm updating the patches because current behavior in error case would
> not be good. For example, when an error occurs in the prepare phase,
> prepared transactions are left as in-doubt transaction. And these
> transactions are not handled by the resolver process. That means that
> a user could need to resolve these transactions manually every abort
> time, which is not good. In abort case, I think that prepared
> transactions can be resolved by the backend itself, rather than
> leaving them for the resolver. I'll submit the updated patch.
>
I've attached the latest version patch set which includes some changes
from the previous version:
* I've added regression tests that test all types of FDW
implementations. There are three types of FDW: FDW doesn't support any
transaction APIs, FDW supports only commit and rollback APIs and FDW
supports all (prepare, commit and rollback) APISs.
src/test/module/test_fdwxact contains those FDW implementations for
tests, and test some cases where a transaction reads/writes data on
various types of foreign servers.
* Also test_fdwxact has TAP tests that check failure cases. The test
FDW implementation has the ability to inject error or panic into
prepare or commit phase. Using it the TAP test checks if distributed
transactions can be committed or rolled back even in failure cases.
* When foreign_twophase_commit = 'required', the transaction commit
fails if the transaction modified data on even one server not
supporting prepare API. Previously, we used to ignore servers that
don't support any transaction API but we check them to strictly
require all involved foreign servers to support all transaction APIs.
* Transaction resolver process resolves in-doubt transactions automatically.
* Incorporated comments from Muhammad Usama.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
For example, consider the case
BEGIN;
INSERT INTO ft_2pc_1 VALUES(1);
PREPARE TRANSACTION 'global_x1';
Here since we are preparing the local transaction so we should also prepare
What do you think?
i.e
For FOREIGN_TWOPHASE_COMMIT_PREFER case in
the condition should be
need_twophase_commit = (nserverstwophase >= 1);
instead of
need_twophase_commit = (nserverstwophase >= 2);
I am attaching a patch that I have generated on top of your V20
Attachment
On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote: > > > Hi Sawada, > > I have just done some review and testing of the patches and have > a couple of comments. Thank you for reviewing! > > 1- IMHO the PREPARE TRANSACTION should always use 2PC even > when the transaction has operated on a single foreign server regardless > of foreign_twophase_commit setting, and throw an error otherwise when > 2PC is not available on any of the data-modified servers. > > For example, consider the case > > BEGIN; > INSERT INTO ft_2pc_1 VALUES(1); > PREPARE TRANSACTION 'global_x1'; > > Here since we are preparing the local transaction so we should also prepare > the transaction on the foreign server even if the transaction has modified only > one foreign table. > > What do you think? Good catch and I agree with you. The transaction should fail if it opened a transaction on a 2pc-no-support server regardless of foreign_twophase_commit. And I think we should prepare a transaction on a foreign server even if it didn't modify any data on that. > > Also without this change, the above test case produces an assertion failure > with your patches. > > 2- when deciding if the two-phase commit is required or not in > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use > 2PC when we have at least one server capable of doing that. > > i.e > > For FOREIGN_TWOPHASE_COMMIT_PREFER case in > checkForeignTwophaseCommitRequired() function I think > the condition should be > > need_twophase_commit = (nserverstwophase >= 1); > instead of > need_twophase_commit = (nserverstwophase >= 2); > Hmm I might be missing your point but it seems to me that you want to use two-phase commit even in the case where a transaction modified data on only one server. Can't we commit distributed transaction atomically even using one-phase commit in that case? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:
>
>
> Hi Sawada,
>
> I have just done some review and testing of the patches and have
> a couple of comments.
Thank you for reviewing!
>
> 1- IMHO the PREPARE TRANSACTION should always use 2PC even
> when the transaction has operated on a single foreign server regardless
> of foreign_twophase_commit setting, and throw an error otherwise when
> 2PC is not available on any of the data-modified servers.
>
> For example, consider the case
>
> BEGIN;
> INSERT INTO ft_2pc_1 VALUES(1);
> PREPARE TRANSACTION 'global_x1';
>
> Here since we are preparing the local transaction so we should also prepare
> the transaction on the foreign server even if the transaction has modified only
> one foreign table.
>
> What do you think?
Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.
>
> Also without this change, the above test case produces an assertion failure
> with your patches.
>
> 2- when deciding if the two-phase commit is required or not in
> FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
> 2PC when we have at least one server capable of doing that.
>
> i.e
>
> For FOREIGN_TWOPHASE_COMMIT_PREFER case in
> checkForeignTwophaseCommitRequired() function I think
> the condition should be
>
> need_twophase_commit = (nserverstwophase >= 1);
> instead of
> need_twophase_commit = (nserverstwophase >= 2);
>
Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?
if (nserverswritten <= 1)
return false;
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote: > > > > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote: >> > >> > >> > Hi Sawada, >> > >> > I have just done some review and testing of the patches and have >> > a couple of comments. >> >> Thank you for reviewing! >> >> > >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even >> > when the transaction has operated on a single foreign server regardless >> > of foreign_twophase_commit setting, and throw an error otherwise when >> > 2PC is not available on any of the data-modified servers. >> > >> > For example, consider the case >> > >> > BEGIN; >> > INSERT INTO ft_2pc_1 VALUES(1); >> > PREPARE TRANSACTION 'global_x1'; >> > >> > Here since we are preparing the local transaction so we should also prepare >> > the transaction on the foreign server even if the transaction has modified only >> > one foreign table. >> > >> > What do you think? >> >> Good catch and I agree with you. The transaction should fail if it >> opened a transaction on a 2pc-no-support server regardless of >> foreign_twophase_commit. And I think we should prepare a transaction >> on a foreign server even if it didn't modify any data on that. >> >> > >> > Also without this change, the above test case produces an assertion failure >> > with your patches. >> > >> > 2- when deciding if the two-phase commit is required or not in >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use >> > 2PC when we have at least one server capable of doing that. >> > >> > i.e >> > >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in >> > checkForeignTwophaseCommitRequired() function I think >> > the condition should be >> > >> > need_twophase_commit = (nserverstwophase >= 1); >> > instead of >> > need_twophase_commit = (nserverstwophase >= 2); >> > >> >> Hmm I might be missing your point but it seems to me that you want to >> use two-phase commit even in the case where a transaction modified >> data on only one server. Can't we commit distributed transaction >> atomically even using one-phase commit in that case? >> > > I think you are confusing between nserverstwophase and nserverswritten. > > need_twophase_commit = (nserverstwophase >= 1) would mean > use two-phase commit if at least one server exists in the list that is > capable of doing 2PC > > For the case when the transaction modified data on only one server we > already exits the function indicating no two-phase required > > if (nserverswritten <= 1) > return false; > Thank you for your explanation. If the transaction modified two servers that don't' support 2pc and one server that supports 2pc I think we don't want to use 2pc even in 'prefer' case. Because even if we use 2pc in that case, it's still possible to have the atomic commit problem. For example, if we failed to commit a transaction after committing other transactions on the server that doesn't support 2pc we cannot rollback the already-committed transaction. On the other hand, in 'prefer' case, if the transaction also modified the local data, we need to use 2pc even if it modified data on only one foreign server that supports 2pc. But the current code doesn't work fine in that case for now. Probably we also need the following change: @@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void) /* Did we modify the local non-temporary data? */ if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0) + { nserverswritten++; + nserverstwophase++; + } Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote:
>
>
>
> On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:
>> >
>> >
>> > Hi Sawada,
>> >
>> > I have just done some review and testing of the patches and have
>> > a couple of comments.
>>
>> Thank you for reviewing!
>>
>> >
>> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even
>> > when the transaction has operated on a single foreign server regardless
>> > of foreign_twophase_commit setting, and throw an error otherwise when
>> > 2PC is not available on any of the data-modified servers.
>> >
>> > For example, consider the case
>> >
>> > BEGIN;
>> > INSERT INTO ft_2pc_1 VALUES(1);
>> > PREPARE TRANSACTION 'global_x1';
>> >
>> > Here since we are preparing the local transaction so we should also prepare
>> > the transaction on the foreign server even if the transaction has modified only
>> > one foreign table.
>> >
>> > What do you think?
>>
>> Good catch and I agree with you. The transaction should fail if it
>> opened a transaction on a 2pc-no-support server regardless of
>> foreign_twophase_commit. And I think we should prepare a transaction
>> on a foreign server even if it didn't modify any data on that.
>>
>> >
>> > Also without this change, the above test case produces an assertion failure
>> > with your patches.
>> >
>> > 2- when deciding if the two-phase commit is required or not in
>> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
>> > 2PC when we have at least one server capable of doing that.
>> >
>> > i.e
>> >
>> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in
>> > checkForeignTwophaseCommitRequired() function I think
>> > the condition should be
>> >
>> > need_twophase_commit = (nserverstwophase >= 1);
>> > instead of
>> > need_twophase_commit = (nserverstwophase >= 2);
>> >
>>
>> Hmm I might be missing your point but it seems to me that you want to
>> use two-phase commit even in the case where a transaction modified
>> data on only one server. Can't we commit distributed transaction
>> atomically even using one-phase commit in that case?
>>
>
> I think you are confusing between nserverstwophase and nserverswritten.
>
> need_twophase_commit = (nserverstwophase >= 1) would mean
> use two-phase commit if at least one server exists in the list that is
> capable of doing 2PC
>
> For the case when the transaction modified data on only one server we
> already exits the function indicating no two-phase required
>
> if (nserverswritten <= 1)
> return false;
>
Thank you for your explanation. If the transaction modified two
servers that don't' support 2pc and one server that supports 2pc I
think we don't want to use 2pc even in 'prefer' case. Because even if
we use 2pc in that case, it's still possible to have the atomic commit
problem. For example, if we failed to commit a transaction after
committing other transactions on the server that doesn't support 2pc
we cannot rollback the already-committed transaction.
On the other hand, in 'prefer' case, if the transaction also modified
the local data, we need to use 2pc even if it modified data on only
one foreign server that supports 2pc. But the current code doesn't
work fine in that case for now. Probably we also need the following
change:
@@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)
/* Did we modify the local non-temporary data? */
if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+ {
nserverswritten++;
+ nserverstwophase++;
+ }
* In 'prefer' case, we prepare transactions on only servers that
* capable of two-phase commit.
*/
- need_twophase_commit = (nserverstwophase >= 2);
+ need_twophase_commit = (nserverstwophase >= 1);
}
The reason I am saying that is. Currently, we do not use 2PC on the local server
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 15 May 2020 at 19:06, Muhammad Usama <m.usama@gmail.com> wrote: > > > > On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote: >> > >> > >> > >> > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> >> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote: >> >> > >> >> > >> >> > Hi Sawada, >> >> > >> >> > I have just done some review and testing of the patches and have >> >> > a couple of comments. >> >> >> >> Thank you for reviewing! >> >> >> >> > >> >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even >> >> > when the transaction has operated on a single foreign server regardless >> >> > of foreign_twophase_commit setting, and throw an error otherwise when >> >> > 2PC is not available on any of the data-modified servers. >> >> > >> >> > For example, consider the case >> >> > >> >> > BEGIN; >> >> > INSERT INTO ft_2pc_1 VALUES(1); >> >> > PREPARE TRANSACTION 'global_x1'; >> >> > >> >> > Here since we are preparing the local transaction so we should also prepare >> >> > the transaction on the foreign server even if the transaction has modified only >> >> > one foreign table. >> >> > >> >> > What do you think? >> >> >> >> Good catch and I agree with you. The transaction should fail if it >> >> opened a transaction on a 2pc-no-support server regardless of >> >> foreign_twophase_commit. And I think we should prepare a transaction >> >> on a foreign server even if it didn't modify any data on that. >> >> >> >> > >> >> > Also without this change, the above test case produces an assertion failure >> >> > with your patches. >> >> > >> >> > 2- when deciding if the two-phase commit is required or not in >> >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use >> >> > 2PC when we have at least one server capable of doing that. >> >> > >> >> > i.e >> >> > >> >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in >> >> > checkForeignTwophaseCommitRequired() function I think >> >> > the condition should be >> >> > >> >> > need_twophase_commit = (nserverstwophase >= 1); >> >> > instead of >> >> > need_twophase_commit = (nserverstwophase >= 2); >> >> > >> >> >> >> Hmm I might be missing your point but it seems to me that you want to >> >> use two-phase commit even in the case where a transaction modified >> >> data on only one server. Can't we commit distributed transaction >> >> atomically even using one-phase commit in that case? >> >> >> > >> > I think you are confusing between nserverstwophase and nserverswritten. >> > >> > need_twophase_commit = (nserverstwophase >= 1) would mean >> > use two-phase commit if at least one server exists in the list that is >> > capable of doing 2PC >> > >> > For the case when the transaction modified data on only one server we >> > already exits the function indicating no two-phase required >> > >> > if (nserverswritten <= 1) >> > return false; >> > >> >> Thank you for your explanation. If the transaction modified two >> servers that don't' support 2pc and one server that supports 2pc I >> think we don't want to use 2pc even in 'prefer' case. Because even if >> we use 2pc in that case, it's still possible to have the atomic commit >> problem. For example, if we failed to commit a transaction after >> committing other transactions on the server that doesn't support 2pc >> we cannot rollback the already-committed transaction. > > > Yes, that is true, And I think the 'prefer' mode will always have a corner case > no matter what. But the thing is we can reduce the probability of hitting > an atomic commit problem by ensuring to use 2PC whenever possible. > > For instance as in your example scenario where a transaction modified > two servers that don't support 2PC and one server that supports it. let us > analyze both scenarios. > > If we use 2PC on the server that supports it then the probability of hitting > a problem would be 1/3 = 0.33. because there is only one corner case > scenario in that case. which would be if we fail to commit the third server > As the first server (2PC supported one) would be using prepared > transactions so no problem there. The second server (NON-2PC support) > if failed to commit then, still no problem as we can rollback the prepared > transaction on the first server. The only issue would happen when we fail > to commit on the third server because we have already committed > on the second server and there is no way to undo that. > > > Now consider the other possibility if we do not use the 2PC in that > case (as you mentioned), then the probability of hitting the problem > would be 2/3 = 0.66. because now commit failure on either second or > third server will land us in an atomic-commit-problem. > > So, INMO using the 2PC whenever available with 'prefer' mode > should be the way to go. My understanding of 'prefer' mode is that even if a distributed transaction modified data on several types of server we can ensure to keep data consistent among only the local server and foreign servers that support 2pc. It doesn't ensure anything for other servers that don't support 2pc. Therefore we use 2pc if the transaction modifies data on two or more servers that either the local node or servers that support 2pc. I understand your argument that using 2pc in that case the possibility of hitting a problem can decrease but one point we need to consider is 2pc is very high cost. I think basically most users don’t want to use 2pc as much as possible. Please note that it might not work as the user expected because users cannot specify the commit order and particular servers might be unstable. I'm not sure that users want to pay high costs under such conditions. If we want to decrease that possibility by using 2pc as much as possible, I think it can be yet another mode so that the user can choose the trade-off. > >> >> On the other hand, in 'prefer' case, if the transaction also modified >> the local data, we need to use 2pc even if it modified data on only >> one foreign server that supports 2pc. But the current code doesn't >> work fine in that case for now. Probably we also need the following >> change: >> >> @@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void) >> >> /* Did we modify the local non-temporary data? */ >> if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0) >> + { >> nserverswritten++; >> + nserverstwophase++; >> + } >> > > I agree with the part that if the transaction also modifies the local data > then the 2PC should be used. > Though the change you suggested [+ nserverstwophase++;] > would server the purpose and deliver the same results but I think a > better way would be to change need_twophase_commit condition for > prefer mode. > > > * In 'prefer' case, we prepare transactions on only servers that > * capable of two-phase commit. > */ > - need_twophase_commit = (nserverstwophase >= 2); > + need_twophase_commit = (nserverstwophase >= 1); > } > > > The reason I am saying that is. Currently, we do not use 2PC on the local server > in case of distributed transactions, so we should also not count the local server > as one (servers that would be performing the 2PC). > Also I feel the change need_twophase_commit = (nserverstwophase >= 1) > looks more in line with the definition of our 'prefer' mode algorithm. > > Do you see an issue with this change? I think that with my change we will use 2pc in the case where a transaction modified data on the local node and one server that supports 2pc. But with your change, we will use 2pc in more cases, in addition to the case where a transaction modifies the local and one 2pc-support server. This would fit the definition of 'prefer' you described but it's still unclear to me that it's better to make 'prefer' mode behave so if we have three values: 'required', 'prefer' and 'disabled'. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 15 May 2020 at 19:06, Muhammad Usama <m.usama@gmail.com> wrote:
>
>
>
> On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote:
>> >
>> >
>> >
>> > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>> >>
>> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote:
>> >> >
>> >> >
>> >> > Hi Sawada,
>> >> >
>> >> > I have just done some review and testing of the patches and have
>> >> > a couple of comments.
>> >>
>> >> Thank you for reviewing!
>> >>
>> >> >
>> >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even
>> >> > when the transaction has operated on a single foreign server regardless
>> >> > of foreign_twophase_commit setting, and throw an error otherwise when
>> >> > 2PC is not available on any of the data-modified servers.
>> >> >
>> >> > For example, consider the case
>> >> >
>> >> > BEGIN;
>> >> > INSERT INTO ft_2pc_1 VALUES(1);
>> >> > PREPARE TRANSACTION 'global_x1';
>> >> >
>> >> > Here since we are preparing the local transaction so we should also prepare
>> >> > the transaction on the foreign server even if the transaction has modified only
>> >> > one foreign table.
>> >> >
>> >> > What do you think?
>> >>
>> >> Good catch and I agree with you. The transaction should fail if it
>> >> opened a transaction on a 2pc-no-support server regardless of
>> >> foreign_twophase_commit. And I think we should prepare a transaction
>> >> on a foreign server even if it didn't modify any data on that.
>> >>
>> >> >
>> >> > Also without this change, the above test case produces an assertion failure
>> >> > with your patches.
>> >> >
>> >> > 2- when deciding if the two-phase commit is required or not in
>> >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
>> >> > 2PC when we have at least one server capable of doing that.
>> >> >
>> >> > i.e
>> >> >
>> >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in
>> >> > checkForeignTwophaseCommitRequired() function I think
>> >> > the condition should be
>> >> >
>> >> > need_twophase_commit = (nserverstwophase >= 1);
>> >> > instead of
>> >> > need_twophase_commit = (nserverstwophase >= 2);
>> >> >
>> >>
>> >> Hmm I might be missing your point but it seems to me that you want to
>> >> use two-phase commit even in the case where a transaction modified
>> >> data on only one server. Can't we commit distributed transaction
>> >> atomically even using one-phase commit in that case?
>> >>
>> >
>> > I think you are confusing between nserverstwophase and nserverswritten.
>> >
>> > need_twophase_commit = (nserverstwophase >= 1) would mean
>> > use two-phase commit if at least one server exists in the list that is
>> > capable of doing 2PC
>> >
>> > For the case when the transaction modified data on only one server we
>> > already exits the function indicating no two-phase required
>> >
>> > if (nserverswritten <= 1)
>> > return false;
>> >
>>
>> Thank you for your explanation. If the transaction modified two
>> servers that don't' support 2pc and one server that supports 2pc I
>> think we don't want to use 2pc even in 'prefer' case. Because even if
>> we use 2pc in that case, it's still possible to have the atomic commit
>> problem. For example, if we failed to commit a transaction after
>> committing other transactions on the server that doesn't support 2pc
>> we cannot rollback the already-committed transaction.
>
>
> Yes, that is true, And I think the 'prefer' mode will always have a corner case
> no matter what. But the thing is we can reduce the probability of hitting
> an atomic commit problem by ensuring to use 2PC whenever possible.
>
> For instance as in your example scenario where a transaction modified
> two servers that don't support 2PC and one server that supports it. let us
> analyze both scenarios.
>
> If we use 2PC on the server that supports it then the probability of hitting
> a problem would be 1/3 = 0.33. because there is only one corner case
> scenario in that case. which would be if we fail to commit the third server
> As the first server (2PC supported one) would be using prepared
> transactions so no problem there. The second server (NON-2PC support)
> if failed to commit then, still no problem as we can rollback the prepared
> transaction on the first server. The only issue would happen when we fail
> to commit on the third server because we have already committed
> on the second server and there is no way to undo that.
>
>
> Now consider the other possibility if we do not use the 2PC in that
> case (as you mentioned), then the probability of hitting the problem
> would be 2/3 = 0.66. because now commit failure on either second or
> third server will land us in an atomic-commit-problem.
>
> So, INMO using the 2PC whenever available with 'prefer' mode
> should be the way to go.
My understanding of 'prefer' mode is that even if a distributed
transaction modified data on several types of server we can ensure to
keep data consistent among only the local server and foreign servers
that support 2pc. It doesn't ensure anything for other servers that
don't support 2pc. Therefore we use 2pc if the transaction modifies
data on two or more servers that either the local node or servers that
support 2pc.
I understand your argument that using 2pc in that case the possibility
of hitting a problem can decrease but one point we need to consider is
2pc is very high cost. I think basically most users don’t want to use
2pc as much as possible. Please note that it might not work as the
user expected because users cannot specify the commit order and
particular servers might be unstable. I'm not sure that users want to
pay high costs under such conditions. If we want to decrease that
possibility by using 2pc as much as possible, I think it can be yet
another mode so that the user can choose the trade-off.
>
>>
>> On the other hand, in 'prefer' case, if the transaction also modified
>> the local data, we need to use 2pc even if it modified data on only
>> one foreign server that supports 2pc. But the current code doesn't
>> work fine in that case for now. Probably we also need the following
>> change:
>>
>> @@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)
>>
>> /* Did we modify the local non-temporary data? */
>> if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
>> + {
>> nserverswritten++;
>> + nserverstwophase++;
>> + }
>>
>
> I agree with the part that if the transaction also modifies the local data
> then the 2PC should be used.
> Though the change you suggested [+ nserverstwophase++;]
> would server the purpose and deliver the same results but I think a
> better way would be to change need_twophase_commit condition for
> prefer mode.
>
>
> * In 'prefer' case, we prepare transactions on only servers that
> * capable of two-phase commit.
> */
> - need_twophase_commit = (nserverstwophase >= 2);
> + need_twophase_commit = (nserverstwophase >= 1);
> }
>
>
> The reason I am saying that is. Currently, we do not use 2PC on the local server
> in case of distributed transactions, so we should also not count the local server
> as one (servers that would be performing the 2PC).
> Also I feel the change need_twophase_commit = (nserverstwophase >= 1)
> looks more in line with the definition of our 'prefer' mode algorithm.
>
> Do you see an issue with this change?
I think that with my change we will use 2pc in the case where a
transaction modified data on the local node and one server that
supports 2pc. But with your change, we will use 2pc in more cases, in
addition to the case where a transaction modifies the local and one
2pc-support server. This would fit the definition of 'prefer' you
described but it's still unclear to me that it's better to make
'prefer' mode behave so if we have three values: 'required', 'prefer'
and 'disabled'.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 16 May 2020 at 00:54, Muhammad Usama <m.usama@gmail.com> wrote: > > > > On Fri, May 15, 2020 at 7:52 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Fri, 15 May 2020 at 19:06, Muhammad Usama <m.usama@gmail.com> wrote: >> > >> > >> > >> > On Fri, May 15, 2020 at 9:59 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> >> >> On Fri, 15 May 2020 at 13:26, Muhammad Usama <m.usama@gmail.com> wrote: >> >> > >> >> > >> >> > >> >> > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> >> >> >> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama <m.usama@gmail.com> wrote: >> >> >> > >> >> >> > >> >> >> > Hi Sawada, >> >> >> > >> >> >> > I have just done some review and testing of the patches and have >> >> >> > a couple of comments. >> >> >> >> >> >> Thank you for reviewing! >> >> >> >> >> >> > >> >> >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even >> >> >> > when the transaction has operated on a single foreign server regardless >> >> >> > of foreign_twophase_commit setting, and throw an error otherwise when >> >> >> > 2PC is not available on any of the data-modified servers. >> >> >> > >> >> >> > For example, consider the case >> >> >> > >> >> >> > BEGIN; >> >> >> > INSERT INTO ft_2pc_1 VALUES(1); >> >> >> > PREPARE TRANSACTION 'global_x1'; >> >> >> > >> >> >> > Here since we are preparing the local transaction so we should also prepare >> >> >> > the transaction on the foreign server even if the transaction has modified only >> >> >> > one foreign table. >> >> >> > >> >> >> > What do you think? >> >> >> >> >> >> Good catch and I agree with you. The transaction should fail if it >> >> >> opened a transaction on a 2pc-no-support server regardless of >> >> >> foreign_twophase_commit. And I think we should prepare a transaction >> >> >> on a foreign server even if it didn't modify any data on that. >> >> >> >> >> >> > >> >> >> > Also without this change, the above test case produces an assertion failure >> >> >> > with your patches. >> >> >> > >> >> >> > 2- when deciding if the two-phase commit is required or not in >> >> >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use >> >> >> > 2PC when we have at least one server capable of doing that. >> >> >> > >> >> >> > i.e >> >> >> > >> >> >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in >> >> >> > checkForeignTwophaseCommitRequired() function I think >> >> >> > the condition should be >> >> >> > >> >> >> > need_twophase_commit = (nserverstwophase >= 1); >> >> >> > instead of >> >> >> > need_twophase_commit = (nserverstwophase >= 2); >> >> >> > >> >> >> >> >> >> Hmm I might be missing your point but it seems to me that you want to >> >> >> use two-phase commit even in the case where a transaction modified >> >> >> data on only one server. Can't we commit distributed transaction >> >> >> atomically even using one-phase commit in that case? >> >> >> >> >> > >> >> > I think you are confusing between nserverstwophase and nserverswritten. >> >> > >> >> > need_twophase_commit = (nserverstwophase >= 1) would mean >> >> > use two-phase commit if at least one server exists in the list that is >> >> > capable of doing 2PC >> >> > >> >> > For the case when the transaction modified data on only one server we >> >> > already exits the function indicating no two-phase required >> >> > >> >> > if (nserverswritten <= 1) >> >> > return false; >> >> > >> >> >> >> Thank you for your explanation. If the transaction modified two >> >> servers that don't' support 2pc and one server that supports 2pc I >> >> think we don't want to use 2pc even in 'prefer' case. Because even if >> >> we use 2pc in that case, it's still possible to have the atomic commit >> >> problem. For example, if we failed to commit a transaction after >> >> committing other transactions on the server that doesn't support 2pc >> >> we cannot rollback the already-committed transaction. >> > >> > >> > Yes, that is true, And I think the 'prefer' mode will always have a corner case >> > no matter what. But the thing is we can reduce the probability of hitting >> > an atomic commit problem by ensuring to use 2PC whenever possible. >> > >> > For instance as in your example scenario where a transaction modified >> > two servers that don't support 2PC and one server that supports it. let us >> > analyze both scenarios. >> > >> > If we use 2PC on the server that supports it then the probability of hitting >> > a problem would be 1/3 = 0.33. because there is only one corner case >> > scenario in that case. which would be if we fail to commit the third server >> > As the first server (2PC supported one) would be using prepared >> > transactions so no problem there. The second server (NON-2PC support) >> > if failed to commit then, still no problem as we can rollback the prepared >> > transaction on the first server. The only issue would happen when we fail >> > to commit on the third server because we have already committed >> > on the second server and there is no way to undo that. >> > >> > >> > Now consider the other possibility if we do not use the 2PC in that >> > case (as you mentioned), then the probability of hitting the problem >> > would be 2/3 = 0.66. because now commit failure on either second or >> > third server will land us in an atomic-commit-problem. >> > >> > So, INMO using the 2PC whenever available with 'prefer' mode >> > should be the way to go. >> >> My understanding of 'prefer' mode is that even if a distributed >> transaction modified data on several types of server we can ensure to >> keep data consistent among only the local server and foreign servers >> that support 2pc. It doesn't ensure anything for other servers that >> don't support 2pc. Therefore we use 2pc if the transaction modifies >> data on two or more servers that either the local node or servers that >> support 2pc. >> >> I understand your argument that using 2pc in that case the possibility >> of hitting a problem can decrease but one point we need to consider is >> 2pc is very high cost. I think basically most users don’t want to use >> 2pc as much as possible. Please note that it might not work as the >> user expected because users cannot specify the commit order and >> particular servers might be unstable. I'm not sure that users want to >> pay high costs under such conditions. If we want to decrease that >> possibility by using 2pc as much as possible, I think it can be yet >> another mode so that the user can choose the trade-off. >> >> > >> >> >> >> On the other hand, in 'prefer' case, if the transaction also modified >> >> the local data, we need to use 2pc even if it modified data on only >> >> one foreign server that supports 2pc. But the current code doesn't >> >> work fine in that case for now. Probably we also need the following >> >> change: >> >> >> >> @@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void) >> >> >> >> /* Did we modify the local non-temporary data? */ >> >> if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0) >> >> + { >> >> nserverswritten++; >> >> + nserverstwophase++; >> >> + } >> >> >> > >> > I agree with the part that if the transaction also modifies the local data >> > then the 2PC should be used. >> > Though the change you suggested [+ nserverstwophase++;] >> > would server the purpose and deliver the same results but I think a >> > better way would be to change need_twophase_commit condition for >> > prefer mode. >> > >> > >> > * In 'prefer' case, we prepare transactions on only servers that >> > * capable of two-phase commit. >> > */ >> > - need_twophase_commit = (nserverstwophase >= 2); >> > + need_twophase_commit = (nserverstwophase >= 1); >> > } >> > >> > >> > The reason I am saying that is. Currently, we do not use 2PC on the local server >> > in case of distributed transactions, so we should also not count the local server >> > as one (servers that would be performing the 2PC). >> > Also I feel the change need_twophase_commit = (nserverstwophase >= 1) >> > looks more in line with the definition of our 'prefer' mode algorithm. >> > >> > Do you see an issue with this change? >> >> I think that with my change we will use 2pc in the case where a >> transaction modified data on the local node and one server that >> supports 2pc. But with your change, we will use 2pc in more cases, in >> addition to the case where a transaction modifies the local and one >> 2pc-support server. This would fit the definition of 'prefer' you >> described but it's still unclear to me that it's better to make >> 'prefer' mode behave so if we have three values: 'required', 'prefer' >> and 'disabled'. >> > > Thanks for the detailed explanation, now I have a better understanding of the > reasons why we were going for a different solution to the problem. > You are right my understanding of 'prefer' mode is we must use 2PC as much > as possible, and reason for that was the world prefer as per my understanding > means "it's more desirable/better to use than another or others" > So the way I understood the FOREIGN_TWOPHASE_COMMIT_PREFER > was that we would use 2PC in the maximum possible of cases, and the user > would already have the expectation that 2PC is more expensive than 1PC. > I think that the current three values are useful for users. The ‘required’ mode is used when users want to ensure all writes involved with the transaction are committed atomically. That being said, as some FDW plugin might not support the prepare API we cannot force users to use this mode all the time when using atomic commit. Therefore ‘prefer’ mode would be useful for this case. Both modes use 2pc only when it's required for atomic commit. So what do you think my idea that adding the behavior you proposed as another new mode? As it’s better to keep the first version simple as much as possible It might not be added to the first version but this behavior might be useful in some cases. I've attached a new version patch that incorporates some bug fixes reported by Muhammad. Please review them. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, May 19, 2020 at 12:33 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > I think that the current three values are useful for users. The > ‘required’ mode is used when users want to ensure all writes involved > with the transaction are committed atomically. That being said, as > some FDW plugin might not support the prepare API we cannot force > users to use this mode all the time when using atomic commit. > Therefore ‘prefer’ mode would be useful for this case. Both modes use > 2pc only when it's required for atomic commit. > > So what do you think my idea that adding the behavior you proposed as > another new mode? As it’s better to keep the first version simple as > much as possible > If the intention is to keep the first version simple, then why do we want to support any mode other than 'required'? I think it will limit its usage for the cases where 2PC can be used only when all FDWs involved support Prepare API but if that helps to keep the design and patch simpler then why not just do that for the first version and then extend it later. OTOH, if you think it will be really useful to keep other modes, then also we could try to keep those in separate patches to facilitate the review and discussion of the core feature. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 3 Jun 2020 at 14:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, May 19, 2020 at 12:33 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I think that the current three values are useful for users. The > > ‘required’ mode is used when users want to ensure all writes involved > > with the transaction are committed atomically. That being said, as > > some FDW plugin might not support the prepare API we cannot force > > users to use this mode all the time when using atomic commit. > > Therefore ‘prefer’ mode would be useful for this case. Both modes use > > 2pc only when it's required for atomic commit. > > > > So what do you think my idea that adding the behavior you proposed as > > another new mode? As it’s better to keep the first version simple as > > much as possible > > > > If the intention is to keep the first version simple, then why do we > want to support any mode other than 'required'? I think it will limit > its usage for the cases where 2PC can be used only when all FDWs > involved support Prepare API but if that helps to keep the design and > patch simpler then why not just do that for the first version and then > extend it later. OTOH, if you think it will be really useful to keep > other modes, then also we could try to keep those in separate patches > to facilitate the review and discussion of the core feature. ‘disabled’ is the fundamental mode. We also need 'disabled' mode, otherwise existing FDW won't work. I was concerned that many FDW plugins don't implement FDW transaction APIs yet when users start using this feature. But it seems to be a good idea to move 'prefer' mode to a separate patch while leaving 'required'. I'll do that in the next version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 3, 2020 at 12:02 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 3 Jun 2020 at 14:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > If the intention is to keep the first version simple, then why do we > > want to support any mode other than 'required'? I think it will limit > > its usage for the cases where 2PC can be used only when all FDWs > > involved support Prepare API but if that helps to keep the design and > > patch simpler then why not just do that for the first version and then > > extend it later. OTOH, if you think it will be really useful to keep > > other modes, then also we could try to keep those in separate patches > > to facilitate the review and discussion of the core feature. > > ‘disabled’ is the fundamental mode. We also need 'disabled' mode, > otherwise existing FDW won't work. > IIUC, if foreign_twophase_commit is 'disabled', we don't use a two-phase protocol to commit distributed transactions, right? So, do we check this at the time of Prepare or Commit whether we need to use a two-phase protocol? I think this should be checked at prepare time. + <para> + This parameter can be changed at any time; the behavior for any one + transaction is determined by the setting in effect when it commits. + </para> This is written w.r.t foreign_twophase_commit. If one changes this between prepare and commit, will it have any impact? > I was concerned that many FDW > plugins don't implement FDW transaction APIs yet when users start > using this feature. But it seems to be a good idea to move 'prefer' > mode to a separate patch while leaving 'required'. I'll do that in the > next version patch. > Okay, thanks. Please, see if you can separate out the documentation for that as well. Few other comments on v21-0003-Documentation-update: ---------------------------------------------------- 1. + <entry></entry> + <entry> + Numeric transaction identifier with that this foreign transaction + associates + </entry> /with that this/with which this 2. + <entry> + The OID of the foreign server on that the foreign transaction is prepared + </entry> /on that the/on which the 3. + <entry><structfield>status</structfield></entry> + <entry><type>text</type></entry> + <entry></entry> + <entry> + Status of foreign transaction. Possible values are: + <itemizedlist> + <listitem> + <para> + <literal>initial</literal> : Initial status. + </para> What exactly "Initial status" means? 4. + <entry><structfield>in_doubt</structfield></entry> + <entry><type>boolean</type></entry> + <entry></entry> + <entry> + If <literal>true</literal> this foreign transaction is in-doubt status and + needs to be resolved by calling <function>pg_resolve_fdwxact</function> + function. + </entry> It would be better if you can add an additional sentence to say when and or how can foreign transactions reach in-doubt state. 5. If <literal>N</literal> local transactions each + across <literal>K</literal> foreign server this value need to be set This part of the sentence can be improved by saying something like: "If a user expects N local transactions and each of those involves K foreign servers, this value..". -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 4 Jun 2020 at 12:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jun 3, 2020 at 12:02 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 3 Jun 2020 at 14:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > If the intention is to keep the first version simple, then why do we > > > want to support any mode other than 'required'? I think it will limit > > > its usage for the cases where 2PC can be used only when all FDWs > > > involved support Prepare API but if that helps to keep the design and > > > patch simpler then why not just do that for the first version and then > > > extend it later. OTOH, if you think it will be really useful to keep > > > other modes, then also we could try to keep those in separate patches > > > to facilitate the review and discussion of the core feature. > > > > ‘disabled’ is the fundamental mode. Oops, I wanted to say 'required' is the fundamental mode. > > We also need 'disabled' mode, > > otherwise existing FDW won't work. > > > > IIUC, if foreign_twophase_commit is 'disabled', we don't use a > two-phase protocol to commit distributed transactions, right? So, do > we check this at the time of Prepare or Commit whether we need to use > a two-phase protocol? I think this should be checked at prepare time. When a client executes COMMIT to a distributed transaction, 2pc is automatically, transparently used. In ‘required’ case, all involved (and modified) foreign server needs to support 2pc. So if a distributed transaction modifies data on a foreign server connected via an existing FDW which doesn’t support 2pc, the transaction cannot proceed commit, fails at pre-commit phase. So there should be two modes: ‘disabled’ and ‘required’, and should be ‘disabled’ by default. > > + <para> > + This parameter can be changed at any time; the behavior for any one > + transaction is determined by the setting in effect when it commits. > + </para> > > This is written w.r.t foreign_twophase_commit. If one changes this > between prepare and commit, will it have any impact? Since the distributed transaction commit automatically uses 2pc when executing COMMIT, it's not possible to change foreign_twophase_commit between prepare and commit. So I'd like to explain the case where a user executes PREPARE and then COMMIT PREPARED while changing foreign_twophase_commit. PREPARE can run only when foreign_twophase_commit is 'required' (or 'prefer') and all foreign servers involved with the transaction support 2pc. We prepare all foreign transactions no matter what the number of servers and modified or not. If either foreign_twophase_commit is 'disabled' or the transaction modifies data on a foreign server that doesn't support 2pc, it raises an error. At COMMIT (or ROLLBACK) PREPARED, similarly foreign_twophase_commit needs to be set to 'required'. It raises an error if the distributed transaction has a foreign transaction and foreign_twophase_commit is 'disabled'. > > > I was concerned that many FDW > > plugins don't implement FDW transaction APIs yet when users start > > using this feature. But it seems to be a good idea to move 'prefer' > > mode to a separate patch while leaving 'required'. I'll do that in the > > next version patch. > > > > Okay, thanks. Please, see if you can separate out the documentation > for that as well. > > Few other comments on v21-0003-Documentation-update: > ---------------------------------------------------- > 1. > + <entry></entry> > + <entry> > + Numeric transaction identifier with that this foreign transaction > + associates > + </entry> > > /with that this/with which this > > 2. > + <entry> > + The OID of the foreign server on that the foreign transaction > is prepared > + </entry> > > /on that the/on which the > > 3. > + <entry><structfield>status</structfield></entry> > + <entry><type>text</type></entry> > + <entry></entry> > + <entry> > + Status of foreign transaction. Possible values are: > + <itemizedlist> > + <listitem> > + <para> > + <literal>initial</literal> : Initial status. > + </para> > > What exactly "Initial status" means? This part is out-of-date. Fixed. > > 4. > + <entry><structfield>in_doubt</structfield></entry> > + <entry><type>boolean</type></entry> > + <entry></entry> > + <entry> > + If <literal>true</literal> this foreign transaction is > in-doubt status and > + needs to be resolved by calling <function>pg_resolve_fdwxact</function> > + function. > + </entry> > > It would be better if you can add an additional sentence to say when > and or how can foreign transactions reach in-doubt state. > > 5. > If <literal>N</literal> local transactions each > + across <literal>K</literal> foreign server this value need to be set > > This part of the sentence can be improved by saying something like: > "If a user expects N local transactions and each of those involves K > foreign servers, this value..". Thanks. I've incorporated all your comments. I've attached the new version patch set. 0006 is a separate patch which introduces 'prefer' mode to foreign_twophase_commit. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
- v22-0006-Add-prefer-mode-to-foreign_twophase_commit.patch
- v22-0005-Add-regression-tests-for-foreign-twophase-commit.patch
- v22-0004-postgres_fdw-supports-atomic-commit-APIs.patch
- v22-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
- v22-0003-Documentation-update.patch
- v22-0001-Keep-track-of-writing-on-non-temporary-relation.patch
On Fri, Jun 5, 2020 at 3:16 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 4 Jun 2020 at 12:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > + <para> > > + This parameter can be changed at any time; the behavior for any one > > + transaction is determined by the setting in effect when it commits. > > + </para> > > > > This is written w.r.t foreign_twophase_commit. If one changes this > > between prepare and commit, will it have any impact? > > Since the distributed transaction commit automatically uses 2pc when > executing COMMIT, it's not possible to change foreign_twophase_commit > between prepare and commit. So I'd like to explain the case where a > user executes PREPARE and then COMMIT PREPARED while changing > foreign_twophase_commit. > > PREPARE can run only when foreign_twophase_commit is 'required' (or > 'prefer') and all foreign servers involved with the transaction > support 2pc. We prepare all foreign transactions no matter what the > number of servers and modified or not. If either > foreign_twophase_commit is 'disabled' or the transaction modifies data > on a foreign server that doesn't support 2pc, it raises an error. At > COMMIT (or ROLLBACK) PREPARED, similarly foreign_twophase_commit needs > to be set to 'required'. It raises an error if the distributed > transaction has a foreign transaction and foreign_twophase_commit is > 'disabled'. > So, IIUC, it will raise an error if foreign_twophase_commit is 'disabled' (or one of the foreign server involved doesn't support 2PC) and the error can be raised both when user issues PREPARE or COMMIT (or ROLLBACK) PREPARED. If so, isn't it strange that we raise such an error after PREPARE? What kind of use-case required this? > > > > > 4. > > + <entry><structfield>in_doubt</structfield></entry> > > + <entry><type>boolean</type></entry> > > + <entry></entry> > > + <entry> > > + If <literal>true</literal> this foreign transaction is > > in-doubt status and > > + needs to be resolved by calling <function>pg_resolve_fdwxact</function> > > + function. > > + </entry> > > > > It would be better if you can add an additional sentence to say when > > and or how can foreign transactions reach in-doubt state. > > + If <literal>true</literal> this foreign transaction is in-doubt status. + A foreign transaction becomes in-doubt status when user canceled the + query during transaction commit or the server crashed during transaction + commit. Can we reword the second sentence as: "A foreign transaction can have this status when the user has cancelled the statement or the server crashes during transaction commit."? I have another question about this field, why can't it be one of the status ('preparing', 'prepared', 'committing', 'aborting', 'in-doubt') rather than having a separate field? Also, isn't it more suitable to name 'status' field as 'state' because these appear to be more like different states of transaction? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 5, 2020 at 3:16 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 4 Jun 2020 at 12:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > + <para> > > > + This parameter can be changed at any time; the behavior for any one > > > + transaction is determined by the setting in effect when it commits. > > > + </para> > > > > > > This is written w.r.t foreign_twophase_commit. If one changes this > > > between prepare and commit, will it have any impact? > > > > Since the distributed transaction commit automatically uses 2pc when > > executing COMMIT, it's not possible to change foreign_twophase_commit > > between prepare and commit. So I'd like to explain the case where a > > user executes PREPARE and then COMMIT PREPARED while changing > > foreign_twophase_commit. > > > > PREPARE can run only when foreign_twophase_commit is 'required' (or > > 'prefer') and all foreign servers involved with the transaction > > support 2pc. We prepare all foreign transactions no matter what the > > number of servers and modified or not. If either > > foreign_twophase_commit is 'disabled' or the transaction modifies data > > on a foreign server that doesn't support 2pc, it raises an error. At > > COMMIT (or ROLLBACK) PREPARED, similarly foreign_twophase_commit needs > > to be set to 'required'. It raises an error if the distributed > > transaction has a foreign transaction and foreign_twophase_commit is > > 'disabled'. > > > > So, IIUC, it will raise an error if foreign_twophase_commit is > 'disabled' (or one of the foreign server involved doesn't support 2PC) > and the error can be raised both when user issues PREPARE or COMMIT > (or ROLLBACK) PREPARED. If so, isn't it strange that we raise such an > error after PREPARE? What kind of use-case required this? > I don’t concrete use-case but the reason why it raises an error when a user setting foreign_twophase_commit to 'disabled' executes COMMIT (or ROLLBACK) PREPARED within the transaction involving at least one foreign server is that I wanted to make it behaves in a similar way of COMMIT case. I mean, if a user executes just COMMIT, the distributed transaction is committed in two phases but the value of foreign_twophase_commit is not changed during these two phases. So I wanted to require user to set foreign_twophase_commit to ‘required’ both when executing PREPARE and executing COMMIT (or ROLLBACK) PREPARED. Implementation also can become simple because we can assume that foreign_twophase_commit is always enabled when a transaction requires foreign transaction preparation and resolution. > > > > > > > > 4. > > > + <entry><structfield>in_doubt</structfield></entry> > > > + <entry><type>boolean</type></entry> > > > + <entry></entry> > > > + <entry> > > > + If <literal>true</literal> this foreign transaction is > > > in-doubt status and > > > + needs to be resolved by calling <function>pg_resolve_fdwxact</function> > > > + function. > > > + </entry> > > > > > > It would be better if you can add an additional sentence to say when > > > and or how can foreign transactions reach in-doubt state. > > > > > + If <literal>true</literal> this foreign transaction is in-doubt status. > + A foreign transaction becomes in-doubt status when user canceled the > + query during transaction commit or the server crashed during transaction > + commit. > > Can we reword the second sentence as: "A foreign transaction can have > this status when the user has cancelled the statement or the server > crashes during transaction commit."? Agreed. Updated in my local branch. > I have another question about > this field, why can't it be one of the status ('preparing', > 'prepared', 'committing', 'aborting', 'in-doubt') rather than having a > separate field? Because I'm using in-doubt field also for checking if the foreign transaction entry can also be resolved manually, i.g. pg_resolve_foreign_xact(). For instance, a foreign transaction which status = 'prepared' and in-doubt = 'true' can be resolved either foreign transaction resolver or pg_resolve_foreign_xact(). When a user execute pg_resolve_foreign_xact() against the foreign transaction, it sets status = 'committing' (or 'rollbacking') by checking transaction status in clog. The user might cancel pg_resolve_foreign_xact() during resolution. In this case, the foreign transaction is still status = 'committing' and in-doubt = 'true'. Then if a foreign transaction resolver process processes the foreign transaction, it can commit it without clog looking. > Also, isn't it more suitable to name 'status' field > as 'state' because these appear to be more like different states of > transaction? Agreed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > I have another question about > > this field, why can't it be one of the status ('preparing', > > 'prepared', 'committing', 'aborting', 'in-doubt') rather than having a > > separate field? > > Because I'm using in-doubt field also for checking if the foreign > transaction entry can also be resolved manually, i.g. > pg_resolve_foreign_xact(). For instance, a foreign transaction which > status = 'prepared' and in-doubt = 'true' can be resolved either > foreign transaction resolver or pg_resolve_foreign_xact(). When a user > execute pg_resolve_foreign_xact() against the foreign transaction, it > sets status = 'committing' (or 'rollbacking') by checking transaction > status in clog. The user might cancel pg_resolve_foreign_xact() during > resolution. In this case, the foreign transaction is still status = > 'committing' and in-doubt = 'true'. Then if a foreign transaction > resolver process processes the foreign transaction, it can commit it > without clog looking. > I think this is a corner case and it is better to simplify the state recording of foreign transactions then to save a CLOG lookup. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, 12 Jun 2020 at 12:40, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I have another question about > > > this field, why can't it be one of the status ('preparing', > > > 'prepared', 'committing', 'aborting', 'in-doubt') rather than having a > > > separate field? > > > > Because I'm using in-doubt field also for checking if the foreign > > transaction entry can also be resolved manually, i.g. > > pg_resolve_foreign_xact(). For instance, a foreign transaction which > > status = 'prepared' and in-doubt = 'true' can be resolved either > > foreign transaction resolver or pg_resolve_foreign_xact(). When a user > > execute pg_resolve_foreign_xact() against the foreign transaction, it > > sets status = 'committing' (or 'rollbacking') by checking transaction > > status in clog. The user might cancel pg_resolve_foreign_xact() during > > resolution. In this case, the foreign transaction is still status = > > 'committing' and in-doubt = 'true'. Then if a foreign transaction > > resolver process processes the foreign transaction, it can commit it > > without clog looking. > > > > I think this is a corner case and it is better to simplify the state > recording of foreign transactions then to save a CLOG lookup. > The main usage of in-doubt flag is to distinguish between in-doubt transactions and other transactions that have their waiter (I call on-line transactions). If one foreign server downs for a long time after the server crash during distributed transaction commit, foreign transaction resolver tries to resolve the foreign transaction but fails because the foreign server doesn’t respond. We’d like to avoid the situation where a resolver process always picks up that foreign transaction and other on-online transactions waiting to be resolved cannot move forward. Therefore, a resolver process prioritizes online transactions. Once the shmem queue having on-line transactions becomes empty, a resolver process looks at the array of foreign transaction state to get in-doubt transactions to resolve. I think we should not process both in-doubt transactions and on-line transactions in the same way. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 12, 2020 at 9:54 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 12 Jun 2020 at 12:40, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > I have another question about > > > > this field, why can't it be one of the status ('preparing', > > > > 'prepared', 'committing', 'aborting', 'in-doubt') rather than having a > > > > separate field? > > > > > > Because I'm using in-doubt field also for checking if the foreign > > > transaction entry can also be resolved manually, i.g. > > > pg_resolve_foreign_xact(). For instance, a foreign transaction which > > > status = 'prepared' and in-doubt = 'true' can be resolved either > > > foreign transaction resolver or pg_resolve_foreign_xact(). When a user > > > execute pg_resolve_foreign_xact() against the foreign transaction, it > > > sets status = 'committing' (or 'rollbacking') by checking transaction > > > status in clog. The user might cancel pg_resolve_foreign_xact() during > > > resolution. In this case, the foreign transaction is still status = > > > 'committing' and in-doubt = 'true'. Then if a foreign transaction > > > resolver process processes the foreign transaction, it can commit it > > > without clog looking. > > > > > > > I think this is a corner case and it is better to simplify the state > > recording of foreign transactions then to save a CLOG lookup. > > > > The main usage of in-doubt flag is to distinguish between in-doubt > transactions and other transactions that have their waiter (I call > on-line transactions). > Which are these other online transactions? I had assumed that foreign transaction resolver process is to resolve in-doubt transactions but it seems it is also used for some other purpose which anyway was the next question I had while reviewing other sections of docs but let's clarify as it came up now. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, 12 Jun 2020 at 15:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 12, 2020 at 9:54 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 12 Jun 2020 at 12:40, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jun 12, 2020 at 7:59 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Thu, 11 Jun 2020 at 22:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > I have another question about > > > > > this field, why can't it be one of the status ('preparing', > > > > > 'prepared', 'committing', 'aborting', 'in-doubt') rather than having a > > > > > separate field? > > > > > > > > Because I'm using in-doubt field also for checking if the foreign > > > > transaction entry can also be resolved manually, i.g. > > > > pg_resolve_foreign_xact(). For instance, a foreign transaction which > > > > status = 'prepared' and in-doubt = 'true' can be resolved either > > > > foreign transaction resolver or pg_resolve_foreign_xact(). When a user > > > > execute pg_resolve_foreign_xact() against the foreign transaction, it > > > > sets status = 'committing' (or 'rollbacking') by checking transaction > > > > status in clog. The user might cancel pg_resolve_foreign_xact() during > > > > resolution. In this case, the foreign transaction is still status = > > > > 'committing' and in-doubt = 'true'. Then if a foreign transaction > > > > resolver process processes the foreign transaction, it can commit it > > > > without clog looking. > > > > > > > > > > I think this is a corner case and it is better to simplify the state > > > recording of foreign transactions then to save a CLOG lookup. > > > > > > > The main usage of in-doubt flag is to distinguish between in-doubt > > transactions and other transactions that have their waiter (I call > > on-line transactions). > > > > Which are these other online transactions? I had assumed that foreign > transaction resolver process is to resolve in-doubt transactions but > it seems it is also used for some other purpose which anyway was the > next question I had while reviewing other sections of docs but let's > clarify as it came up now. When a distributed transaction is committed by COMMIT command, the postgres backend process prepare all foreign transaction and commit the local transaction. Then the backend enqueue itself to the shmem queue, asks a resolver process for committing the prepared foreign transaction, and wait. That is, these prepared foreign transactions are committed by the resolver process, not backend process. Once the resolver process committed all prepared foreign transactions, it wakes the waiting backend process. I meant this kind of transaction is on-line transactions. This procedure is similar to what synchronous replication does. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 12, 2020 at 2:10 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 12 Jun 2020 at 15:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > I think this is a corner case and it is better to simplify the state > > > > recording of foreign transactions then to save a CLOG lookup. > > > > > > > > > > The main usage of in-doubt flag is to distinguish between in-doubt > > > transactions and other transactions that have their waiter (I call > > > on-line transactions). > > > > > > > Which are these other online transactions? I had assumed that foreign > > transaction resolver process is to resolve in-doubt transactions but > > it seems it is also used for some other purpose which anyway was the > > next question I had while reviewing other sections of docs but let's > > clarify as it came up now. > > When a distributed transaction is committed by COMMIT command, the > postgres backend process prepare all foreign transaction and commit > the local transaction. > Does this mean that we will mark the xid as committed in CLOG of the local server? If so, why is this okay till we commit transactions in all the foreign servers, what if we fail to commit on one of the servers? Few more comments on v22-0003-Documentation-update -------------------------------------------------------------------------------------- 1. + When <literal>disabled</literal> there can be risk of database + consistency among all servers that involved in the distributed + transaction when some foreign server crashes during committing the + distributed transaction. Will it read better if rephrase above to something like: "When <literal>disabled</literal> there can be a risk of database consistency if one or more foreign servers crashes while committing the distributed transaction."? 2. + <varlistentry id="guc-foreign-transaction-resolution-rety-interval" xreflabel="foreign_transaction_resolution_retry_interval"> + <term><varname>foreign_transaction_resolution_retry_interval</varname> (<type>integer</type>) + <indexterm> + <primary><varname>foreign_transaction_resolution_interval</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specify how long the foreign transaction resolver should wait when the last resolution + fails before retrying to resolve foreign transaction. This parameter can only be set in the + <filename>postgresql.conf</filename> file or on the server command line. + </para> + <para> + The default value is 10 seconds. + </para> + </listitem> + </varlistentry> Typo. <varlistentry id="guc-foreign-transaction-resolution-rety-interval", spelling of retry is wrong. Do we really need such a guc parameter? I think we can come up with some simple algorithm to retry after a few seconds and then increase that interval of retry if we fail again or something like that. I don't know how users can come up with some non-default value for this variable. 3 + <varlistentry id="guc-foreign-transaction-resolver-timeout" xreflabel="foreign_transaction_resolver_timeout"> + <term><varname>foreign_transaction_resolver_timeout</varname> (<type>integer</type>) + <indexterm> + <primary><varname>foreign_transaction_resolver_timeout</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Terminate foreign transaction resolver processes that don't have any foreign + transactions to resolve longer than the specified number of milliseconds. + A value of zero disables the timeout mechanism, meaning it connects to one + database until stopping manually. Can we mention the function name using which one can stop the resolver process? 4. + Using the <productname>PostgreSQL</productname>'s atomic commit ensures that + all changes on foreign servers end in either commit or rollback using the + transaction callback routines Can we slightly rephase this "Using the PostgreSQL's atomic commit ensures that all the changes on foreign servers are either committed or rolled back using the transaction callback routines"? 5. + Prepare all transactions on foreign servers. + <productname>PostgreSQL</productname> distributed transaction manager + prepares all transaction on the foreign servers if two-phase commit is + required. Two-phase commit is required when the transaction modifies + data on two or more servers including the local server itself and + <xref linkend="guc-foreign-twophase-commit"/> is + <literal>required</literal>. /PostgreSQL/PostgreSQL's. If all preparations on foreign servers got + successful go to the next step. How about "If the prepare on all foreign servers is successful then go to the next step"? Any failure happens in this step, + the server changes to rollback, then rollback all transactions on both + local and foreign servers. Can we rephrase this line to something like: "If there is any failure in the prepare phase, the server will rollback all the transactions on both local and foreign servers."? What if the issued Rollback also failed, say due to network breakdown between local and one of foreign servers? Shouldn't such a transaction be 'in-doubt' state? 6. + <para> + Commit locally. The server commits transaction locally. Any failure happens + in this step the server changes to rollback, then rollback all transactions + on both local and foreign servers. + </para> + </listitem> + <listitem> + <para> + Resolve all prepared transaction on foreign servers. Pprepared transactions + are committed or rolled back according to the result of the local transaction. + This step is normally performed by a foreign transaction resolver process. + </para> When (in which step) do we commit on foreign servers? Do Resolver processes commit on foreign servers, if so, how can we commit locally without committing on foreign servers, what if the commit on one of the servers fails? It is not very clear to me from the steps mentioned here? Typo, /Pprepared/Prepared 7. However, foreign transactions + become <firstterm>in-doubt</firstterm> in three cases: where the foreign + server crashed or lost the connectibility to it during preparing foreign + transaction, where the local node crashed during either preparing or + resolving foreign transaction and where user canceled the query. Here the three cases are not very clear. You might want to use (a) ..., (b) .. ,(c).. Also, I think the state will be in-doubt even when we lost connection to server during commit or rollback. 8. + One foreign transaction resolver is responsible for transaction resolutions + on which one database connecting. Can we rephrase it to: "One foreign transaction resolver is responsible for transaction resolutions on the database to which it is connected."? 9. + Note that other <productname>PostgreSQL</productname> feature such as parallel + queries, logical replication, etc., also take worker slots from + <varname>max_worker_processes</varname>. /feature/features 10. + <para> + Atomic commit requires several configuration options to be set. + On the local node, <xref linkend="guc-max-prepared-foreign-transactions"/> and + <xref linkend="guc-max-foreign-transaction-resolvers"/> must be non-zero value. + Additionally the <varname>max_worker_processes</varname> may need to be adjusted to + accommodate for foreign transaction resolver workers, at least + (<varname>max_foreign_transaction_resolvers</varname> + <literal>1</literal>). + Note that other <productname>PostgreSQL</productname> feature such as parallel + queries, logical replication, etc., also take worker slots from + <varname>max_worker_processes</varname>. + </para> Don't we need to mention foreign_twophase_commit GUC here? 11. + <sect2 id="fdw-callbacks-transaction-managements"> + <title>FDW Routines For Transaction Managements</title> Managements/Management? 12. + Transaction management callbacks are used for doing commit, rollback and + prepare the foreign transaction. Lets write the above sentence as: "Transaction management callbacks are used to commit, rollback and prepare the foreign transaction." 13. + <para> + Transaction management callbacks are used for doing commit, rollback and + prepare the foreign transaction. If an FDW wishes that its foreign + transaction is managed by <productname>PostgreSQL</productname>'s global + transaction manager it must provide both + <function>CommitForeignTransaction</function> and + <function>RollbackForeignTransaction</function>. In addition, if an FDW + wishes to support <firstterm>atomic commit</firstterm> (as described in + <xref linkend="fdw-transaction-managements"/>), it must provide + <function>PrepareForeignTransaction</function> as well and can provide + <function>GetPrepareId</function> callback optionally. + </para> What exact functionality a FDW can accomplish if it just supports CommitForeignTransaction and RollbackForeignTransaction? It seems it doesn't care for 2PC, if so, is there any special functionality we can achieve with this which we can't do without these APIs? 14. +PrepareForeignTransaction(FdwXactRslvState *frstate); +</programlisting> + Prepare the transaction on the foreign server. This function is called at the + pre-commit phase of the local transactions if foreign twophase commit is + required. This function is used only for distribute transaction management + (see <xref linkend="distributed-transaction"/>). + </para> /distribute/distributed 15. + <sect2 id="fdw-transaction-commit-rollback"> + <title>Commit And Rollback Single Foreign Transaction</title> + <para> + The FDW callback function <literal>CommitForeignTransaction</literal> + and <literal>RollbackForeignTransaction</literal> can be used to commit + and rollback the foreign transaction. During transaction commit, the core + transaction manager calls <literal>CommitForeignTransaction</literal> function + in the pre-commit phase and calls + <literal>RollbackForeignTransaction</literal> function in the post-rollback + phase. + </para> There is no reasoning mentioned as to why CommitForeignTransaction has to be called in pre-commit phase and RollbackForeignTransaction in post-rollback phase? Basically why one in pre phase and other in post phase? 16. + <entry> + <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, <parameter>userid</parameter> <type>oid</type>)</function></literal> + </entry> + <entry><type>void</type></entry> + <entry> + This function works the same as <function>pg_resolve_foreign_xact</function> + except that this removes the foreign transcation entry without resolution. + </entry> Can we write why and when such a function can be used? Typo, /trasnaction/transaction 17. + <row> + <entry><literal>FdwXactResolutionLock</literal></entry> + <entry>Waiting to read or update information of foreign trasnaction + resolution.</entry> + </row> /trasnaction/transaction -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 12, 2020 at 2:10 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 12 Jun 2020 at 15:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > I think this is a corner case and it is better to simplify the state > > > > > recording of foreign transactions then to save a CLOG lookup. > > > > > > > > > > > > > The main usage of in-doubt flag is to distinguish between in-doubt > > > > transactions and other transactions that have their waiter (I call > > > > on-line transactions). > > > > > > > > > > Which are these other online transactions? I had assumed that foreign > > > transaction resolver process is to resolve in-doubt transactions but > > > it seems it is also used for some other purpose which anyway was the > > > next question I had while reviewing other sections of docs but let's > > > clarify as it came up now. > > > > When a distributed transaction is committed by COMMIT command, the > > postgres backend process prepare all foreign transaction and commit > > the local transaction. > > Thank you for your review comments! Let me answer your question first. I'll see the review comments. > > Does this mean that we will mark the xid as committed in CLOG of the > local server? Well what I meant is that when the client executes COMMIT command, the backend executes PREPARE TRANSACTION command on all involved foreign servers and then marks the xid as committed in clog in the local server. > If so, why is this okay till we commit transactions in > all the foreign servers, what if we fail to commit on one of the > servers? Once the local transaction is committed, all involved foreign transactions never be rolled back. The backend already prepared all foreign transaction before local commit, committing prepared foreign transaction basically doesn't fail. But even if it fails for whatever reason, we never rollback the all prepared foreign transactions. A resolver tries to commit foreign transactions at certain intervals. Does it answer your question? Regard, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 12, 2020 at 6:24 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Which are these other online transactions? I had assumed that foreign > > > > transaction resolver process is to resolve in-doubt transactions but > > > > it seems it is also used for some other purpose which anyway was the > > > > next question I had while reviewing other sections of docs but let's > > > > clarify as it came up now. > > > > > > When a distributed transaction is committed by COMMIT command, the > > > postgres backend process prepare all foreign transaction and commit > > > the local transaction. > > > > > Thank you for your review comments! Let me answer your question first. > I'll see the review comments. > > > > > Does this mean that we will mark the xid as committed in CLOG of the > > local server? > > Well what I meant is that when the client executes COMMIT command, the > backend executes PREPARE TRANSACTION command on all involved foreign > servers and then marks the xid as committed in clog in the local > server. > Won't it create an inconsistency in viewing the data from the different servers? Say, such a transaction inserts one row into a local server and another into the foreign server. Now, if we follow the above protocol, the user will be able to see the row from the local server but not from the foreign server. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, 13 Jun 2020 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 12, 2020 at 6:24 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Which are these other online transactions? I had assumed that foreign > > > > > transaction resolver process is to resolve in-doubt transactions but > > > > > it seems it is also used for some other purpose which anyway was the > > > > > next question I had while reviewing other sections of docs but let's > > > > > clarify as it came up now. > > > > > > > > When a distributed transaction is committed by COMMIT command, the > > > > postgres backend process prepare all foreign transaction and commit > > > > the local transaction. > > > > > > > > Thank you for your review comments! Let me answer your question first. > > I'll see the review comments. > > > > > > > > Does this mean that we will mark the xid as committed in CLOG of the > > > local server? > > > > Well what I meant is that when the client executes COMMIT command, the > > backend executes PREPARE TRANSACTION command on all involved foreign > > servers and then marks the xid as committed in clog in the local > > server. > > > > Won't it create an inconsistency in viewing the data from the > different servers? Say, such a transaction inserts one row into a > local server and another into the foreign server. Now, if we follow > the above protocol, the user will be able to see the row from the > local server but not from the foreign server. Yes, you're right. This atomic commit feature doesn't guarantee such consistent visibility so-called atomic visibility. Even the local server is not modified, since a resolver process commits prepared foreign transactions one by one another user could see an inconsistent result. Providing globally consistent snapshots to transactions involving foreign servers is one of the solutions. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> Won't it create an inconsistency in viewing the data from the >> different servers? Say, such a transaction inserts one row into a >> local server and another into the foreign server. Now, if we follow >> the above protocol, the user will be able to see the row from the >> local server but not from the foreign server. > > Yes, you're right. This atomic commit feature doesn't guarantee such > consistent visibility so-called atomic visibility. Even the local > server is not modified, since a resolver process commits prepared > foreign transactions one by one another user could see an inconsistent > result. Providing globally consistent snapshots to transactions > involving foreign servers is one of the solutions. Another approach to the atomic visibility problem is to control snapshot acquisition timing and commit timing (plus using REPEATABLE READ). In the REPEATABLE READ transaction isolation level, PostgreSQL assigns a snapshot at the time when the first command is executed in a transaction. If we could prevent any commit while any transaction is acquiring snapshot, and we could prevent any snapshot acquisition while committing, visibility inconsistency which Amit explained can be avoided. This approach was proposed in a academic paper [1]. Good point with the approach is, we don't need to modify PostgreSQL at all. Downside of the approach is, we need someone who controls the timings (in [1], a middleware called "Pangea" was proposed). Also we need to limit the transaction isolation level to REPEATABLE READ. [1] http://www.vldb.org/pvldb/vol2/vldb09-694.pdf Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Sun, Jun 14, 2020 at 2:21 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > >> Won't it create an inconsistency in viewing the data from the > >> different servers? Say, such a transaction inserts one row into a > >> local server and another into the foreign server. Now, if we follow > >> the above protocol, the user will be able to see the row from the > >> local server but not from the foreign server. > > > > Yes, you're right. This atomic commit feature doesn't guarantee such > > consistent visibility so-called atomic visibility. Okay, I understand that the purpose of this feature is to provide atomic commit which means the transaction on all servers involved will either commit or rollback. However, I think we should at least see at a high level how the visibility will work because it might influence the implementation of this feature. > > Even the local > > server is not modified, since a resolver process commits prepared > > foreign transactions one by one another user could see an inconsistent > > result. Providing globally consistent snapshots to transactions > > involving foreign servers is one of the solutions. How would it be able to do that? Say, when it decides to take a snapshot the transaction on the foreign server appears to be committed but the transaction on the local server won't appear to be committed, so the consistent data visibility problem as mentioned above could still arise. > > Another approach to the atomic visibility problem is to control > snapshot acquisition timing and commit timing (plus using REPEATABLE > READ). In the REPEATABLE READ transaction isolation level, PostgreSQL > assigns a snapshot at the time when the first command is executed in a > transaction. If we could prevent any commit while any transaction is > acquiring snapshot, and we could prevent any snapshot acquisition while > committing, visibility inconsistency which Amit explained can be > avoided. > I think the problem mentioned above can occur with this as well or if I am missing something then can you explain in further detail how it won't create problem in the scenario I have used above? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
>> Another approach to the atomic visibility problem is to control >> snapshot acquisition timing and commit timing (plus using REPEATABLE >> READ). In the REPEATABLE READ transaction isolation level, PostgreSQL >> assigns a snapshot at the time when the first command is executed in a >> transaction. If we could prevent any commit while any transaction is >> acquiring snapshot, and we could prevent any snapshot acquisition while >> committing, visibility inconsistency which Amit explained can be >> avoided. >> > > I think the problem mentioned above can occur with this as well or if > I am missing something then can you explain in further detail how it > won't create problem in the scenario I have used above? So the problem you mentioned above is like this? (S1/S2 denotes transactions (sessions), N1/N2 is the postgreSQL servers). Since S1 already committed on N1, S2 sees the row on N1. However S2 does not see the row on N2 since S1 has not committed on N2 yet. S1/N1: DROP TABLE t1; DROP TABLE S1/N1: CREATE TABLE t1(i int); CREATE TABLE S1/N2: DROP TABLE t1; DROP TABLE S1/N2: CREATE TABLE t1(i int); CREATE TABLE S1/N1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN S1/N2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN S2/N1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN S1/N1: INSERT INTO t1 VALUES (1); INSERT 0 1 S1/N2: INSERT INTO t1 VALUES (1); INSERT 0 1 S1/N1: PREPARE TRANSACTION 's1n1'; PREPARE TRANSACTION S1/N2: PREPARE TRANSACTION 's1n2'; PREPARE TRANSACTION S2/N1: PREPARE TRANSACTION 's2n1'; PREPARE TRANSACTION S1/N1: COMMIT PREPARED 's1n1'; COMMIT PREPARED S2/N1: SELECT * FROM t1; -- see the row i --- 1 (1 row) S2/N2: SELECT * FROM t1; -- doesn't see the row i --- (0 rows) S1/N2: COMMIT PREPARED 's1n2'; COMMIT PREPARED S2/N1: COMMIT PREPARED 's2n1'; COMMIT PREPARED Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Mon, Jun 15, 2020 at 12:30 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > >> Another approach to the atomic visibility problem is to control > >> snapshot acquisition timing and commit timing (plus using REPEATABLE > >> READ). In the REPEATABLE READ transaction isolation level, PostgreSQL > >> assigns a snapshot at the time when the first command is executed in a > >> transaction. If we could prevent any commit while any transaction is > >> acquiring snapshot, and we could prevent any snapshot acquisition while > >> committing, visibility inconsistency which Amit explained can be > >> avoided. > >> > > > > I think the problem mentioned above can occur with this as well or if > > I am missing something then can you explain in further detail how it > > won't create problem in the scenario I have used above? > > So the problem you mentioned above is like this? (S1/S2 denotes > transactions (sessions), N1/N2 is the postgreSQL servers). Since S1 > already committed on N1, S2 sees the row on N1. However S2 does not > see the row on N2 since S1 has not committed on N2 yet. > Yeah, something on these lines but S2 can execute the query on N1 directly which should fetch the data from both N1 and N2. Even if there is a solution using REPEATABLE READ isolation level we might not prefer to use that as the only level for distributed transactions, it might be too costly but let us first see how does it solve the problem? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, 15 Jun 2020 at 15:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Jun 14, 2020 at 2:21 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > > >> Won't it create an inconsistency in viewing the data from the > > >> different servers? Say, such a transaction inserts one row into a > > >> local server and another into the foreign server. Now, if we follow > > >> the above protocol, the user will be able to see the row from the > > >> local server but not from the foreign server. > > > > > > Yes, you're right. This atomic commit feature doesn't guarantee such > > > consistent visibility so-called atomic visibility. > > Okay, I understand that the purpose of this feature is to provide > atomic commit which means the transaction on all servers involved will > either commit or rollback. However, I think we should at least see at > a high level how the visibility will work because it might influence > the implementation of this feature. > > > > Even the local > > > server is not modified, since a resolver process commits prepared > > > foreign transactions one by one another user could see an inconsistent > > > result. Providing globally consistent snapshots to transactions > > > involving foreign servers is one of the solutions. > > How would it be able to do that? Say, when it decides to take a > snapshot the transaction on the foreign server appears to be committed > but the transaction on the local server won't appear to be committed, > so the consistent data visibility problem as mentioned above could > still arise. There are many solutions. For instance, in Postgres-XC/X2 (and maybe XL), there is a GTM node that is responsible for providing global transaction IDs (GXID) and globally consistent snapshots. All transactions need to access GTM when checking the distributed transaction status as well as starting transactions and ending transactions. IIUC if a global transaction accesses a tuple whose GXID is included in its global snapshot it waits for that transaction to be committed or rolled back. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jun 15, 2020 at 7:06 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 15 Jun 2020 at 15:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Even the local > > > > server is not modified, since a resolver process commits prepared > > > > foreign transactions one by one another user could see an inconsistent > > > > result. Providing globally consistent snapshots to transactions > > > > involving foreign servers is one of the solutions. > > > > How would it be able to do that? Say, when it decides to take a > > snapshot the transaction on the foreign server appears to be committed > > but the transaction on the local server won't appear to be committed, > > so the consistent data visibility problem as mentioned above could > > still arise. > > There are many solutions. For instance, in Postgres-XC/X2 (and maybe > XL), there is a GTM node that is responsible for providing global > transaction IDs (GXID) and globally consistent snapshots. All > transactions need to access GTM when checking the distributed > transaction status as well as starting transactions and ending > transactions. IIUC if a global transaction accesses a tuple whose GXID > is included in its global snapshot it waits for that transaction to be > committed or rolled back. > Is there some mapping between GXID and XIDs allocated for each node or will each node use the GXID as XID to modify the data? Are we fine with parking the work for global snapshots and atomic visibility to a separate patch and just proceed with the design proposed by this patch? I am asking because I thought there might be some impact on the design of this patch based on what we decide for that work. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 16, 2020 at 3:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jun 15, 2020 at 7:06 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 15 Jun 2020 at 15:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > Even the local > > > > > server is not modified, since a resolver process commits prepared > > > > > foreign transactions one by one another user could see an inconsistent > > > > > result. Providing globally consistent snapshots to transactions > > > > > involving foreign servers is one of the solutions. > > > > > > How would it be able to do that? Say, when it decides to take a > > > snapshot the transaction on the foreign server appears to be committed > > > but the transaction on the local server won't appear to be committed, > > > so the consistent data visibility problem as mentioned above could > > > still arise. > > > > There are many solutions. For instance, in Postgres-XC/X2 (and maybe > > XL), there is a GTM node that is responsible for providing global > > transaction IDs (GXID) and globally consistent snapshots. All > > transactions need to access GTM when checking the distributed > > transaction status as well as starting transactions and ending > > transactions. IIUC if a global transaction accesses a tuple whose GXID > > is included in its global snapshot it waits for that transaction to be > > committed or rolled back. > > > > Is there some mapping between GXID and XIDs allocated for each node or > will each node use the GXID as XID to modify the data? Are we fine > with parking the work for global snapshots and atomic visibility to a > separate patch and just proceed with the design proposed by this > patch? Distributed transaction involves, atomic commit, atomic visibility and global consistency. 2PC is the only practical solution for atomic commit. There are some improvements over 2PC but those are add ons to the basic 2PC, which is what this patch provides. Atomic visibility and global consistency however have alternative solutions but all of those solutions require 2PC to be supported. Each of those are large pieces of work and trying to get everything in may not work. Once we have basic 2PC in place, there will be a ground to experiment with solutions for global consistency and atomic visibility. If we manage to do it right, we could make it pluggable as well. So, I think we should concentrate on supporting basic 2PC work now. > I am asking because I thought there might be some impact on > the design of this patch based on what we decide for that work. > Since 2PC is at the heart of any distributed transaction system, the impact will be low. Figuring all of that, without having basic 2PC, will be very hard. -- Best Wishes, Ashutosh Bapat
On Fri, 12 Jun 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > Thank you for your reviews on 0003 patch. I've incorporated your comments. I'll submit the latest version patch later as the design or scope might change as a result of the discussion. > > Few more comments on v22-0003-Documentation-update > -------------------------------------------------------------------------------------- > 1. > + When <literal>disabled</literal> there can be risk of database > + consistency among all servers that involved in the distributed > + transaction when some foreign server crashes during committing the > + distributed transaction. > > Will it read better if rephrase above to something like: "When > <literal>disabled</literal> there can be a risk of database > consistency if one or more foreign servers crashes while committing > the distributed transaction."? Fixed. > > 2. > + <varlistentry > id="guc-foreign-transaction-resolution-rety-interval" > xreflabel="foreign_transaction_resolution_retry_interval"> > + <term><varname>foreign_transaction_resolution_retry_interval</varname> > (<type>integer</type>) > + <indexterm> > + <primary><varname>foreign_transaction_resolution_interval</varname> > configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specify how long the foreign transaction resolver should > wait when the last resolution > + fails before retrying to resolve foreign transaction. This > parameter can only be set in the > + <filename>postgresql.conf</filename> file or on the server > command line. > + </para> > + <para> > + The default value is 10 seconds. > + </para> > + </listitem> > + </varlistentry> > > Typo. <varlistentry > id="guc-foreign-transaction-resolution-rety-interval", spelling of > retry is wrong. Do we really need such a guc parameter? I think we > can come up with some simple algorithm to retry after a few seconds > and then increase that interval of retry if we fail again or something > like that. I don't know how users can come up with some non-default > value for this variable. For example, in a low-reliable network environment, setting lower value would help to minimize the backend wait time in case of connection lost. But I also agree with your point. In terms of implementation, having backends wait for the fixed time is more simple but we can do such incremental interval by remembering the retry count for each foreign transaction. An open question regarding retrying foreign transaction resolution is how we process the case where an involved foreign server is down for a very long. If an online transaction is waiting to be resolved, there is no way to exit from the wait loop other than either the user sends a cancel request or the crashed server is restored. But if the foreign server has to be down for a long time, I think it’s not practical to send a cancel request because the client would need something like a timeout mechanism. So I think it might be better to provide a way to cancel the waiting without the user sending a cancel. For example, having a timeout or having the limit of the retry count. If an in-doubt transaction is waiting to be resolved, we keep trying to resolve the foreign transaction at an interval. But I wonder if the user might want to disable the automatic in-doubt foreign transaction in some cases, for example, where the user knows the crashed server will not be restored for a long time. I’m thinking that we can provide a way to disable automatic foreign transaction resolution or disable it for the particular foreign transaction. > > 3 > + <varlistentry id="guc-foreign-transaction-resolver-timeout" > xreflabel="foreign_transaction_resolver_timeout"> > + <term><varname>foreign_transaction_resolver_timeout</varname> > (<type>integer</type>) > + <indexterm> > + <primary><varname>foreign_transaction_resolver_timeout</varname> > configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Terminate foreign transaction resolver processes that don't > have any foreign > + transactions to resolve longer than the specified number of > milliseconds. > + A value of zero disables the timeout mechanism, meaning it > connects to one > + database until stopping manually. > > Can we mention the function name using which one can stop the resolver process? Fixed. > > 4. > + Using the <productname>PostgreSQL</productname>'s atomic commit ensures that > + all changes on foreign servers end in either commit or rollback using the > + transaction callback routines > > Can we slightly rephase this "Using the PostgreSQL's atomic commit > ensures that all the changes on foreign servers are either committed > or rolled back using the transaction callback routines"? Fixed. > > 5. > + Prepare all transactions on foreign servers. > + <productname>PostgreSQL</productname> distributed transaction manager > + prepares all transaction on the foreign servers if two-phase commit is > + required. Two-phase commit is required when the transaction modifies > + data on two or more servers including the local server itself and > + <xref linkend="guc-foreign-twophase-commit"/> is > + <literal>required</literal>. > > /PostgreSQL/PostgreSQL's. Fixed. > > If all preparations on foreign servers got > + successful go to the next step. > > How about "If the prepare on all foreign servers is successful then go > to the next step"? Fixed. > > Any failure happens in this step, > + the server changes to rollback, then rollback all transactions on both > + local and foreign servers. > > Can we rephrase this line to something like: "If there is any failure > in the prepare phase, the server will rollback all the transactions on > both local and foreign servers."? Fixed. > > What if the issued Rollback also failed, say due to network breakdown > between local and one of foreign servers? Shouldn't such a > transaction be 'in-doubt' state? Rollback API to rollback transaction in one-phase can be called recursively. So FDWs have to tolerate recursive calling. In the current patch, all transaction operations are performed synchronously. That is, foreign transaction never becomes in-doubt state without explicit cancel by the user or the local node crash. That way, subsequent transactions can assume that precedent distributed transactions are already resolved unless the user canceled. Let me explain the details: If the transaction turns rollback due to failure before the local commit, we attempt to do both ROLLBACK and ROLLBACK PREPARED against foreign transactions whose status is PREPARING. That is, we end the foreign transactions by doing ROLLBACK. And since we're not sure preparation has been completed on the foreign server the backend asks the resolver process for doing ROLLBACK PREPARED on the foreign servers. Therefore FDWs have to tolerate OBJECT_NOT_FOUND error in abort case. Since the backend process returns an acknowledgment to the client only after rolling back all foreign transactions, these foreign transactional don't remain as in-doubt state. If rolling back failed after the local commit (i.g., the client does ROLLBACK and the resolver failed to do ROLLBACK PREPARED), a resolver process will relaunch and retry to do ROLLBACK PREPARED. The backend process waits until ROLLBACK PREPARED is successfully done or the user cancels. So the foreign transactions don't become in-doubt transactions. Synchronousness is also an open question. If we want to support atomic commit in an asynchronous manner it might be better to implement it first in terms of complexity. The backend returns an acknowledgment to the client immediately after asking the resolver process. It’s known as the early acknowledgment technique. The downside is that the user who wants to see the result of precedent transaction needs to make sure the precedent transaction is committed on all foreign servers. We will also need to think about how to control it by GUC parameter when we have synchronous distributed transaction commit. Perhaps it’s better to control it independent of synchronous replication. > > 6. > + <para> > + Commit locally. The server commits transaction locally. Any > failure happens > + in this step the server changes to rollback, then rollback all > transactions > + on both local and foreign servers. > + </para> > + </listitem> > + <listitem> > + <para> > + Resolve all prepared transaction on foreign servers. Pprepared > transactions > + are committed or rolled back according to the result of the > local transaction. > + This step is normally performed by a foreign transaction > resolver process. > + </para> > > When (in which step) do we commit on foreign servers? Do Resolver > processes commit on foreign servers, if so, how can we commit locally > without committing on foreign servers, what if the commit on one of > the servers fails? It is not very clear to me from the steps mentioned > here? In case 2pc is required, we commit transactions on foreign servers at the final step by the resolver process. If the committing a prepared transaction on one of the servers fails, a resolver process relaunches after an interval and retry to commit. In case 2pc is not required, we commit transactions on foreign servers at pre-commit phase by the backend. > Typo, /Pprepared/Prepared Fixed. > > 7. > However, foreign transactions > + become <firstterm>in-doubt</firstterm> in three cases: where the foreign > + server crashed or lost the connectibility to it during preparing foreign > + transaction, where the local node crashed during either preparing or > + resolving foreign transaction and where user canceled the query. > > Here the three cases are not very clear. You might want to use (a) > ..., (b) .. ,(c).. Fixed. I change it to itemizedlist. > Also, I think the state will be in-doubt even when > we lost connection to server during commit or rollback. Let me correct the cases of the foreign transactions remain as in-doubt state. There are two cases: * The local node crashed * The user canceled the transaction commit or rollback. Even when we lost connection to the server during commit or rollback prepared transaction, a backend doesn’t return an acknowledgment to the client until either transaction is successfully resolved, the user cancels the transaction, or the local node crashes. > > 8. > + One foreign transaction resolver is responsible for transaction resolutions > + on which one database connecting. > > Can we rephrase it to: "One foreign transaction resolver is > responsible for transaction resolutions on the database to which it is > connected."? Fixed. > > 9. > + Note that other <productname>PostgreSQL</productname> feature > such as parallel > + queries, logical replication, etc., also take worker slots from > + <varname>max_worker_processes</varname>. > > /feature/features Fixed. > > 10. > + <para> > + Atomic commit requires several configuration options to be set. > + On the local node, <xref > linkend="guc-max-prepared-foreign-transactions"/> and > + <xref linkend="guc-max-foreign-transaction-resolvers"/> must be > non-zero value. > + Additionally the <varname>max_worker_processes</varname> may need > to be adjusted to > + accommodate for foreign transaction resolver workers, at least > + (<varname>max_foreign_transaction_resolvers</varname> + > <literal>1</literal>). > + Note that other <productname>PostgreSQL</productname> feature > such as parallel > + queries, logical replication, etc., also take worker slots from > + <varname>max_worker_processes</varname>. > + </para> > > Don't we need to mention foreign_twophase_commit GUC here? Fixed. > > 11. > + <sect2 id="fdw-callbacks-transaction-managements"> > + <title>FDW Routines For Transaction Managements</title> > > Managements/Management? Fixed. > > 12. > + Transaction management callbacks are used for doing commit, rollback and > + prepare the foreign transaction. > > Lets write the above sentence as: "Transaction management callbacks > are used to commit, rollback and prepare the foreign transaction." Fixed. > > 13. > + <para> > + Transaction management callbacks are used for doing commit, rollback and > + prepare the foreign transaction. If an FDW wishes that its foreign > + transaction is managed by <productname>PostgreSQL</productname>'s global > + transaction manager it must provide both > + <function>CommitForeignTransaction</function> and > + <function>RollbackForeignTransaction</function>. In addition, if an FDW > + wishes to support <firstterm>atomic commit</firstterm> (as described in > + <xref linkend="fdw-transaction-managements"/>), it must provide > + <function>PrepareForeignTransaction</function> as well and can provide > + <function>GetPrepareId</function> callback optionally. > + </para> > > What exact functionality a FDW can accomplish if it just supports > CommitForeignTransaction and RollbackForeignTransaction? It seems it > doesn't care for 2PC, if so, is there any special functionality we can > achieve with this which we can't do without these APIs? There is no special functionality even if an FDW implements CommitForeignTrasnaction and RollbackForeignTransaction. Currently, since there is no transaction API in FDW APIs, FDW developer has to use XactCallback to control transactions but there is no documentation. The idea of allowing an FDW to support only CommitForeignTrasnaction and RollbackForeignTransaction is that FDW developers can implement transaction management easily. But in the first patch, we also can disallow it to make the implementation simple. > > 14. > +PrepareForeignTransaction(FdwXactRslvState *frstate); > +</programlisting> > + Prepare the transaction on the foreign server. This function is > called at the > + pre-commit phase of the local transactions if foreign twophase commit is > + required. This function is used only for distribute transaction management > + (see <xref linkend="distributed-transaction"/>). > + </para> > > /distribute/distributed Fixed. > > 15. > + <sect2 id="fdw-transaction-commit-rollback"> > + <title>Commit And Rollback Single Foreign Transaction</title> > + <para> > + The FDW callback function <literal>CommitForeignTransaction</literal> > + and <literal>RollbackForeignTransaction</literal> can be used to commit > + and rollback the foreign transaction. During transaction commit, the core > + transaction manager calls > <literal>CommitForeignTransaction</literal> function > + in the pre-commit phase and calls > + <literal>RollbackForeignTransaction</literal> function in the > post-rollback > + phase. > + </para> > > There is no reasoning mentioned as to why CommitForeignTransaction has > to be called in pre-commit phase and RollbackForeignTransaction in > post-rollback phase? Basically why one in pre phase and other in post > phase? Good point. This behavior just follows what postgres_fdw does. I'm not sure the exact reason why postgres_fdw commit the transaction in pre-commit phase but I guess the committing a foreign transaction is likely to abort comparing to the local commit, it might be better to do first. > > 16. > + <entry> > + <literal><function>pg_remove_foreign_xact(<parameter>transaction</parameter> > <type>xid</type>, <parameter>serverid</parameter> <type>oid</type>, > <parameter>userid</parameter> <type>oid</type>)</function></literal> > + </entry> > + <entry><type>void</type></entry> > + <entry> > + This function works the same as > <function>pg_resolve_foreign_xact</function> > + except that this removes the foreign transcation entry > without resolution. > + </entry> > > Can we write why and when such a function can be used? Typo, > /trasnaction/transaction Fixed. > > 17. > + <row> > + <entry><literal>FdwXactResolutionLock</literal></entry> > + <entry>Waiting to read or update information of foreign trasnaction > + resolution.</entry> > + </row> > > /trasnaction/transaction Fixed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> > I think the problem mentioned above can occur with this as well or if >> > I am missing something then can you explain in further detail how it >> > won't create problem in the scenario I have used above? >> >> So the problem you mentioned above is like this? (S1/S2 denotes >> transactions (sessions), N1/N2 is the postgreSQL servers). Since S1 >> already committed on N1, S2 sees the row on N1. However S2 does not >> see the row on N2 since S1 has not committed on N2 yet. >> > > Yeah, something on these lines but S2 can execute the query on N1 > directly which should fetch the data from both N1 and N2. The algorythm assumes that any client should access database through a middle ware. Such direct access is prohibited. > Even if > there is a solution using REPEATABLE READ isolation level we might not > prefer to use that as the only level for distributed transactions, it > might be too costly but let us first see how does it solve the > problem? The paper extends Snapshot Isolation (SI, which is same as our REPEATABLE READ isolation level) to "Global Snapshot Isolation", GSI). I think GSI will solve the problem (atomic visibility) we are discussing. Unlike READ COMMITTED, REPEATABLE READ acquires snapshot at the time when the first command is executed in a transaction (READ COMMITTED acquires a snapshot at each command in a transaction). Pangea controls the timing of the snapshot acquisition on pair of transactions (S1/N1,N2 or S2/N1,N2) so that each pair acquires the same snapshot. To achieve this, while some transactions are trying to acquire snapshot, any commit operation should be postponed. Likewise any snapshot acquisition should wait until any in progress commit operations are finished (see Algorithm I to III in the paper for more details). With this rule, the previous example now looks like this: you can see SELECT on S2/N1 and S2/N2 give the same result. S1/N1: DROP TABLE t1; DROP TABLE S1/N1: CREATE TABLE t1(i int); CREATE TABLE S1/N2: DROP TABLE t1; DROP TABLE S1/N2: CREATE TABLE t1(i int); CREATE TABLE S1/N1: BEGIN; BEGIN S1/N2: BEGIN; BEGIN S2/N1: BEGIN; BEGIN S1/N1: SET transaction_isolation TO 'repeatable read'; SET S1/N2: SET transaction_isolation TO 'repeatable read'; SET S2/N1: SET transaction_isolation TO 'repeatable read'; SET S1/N1: INSERT INTO t1 VALUES (1); INSERT 0 1 S1/N2: INSERT INTO t1 VALUES (1); INSERT 0 1 S2/N1: SELECT * FROM t1; i --- (0 rows) S2/N2: SELECT * FROM t1; i --- (0 rows) S1/N1: PREPARE TRANSACTION 's1n1'; PREPARE TRANSACTION S1/N2: PREPARE TRANSACTION 's1n2'; PREPARE TRANSACTION S2/N1: PREPARE TRANSACTION 's2n1'; PREPARE TRANSACTION S1/N1: COMMIT PREPARED 's1n1'; COMMIT PREPARED S1/N2: COMMIT PREPARED 's1n2'; COMMIT PREPARED S2/N1: COMMIT PREPARED 's2n1'; COMMIT PREPARED Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Tue, Jun 16, 2020 at 06:42:52PM +0530, Ashutosh Bapat wrote: > > Is there some mapping between GXID and XIDs allocated for each node or > > will each node use the GXID as XID to modify the data? Are we fine > > with parking the work for global snapshots and atomic visibility to a > > separate patch and just proceed with the design proposed by this > > patch? > > Distributed transaction involves, atomic commit, atomic visibility > and global consistency. 2PC is the only practical solution for atomic > commit. There are some improvements over 2PC but those are add ons to > the basic 2PC, which is what this patch provides. Atomic visibility > and global consistency however have alternative solutions but all of > those solutions require 2PC to be supported. Each of those are large > pieces of work and trying to get everything in may not work. Once we > have basic 2PC in place, there will be a ground to experiment with > solutions for global consistency and atomic visibility. If we manage > to do it right, we could make it pluggable as well. So, I think we > should concentrate on supporting basic 2PC work now. Very good summary, thank you. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
> I've attached the new version patch set. 0006 is a separate patch > which introduces 'prefer' mode to foreign_twophase_commit. I hope we can use this feature. Thank you for making patches and discussions. I'm currently understanding the logic and found some minor points to be fixed. I'm sorry if my understanding is wrong. * The v22 patches need rebase as they can't apply to the current master. * FdwXactAtomicCommitParticipants said in src/backend/access/fdwxact/README is not implemented. Is FdwXactParticipants right? * A following comment says that this code is for "One-phase", but second argument of FdwXactParticipantEndTransaction() describes this code is not "onephase". AtEOXact_FdwXact() in fdwxact.c /* One-phase rollback foreign transaction */ FdwXactParticipantEndTransaction(fdw_part, false, false); static void FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool onephase, bool for_commit) * "two_phase_commit" option is mentioned in postgres-fdw.sgml, but I can't find related code. * resolver.c comments have the sentence containing two blanks.(Emergency Termination) * There are some inconsistency with PostgreSQL wiki. https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions I understand it's difficult to keep consistency, I think it's ok to fix later when these patches almost be able to be committed. - I can't find "two_phase_commit" option in the source code. But 2PC is work if the remote server's "max_prepared_transactions" is set to non zero value. It is correct work, isn't it? - some parameters are renamed or added in latest patches. max_prepared_foreign_transaction, max_prepared_transactions and so on. - typo: froeign_transaction_resolver_timeout Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Tue, Jun 16, 2020 at 8:06 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > >> > I think the problem mentioned above can occur with this as well or if > >> > I am missing something then can you explain in further detail how it > >> > won't create problem in the scenario I have used above? > >> > >> So the problem you mentioned above is like this? (S1/S2 denotes > >> transactions (sessions), N1/N2 is the postgreSQL servers). Since S1 > >> already committed on N1, S2 sees the row on N1. However S2 does not > >> see the row on N2 since S1 has not committed on N2 yet. > >> > > > > Yeah, something on these lines but S2 can execute the query on N1 > > directly which should fetch the data from both N1 and N2. > > The algorythm assumes that any client should access database through a > middle ware. Such direct access is prohibited. > okay, so it seems we need few things which middleware (Pangea) expects if we have to follow the design of paper. > > Even if > > there is a solution using REPEATABLE READ isolation level we might not > > prefer to use that as the only level for distributed transactions, it > > might be too costly but let us first see how does it solve the > > problem? > > The paper extends Snapshot Isolation (SI, which is same as our > REPEATABLE READ isolation level) to "Global Snapshot Isolation", GSI). > I think GSI will solve the problem (atomic visibility) we are > discussing. > > Unlike READ COMMITTED, REPEATABLE READ acquires snapshot at the time > when the first command is executed in a transaction (READ COMMITTED > acquires a snapshot at each command in a transaction). Pangea controls > the timing of the snapshot acquisition on pair of transactions > (S1/N1,N2 or S2/N1,N2) so that each pair acquires the same > snapshot. To achieve this, while some transactions are trying to > acquire snapshot, any commit operation should be postponed. Likewise > any snapshot acquisition should wait until any in progress commit > operations are finished (see Algorithm I to III in the paper for more > details). > I haven't read the paper completely but it sounds quite restrictive (like both commits and snapshots need to wait). Another point is that do we want some middleware involved in the solution? The main thing I was looking into at this stage is do we think that the current implementation proposed by the patch for 2PC is generic enough that we would be later able to integrate the solution for atomic visibility? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 17 Jun 2020 at 09:01, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > I've attached the new version patch set. 0006 is a separate patch > > which introduces 'prefer' mode to foreign_twophase_commit. > > I hope we can use this feature. Thank you for making patches and > discussions. > I'm currently understanding the logic and found some minor points to be > fixed. > > I'm sorry if my understanding is wrong. > > * The v22 patches need rebase as they can't apply to the current master. > > * FdwXactAtomicCommitParticipants said in > src/backend/access/fdwxact/README > is not implemented. Is FdwXactParticipants right? Right. > > * A following comment says that this code is for "One-phase", > but second argument of FdwXactParticipantEndTransaction() describes > this code is not "onephase". > > AtEOXact_FdwXact() in fdwxact.c > /* One-phase rollback foreign transaction */ > FdwXactParticipantEndTransaction(fdw_part, false, false); > > static void > FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool > onephase, > bool for_commit) > > * "two_phase_commit" option is mentioned in postgres-fdw.sgml, > but I can't find related code. > > * resolver.c comments have the sentence > containing two blanks.(Emergency Termination) > > * There are some inconsistency with PostgreSQL wiki. > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions > > I understand it's difficult to keep consistency, I think it's ok to > fix later > when these patches almost be able to be committed. > > - I can't find "two_phase_commit" option in the source code. > But 2PC is work if the remote server's "max_prepared_transactions" > is set > to non zero value. It is correct work, isn't it? Yes. I had removed two_phase_commit option from postgres_fdw. Currently, postgres_fdw uses 2pc when 2pc is required. Therefore, max_prepared_transactions needs to be set to more than one, as you mentioned. > > - some parameters are renamed or added in latest patches. > max_prepared_foreign_transaction, max_prepared_transactions and so > on. > > - typo: froeign_transaction_resolver_timeout > Thank you for your review! I've incorporated your comments on the local branch. I'll share the latest version patch. Also, I've updated the wiki page. I'll try to keep the wiki page up-to-date. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 16, 2020 at 6:43 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Tue, Jun 16, 2020 at 3:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Is there some mapping between GXID and XIDs allocated for each node or > > will each node use the GXID as XID to modify the data? Are we fine > > with parking the work for global snapshots and atomic visibility to a > > separate patch and just proceed with the design proposed by this > > patch? > > Distributed transaction involves, atomic commit, atomic visibility > and global consistency. 2PC is the only practical solution for atomic > commit. There are some improvements over 2PC but those are add ons to > the basic 2PC, which is what this patch provides. Atomic visibility > and global consistency however have alternative solutions but all of > those solutions require 2PC to be supported. Each of those are large > pieces of work and trying to get everything in may not work. Once we > have basic 2PC in place, there will be a ground to experiment with > solutions for global consistency and atomic visibility. If we manage > to do it right, we could make it pluggable as well. > I think it is easier said than done. If you want to make it pluggable or want alternative solutions to adapt the 2PC support provided by us we should have some idea how those alternative solutions look like. I am not telling we have to figure out each and every detail of those solutions but without paying any attention to the high-level picture we might end up doing something for 2PC here which either needs a lot of modifications or might need a design change which would be bad. Basically, if we later decide to use something like Global Xid to achieve other features then what we are doing here might not work. I think it is a good idea to complete the work in pieces where each piece is useful on its own but without having clarity on the overall solution that could be a recipe for disaster. It is possible that you have some idea in your mind where you can see clearly how this piece of work can fit in the bigger picture but it is not very apparent to others or doesn't seem to be documented anywhere. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
> okay, so it seems we need few things which middleware (Pangea) expects > if we have to follow the design of paper. Yes. > I haven't read the paper completely but it sounds quite restrictive > (like both commits and snapshots need to wait). Maybe. There is a performance evaluation in the paper. You might want to take a look at it. > Another point is that > do we want some middleware involved in the solution? The main thing > I was looking into at this stage is do we think that the current > implementation proposed by the patch for 2PC is generic enough that we > would be later able to integrate the solution for atomic visibility? My concern is, FDW+2PC without atomic visibility could lead to data inconsistency among servers in some cases. If my understanding is correct, FDW+2PC (without atomic visibility) cannot prevent data inconsistency in the case below. Initially table t1 has only one row with i = 0 on both N1 and N2. By executing S1 and S2 concurrently, t1 now has different value of i, 0 and 1. S1/N1: DROP TABLE t1; DROP TABLE S1/N1: CREATE TABLE t1(i int); CREATE TABLE S1/N1: INSERT INTO t1 VALUES(0); INSERT 0 1 S1/N2: DROP TABLE t1; DROP TABLE S1/N2: CREATE TABLE t1(i int); CREATE TABLE S1/N2: INSERT INTO t1 VALUES(0); INSERT 0 1 S1/N1: BEGIN; BEGIN S1/N2: BEGIN; BEGIN S1/N1: UPDATE t1 SET i = i + 1; -- i = 1 UPDATE 1 S1/N2: UPDATE t1 SET i = i + 1; -- i = 1 UPDATE 1 S1/N1: PREPARE TRANSACTION 's1n1'; PREPARE TRANSACTION S1/N1: COMMIT PREPARED 's1n1'; COMMIT PREPARED S2/N1: BEGIN; BEGIN S2/N2: BEGIN; BEGIN S2/N2: DELETE FROM t1 WHERE i = 1; DELETE 0 S2/N1: DELETE FROM t1 WHERE i = 1; DELETE 1 S1/N2: PREPARE TRANSACTION 's1n2'; PREPARE TRANSACTION S2/N1: PREPARE TRANSACTION 's2n1'; PREPARE TRANSACTION S2/N2: PREPARE TRANSACTION 's2n2'; PREPARE TRANSACTION S1/N2: COMMIT PREPARED 's1n2'; COMMIT PREPARED S2/N1: COMMIT PREPARED 's2n1'; COMMIT PREPARED S2/N2: COMMIT PREPARED 's2n2'; COMMIT PREPARED S2/N1: SELECT * FROM t1; i --- (0 rows) S2/N2: SELECT * FROM t1; i --- 1 (1 row) Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Thu, 18 Jun 2020 at 08:31, Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > okay, so it seems we need few things which middleware (Pangea) expects > > if we have to follow the design of paper. > > Yes. > > > I haven't read the paper completely but it sounds quite restrictive > > (like both commits and snapshots need to wait). > > Maybe. There is a performance evaluation in the paper. You might want > to take a look at it. > > > Another point is that > > do we want some middleware involved in the solution? The main thing > > I was looking into at this stage is do we think that the current > > implementation proposed by the patch for 2PC is generic enough that we > > would be later able to integrate the solution for atomic visibility? > > My concern is, FDW+2PC without atomic visibility could lead to data > inconsistency among servers in some cases. If my understanding is > correct, FDW+2PC (without atomic visibility) cannot prevent data > inconsistency in the case below. Initially table t1 has only one row > with i = 0 on both N1 and N2. By executing S1 and S2 concurrently, t1 > now has different value of i, 0 and 1. IIUC the following sequence won't happen because COMMIT PREPARED 's1n1' cannot be executed before PREPARE TRANSACTION 's1n2'. But as you mentioned, we cannot prevent data inconsistency even with FDW+2PC e.g., when S2 starts a transaction between COMMIT PREPARED on N1 and COMMIT PREPARED on N2 by S1. The point is this data inconsistency is lead by an inconsistent read but not by an inconsistent commit results. I think there are kinds of possibilities causing data inconsistency but atomic commit and atomic visibility eliminate different possibilities. We can eliminate all possibilities of data inconsistency only after we support 2PC and globally MVCC. > > S1/N1: DROP TABLE t1; > DROP TABLE > S1/N1: CREATE TABLE t1(i int); > CREATE TABLE > S1/N1: INSERT INTO t1 VALUES(0); > INSERT 0 1 > S1/N2: DROP TABLE t1; > DROP TABLE > S1/N2: CREATE TABLE t1(i int); > CREATE TABLE > S1/N2: INSERT INTO t1 VALUES(0); > INSERT 0 1 > S1/N1: BEGIN; > BEGIN > S1/N2: BEGIN; > BEGIN > S1/N1: UPDATE t1 SET i = i + 1; -- i = 1 > UPDATE 1 > S1/N2: UPDATE t1 SET i = i + 1; -- i = 1 > UPDATE 1 > S1/N1: PREPARE TRANSACTION 's1n1'; > PREPARE TRANSACTION > S1/N1: COMMIT PREPARED 's1n1'; > COMMIT PREPARED > S2/N1: BEGIN; > BEGIN > S2/N2: BEGIN; > BEGIN > S2/N2: DELETE FROM t1 WHERE i = 1; > DELETE 0 > S2/N1: DELETE FROM t1 WHERE i = 1; > DELETE 1 > S1/N2: PREPARE TRANSACTION 's1n2'; > PREPARE TRANSACTION > S2/N1: PREPARE TRANSACTION 's2n1'; > PREPARE TRANSACTION > S2/N2: PREPARE TRANSACTION 's2n2'; > PREPARE TRANSACTION > S1/N2: COMMIT PREPARED 's1n2'; > COMMIT PREPARED > S2/N1: COMMIT PREPARED 's2n1'; > COMMIT PREPARED > S2/N2: COMMIT PREPARED 's2n2'; > COMMIT PREPARED > S2/N1: SELECT * FROM t1; > i > --- > (0 rows) > > S2/N2: SELECT * FROM t1; > i > --- > 1 > (1 row) > Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> My concern is, FDW+2PC without atomic visibility could lead to data >> inconsistency among servers in some cases. If my understanding is >> correct, FDW+2PC (without atomic visibility) cannot prevent data >> inconsistency in the case below. Initially table t1 has only one row >> with i = 0 on both N1 and N2. By executing S1 and S2 concurrently, t1 >> now has different value of i, 0 and 1. > > IIUC the following sequence won't happen because COMMIT PREPARED > 's1n1' cannot be executed before PREPARE TRANSACTION 's1n2'. You are right. > But as > you mentioned, we cannot prevent data inconsistency even with FDW+2PC > e.g., when S2 starts a transaction between COMMIT PREPARED on N1 and > COMMIT PREPARED on N2 by S1. Ok, example updated. S1/N1: DROP TABLE t1; DROP TABLE S1/N1: CREATE TABLE t1(i int); CREATE TABLE S1/N1: INSERT INTO t1 VALUES(0); INSERT 0 1 S1/N2: DROP TABLE t1; DROP TABLE S1/N2: CREATE TABLE t1(i int); CREATE TABLE S1/N2: INSERT INTO t1 VALUES(0); INSERT 0 1 S1/N1: BEGIN; BEGIN S1/N2: BEGIN; BEGIN S1/N1: UPDATE t1 SET i = i + 1; -- i = 1 UPDATE 1 S1/N2: UPDATE t1 SET i = i + 1; -- i = 1 UPDATE 1 S2/N1: BEGIN; BEGIN S2/N2: BEGIN; BEGIN S1/N1: PREPARE TRANSACTION 's1n1'; PREPARE TRANSACTION S1/N2: PREPARE TRANSACTION 's1n2'; PREPARE TRANSACTION S2/N1: PREPARE TRANSACTION 's2n1'; PREPARE TRANSACTION S2/N2: PREPARE TRANSACTION 's2n2'; PREPARE TRANSACTION S1/N1: COMMIT PREPARED 's1n1'; COMMIT PREPARED S2/N1: DELETE FROM t1 WHERE i = 1; DELETE 1 S2/N2: DELETE FROM t1 WHERE i = 1; DELETE 0 S1/N2: COMMIT PREPARED 's1n2'; COMMIT PREPARED S2/N1: COMMIT PREPARED 's2n1'; COMMIT PREPARED S2/N2: COMMIT PREPARED 's2n2'; COMMIT PREPARED S2/N1: SELECT * FROM t1; i --- (0 rows) S2/N2: SELECT * FROM t1; i --- 1 (1 row) > The point is this data inconsistency is > lead by an inconsistent read but not by an inconsistent commit > results. I think there are kinds of possibilities causing data > inconsistency but atomic commit and atomic visibility eliminate > different possibilities. We can eliminate all possibilities of data > inconsistency only after we support 2PC and globally MVCC. IMO any permanent data inconsistency is a serious problem for users no matter what the technical reasons are. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Thu, Jun 18, 2020 at 5:01 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > Another point is that > > do we want some middleware involved in the solution? The main thing > > I was looking into at this stage is do we think that the current > > implementation proposed by the patch for 2PC is generic enough that we > > would be later able to integrate the solution for atomic visibility? > > My concern is, FDW+2PC without atomic visibility could lead to data > inconsistency among servers in some cases. If my understanding is > correct, FDW+2PC (without atomic visibility) cannot prevent data > inconsistency in the case below. > You are right and we are not going to claim that after this feature is committed. This feature has independent use cases like it can allow parallel copy when foreign tables are involved once we have parallel copy and surely there will be more. I think it is clear that we need atomic visibility (some way to ensure global consistency) to avoid the data inconsistency problems you and I are worried about and we can do that as a separate patch but at this stage, it would be good if we can have some high-level design of that as well so that if we need some adjustments in the design/implementation of this patch then we can do it now. I think there is some discussion on the other threads (like [1]) about the kind of stuff we are worried about which I need to follow up on to study the impact. Having said that, I don't think that is a reason to stop reviewing or working on this patch. [1] - https://www.postgresql.org/message-id/flat/21BC916B-80A1-43BF-8650-3363CCDAE09C%40postgrespro.ru -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 18, 2020 at 04:09:56PM +0530, Amit Kapila wrote: > You are right and we are not going to claim that after this feature is > committed. This feature has independent use cases like it can allow > parallel copy when foreign tables are involved once we have parallel > copy and surely there will be more. I think it is clear that we need > atomic visibility (some way to ensure global consistency) to avoid the > data inconsistency problems you and I are worried about and we can do > that as a separate patch but at this stage, it would be good if we can > have some high-level design of that as well so that if we need some > adjustments in the design/implementation of this patch then we can do > it now. I think there is some discussion on the other threads (like > [1]) about the kind of stuff we are worried about which I need to > follow up on to study the impact. > > Having said that, I don't think that is a reason to stop reviewing or > working on this patch. I think our first step is to allow sharding to work on read-only databases, e.g. data warehousing. Read/write will require global snapshots. It is true that 2PC is limited usefulness without global snapshots, because, by definition, systems using 2PC are read-write systems. However, I can see cases where you are loading data into a data warehouse but want 2PC so the systems remain consistent even if there is a crash during loading. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Thu, Jun 18, 2020 at 6:49 PM Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Jun 18, 2020 at 04:09:56PM +0530, Amit Kapila wrote: > > You are right and we are not going to claim that after this feature is > > committed. This feature has independent use cases like it can allow > > parallel copy when foreign tables are involved once we have parallel > > copy and surely there will be more. I think it is clear that we need > > atomic visibility (some way to ensure global consistency) to avoid the > > data inconsistency problems you and I are worried about and we can do > > that as a separate patch but at this stage, it would be good if we can > > have some high-level design of that as well so that if we need some > > adjustments in the design/implementation of this patch then we can do > > it now. I think there is some discussion on the other threads (like > > [1]) about the kind of stuff we are worried about which I need to > > follow up on to study the impact. > > > > Having said that, I don't think that is a reason to stop reviewing or > > working on this patch. > > I think our first step is to allow sharding to work on read-only > databases, e.g. data warehousing. Read/write will require global > snapshots. It is true that 2PC is limited usefulness without global > snapshots, because, by definition, systems using 2PC are read-write > systems. However, I can see cases where you are loading data into a > data warehouse but want 2PC so the systems remain consistent even if > there is a crash during loading. > For sharding, just implementing 2PC without global consistency provides limited functionality. But for general purpose federated databases 2PC serves an important functionality - atomic visibility. When PostgreSQL is used as one of the coordinators in a heterogeneous federated database system, it's not expected to have global consistency or even atomic visibility. But it needs a guarantee that once a transaction commit, all its legs are committed. 2PC provides that guarantee as long as the other databases keep their promise that prepared transactions will always get committed when requested so. Subtle to this is HA requirement from these databases as well. So the functionality provided by this patch is important outside the sharding case as well. As you said, even for a data warehousing application, there is some write in the form of loading/merging data. If that write happens across multiple servers, we need atomic commit to be guaranteed. Some of these applications can work even if global consistency and atomic visibility is guaranteed eventually. -- Best Wishes, Ashutosh Bapat
On Wed, 17 Jun 2020 at 14:07, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 17 Jun 2020 at 09:01, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > > I've attached the new version patch set. 0006 is a separate patch > > > which introduces 'prefer' mode to foreign_twophase_commit. > > > > I hope we can use this feature. Thank you for making patches and > > discussions. > > I'm currently understanding the logic and found some minor points to be > > fixed. > > > > I'm sorry if my understanding is wrong. > > > > * The v22 patches need rebase as they can't apply to the current master. > > > > * FdwXactAtomicCommitParticipants said in > > src/backend/access/fdwxact/README > > is not implemented. Is FdwXactParticipants right? > > Right. > > > > > * A following comment says that this code is for "One-phase", > > but second argument of FdwXactParticipantEndTransaction() describes > > this code is not "onephase". > > > > AtEOXact_FdwXact() in fdwxact.c > > /* One-phase rollback foreign transaction */ > > FdwXactParticipantEndTransaction(fdw_part, false, false); > > > > static void > > FdwXactParticipantEndTransaction(FdwXactParticipant *fdw_part, bool > > onephase, > > bool for_commit) > > > > * "two_phase_commit" option is mentioned in postgres-fdw.sgml, > > but I can't find related code. > > > > * resolver.c comments have the sentence > > containing two blanks.(Emergency Termination) > > > > * There are some inconsistency with PostgreSQL wiki. > > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions > > > > I understand it's difficult to keep consistency, I think it's ok to > > fix later > > when these patches almost be able to be committed. > > > > - I can't find "two_phase_commit" option in the source code. > > But 2PC is work if the remote server's "max_prepared_transactions" > > is set > > to non zero value. It is correct work, isn't it? > > Yes. I had removed two_phase_commit option from postgres_fdw. > Currently, postgres_fdw uses 2pc when 2pc is required. Therefore, > max_prepared_transactions needs to be set to more than one, as you > mentioned. > > > > > - some parameters are renamed or added in latest patches. > > max_prepared_foreign_transaction, max_prepared_transactions and so > > on. > > > > - typo: froeign_transaction_resolver_timeout > > > > Thank you for your review! I've incorporated your comments on the > local branch. I'll share the latest version patch. > > Also, I've updated the wiki page. I'll try to keep the wiki page up-to-date. > I've attached the latest version patches. I've incorporated the review comments I got so far and improved locking strategy. Please review it. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
- v23-0007-Add-prefer-mode-to-foreign_twophase_commit.patch
- v23-0006-Add-regression-tests-for-foreign-twophase-commit.patch
- v23-0005-postgres_fdw-supports-atomic-commit-APIs.patch
- v23-0004-Documentation-update.patch
- v23-0003-Support-atomic-commit-among-multiple-foreign-ser.patch
- v23-0002-Recreate-RemoveForeignServerById.patch
- v23-0001-Keep-track-of-writing-on-non-temporary-relation.patch
On Tue, Jun 23, 2020 at 9:03 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > > I've attached the latest version patches. I've incorporated the review > comments I got so far and improved locking strategy. > Thanks for updating the patch. > Please review it. > I think at this stage it is important that we do some study of various approaches to achieve this work and come up with a comparison of the pros and cons of each approach (a) what this patch provides, (b) what is implemented in Global Snapshots patch [1], (c) if possible, what is implemented in Postgres-XL. I fear that if go too far in spending effort on this and later discovered that it can be better done via some other available patch/work (maybe due to a reasons like that approach can easily extended to provide atomic visibility or the design is more robust, etc.) then it can lead to a lot of rework. [1] - https://www.postgresql.org/message-id/20200622150636.GB28999%40momjian.us -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 23 Jun 2020 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 23, 2020 at 9:03 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I've attached the latest version patches. I've incorporated the review > > comments I got so far and improved locking strategy. > > > > Thanks for updating the patch. > > > Please review it. > > > > I think at this stage it is important that we do some study of various > approaches to achieve this work and come up with a comparison of the > pros and cons of each approach (a) what this patch provides, (b) what > is implemented in Global Snapshots patch [1], (c) if possible, what is > implemented in Postgres-XL. I fear that if go too far in spending > effort on this and later discovered that it can be better done via > some other available patch/work (maybe due to a reasons like that > approach can easily extended to provide atomic visibility or the > design is more robust, etc.) then it can lead to a lot of rework. Yeah, I have no objection to that plan but I think we also need to keep in mind that (b), (c), and whatever we are thinking about global consistency are talking about only PostgreSQL (and postgres_fdw). On the other hand, this patch needs to implement the feature that can resolve the atomic commit problem more generically, because the foreign server might be using oracle_fdw, mysql_fdw, or other FDWs connecting database systems supporting 2PC. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 26, 2020 at 10:50 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 23 Jun 2020 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I think at this stage it is important that we do some study of various > > approaches to achieve this work and come up with a comparison of the > > pros and cons of each approach (a) what this patch provides, (b) what > > is implemented in Global Snapshots patch [1], (c) if possible, what is > > implemented in Postgres-XL. I fear that if go too far in spending > > effort on this and later discovered that it can be better done via > > some other available patch/work (maybe due to a reasons like that > > approach can easily extended to provide atomic visibility or the > > design is more robust, etc.) then it can lead to a lot of rework. > > Yeah, I have no objection to that plan but I think we also need to > keep in mind that (b), (c), and whatever we are thinking about global > consistency are talking about only PostgreSQL (and postgres_fdw). > I think we should explore if those approaches could be extended for FDWs and if not then that could be considered as a disadvantage of that approach. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
>> The point is this data inconsistency is >> lead by an inconsistent read but not by an inconsistent commit >> results. I think there are kinds of possibilities causing data >> inconsistency but atomic commit and atomic visibility eliminate >> different possibilities. We can eliminate all possibilities of data >> inconsistency only after we support 2PC and globally MVCC. > > IMO any permanent data inconsistency is a serious problem for users no > matter what the technical reasons are. I have incorporated "Pangea" algorithm into Pgpool-II to implement the atomic visibility. In a test below I have two PostgreSQL servers (stock v12), server0 (port 11002) and server1 (port 11003). default_transaction_isolation was set to 'repeatable read' on both PostgreSQL, this is required by Pangea. Pgpool-II replicates write queries and send them to both server0 and server1. There are two tables "t1" (having only 1 integer column "i") and "log" (having only 1 integer c column "i"). I have run following script (inconsistency1.sql) via pgbench: BEGIN; UPDATE t1 SET i = i + 1; END; like: pgbench -n -c 1 -T 30 -f inconsistency1.sql In the moment I have run another session from pgbench concurrently: BEGIN; INSERT INTO log SELECT * FROM t1; END; pgbench -n -c 1 -T 30 -f inconsistency2.sql After finishing those two pgbench runs, I ran following COPY to see if contents of table "log" are identical in server0 and server1: psql -p 11002 -c "\copy log to '11002.txt'" psql -p 11003 -c "\copy log to '11003.txt'" cmp 11002.txt 11003.txt The new Pgpool-II incorporating Pangea showed that 11002.txt and 11003.txt are identical as expected. This indicates that the atomic visibility are kept. On the other hand Pgpool-II which does not implement Pangea showed differences in those files. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> I've attached the latest version patches. I've incorporated the review > comments I got so far and improved locking strategy. Thanks for updating the patch! I have three questions about the v23 patches. 1. messages related to user canceling In my understanding, there are two messages which can be output when a user cancels the COMMIT command. A. When prepare is failed, the output shows that committed locally but some error is occurred. ``` postgres=*# COMMIT; ^CCancel request sent WARNING: canceling wait for resolving foreign transaction due to user request DETAIL: The transaction has already committed locally, but might not have been committed on the foreign server. ERROR: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. CONTEXT: remote SQL command: PREPARE TRANSACTION 'fx_1020791818_519_16399_10' ``` B. When prepare is succeeded, the output show that committed locally. ``` postgres=*# COMMIT; ^CCancel request sent WARNING: canceling wait for resolving foreign transaction due to user request DETAIL: The transaction has already committed locally, but might not have been committed on the foreign server. COMMIT ``` In case of A, I think that "committed locally" message can confuse user. Because although messages show committed but the transaction is "ABORTED". I think "committed" message means that "ABORT" is committed locally. But is there a possibility of misunderstanding? In case of A, it's better to change message for user friendly, isn't it? 2. typo Is trasnactions in fdwxact.c typo? 3. FdwXactGetWaiter in fdwxact.c return unused value FdwXactGetWaiter is called in FXRslvLoop function. It returns *waitXid_p, but FXRslvloop doesn't seem to use *waitXid_p. Do we need to return it? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On 2020/07/14 9:08, Masahiro Ikeda wrote: >> I've attached the latest version patches. I've incorporated the review >> comments I got so far and improved locking strategy. > > Thanks for updating the patch! +1 I'm interested in these patches and now studying them. While checking the behaviors of the patched PostgreSQL, I got three comments. 1. We can access to the foreign table even during recovery in the HEAD. But in the patched version, when I did that, I got the following error. Is this intentional? ERROR: cannot assign TransactionIds during recovery 2. With the patch, when INSERT/UPDATE/DELETE are executed both in local and remote servers, 2PC is executed at the commit phase. But when write SQL (e.g., TRUNCATE) except INSERT/UPDATE/DELETE are executed in local and INSERT/UPDATE/DELETE are executed in remote, 2PC is NOT executed. Is this safe? 3. XACT_FLAGS_WROTENONTEMPREL is set when INSERT/UPDATE/DELETE are executed. But it's not reset even when those queries are canceled by ROLLBACK TO SAVEPOINT. This may cause unnecessary 2PC at the commit phase. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> I've attached the latest version patches. I've incorporated the review > comments I got so far and improved locking strategy. I want to ask a question about streaming replication with 2PC. Are you going to support 2PC with streaming replication? I tried streaming replication using v23 patches. I confirm that 2PC works with streaming replication, which there are primary/standby coordinator. But, in my understanding, the WAL of "PREPARE" and "COMMIT/ABORT PREPARED" can't be replicated to the standby server in sync. If this is right, the unresolved transaction can be occurred. For example, 1. PREPARE is done 2. crash primary before the WAL related to PREPARE is replicated to the standby server 3. promote standby server // but can't execute "ABORT PREPARED" In above case, the remote server has the unresolved transaction. Can we solve this problem to support in-sync replication? But, I think some users use async replication for performance. Do we need to document the limitation or make another solution? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > I've attached the latest version patches. I've incorporated the review > > comments I got so far and improved locking strategy. > > Thanks for updating the patch! > I have three questions about the v23 patches. > > > 1. messages related to user canceling > > In my understanding, there are two messages > which can be output when a user cancels the COMMIT command. > > A. When prepare is failed, the output shows that > committed locally but some error is occurred. > > ``` > postgres=*# COMMIT; > ^CCancel request sent > WARNING: canceling wait for resolving foreign transaction due to user > request > DETAIL: The transaction has already committed locally, but might not > have been committed on the foreign server. > ERROR: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > CONTEXT: remote SQL command: PREPARE TRANSACTION > 'fx_1020791818_519_16399_10' > ``` > > B. When prepare is succeeded, > the output show that committed locally. > > ``` > postgres=*# COMMIT; > ^CCancel request sent > WARNING: canceling wait for resolving foreign transaction due to user > request > DETAIL: The transaction has already committed locally, but might not > have been committed on the foreign server. > COMMIT > ``` > > In case of A, I think that "committed locally" message can confuse user. > Because although messages show committed but the transaction is > "ABORTED". > > I think "committed" message means that "ABORT" is committed locally. > But is there a possibility of misunderstanding? No, you're right. I'll fix it in the next version patch. I think synchronous replication also has the same problem. It says "the transaction has already committed" but it's not true when executing ROLLBACK PREPARED. BTW how did you test the case (A)? It says canceling wait for foreign transaction resolution but the remote SQL command is PREPARE TRANSACTION. > > In case of A, it's better to change message for user friendly, isn't it? > > > 2. typo > > Is trasnactions in fdwxact.c typo? > Fixed. > > 3. FdwXactGetWaiter in fdwxact.c return unused value > > FdwXactGetWaiter is called in FXRslvLoop function. > It returns *waitXid_p, but FXRslvloop doesn't seem to > use *waitXid_p. Do we need to return it? Removed. I've incorporated the above your comments in the local branch. I'll post the latest version patch after incorporating other comments soon. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/07/15 15:06, Masahiko Sawada wrote: > On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >> >>> I've attached the latest version patches. I've incorporated the review >>> comments I got so far and improved locking strategy. >> >> Thanks for updating the patch! >> I have three questions about the v23 patches. >> >> >> 1. messages related to user canceling >> >> In my understanding, there are two messages >> which can be output when a user cancels the COMMIT command. >> >> A. When prepare is failed, the output shows that >> committed locally but some error is occurred. >> >> ``` >> postgres=*# COMMIT; >> ^CCancel request sent >> WARNING: canceling wait for resolving foreign transaction due to user >> request >> DETAIL: The transaction has already committed locally, but might not >> have been committed on the foreign server. >> ERROR: server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> CONTEXT: remote SQL command: PREPARE TRANSACTION >> 'fx_1020791818_519_16399_10' >> ``` >> >> B. When prepare is succeeded, >> the output show that committed locally. >> >> ``` >> postgres=*# COMMIT; >> ^CCancel request sent >> WARNING: canceling wait for resolving foreign transaction due to user >> request >> DETAIL: The transaction has already committed locally, but might not >> have been committed on the foreign server. >> COMMIT >> ``` >> >> In case of A, I think that "committed locally" message can confuse user. >> Because although messages show committed but the transaction is >> "ABORTED". >> >> I think "committed" message means that "ABORT" is committed locally. >> But is there a possibility of misunderstanding? > > No, you're right. I'll fix it in the next version patch. > > I think synchronous replication also has the same problem. It says > "the transaction has already committed" but it's not true when > executing ROLLBACK PREPARED. Yes. Also the same message is logged when executing PREPARE TRANSACTION. Maybe it should be changed to "the transaction has already prepared". Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-07-15 15:06, Masahiko Sawada wrote: > On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikedamsh@oss.nttdata.com> > wrote: >> >> > I've attached the latest version patches. I've incorporated the review >> > comments I got so far and improved locking strategy. >> >> Thanks for updating the patch! >> I have three questions about the v23 patches. >> >> >> 1. messages related to user canceling >> >> In my understanding, there are two messages >> which can be output when a user cancels the COMMIT command. >> >> A. When prepare is failed, the output shows that >> committed locally but some error is occurred. >> >> ``` >> postgres=*# COMMIT; >> ^CCancel request sent >> WARNING: canceling wait for resolving foreign transaction due to user >> request >> DETAIL: The transaction has already committed locally, but might not >> have been committed on the foreign server. >> ERROR: server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> CONTEXT: remote SQL command: PREPARE TRANSACTION >> 'fx_1020791818_519_16399_10' >> ``` >> >> B. When prepare is succeeded, >> the output show that committed locally. >> >> ``` >> postgres=*# COMMIT; >> ^CCancel request sent >> WARNING: canceling wait for resolving foreign transaction due to user >> request >> DETAIL: The transaction has already committed locally, but might not >> have been committed on the foreign server. >> COMMIT >> ``` >> >> In case of A, I think that "committed locally" message can confuse >> user. >> Because although messages show committed but the transaction is >> "ABORTED". >> >> I think "committed" message means that "ABORT" is committed locally. >> But is there a possibility of misunderstanding? > > No, you're right. I'll fix it in the next version patch. > > I think synchronous replication also has the same problem. It says > "the transaction has already committed" but it's not true when > executing ROLLBACK PREPARED. Thanks for replying and sharing the synchronous replication problem. > BTW how did you test the case (A)? It says canceling wait for foreign > transaction resolution but the remote SQL command is PREPARE > TRANSACTION. I think the timing of failure is important for 2PC test. Since I don't have any good solution to simulate those flexibly, I use the GDB debugger. The message of the case (A) is sent after performing the following operations. 1. Attach the debugger to a backend process. 2. Set a breakpoint to PreCommit_FdwXact() in CommitTransaction(). // Before PREPARE. 3. Execute "BEGIN" and insert data into two remote foreign tables. 4. Issue a "Commit" command 5. The backend process stops at the breakpoint. 6. Stop a remote foreign server. 7. Detach the debugger. // The backend continues and prepare is failed. TR try to abort all remote txs. // It's unnecessary to resolve remote txs which prepare is failed, isn't it? 8. Send a cancel request. BTW, I concerned that how to test the 2PC patches. There are many failure patterns, such as failure timing, failure server/nw (and unexpected recovery), and those combinations... Though it's best to test those failure patterns automatically, I have no idea for now, so I manually check some patterns. > I've incorporated the above your comments in the local branch. I'll > post the latest version patch after incorporating other comments soon. OK, Thanks. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > I've attached the latest version patches. I've incorporated the review > > comments I got so far and improved locking strategy. > > I want to ask a question about streaming replication with 2PC. > Are you going to support 2PC with streaming replication? > > I tried streaming replication using v23 patches. > I confirm that 2PC works with streaming replication, > which there are primary/standby coordinator. > > But, in my understanding, the WAL of "PREPARE" and > "COMMIT/ABORT PREPARED" can't be replicated to the standby server in > sync. > > If this is right, the unresolved transaction can be occurred. > > For example, > > 1. PREPARE is done > 2. crash primary before the WAL related to PREPARE is > replicated to the standby server > 3. promote standby server // but can't execute "ABORT PREPARED" > > In above case, the remote server has the unresolved transaction. > Can we solve this problem to support in-sync replication? > > But, I think some users use async replication for performance. > Do we need to document the limitation or make another solution? > IIUC with synchronous replication, we can guarantee that WAL records are written on both primary and replicas when the client got an acknowledgment of commit. We don't replicate each WAL records generated during transaction one by one in sync. In the case you described, the client will get an error due to the server crash. Therefore I think the user cannot expect WAL records generated so far has been replicated. The same issue could happen also when the user executes PREPARE TRANSACTION and the server crashes. To prevent this issue, I think we would need to send each WAL records in sync but I'm not sure it's reasonable behavior, and as long as we write WAL in the local and then send it to replicas we would need a smart mechanism to prevent this situation. Related to the pointing out by Ikeda-san, I realized that with the current patch the backend waits for synchronous replication and then waits for foreign transaction resolution. But it should be reversed. Otherwise, it could lead to data loss even when the client got an acknowledgment of commit. Also, when the user is using both atomic commit and synchronous replication and wants to cancel waiting, he/she will need to press ctl-c twice with the current patch, which also should be fixed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Sawada san, I'm reviewing this patch series, and let me give some initial comments and questions. I'm looking at this with a hope thatthis will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be used forPostgres scale-out. I don't think it's necessarily required to combine 2PC with the global visibility. X/Open XA specification only handlesthe atomic commit. The only part in the XA specification that refers to global visibility is the following: [Quote from XA specification] -------------------------------------------------- 2.3.2 Protocol Optimisations ・ Read-only An RM can respond to the TM’s prepare request by asserting that the RM was not asked to update shared resources in this transaction branch. This response concludes the RM’s involvement in the transaction; the Phase 2 dialogue between the TM and this RM does not occur. The TM need not stably record, in its list of participating RMs, an RM that asserts a read-only role in the global transaction. However, if the RM returns the read-only optimisation before all work on the global transaction is prepared, global serialisability1 cannot be guaranteed. This is because the RM may release transaction context, such as read locks, before all application activity for that global transaction is finished. 1. Serialisability is a property of a set of concurrent transactions. For a serialisable set of transactions, at least one serial sequence of the transactions exists that produces identical results, with respect to shared resources, as does concurrent execution of the transaction. -------------------------------------------------- (1) Do other popular DBMSs (Oracle, MySQL, etc.) provide concrete functions that can be used for the new FDW commit/rollback/prepareAPI? I'm asking this to confirm that we really need to provide these functions, not as the transactioncallbacks for postgres_fdw. (2) How are data modifications tracked in local and remote transactions? 0001 seems to handle local INSERT/DELETE/UPDATE. Especially: * COPY FROM to local/remote tables/views. * User-defined function calls that modify data, e.g. SELECT func1() WHERE col = func2() (3) Does the 2PC processing always go through the background worker? Is the group commit effective on the remote server? That is, PREPARE and COMMIT PREPARED issued from multiple remote sessionsare written to WAL in batch? Regards Takayuki Tsunakawa
On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/07/14 9:08, Masahiro Ikeda wrote: > >> I've attached the latest version patches. I've incorporated the review > >> comments I got so far and improved locking strategy. > > > > Thanks for updating the patch! > > +1 > I'm interested in these patches and now studying them. While checking > the behaviors of the patched PostgreSQL, I got three comments. Thank you for testing this patch! > > 1. We can access to the foreign table even during recovery in the HEAD. > But in the patched version, when I did that, I got the following error. > Is this intentional? > > ERROR: cannot assign TransactionIds during recovery No, it should be fixed. I'm going to fix this by not collecting participants for atomic commit during recovery. > > 2. With the patch, when INSERT/UPDATE/DELETE are executed both in > local and remote servers, 2PC is executed at the commit phase. But > when write SQL (e.g., TRUNCATE) except INSERT/UPDATE/DELETE are > executed in local and INSERT/UPDATE/DELETE are executed in remote, > 2PC is NOT executed. Is this safe? Hmm, you're right. I think atomic commit must be used also when the user executes other write SQLs such as TRUNCATE, COPY, CLUSTER, and CREATE TABLE on the local node. > > 3. XACT_FLAGS_WROTENONTEMPREL is set when INSERT/UPDATE/DELETE > are executed. But it's not reset even when those queries are canceled by > ROLLBACK TO SAVEPOINT. This may cause unnecessary 2PC at the commit phase. Will fix. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-07-16 13:16, Masahiko Sawada wrote: > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com> > wrote: >> >> > I've attached the latest version patches. I've incorporated the review >> > comments I got so far and improved locking strategy. >> >> I want to ask a question about streaming replication with 2PC. >> Are you going to support 2PC with streaming replication? >> >> I tried streaming replication using v23 patches. >> I confirm that 2PC works with streaming replication, >> which there are primary/standby coordinator. >> >> But, in my understanding, the WAL of "PREPARE" and >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in >> sync. >> >> If this is right, the unresolved transaction can be occurred. >> >> For example, >> >> 1. PREPARE is done >> 2. crash primary before the WAL related to PREPARE is >> replicated to the standby server >> 3. promote standby server // but can't execute "ABORT PREPARED" >> >> In above case, the remote server has the unresolved transaction. >> Can we solve this problem to support in-sync replication? >> >> But, I think some users use async replication for performance. >> Do we need to document the limitation or make another solution? >> > > IIUC with synchronous replication, we can guarantee that WAL records > are written on both primary and replicas when the client got an > acknowledgment of commit. We don't replicate each WAL records > generated during transaction one by one in sync. In the case you > described, the client will get an error due to the server crash. > Therefore I think the user cannot expect WAL records generated so far > has been replicated. The same issue could happen also when the user > executes PREPARE TRANSACTION and the server crashes. Thanks! I didn't noticed the behavior when a user executes PREPARE TRANSACTION is same. IIUC with 2PC, there is a different point between (1)PREPARE TRANSACTION and (2)2PC. The point is that whether the client can know when the server crashed and it's global tx id. If (1)PREPARE TRANSACTION is failed, it's ok the client execute same command because if the remote server is already prepared the command will be ignored. But, if (2)2PC is failed with coordinator crash, the client can't know what operations should be done. If the old coordinator already executed PREPARED, there are some transaction which should be ABORT PREPARED. But if the PREPARED WAL is not sent to the standby, the new coordinator can't execute ABORT PREPARED. And the client can't know which remote servers have PREPARED transactions which should be ABORTED either. Even if the client can know that, only the old coordinator knows its global transaction id. Only the database administrator can analyze the old coordinator's log and then execute the appropriate commands manually, right? > To prevent this > issue, I think we would need to send each WAL records in sync but I'm > not sure it's reasonable behavior, and as long as we write WAL in the > local and then send it to replicas we would need a smart mechanism to > prevent this situation. I agree. To send each 2PC WAL records in sync must be with a large performance impact. At least, we need to document the limitation and how to handle this situation. > Related to the pointing out by Ikeda-san, I realized that with the > current patch the backend waits for synchronous replication and then > waits for foreign transaction resolution. But it should be reversed. > Otherwise, it could lead to data loss even when the client got an > acknowledgment of commit. Also, when the user is using both atomic > commit and synchronous replication and wants to cancel waiting, he/she > will need to press ctl-c twice with the current patch, which also > should be fixed. I'm sorry that I can't understood. In my understanding, if COMMIT WAL is replicated to the standby in sync, the standby server can resolve the transaction after crash recovery in promoted phase. If reversed, there are some situation which can't guarantee atomic commit. In case that some foreign transaction resolutions are succeed but others are failed(and COMMIT WAL is not replicated), the standby must ABORT PREPARED because the COMMIT WAL is not replicated. This means that some foreign transactions are COMMITE PREPARED executed by primary coordinator, other foreign transactions can be ABORT PREPARED executed by secondary coordinator. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Thu, 16 Jul 2020 at 13:53, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > Hi Sawada san, > > > I'm reviewing this patch series, and let me give some initial comments and questions. I'm looking at this with a hopethat this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be usedfor Postgres scale-out. Thank you for reviewing this patch! Yes, this patch is trying to resolve the generic atomic commit problem w.r.t. FDW, and will be useful also for Postgres scale-out. > > I don't think it's necessarily required to combine 2PC with the global visibility. X/Open XA specification only handlesthe atomic commit. The only part in the XA specification that refers to global visibility is the following: > > > [Quote from XA specification] > -------------------------------------------------- > 2.3.2 Protocol Optimisations > ・ Read-only > An RM can respond to the TM’s prepare request by asserting that the RM was not > asked to update shared resources in this transaction branch. This response > concludes the RM’s involvement in the transaction; the Phase 2 dialogue between > the TM and this RM does not occur. The TM need not stably record, in its list of > participating RMs, an RM that asserts a read-only role in the global transaction. > > However, if the RM returns the read-only optimisation before all work on the global > transaction is prepared, global serialisability1 cannot be guaranteed. This is because > the RM may release transaction context, such as read locks, before all application > activity for that global transaction is finished. > > 1. > Serialisability is a property of a set of concurrent transactions. For a serialisable set of transactions, at least one > serial sequence of the transactions exists that produces identical results, with respect to shared resources, as does > concurrent execution of the transaction. > -------------------------------------------------- > Agreed. > > (1) > Do other popular DBMSs (Oracle, MySQL, etc.) provide concrete functions that can be used for the new FDW commit/rollback/prepareAPI? I'm asking this to confirm that we really need to provide these functions, not as the transactioncallbacks for postgres_fdw. > I have briefly checked the only oracle_fdw but in general I think that if an existing FDW supports transaction begin, commit, and rollback, these can be ported to new FDW transaction APIs easily. Regarding the comparison between FDW transaction APIs and transaction callbacks, I think one of the benefits of providing FDW transaction APIs is that the core is able to manage the status of foreign transactions. We need to track the status of individual foreign transactions to support atomic commit. If we use transaction callbacks (XactCallback) that many FDWs are using, I think we will end up calling the transaction callback and leave the transaction work to FDWs, leading that the core is not able to know the return values of PREPARE TRANSACTION for example. We can add more arguments passed to transaction callbacks to get the return value from FDWs but I don’t think it’s a good idea as transaction callbacks are used not only by FDW but also other external modules. > > (2) > How are data modifications tracked in local and remote transactions? 0001 seems to handle local INSERT/DELETE/UPDATE. Especially: > > * COPY FROM to local/remote tables/views. > > * User-defined function calls that modify data, e.g. SELECT func1() WHERE col = func2() > With the current version patch (v23), it supports only INSERT/DELETE/UPDATE. But I'm going to change the patch so that it supports other writes SQLs as Fujii-san also pointed out. > > (3) > Does the 2PC processing always go through the background worker? > Is the group commit effective on the remote server? That is, PREPARE and COMMIT PREPARED issued from multiple remote sessionsare written to WAL in batch? No, in the current design, the backend who received a query from the client does PREPARE, and then the transaction resolver process, a background worker, does COMMIT PREPARED. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> I have briefly checked the only oracle_fdw but in general I think that > if an existing FDW supports transaction begin, commit, and rollback, > these can be ported to new FDW transaction APIs easily. Does oracle_fdw support begin, commit and rollback? And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words,will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples. What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will beriddled with functions only for Postgres. That is, the FDW interface is getting away from its original purpose "accessexternal data as a relation" and complex. Tomas Vondra showed this concern as follows: Horizontal scalability/sharding https://www.postgresql.org/message-id/flat/CANP8%2BjK%3D%2B3zVYDFY0oMAQKQVJ%2BqReDHr1UPdyFEELO82yVfb9A%40mail.gmail.com#2c45f0ee97855449f1f7fedcef1d5e11 [Tomas Vondra's remarks] -------------------------------------------------- > This strikes me as a bit of a conflict of interest with FDW which > seems to want to hide the fact that it's foreign; the FDW > implementation makes it's own optimization decisions which might > make sense for single table queries but breaks down in the face of > joins. +1 to these concerns In my mind, FDW is a wonderful tool to integrate PostgreSQL with external data sources, and it's nicely shaped for this purpose, which implies the abstractions and assumptions in the code. The truth however is that many current uses of the FDW API are actually using it for different purposes because there's no other way to do that, not because FDWs are the "right way". And this includes the attempts to build sharding on FDW, I think. Situations like this result in "improvements" of the API that seem to improve the API for the second group, but make the life harder for the original FDW API audience by making the API needlessly complex. And I say "seem to improve" because the second group eventually runs into the fundamental abstractions and assumptions the API is based on anyway. And based on the discussions at pgcon, I think this is the main reason why people cringe when they hear "FDW" and "sharding" in the same sentence. ... My other worry is that we'll eventually mess the FDW infrastructure, making it harder to use for the original purpose. Granted, most of the improvements proposed so far look sane and useful for FDWs in general, but sooner or later that ceases to be the case - there sill be changes needed merely for the sharding. Those will be tough decisions. -------------------------------------------------- > Regarding the comparison between FDW transaction APIs and transaction > callbacks, I think one of the benefits of providing FDW transaction > APIs is that the core is able to manage the status of foreign > transactions. We need to track the status of individual foreign > transactions to support atomic commit. If we use transaction callbacks > (XactCallback) that many FDWs are using, I think we will end up > calling the transaction callback and leave the transaction work to > FDWs, leading that the core is not able to know the return values of > PREPARE TRANSACTION for example. We can add more arguments passed to > transaction callbacks to get the return value from FDWs but I don’t > think it’s a good idea as transaction callbacks are used not only by > FDW but also other external modules. To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can'twe? > With the current version patch (v23), it supports only > INSERT/DELETE/UPDATE. But I'm going to change the patch so that it > supports other writes SQLs as Fujii-san also pointed out. OK. I've just read that Fujii san already pointed out a similar thing. But I wonder if we can know that the UDF executedon the foreign server has updated data. Maybe we can know or guess it by calling txid_current_if_any() or checkingthe transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw? > No, in the current design, the backend who received a query from the > client does PREPARE, and then the transaction resolver process, a > background worker, does COMMIT PREPARED. This "No" means the current implementation cannot group commits from multiple transactions? Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one? For example, [local server] Transaction T1 and T2 performs 2PC at the same time. Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response. T1 writes COMMIT PREPARED record locally and sync the WAL. Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response. T2 writes COMMIT PREPARED record locally and sync the WAL. [foreign server] T1 writes COMMIT PREPARED record locally and sync the WAL. T2 writes COMMIT PREPARED record locally and sync the WAL. If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't takeeffect, then the OLTP transaction performance will be unacceptable. Regards Takayuki Tsunakawa
On Fri, 2020-07-17 at 05:21 +0000, tsunakawa.takay@fujitsu.com wrote: > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > I have briefly checked the only oracle_fdw but in general I think that > > if an existing FDW supports transaction begin, commit, and rollback, > > these can be ported to new FDW transaction APIs easily. > > Does oracle_fdw support begin, commit and rollback? Yes. > And most importantly, do other major DBMSs, including Oracle, provide the API for > preparing a transaction? In other words, will the FDWs other than postgres_fdw > really be able to take advantage of the new FDW functions to join the 2PC processing? > I think we need to confirm that there are concrete examples. I bet they do. There is even a standard for that. I am not looking forward to adapting oracle_fdw, and I didn't read the patch. But using distributed transactions is certainly a good thing if it is done right. The trade off is the need for a transaction manager, and implementing that correctly is a high price to pay. Yours, Laurenz Albe
On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > On 2020-07-16 13:16, Masahiko Sawada wrote: > > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com> > > wrote: > >> > >> > I've attached the latest version patches. I've incorporated the review > >> > comments I got so far and improved locking strategy. > >> > >> I want to ask a question about streaming replication with 2PC. > >> Are you going to support 2PC with streaming replication? > >> > >> I tried streaming replication using v23 patches. > >> I confirm that 2PC works with streaming replication, > >> which there are primary/standby coordinator. > >> > >> But, in my understanding, the WAL of "PREPARE" and > >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in > >> sync. > >> > >> If this is right, the unresolved transaction can be occurred. > >> > >> For example, > >> > >> 1. PREPARE is done > >> 2. crash primary before the WAL related to PREPARE is > >> replicated to the standby server > >> 3. promote standby server // but can't execute "ABORT PREPARED" > >> > >> In above case, the remote server has the unresolved transaction. > >> Can we solve this problem to support in-sync replication? > >> > >> But, I think some users use async replication for performance. > >> Do we need to document the limitation or make another solution? > >> > > > > IIUC with synchronous replication, we can guarantee that WAL records > > are written on both primary and replicas when the client got an > > acknowledgment of commit. We don't replicate each WAL records > > generated during transaction one by one in sync. In the case you > > described, the client will get an error due to the server crash. > > Therefore I think the user cannot expect WAL records generated so far > > has been replicated. The same issue could happen also when the user > > executes PREPARE TRANSACTION and the server crashes. > > Thanks! I didn't noticed the behavior when a user executes PREPARE > TRANSACTION is same. > > IIUC with 2PC, there is a different point between (1)PREPARE TRANSACTION > and (2)2PC. > The point is that whether the client can know when the server crashed > and it's global tx id. > > If (1)PREPARE TRANSACTION is failed, it's ok the client execute same > command > because if the remote server is already prepared the command will be > ignored. > > But, if (2)2PC is failed with coordinator crash, the client can't know > what operations should be done. > > If the old coordinator already executed PREPARED, there are some > transaction which should be ABORT PREPARED. > But if the PREPARED WAL is not sent to the standby, the new coordinator > can't execute ABORT PREPARED. > And the client can't know which remote servers have PREPARED > transactions which should be ABORTED either. > > Even if the client can know that, only the old coordinator knows its > global transaction id. > Only the database administrator can analyze the old coordinator's log > and then execute the appropriate commands manually, right? I think that's right. In the case of the coordinator crash, the user can look orphaned foreign prepared transactions by checking the 'identifier' column of pg_foreign_xacts on the new standby server and the prepared transactions on the remote servers. > > > > To prevent this > > issue, I think we would need to send each WAL records in sync but I'm > > not sure it's reasonable behavior, and as long as we write WAL in the > > local and then send it to replicas we would need a smart mechanism to > > prevent this situation. > > I agree. To send each 2PC WAL records in sync must be with a large > performance impact. > At least, we need to document the limitation and how to handle this > situation. Ok. I'll add it. > > > > Related to the pointing out by Ikeda-san, I realized that with the > > current patch the backend waits for synchronous replication and then > > waits for foreign transaction resolution. But it should be reversed. > > Otherwise, it could lead to data loss even when the client got an > > acknowledgment of commit. Also, when the user is using both atomic > > commit and synchronous replication and wants to cancel waiting, he/she > > will need to press ctl-c twice with the current patch, which also > > should be fixed. > > I'm sorry that I can't understood. > > In my understanding, if COMMIT WAL is replicated to the standby in sync, > the standby server can resolve the transaction after crash recovery in > promoted phase. > > If reversed, there are some situation which can't guarantee atomic > commit. > In case that some foreign transaction resolutions are succeed but others > are failed(and COMMIT WAL is not replicated), > the standby must ABORT PREPARED because the COMMIT WAL is not > replicated. > This means that some foreign transactions are COMMITE PREPARED executed > by primary coordinator, > other foreign transactions can be ABORT PREPARED executed by secondary > coordinator. You're right. Thank you for pointing out! If the coordinator crashes after the client gets acknowledgment of the successful commit of the transaction but before sending XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are left on the replicas even after failover. But since we require FDW to tolerate the error of undefined prepared transactions in COMMIT/ROLLBACK PREPARED it won’t be a critical problem. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Laurenz Albe <laurenz.albe@cybertec.at> > On Fri, 2020-07-17 at 05:21 +0000, tsunakawa.takay@fujitsu.com wrote: > > And most importantly, do other major DBMSs, including Oracle, provide the > API for > > preparing a transaction? In other words, will the FDWs other than > postgres_fdw > > really be able to take advantage of the new FDW functions to join the 2PC > processing? > > I think we need to confirm that there are concrete examples. > > I bet they do. There is even a standard for that. If you're thinking of xa_prepare() defined in the X/Open XA specification, we need to be sure that other FDWs can reallyutilize this new 2PC mechanism. What I'm especially wondering is when the FDW can call xa_start(). Regards Takayuki Tsunakawa
On Fri, 17 Jul 2020 at 14:22, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > I have briefly checked the only oracle_fdw but in general I think that > > if an existing FDW supports transaction begin, commit, and rollback, > > these can be ported to new FDW transaction APIs easily. > > Does oracle_fdw support begin, commit and rollback? > > And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words,will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples. I also believe they do. But I'm concerned that some FDW needs to start a transaction differently when using 2PC. For instance, IIUC MySQL also supports 2PC but the transaction needs to be started with "XA START id” when the transaction needs to be prepared. The transaction started with XA START can be closed by XA END followed by XA PREPARE or XA COMMIT ONE PHASE. It means that when starts a new transaction the transaction needs to prepare the transaction identifier and to know that 2PC might be used. It’s quite different from PostgreSQL. In PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is required when PREPARE TRANSACTION. With MySQL, I guess FDW needs a way to tell the (next) transaction needs to be started with XA START so it can be prepared. It could be a custom GUC or an SQL function. Then when starts a new transaction on MySQL server, FDW can generate and store a transaction identifier into somewhere alongside the connection. At the prepare phase, it passes the transaction identifier via GetPrepareId() API to the core. I haven’t tested the above yet and it’s just a desk plan. it's definitely a good idea to try integrating this 2PC feature to FDWs other than postgres_fdw to see if design and interfaces are implemented sophisticatedly. > > What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will beriddled with functions only for Postgres. That is, the FDW interface is getting away from its original purpose "accessexternal data as a relation" and complex. Tomas Vondra showed this concern as follows: > > Horizontal scalability/sharding > https://www.postgresql.org/message-id/flat/CANP8%2BjK%3D%2B3zVYDFY0oMAQKQVJ%2BqReDHr1UPdyFEELO82yVfb9A%40mail.gmail.com#2c45f0ee97855449f1f7fedcef1d5e11 > > > [Tomas Vondra's remarks] > -------------------------------------------------- > > This strikes me as a bit of a conflict of interest with FDW which > > seems to want to hide the fact that it's foreign; the FDW > > implementation makes it's own optimization decisions which might > > make sense for single table queries but breaks down in the face of > > joins. > > +1 to these concerns > > In my mind, FDW is a wonderful tool to integrate PostgreSQL with > external data sources, and it's nicely shaped for this purpose, which > implies the abstractions and assumptions in the code. > > The truth however is that many current uses of the FDW API are actually > using it for different purposes because there's no other way to do that, > not because FDWs are the "right way". And this includes the attempts to > build sharding on FDW, I think. > > Situations like this result in "improvements" of the API that seem to > improve the API for the second group, but make the life harder for the > original FDW API audience by making the API needlessly complex. And I > say "seem to improve" because the second group eventually runs into the > fundamental abstractions and assumptions the API is based on anyway. > > And based on the discussions at pgcon, I think this is the main reason > why people cringe when they hear "FDW" and "sharding" in the same sentence. > > ... > My other worry is that we'll eventually mess the FDW infrastructure, > making it harder to use for the original purpose. Granted, most of the > improvements proposed so far look sane and useful for FDWs in general, > but sooner or later that ceases to be the case - there sill be changes > needed merely for the sharding. Those will be tough decisions. > -------------------------------------------------- > > > > Regarding the comparison between FDW transaction APIs and transaction > > callbacks, I think one of the benefits of providing FDW transaction > > APIs is that the core is able to manage the status of foreign > > transactions. We need to track the status of individual foreign > > transactions to support atomic commit. If we use transaction callbacks > > (XactCallback) that many FDWs are using, I think we will end up > > calling the transaction callback and leave the transaction work to > > FDWs, leading that the core is not able to know the return values of > > PREPARE TRANSACTION for example. We can add more arguments passed to > > transaction callbacks to get the return value from FDWs but I don’t > > think it’s a good idea as transaction callbacks are used not only by > > FDW but also other external modules. > > To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can'twe? I haven't thought such an interface but it sounds like the transaction status is managed on both the core and FDWs. Could you elaborate on that? > > > > With the current version patch (v23), it supports only > > INSERT/DELETE/UPDATE. But I'm going to change the patch so that it > > supports other writes SQLs as Fujii-san also pointed out. > > OK. I've just read that Fujii san already pointed out a similar thing. But I wonder if we can know that the UDF executedon the foreign server has updated data. Maybe we can know or guess it by calling txid_current_if_any() or checkingthe transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw? Ah, my answer was not enough. It was only about tracking local writes. Regarding tracking of writes on the foreign server, I think there are restrictions. Currently, the executor registers a foreign sever as a participant of 2PC before calling BeginForeignInsert(), BeginForeignModify(), and BeginForeignScan() etc with a flag indicating whether writes is going to happen on the foreign server. So even if an UDF in a SELECT statement that could update data were to be pushed down to the foreign server, the foreign server would be marked as *not* modified. I’ve not tested yet but I guess that since FDW also is allowed to register the foreign server along with that flag anytime before commit, FDW is able to forcibly change that flag if it knows the SELECT query is going to modify the data on the remote server. > > > > No, in the current design, the backend who received a query from the > > client does PREPARE, and then the transaction resolver process, a > > background worker, does COMMIT PREPARED. > > This "No" means the current implementation cannot group commits from multiple transactions? Yes. > Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one? For example, > > [local server] > Transaction T1 and T2 performs 2PC at the same time. > Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response. > T1 writes COMMIT PREPARED record locally and sync the WAL. > Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response. > T2 writes COMMIT PREPARED record locally and sync the WAL. > > [foreign server] > T1 writes COMMIT PREPARED record locally and sync the WAL. > T2 writes COMMIT PREPARED record locally and sync the WAL. Just to be clear, the transaction resolver writes FDWXACT_REMOVE records instead of COMMIT PREPARED record to remove foreign transaction entry. But, yes, the transaction resolver works like the above you explained. > If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't takeeffect, then the OLTP transaction performance will be unacceptable. I agree that it'll be a large performance penalty. I'd like to have it but I’m not sure we should have it in the first version from the perspective of complexity. Since the procedure of 2PC is originally high cost, in my opinion, the user should not use as much as possible in terms of performance. Especially in OLTP, its cost will directly affect the latency. I’d suggest designing database schema so transaction touches only one foreign server but do you have concrete OLTP usecase where normally requires 2PC, and how many servers involved within a distributed transaction? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/07/17 20:04, Masahiko Sawada wrote: > On Fri, 17 Jul 2020 at 14:22, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: >> >> From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> >> I have briefly checked the only oracle_fdw but in general I think that >>> if an existing FDW supports transaction begin, commit, and rollback, >>> these can be ported to new FDW transaction APIs easily. >> >> Does oracle_fdw support begin, commit and rollback? >> >> And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words,will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples. > > I also believe they do. But I'm concerned that some FDW needs to start > a transaction differently when using 2PC. For instance, IIUC MySQL > also supports 2PC but the transaction needs to be started with "XA > START id” when the transaction needs to be prepared. The transaction > started with XA START can be closed by XA END followed by XA PREPARE > or XA COMMIT ONE PHASE. This means that FDW should provide also the API for xa_end()? Maybe we need to consider again which API we should provide in FDW, based on XA specification? > It means that when starts a new transaction > the transaction needs to prepare the transaction identifier and to > know that 2PC might be used. It’s quite different from PostgreSQL. In > PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE > TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is > required when PREPARE TRANSACTION. > > With MySQL, I guess FDW needs a way to tell the (next) transaction > needs to be started with XA START so it can be prepared. It could be a > custom GUC or an SQL function. Then when starts a new transaction on > MySQL server, FDW can generate and store a transaction identifier into > somewhere alongside the connection. At the prepare phase, it passes > the transaction identifier via GetPrepareId() API to the core. > > I haven’t tested the above yet and it’s just a desk plan. it's > definitely a good idea to try integrating this 2PC feature to FDWs > other than postgres_fdw to see if design and interfaces are > implemented sophisticatedly. With the current patch, we track whether write queries are executed in each server. Then, if the number of servers that execute write queries is less than two, 2PC is skipped. This "optimization" is not necessary (cannot be applied) when using mysql_fdw because the transaction starts with XA START. Right? If that's the "optimization" only for postgres_fdw, maybe it's better to get rid of that "optimization" from the first patch, to make the patch simpler. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/07/16 14:47, Masahiko Sawada wrote: > On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/07/14 9:08, Masahiro Ikeda wrote: >>>> I've attached the latest version patches. I've incorporated the review >>>> comments I got so far and improved locking strategy. >>> >>> Thanks for updating the patch! >> >> +1 >> I'm interested in these patches and now studying them. While checking >> the behaviors of the patched PostgreSQL, I got three comments. > > Thank you for testing this patch! > >> >> 1. We can access to the foreign table even during recovery in the HEAD. >> But in the patched version, when I did that, I got the following error. >> Is this intentional? >> >> ERROR: cannot assign TransactionIds during recovery > > No, it should be fixed. I'm going to fix this by not collecting > participants for atomic commit during recovery. Thanks for trying to fix the issues! I'd like to report one more issue. When I started new transaction in the local server, executed INSERT in the remote server via postgres_fdw and then quit psql, I got the following assertion failure. TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 3 postgres 0x000000010d313fe3 shmem_exit + 179 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 5 postgres 0x000000010d313da3 proc_exit + 19 6 postgres 0x000000010d35112f PostgresMain + 3711 7 postgres 0x000000010d27bb3a BackendRun + 570 8 postgres 0x000000010d27af6b BackendStartup + 475 9 postgres 0x000000010d279ed1 ServerLoop + 593 10 postgres 0x000000010d277940 PostmasterMain + 6016 11 postgres 0x000000010d1597b9 main + 761 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 13 ??? 0x0000000000000003 0x0 + 3 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-07-17 15:55, Masahiko Sawada wrote: > On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <ikedamsh@oss.nttdata.com> > wrote: >> >> On 2020-07-16 13:16, Masahiko Sawada wrote: >> > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com> >> > wrote: >> >> >> >> > I've attached the latest version patches. I've incorporated the review >> >> > comments I got so far and improved locking strategy. >> >> >> >> I want to ask a question about streaming replication with 2PC. >> >> Are you going to support 2PC with streaming replication? >> >> >> >> I tried streaming replication using v23 patches. >> >> I confirm that 2PC works with streaming replication, >> >> which there are primary/standby coordinator. >> >> >> >> But, in my understanding, the WAL of "PREPARE" and >> >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in >> >> sync. >> >> >> >> If this is right, the unresolved transaction can be occurred. >> >> >> >> For example, >> >> >> >> 1. PREPARE is done >> >> 2. crash primary before the WAL related to PREPARE is >> >> replicated to the standby server >> >> 3. promote standby server // but can't execute "ABORT PREPARED" >> >> >> >> In above case, the remote server has the unresolved transaction. >> >> Can we solve this problem to support in-sync replication? >> >> >> >> But, I think some users use async replication for performance. >> >> Do we need to document the limitation or make another solution? >> >> >> > >> > IIUC with synchronous replication, we can guarantee that WAL records >> > are written on both primary and replicas when the client got an >> > acknowledgment of commit. We don't replicate each WAL records >> > generated during transaction one by one in sync. In the case you >> > described, the client will get an error due to the server crash. >> > Therefore I think the user cannot expect WAL records generated so far >> > has been replicated. The same issue could happen also when the user >> > executes PREPARE TRANSACTION and the server crashes. >> >> Thanks! I didn't noticed the behavior when a user executes PREPARE >> TRANSACTION is same. >> >> IIUC with 2PC, there is a different point between (1)PREPARE >> TRANSACTION >> and (2)2PC. >> The point is that whether the client can know when the server crashed >> and it's global tx id. >> >> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same >> command >> because if the remote server is already prepared the command will be >> ignored. >> >> But, if (2)2PC is failed with coordinator crash, the client can't know >> what operations should be done. >> >> If the old coordinator already executed PREPARED, there are some >> transaction which should be ABORT PREPARED. >> But if the PREPARED WAL is not sent to the standby, the new >> coordinator >> can't execute ABORT PREPARED. >> And the client can't know which remote servers have PREPARED >> transactions which should be ABORTED either. >> >> Even if the client can know that, only the old coordinator knows its >> global transaction id. >> Only the database administrator can analyze the old coordinator's log >> and then execute the appropriate commands manually, right? > > I think that's right. In the case of the coordinator crash, the user > can look orphaned foreign prepared transactions by checking the > 'identifier' column of pg_foreign_xacts on the new standby server and > the prepared transactions on the remote servers. I think there is a case we can't check orphaned foreign prepared transaction in pg_foreign_xacts view on the new standby server. It confuses users and database administrators. If the primary coordinator crashes after preparing foreign transaction, but before sending XLOG_FDWXACT_INSERT records to the standby server, the standby server can't restore their transaction status and pg_foreign_xacts view doesn't show the prepared foreign transactions. To send XLOG_FDWXACT_INSERT records asynchronously leads this problem. >> > To prevent this >> > issue, I think we would need to send each WAL records in sync but I'm >> > not sure it's reasonable behavior, and as long as we write WAL in the >> > local and then send it to replicas we would need a smart mechanism to >> > prevent this situation. >> >> I agree. To send each 2PC WAL records in sync must be with a large >> performance impact. >> At least, we need to document the limitation and how to handle this >> situation. > > Ok. I'll add it. Thanks a lot. >> > Related to the pointing out by Ikeda-san, I realized that with the >> > current patch the backend waits for synchronous replication and then >> > waits for foreign transaction resolution. But it should be reversed. >> > Otherwise, it could lead to data loss even when the client got an >> > acknowledgment of commit. Also, when the user is using both atomic >> > commit and synchronous replication and wants to cancel waiting, he/she >> > will need to press ctl-c twice with the current patch, which also >> > should be fixed. >> >> I'm sorry that I can't understood. >> >> In my understanding, if COMMIT WAL is replicated to the standby in >> sync, >> the standby server can resolve the transaction after crash recovery in >> promoted phase. >> >> If reversed, there are some situation which can't guarantee atomic >> commit. >> In case that some foreign transaction resolutions are succeed but >> others >> are failed(and COMMIT WAL is not replicated), >> the standby must ABORT PREPARED because the COMMIT WAL is not >> replicated. >> This means that some foreign transactions are COMMITE PREPARED >> executed >> by primary coordinator, >> other foreign transactions can be ABORT PREPARED executed by secondary >> coordinator. > > You're right. Thank you for pointing out! > > If the coordinator crashes after the client gets acknowledgment of the > successful commit of the transaction but before sending > XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are > left on the replicas even after failover. But since we require FDW to > tolerate the error of undefined prepared transactions in > COMMIT/ROLLBACK PREPARED it won’t be a critical problem. I agree. It's ok that the primary coordinator sends XLOG_FDWXACT_REMOVE records asynchronously. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, Jul 17, 2020 at 8:38 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 16 Jul 2020 at 13:53, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > Hi Sawada san, > > > > > > I'm reviewing this patch series, and let me give some initial comments and questions. I'm looking at this with a hopethat this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be usedfor Postgres scale-out. > > Thank you for reviewing this patch! > > Yes, this patch is trying to resolve the generic atomic commit problem > w.r.t. FDW, and will be useful also for Postgres scale-out. > I think it is important to get a consensus on this point. If I understand correctly, Tsunakawa-San doesn't sound to be convinced that FDW can be used for postgres scale-out and we are trying to paint this feature as a step forward in the scale-out direction. As per my understanding, we don't have a very clear vision whether we will be able to achieve the other important aspects of scale-out feature like global visibility if we go in this direction and that is the reason I have insisted in this and the other related thread [1] to at least have a high-level idea of the same before going too far with this patch. It is quite possible that after spending months of efforts to straighten out this patch/feature, we came to the conclusion that this need to be re-designed or requires a lot of re-work to ensure that it can be extended for global visibility. It is better to spend some effort up front to see if the proposed patch is a stepping stone for achieving what we want w.r.t postgres scale-out. [1] - https://www.postgresql.org/message-id/07b2c899-4ed0-4c87-1327-23c750311248%40postgrespro.ru -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > I also believe they do. But I'm concerned that some FDW needs to start > a transaction differently when using 2PC. For instance, IIUC MySQL > also supports 2PC but the transaction needs to be started with "XA > START id” when the transaction needs to be prepared. The transaction > started with XA START can be closed by XA END followed by XA PREPARE > or XA COMMIT ONE PHASE. It means that when starts a new transaction > the transaction needs to prepare the transaction identifier and to > know that 2PC might be used. It’s quite different from PostgreSQL. In > PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE > TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is > required when PREPARE TRANSACTION. I guess Postgres is rather a minority in this regard. All I know is XA and its Java counterpart (Java Transaction API: JTA). In XA, the connection needs to be associated with an XID before its transaction work is performed. If some transaction work is already done before associating with XID, xa_start() returns an error like this: [XA specification] -------------------------------------------------- [XAER_OUTSIDE] The resource manager is doing work outside any global transaction on behalf of the application. -------------------------------------------------- [Java Transaction API (JTA)] -------------------------------------------------- void start(Xid xid, int flags) throws XAException This method starts work on behalf of a transaction branch. ... 3.4.7 Local and Global Transactions The resource adapter is encouraged to support the usage of both local and global transactions within the same transactional connection. Local transactions are transactions that are started and coordinated by the resource manager internally. The XAResource interface is not used for local transactions. When using the same connection to perform both local and global transactions, the following rules apply: . The local transaction must be committed (or rolled back) before starting a global transaction in the connection. . The global transaction must be disassociated from the connection before any local transaction is started. -------------------------------------------------- (FWIW, jdbc_fdw would expect to use JTA for this FDW 2PC?) > I haven’t tested the above yet and it’s just a desk plan. it's > definitely a good idea to try integrating this 2PC feature to FDWs > other than postgres_fdw to see if design and interfaces are > implemented sophisticatedly. Yes, if we address this 2PC feature as an FDW enhancement, we need to make sure that at least some well-known DBMSs shouldbe able to implement the new interface. The following part may help devise the interface: [References from XA specification] -------------------------------------------------- The primary use of xa_start() is to register a new transaction branch with the RM. This marks the start of the branch. Subsequently, the AP, using the same thread of control, uses the RM’s native interface to do useful work. All requests for service made by the same thread are part of the same branch until the thread dissociates from the branch (see below). 3.3.1 Registration of Resource Managers Normally, a TM involves all associated RMs in a transaction branch. (The TM’s set of RM switches, described in Section 4.3 on page 21 tells the TM which RMs are associated with it.) The TM calls all these RMs with xa_start(), xa_end(), and xa_prepare (), although an RM that is not active in a branch need not participate further (see Section 2.3.2 on page 8). A technique to reduce overhead for infrequently-used RMs is discussed below. Dynamic Registration Certain RMs, especially those involved in relatively few global transactions, may ask the TM to assume they are not involved in a transaction. These RMs must register with the TM before they do application work, to see whether the work is part of a global transaction. The TM never calls these RMs with any form of xa_start(). An RM declares dynamic registration in its switch (see Section 4.3 on page 21). An RM can make this declaration only on its own behalf, and doing so does not change the TM’s behaviour with respect to other RMs. When an AP requests work from such an RM, before doing any work, the RM contacts the TM by calling ax_reg(). The RM must call ax_reg() from the same thread of control that the AP would use if it called ax_reg() directly. The TM returns to the RM the appropriate XID if the AP is in a global transaction. The implications of dynamically registering are as follows: when a thread of control begins working on behalf of a transaction branch, the transaction manager calls xa_start() for all resource managers known to the thread except those having TMREGISTER set in their xa_switch_t structure. Thus, those resource managers with this flag set must explicitly join a branch with ax_reg(). Secondly, when a thread of control is working on behalf of a branch, a transaction manager calls xa_end() for all resource managers known to the thread that either do not have TMREGISTER set in their xa_switch_t structure or have dynamically registered with ax_reg(). int xa_start(XID *xid, int rmid, long flags) DESCRIPTION A transaction manager calls xa_start() to inform a resource manager that an application may do work on behalf of a transaction branch. ... A transaction manager calls xa_start() only for those resource managers that do not have TMREGISTER set in the flags element of their xa_switch_t structure. Resource managers with TMREGISTER set must use ax_reg() to join a transaction branch (see ax_reg() for details). -------------------------------------------------- > > To track the foreign transaction status, we can add GetTransactionStatus() to > the FDW interface as an alternative, can't we? > > I haven't thought such an interface but it sounds like the transaction > status is managed on both the core and FDWs. Could you elaborate on > that? I don't have such deep analysis. I just thought that the core could keep track of the local transaction status, and askeach participant FDW about its transaction status to determine an action. > > If the WAL records of multiple concurrent transactions are written and > synced separately, i.e. group commit doesn't take effect, then the OLTP > transaction performance will be unacceptable. > > I agree that it'll be a large performance penalty. I'd like to have it > but I’m not sure we should have it in the first version from the > perspective of complexity. I think at least we should have a rough image of how we can reach the goal. Otherwise, the current design/implementationmay have to be overhauled with great efforts in the near future. Apart from that, I feel it's unnaturalthat the commit processing is serialized at the transaction resolver while the DML processing of multiple foreigntransactions can be performed in parallel. > Since the procedure of 2PC is originally > high cost, in my opinion, the user should not use as much as possible > in terms of performance. Especially in OLTP, its cost will directly > affect the latency. I’d suggest designing database schema so > transaction touches only one foreign server but do you have concrete > OLTP usecase where normally requires 2PC, and how many servers > involved within a distributed transaction? I can't share the details, but some of our customers show interest in Postgres scale-out or FDW 2PC for the following usecases: * Multitenant OLTP where the data specific to one tenant is stored on one database server. On the other hand, some dataare shared among all tenants, and they are stored on a separate server. The shared data and the tenant-specific datais updated in the same transaction (I don't know the frequency of such transactions.) * An IoT use case where each edge database server monitors and tracks the movement of objects in one area. Those edge databaseservers store the records of objects they manage. When an object gets out of one area and moves to another, therecord for the object is moved between the two edge database servers using an atomic distributed transaction. (I wonder if TPC-C or TPC-E needs distributed transaction...) Regards Takayuki Tsunakawa
On Sat, 18 Jul 2020 at 01:45, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/07/17 20:04, Masahiko Sawada wrote: > > On Fri, 17 Jul 2020 at 14:22, tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > >> > >> From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > >> I have briefly checked the only oracle_fdw but in general I think that > >>> if an existing FDW supports transaction begin, commit, and rollback, > >>> these can be ported to new FDW transaction APIs easily. > >> > >> Does oracle_fdw support begin, commit and rollback? > >> > >> And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In otherwords, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PCprocessing? I think we need to confirm that there are concrete examples. > > > > I also believe they do. But I'm concerned that some FDW needs to start > > a transaction differently when using 2PC. For instance, IIUC MySQL > > also supports 2PC but the transaction needs to be started with "XA > > START id” when the transaction needs to be prepared. The transaction > > started with XA START can be closed by XA END followed by XA PREPARE > > or XA COMMIT ONE PHASE. > > This means that FDW should provide also the API for xa_end()? > Maybe we need to consider again which API we should provide in FDW, > based on XA specification? Not sure that we really need the API for xa_end(). It's not necessary at least in MySQL case. mysql_fdw can execute either XA END and XA PREPARE when FDW prepare API is called or XA END and XA COMMIT ONE PHASE when FDW commit API is called with FDWXACT_FLAG_ONEPHASE. > > > > It means that when starts a new transaction > > the transaction needs to prepare the transaction identifier and to > > know that 2PC might be used. It’s quite different from PostgreSQL. In > > PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE > > TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is > > required when PREPARE TRANSACTION. > > > > With MySQL, I guess FDW needs a way to tell the (next) transaction > > needs to be started with XA START so it can be prepared. It could be a > > custom GUC or an SQL function. Then when starts a new transaction on > > MySQL server, FDW can generate and store a transaction identifier into > > somewhere alongside the connection. At the prepare phase, it passes > > the transaction identifier via GetPrepareId() API to the core. > > > > I haven’t tested the above yet and it’s just a desk plan. it's > > definitely a good idea to try integrating this 2PC feature to FDWs > > other than postgres_fdw to see if design and interfaces are > > implemented sophisticatedly. > > With the current patch, we track whether write queries are executed > in each server. Then, if the number of servers that execute write queries > is less than two, 2PC is skipped. This "optimization" is not necessary > (cannot be applied) when using mysql_fdw because the transaction starts > with XA START. Right? I think we can use XA COMMIT ONE PHASE in MySQL, which both prepares and commits the transaction. If the number of servers that executed write queries is less than two, the core transaction manager calls CommitForeignTransaction API with the flag FDWXACT_FLAG_ONEPHASE. That way, mysql_fdw can execute XA COMMIT ONE PHASE instead of XA PREPARE, following XA END. On the other hand, when the number of such servers is greater than or equals to two, the core transaction manager calls PrepareForeignTransaction API and then CommitForeignTransactionAPI without that flag. In this case, mysql_fdw can execute XA END and XA PREPARE in PrepareForeignTransaction API call, and then XA COMMIT in CommitForeignTransaction API call. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/07/16 14:47, Masahiko Sawada wrote: > > On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/07/14 9:08, Masahiro Ikeda wrote: > >>>> I've attached the latest version patches. I've incorporated the review > >>>> comments I got so far and improved locking strategy. > >>> > >>> Thanks for updating the patch! > >> > >> +1 > >> I'm interested in these patches and now studying them. While checking > >> the behaviors of the patched PostgreSQL, I got three comments. > > > > Thank you for testing this patch! > > > >> > >> 1. We can access to the foreign table even during recovery in the HEAD. > >> But in the patched version, when I did that, I got the following error. > >> Is this intentional? > >> > >> ERROR: cannot assign TransactionIds during recovery > > > > No, it should be fixed. I'm going to fix this by not collecting > > participants for atomic commit during recovery. > > Thanks for trying to fix the issues! > > I'd like to report one more issue. When I started new transaction > in the local server, executed INSERT in the remote server via > postgres_fdw and then quit psql, I got the following assertion failure. > > TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) > 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 > 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 > 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 > 3 postgres 0x000000010d313fe3 shmem_exit + 179 > 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 > 5 postgres 0x000000010d313da3 proc_exit + 19 > 6 postgres 0x000000010d35112f PostgresMain + 3711 > 7 postgres 0x000000010d27bb3a BackendRun + 570 > 8 postgres 0x000000010d27af6b BackendStartup + 475 > 9 postgres 0x000000010d279ed1 ServerLoop + 593 > 10 postgres 0x000000010d277940 PostmasterMain + 6016 > 11 postgres 0x000000010d1597b9 main + 761 > 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 > 13 ??? 0x0000000000000003 0x0 + 3 > Thank you for reporting the issue! I've attached the latest version patch that incorporated all comments I got so far. I've removed the patch adding the 'prefer' mode of foreign_twophase_commit to keep the patch set simple. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020/07/16 14:47, Masahiko Sawada wrote:
> On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/07/14 9:08, Masahiro Ikeda wrote:
>>>> I've attached the latest version patches. I've incorporated the review
>>>> comments I got so far and improved locking strategy.
>>>
>>> Thanks for updating the patch!
>>
>> +1
>> I'm interested in these patches and now studying them. While checking
>> the behaviors of the patched PostgreSQL, I got three comments.
>
> Thank you for testing this patch!
>
>>
>> 1. We can access to the foreign table even during recovery in the HEAD.
>> But in the patched version, when I did that, I got the following error.
>> Is this intentional?
>>
>> ERROR: cannot assign TransactionIds during recovery
>
> No, it should be fixed. I'm going to fix this by not collecting
> participants for atomic commit during recovery.
Thanks for trying to fix the issues!
I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.
TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
3 postgres 0x000000010d313fe3 shmem_exit + 179
4 postgres 0x000000010d313e7a proc_exit_prepare + 122
5 postgres 0x000000010d313da3 proc_exit + 19
6 postgres 0x000000010d35112f PostgresMain + 3711
7 postgres 0x000000010d27bb3a BackendRun + 570
8 postgres 0x000000010d27af6b BackendStartup + 475
9 postgres 0x000000010d279ed1 ServerLoop + 593
10 postgres 0x000000010d277940 PostmasterMain + 6016
11 postgres 0x000000010d1597b9 main + 761
12 libdyld.dylib 0x00007fff7161e3d5 start + 1
13 ??? 0x0000000000000003 0x0 + 3
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/07/16 14:47, Masahiko Sawada wrote:
> > On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2020/07/14 9:08, Masahiro Ikeda wrote:
> >>>> I've attached the latest version patches. I've incorporated the review
> >>>> comments I got so far and improved locking strategy.
> >>>
> >>> Thanks for updating the patch!
> >>
> >> +1
> >> I'm interested in these patches and now studying them. While checking
> >> the behaviors of the patched PostgreSQL, I got three comments.
> >
> > Thank you for testing this patch!
> >
> >>
> >> 1. We can access to the foreign table even during recovery in the HEAD.
> >> But in the patched version, when I did that, I got the following error.
> >> Is this intentional?
> >>
> >> ERROR: cannot assign TransactionIds during recovery
> >
> > No, it should be fixed. I'm going to fix this by not collecting
> > participants for atomic commit during recovery.
>
> Thanks for trying to fix the issues!
>
> I'd like to report one more issue. When I started new transaction
> in the local server, executed INSERT in the remote server via
> postgres_fdw and then quit psql, I got the following assertion failure.
>
> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
> 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160
> 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
> 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20
> 3 postgres 0x000000010d313fe3 shmem_exit + 179
> 4 postgres 0x000000010d313e7a proc_exit_prepare + 122
> 5 postgres 0x000000010d313da3 proc_exit + 19
> 6 postgres 0x000000010d35112f PostgresMain + 3711
> 7 postgres 0x000000010d27bb3a BackendRun + 570
> 8 postgres 0x000000010d27af6b BackendStartup + 475
> 9 postgres 0x000000010d279ed1 ServerLoop + 593
> 10 postgres 0x000000010d277940 PostmasterMain + 6016
> 11 postgres 0x000000010d1597b9 main + 761
> 12 libdyld.dylib 0x00007fff7161e3d5 start + 1
> 13 ??? 0x0000000000000003 0x0 + 3
>
Thank you for reporting the issue!
I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.
src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote: > > > > On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> > >> > >> > >> > On 2020/07/16 14:47, Masahiko Sawada wrote: >> > > On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> > >> >> > >> >> > >> >> > >> On 2020/07/14 9:08, Masahiro Ikeda wrote: >> > >>>> I've attached the latest version patches. I've incorporated the review >> > >>>> comments I got so far and improved locking strategy. >> > >>> >> > >>> Thanks for updating the patch! >> > >> >> > >> +1 >> > >> I'm interested in these patches and now studying them. While checking >> > >> the behaviors of the patched PostgreSQL, I got three comments. >> > > >> > > Thank you for testing this patch! >> > > >> > >> >> > >> 1. We can access to the foreign table even during recovery in the HEAD. >> > >> But in the patched version, when I did that, I got the following error. >> > >> Is this intentional? >> > >> >> > >> ERROR: cannot assign TransactionIds during recovery >> > > >> > > No, it should be fixed. I'm going to fix this by not collecting >> > > participants for atomic commit during recovery. >> > >> > Thanks for trying to fix the issues! >> > >> > I'd like to report one more issue. When I started new transaction >> > in the local server, executed INSERT in the remote server via >> > postgres_fdw and then quit psql, I got the following assertion failure. >> > >> > TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) >> > 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 >> > 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 >> > 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 >> > 3 postgres 0x000000010d313fe3 shmem_exit + 179 >> > 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 >> > 5 postgres 0x000000010d313da3 proc_exit + 19 >> > 6 postgres 0x000000010d35112f PostgresMain + 3711 >> > 7 postgres 0x000000010d27bb3a BackendRun + 570 >> > 8 postgres 0x000000010d27af6b BackendStartup + 475 >> > 9 postgres 0x000000010d279ed1 ServerLoop + 593 >> > 10 postgres 0x000000010d277940 PostmasterMain + 6016 >> > 11 postgres 0x000000010d1597b9 main + 761 >> > 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 >> > 13 ??? 0x0000000000000003 0x0 + 3 >> > >> >> Thank you for reporting the issue! >> >> I've attached the latest version patch that incorporated all comments >> I got so far. I've removed the patch adding the 'prefer' mode of >> foreign_twophase_commit to keep the patch set simple. > > > I have started to review the patchset. Just a quick comment. > > Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch > contains changes (adding fdwxact includes) for > src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c > and src/backend/executor/execPartition.c files that doesn't seem to be > required with the latest version. Thanks for your comment. Right. I've removed these changes on the local branch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/07/27 15:59, Masahiko Sawada wrote: > On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote: >> >> >> >> On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >>> >>> On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> >>>> >>>> On 2020/07/16 14:47, Masahiko Sawada wrote: >>>>> On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2020/07/14 9:08, Masahiro Ikeda wrote: >>>>>>>> I've attached the latest version patches. I've incorporated the review >>>>>>>> comments I got so far and improved locking strategy. >>>>>>> >>>>>>> Thanks for updating the patch! >>>>>> >>>>>> +1 >>>>>> I'm interested in these patches and now studying them. While checking >>>>>> the behaviors of the patched PostgreSQL, I got three comments. >>>>> >>>>> Thank you for testing this patch! >>>>> >>>>>> >>>>>> 1. We can access to the foreign table even during recovery in the HEAD. >>>>>> But in the patched version, when I did that, I got the following error. >>>>>> Is this intentional? >>>>>> >>>>>> ERROR: cannot assign TransactionIds during recovery >>>>> >>>>> No, it should be fixed. I'm going to fix this by not collecting >>>>> participants for atomic commit during recovery. >>>> >>>> Thanks for trying to fix the issues! >>>> >>>> I'd like to report one more issue. When I started new transaction >>>> in the local server, executed INSERT in the remote server via >>>> postgres_fdw and then quit psql, I got the following assertion failure. >>>> >>>> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) >>>> 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 >>>> 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 >>>> 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 >>>> 3 postgres 0x000000010d313fe3 shmem_exit + 179 >>>> 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 >>>> 5 postgres 0x000000010d313da3 proc_exit + 19 >>>> 6 postgres 0x000000010d35112f PostgresMain + 3711 >>>> 7 postgres 0x000000010d27bb3a BackendRun + 570 >>>> 8 postgres 0x000000010d27af6b BackendStartup + 475 >>>> 9 postgres 0x000000010d279ed1 ServerLoop + 593 >>>> 10 postgres 0x000000010d277940 PostmasterMain + 6016 >>>> 11 postgres 0x000000010d1597b9 main + 761 >>>> 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 >>>> 13 ??? 0x0000000000000003 0x0 + 3 >>>> >>> >>> Thank you for reporting the issue! >>> >>> I've attached the latest version patch that incorporated all comments >>> I got so far. I've removed the patch adding the 'prefer' mode of >>> foreign_twophase_commit to keep the patch set simple. >> >> >> I have started to review the patchset. Just a quick comment. >> >> Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch >> contains changes (adding fdwxact includes) for >> src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c >> and src/backend/executor/execPartition.c files that doesn't seem to be >> required with the latest version. > > Thanks for your comment. > > Right. I've removed these changes on the local branch. The latest patches failed to be applied to the master branch. Could you rebase the patches? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/07/27 15:59, Masahiko Sawada wrote: > > On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote: > >> > >> > >> > >> On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > >>> > >>> On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2020/07/16 14:47, Masahiko Sawada wrote: > >>>>> On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2020/07/14 9:08, Masahiro Ikeda wrote: > >>>>>>>> I've attached the latest version patches. I've incorporated the review > >>>>>>>> comments I got so far and improved locking strategy. > >>>>>>> > >>>>>>> Thanks for updating the patch! > >>>>>> > >>>>>> +1 > >>>>>> I'm interested in these patches and now studying them. While checking > >>>>>> the behaviors of the patched PostgreSQL, I got three comments. > >>>>> > >>>>> Thank you for testing this patch! > >>>>> > >>>>>> > >>>>>> 1. We can access to the foreign table even during recovery in the HEAD. > >>>>>> But in the patched version, when I did that, I got the following error. > >>>>>> Is this intentional? > >>>>>> > >>>>>> ERROR: cannot assign TransactionIds during recovery > >>>>> > >>>>> No, it should be fixed. I'm going to fix this by not collecting > >>>>> participants for atomic commit during recovery. > >>>> > >>>> Thanks for trying to fix the issues! > >>>> > >>>> I'd like to report one more issue. When I started new transaction > >>>> in the local server, executed INSERT in the remote server via > >>>> postgres_fdw and then quit psql, I got the following assertion failure. > >>>> > >>>> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) > >>>> 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 > >>>> 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 > >>>> 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 > >>>> 3 postgres 0x000000010d313fe3 shmem_exit + 179 > >>>> 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 > >>>> 5 postgres 0x000000010d313da3 proc_exit + 19 > >>>> 6 postgres 0x000000010d35112f PostgresMain + 3711 > >>>> 7 postgres 0x000000010d27bb3a BackendRun + 570 > >>>> 8 postgres 0x000000010d27af6b BackendStartup + 475 > >>>> 9 postgres 0x000000010d279ed1 ServerLoop + 593 > >>>> 10 postgres 0x000000010d277940 PostmasterMain + 6016 > >>>> 11 postgres 0x000000010d1597b9 main + 761 > >>>> 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 > >>>> 13 ??? 0x0000000000000003 0x0 + 3 > >>>> > >>> > >>> Thank you for reporting the issue! > >>> > >>> I've attached the latest version patch that incorporated all comments > >>> I got so far. I've removed the patch adding the 'prefer' mode of > >>> foreign_twophase_commit to keep the patch set simple. > >> > >> > >> I have started to review the patchset. Just a quick comment. > >> > >> Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch > >> contains changes (adding fdwxact includes) for > >> src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c > >> and src/backend/executor/execPartition.c files that doesn't seem to be > >> required with the latest version. > > > > Thanks for your comment. > > > > Right. I've removed these changes on the local branch. > > The latest patches failed to be applied to the master branch. Could you rebase the patches? > Thank you for letting me know. I've attached the latest version patch set. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
> On 2020-07-17 15:55, Masahiko Sawada wrote: >> On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda >> <ikedamsh(at)oss(dot)nttdata(dot)com> >> wrote: >>> >>> On 2020-07-16 13:16, Masahiko Sawada wrote: >>>> On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda >>>> <ikedamsh(at)oss(dot)nttdata(dot)com> >>>> wrote: >>>>> >>>>>> I've attached the latest version patches. I've incorporated the >>>>>> review >>>>>> comments I got so far and improved locking strategy. >>>>> >>>>> I want to ask a question about streaming replication with 2PC. >>>>> Are you going to support 2PC with streaming replication? >>>>> >>>>> I tried streaming replication using v23 patches. >>>>> I confirm that 2PC works with streaming replication, >>>>> which there are primary/standby coordinator. >>>>> >>>>> But, in my understanding, the WAL of "PREPARE" and >>>>> "COMMIT/ABORT PREPARED" can't be replicated to the standby server >>>>> in >>>>> sync. >>>>> >>>>> If this is right, the unresolved transaction can be occurred. >>>>> >>>>> For example, >>>>> >>>>> 1. PREPARE is done >>>>> 2. crash primary before the WAL related to PREPARE is >>>>> replicated to the standby server >>>>> 3. promote standby server // but can't execute "ABORT PREPARED" >>>>> >>>>> In above case, the remote server has the unresolved transaction. >>>>> Can we solve this problem to support in-sync replication? >>>>> >>>>> But, I think some users use async replication for performance. >>>>> Do we need to document the limitation or make another solution? >>>>> >>>> >>>> IIUC with synchronous replication, we can guarantee that WAL records >>>> are written on both primary and replicas when the client got an >>>> acknowledgment of commit. We don't replicate each WAL records >>>> generated during transaction one by one in sync. In the case you >>>> described, the client will get an error due to the server crash. >>>> Therefore I think the user cannot expect WAL records generated so >>>> far >>>> has been replicated. The same issue could happen also when the user >>>> executes PREPARE TRANSACTION and the server crashes. >>> >>> Thanks! I didn't noticed the behavior when a user executes PREPARE >>> TRANSACTION is same. >>> >>> IIUC with 2PC, there is a different point between (1)PREPARE >>> TRANSACTION >>> and (2)2PC. >>> The point is that whether the client can know when the server crashed >>> and it's global tx id. >>> >>> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same >>> command >>> because if the remote server is already prepared the command will be >>> ignored. >>> >>> But, if (2)2PC is failed with coordinator crash, the client can't >>> know >>> what operations should be done. >>> >>> If the old coordinator already executed PREPARED, there are some >>> transaction which should be ABORT PREPARED. >>> But if the PREPARED WAL is not sent to the standby, the new >>> coordinator >>> can't execute ABORT PREPARED. >>> And the client can't know which remote servers have PREPARED >>> transactions which should be ABORTED either. >>> >>> Even if the client can know that, only the old coordinator knows its >>> global transaction id. >>> Only the database administrator can analyze the old coordinator's log >>> and then execute the appropriate commands manually, right? >> >> I think that's right. In the case of the coordinator crash, the user >> can look orphaned foreign prepared transactions by checking the >> 'identifier' column of pg_foreign_xacts on the new standby server and >> the prepared transactions on the remote servers. >> > I think there is a case we can't check orphaned foreign > prepared transaction in pg_foreign_xacts view on the new standby > server. > It confuses users and database administrators. > > If the primary coordinator crashes after preparing foreign transaction, > but before sending XLOG_FDWXACT_INSERT records to the standby server, > the standby server can't restore their transaction status and > pg_foreign_xacts view doesn't show the prepared foreign transactions. > > To send XLOG_FDWXACT_INSERT records asynchronously leads this problem. If the primary replicates XLOG_FDWXACT_INSERT to the standby asynchronously, some prepared transaction may be unsolved forever. Since I think to solve this inconsistency manually is hard operation, we need to support synchronous XLOG_FDWXACT_INSERT replication. I understood that there are a lot of impact to the performance, but users can control the consistency/durability vs performance with synchronous_commit parameter. What do you think? > Thank you for letting me know. I've attached the latest version patch > set. Thanks for updating. But, the latest patches failed to be applied to the master branch. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, 28 Aug 2020 at 17:50, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > I think there is a case we can't check orphaned foreign > > prepared transaction in pg_foreign_xacts view on the new standby > > server. > > It confuses users and database administrators. > > > > If the primary coordinator crashes after preparing foreign transaction, > > but before sending XLOG_FDWXACT_INSERT records to the standby server, > > the standby server can't restore their transaction status and > > pg_foreign_xacts view doesn't show the prepared foreign transactions. > > > > To send XLOG_FDWXACT_INSERT records asynchronously leads this problem. > > If the primary replicates XLOG_FDWXACT_INSERT to the standby > asynchronously, > some prepared transaction may be unsolved forever. > > Since I think to solve this inconsistency manually is hard operation, > we need to support synchronous XLOG_FDWXACT_INSERT replication. > > I understood that there are a lot of impact to the performance, > but users can control the consistency/durability vs performance > with synchronous_commit parameter. > > What do you think? I think the user can check such prepared transactions by seeing transactions that exist on the foreign server's pg_prepared_xact but not on the coordinator server's pg_foreign_xacts, no? To make checking such prepared transactions easy, perhaps we could contain the timestamp to prepared transaction id. But I’m concerned the duplication of transaction id due to clock skew. If there is a way to identify such unresolved foreign transactions and it's not cumbersome, given that the likelihood of problem you're concerned is unlikely high I guess a certain number of would be able to accept it as a restriction. So I’d recommend not dealing with this problem in the first version patch and we will be able to improve this feature to deal with this problem as an additional feature. Thoughts? > > Thank you for letting me know. I've attached the latest version patch > > set. > > Thanks for updating. > But, the latest patches failed to be applied to the master branch. I'll submit the updated version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-09-03 23:08, Masahiko Sawada wrote: > On Fri, 28 Aug 2020 at 17:50, Masahiro Ikeda <ikedamsh@oss.nttdata.com> > wrote: >> >> > I think there is a case we can't check orphaned foreign >> > prepared transaction in pg_foreign_xacts view on the new standby >> > server. >> > It confuses users and database administrators. >> > >> > If the primary coordinator crashes after preparing foreign transaction, >> > but before sending XLOG_FDWXACT_INSERT records to the standby server, >> > the standby server can't restore their transaction status and >> > pg_foreign_xacts view doesn't show the prepared foreign transactions. >> > >> > To send XLOG_FDWXACT_INSERT records asynchronously leads this problem. >> >> If the primary replicates XLOG_FDWXACT_INSERT to the standby >> asynchronously, >> some prepared transaction may be unsolved forever. >> >> Since I think to solve this inconsistency manually is hard operation, >> we need to support synchronous XLOG_FDWXACT_INSERT replication. >> >> I understood that there are a lot of impact to the performance, >> but users can control the consistency/durability vs performance >> with synchronous_commit parameter. >> >> What do you think? > > I think the user can check such prepared transactions by seeing > transactions that exist on the foreign server's pg_prepared_xact but > not on the coordinator server's pg_foreign_xacts, no? To make checking > such prepared transactions easy, perhaps we could contain the > timestamp to prepared transaction id. But I’m concerned the > duplication of transaction id due to clock skew. Thanks for letting me know. I agreed that we can check pg_prepared_xact and pg_foreign_xacts. We have to abort the transaction which exists in pg_prepared_xact and doesn't exist in pg_foreign_xacts manually, don't we? So users have to use the foreign database which supports to show prepared transaction status like pg_foreign_xacts. When duplication of transaction id is made? I'm sorry that I couldn't understand about clock skew. IICU, since prepared id may have coordinator's xid, there is no clock skew and we can determine transaction_id uniquely. If the fdw implements GetPrepareId_function API and it generates transaction_id without coordinator's xid, your concern will emerge. But, I can't understand the case to generate transaction_id without coordinator's xid. > If there is a way to identify such unresolved foreign transactions and > it's not cumbersome, given that the likelihood of problem you're > concerned is unlikely high I guess a certain number of would be able > to accept it as a restriction. So I’d recommend not dealing with this > problem in the first version patch and we will be able to improve this > feature to deal with this problem as an additional feature. Thoughts? I agree. Thanks for your comments. >> > Thank you for letting me know. I've attached the latest version patch >> > set. >> >> Thanks for updating. >> But, the latest patches failed to be applied to the master branch. > > I'll submit the updated version patch. Thanks. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote: > Thank you for letting me know. I've attached the latest version patch set. This needs a rebase. Patch 0002 is conflicting with some of the recent changes done in syncrep.c and procarray.c, at least. -- Michael
Attachment
On 2020/08/21 15:25, Masahiko Sawada wrote: > On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/07/27 15:59, Masahiko Sawada wrote: >>> On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote: >>>> >>>> >>>> >>>> On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >>>>> >>>>> On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2020/07/16 14:47, Masahiko Sawada wrote: >>>>>>> On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2020/07/14 9:08, Masahiro Ikeda wrote: >>>>>>>>>> I've attached the latest version patches. I've incorporated the review >>>>>>>>>> comments I got so far and improved locking strategy. >>>>>>>>> >>>>>>>>> Thanks for updating the patch! >>>>>>>> >>>>>>>> +1 >>>>>>>> I'm interested in these patches and now studying them. While checking >>>>>>>> the behaviors of the patched PostgreSQL, I got three comments. >>>>>>> >>>>>>> Thank you for testing this patch! >>>>>>> >>>>>>>> >>>>>>>> 1. We can access to the foreign table even during recovery in the HEAD. >>>>>>>> But in the patched version, when I did that, I got the following error. >>>>>>>> Is this intentional? >>>>>>>> >>>>>>>> ERROR: cannot assign TransactionIds during recovery >>>>>>> >>>>>>> No, it should be fixed. I'm going to fix this by not collecting >>>>>>> participants for atomic commit during recovery. >>>>>> >>>>>> Thanks for trying to fix the issues! >>>>>> >>>>>> I'd like to report one more issue. When I started new transaction >>>>>> in the local server, executed INSERT in the remote server via >>>>>> postgres_fdw and then quit psql, I got the following assertion failure. >>>>>> >>>>>> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) >>>>>> 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 >>>>>> 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 >>>>>> 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 >>>>>> 3 postgres 0x000000010d313fe3 shmem_exit + 179 >>>>>> 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 >>>>>> 5 postgres 0x000000010d313da3 proc_exit + 19 >>>>>> 6 postgres 0x000000010d35112f PostgresMain + 3711 >>>>>> 7 postgres 0x000000010d27bb3a BackendRun + 570 >>>>>> 8 postgres 0x000000010d27af6b BackendStartup + 475 >>>>>> 9 postgres 0x000000010d279ed1 ServerLoop + 593 >>>>>> 10 postgres 0x000000010d277940 PostmasterMain + 6016 >>>>>> 11 postgres 0x000000010d1597b9 main + 761 >>>>>> 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 >>>>>> 13 ??? 0x0000000000000003 0x0 + 3 >>>>>> >>>>> >>>>> Thank you for reporting the issue! >>>>> >>>>> I've attached the latest version patch that incorporated all comments >>>>> I got so far. I've removed the patch adding the 'prefer' mode of >>>>> foreign_twophase_commit to keep the patch set simple. >>>> >>>> >>>> I have started to review the patchset. Just a quick comment. >>>> >>>> Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch >>>> contains changes (adding fdwxact includes) for >>>> src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c >>>> and src/backend/executor/execPartition.c files that doesn't seem to be >>>> required with the latest version. >>> >>> Thanks for your comment. >>> >>> Right. I've removed these changes on the local branch. >> >> The latest patches failed to be applied to the master branch. Could you rebase the patches? >> > > Thank you for letting me know. I've attached the latest version patch set. Thanks for updating the patch! IMO it's not easy to commit this 2PC patch at once because it's still large and complicated. So I'm thinking it's better to separate the feature into several parts and commit them gradually. What about separating the feature into the following parts? #1 Originally the server just executed xact callback that each FDW registered when the transaction was committed. The patch changes this so that the server manages the participants of FDW in the transaction and triggers them to execute COMMIT or ROLLBACK. IMO this change can be applied without 2PC feature. Thought? Even if we commit this patch and add new interface for FDW, we would need to keep the old interface, for the FDW providing only old interface. #2 Originally when there was the FDW access in the transaction, PREPARE TRANSACTION on that transaction failed with an error. The patch allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED even when FDW access occurs in the transaction. IMO this change can be applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED are automatically executed for each FDW inside "top" COMMIT command). Thought? I'm not sure yet whether automatic resolution of "unresolved" prepared transactions by the resolver process is necessary for this change or not. If it's not necessary, it's better to exclude the resolver process from this change, at this stage, to make the patch simpler. #3 Finally IMO we can provide the patch supporting "automatic" 2PC for each FDW, based on the #1 and #2 patches. What's your opinion about this? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/09/07 17:59, Fujii Masao wrote: > > > On 2020/08/21 15:25, Masahiko Sawada wrote: >> On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> >>> >>> On 2020/07/27 15:59, Masahiko Sawada wrote: >>>> On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >>>>>> >>>>>> On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2020/07/16 14:47, Masahiko Sawada wrote: >>>>>>>> On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2020/07/14 9:08, Masahiro Ikeda wrote: >>>>>>>>>>> I've attached the latest version patches. I've incorporated the review >>>>>>>>>>> comments I got so far and improved locking strategy. >>>>>>>>>> >>>>>>>>>> Thanks for updating the patch! >>>>>>>>> >>>>>>>>> +1 >>>>>>>>> I'm interested in these patches and now studying them. While checking >>>>>>>>> the behaviors of the patched PostgreSQL, I got three comments. >>>>>>>> >>>>>>>> Thank you for testing this patch! >>>>>>>> >>>>>>>>> >>>>>>>>> 1. We can access to the foreign table even during recovery in the HEAD. >>>>>>>>> But in the patched version, when I did that, I got the following error. >>>>>>>>> Is this intentional? >>>>>>>>> >>>>>>>>> ERROR: cannot assign TransactionIds during recovery >>>>>>>> >>>>>>>> No, it should be fixed. I'm going to fix this by not collecting >>>>>>>> participants for atomic commit during recovery. >>>>>>> >>>>>>> Thanks for trying to fix the issues! >>>>>>> >>>>>>> I'd like to report one more issue. When I started new transaction >>>>>>> in the local server, executed INSERT in the remote server via >>>>>>> postgres_fdw and then quit psql, I got the following assertion failure. >>>>>>> >>>>>>> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) >>>>>>> 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 >>>>>>> 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 >>>>>>> 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 >>>>>>> 3 postgres 0x000000010d313fe3 shmem_exit + 179 >>>>>>> 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 >>>>>>> 5 postgres 0x000000010d313da3 proc_exit + 19 >>>>>>> 6 postgres 0x000000010d35112f PostgresMain + 3711 >>>>>>> 7 postgres 0x000000010d27bb3a BackendRun + 570 >>>>>>> 8 postgres 0x000000010d27af6b BackendStartup + 475 >>>>>>> 9 postgres 0x000000010d279ed1 ServerLoop + 593 >>>>>>> 10 postgres 0x000000010d277940 PostmasterMain + 6016 >>>>>>> 11 postgres 0x000000010d1597b9 main + 761 >>>>>>> 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 >>>>>>> 13 ??? 0x0000000000000003 0x0 + 3 >>>>>>> >>>>>> >>>>>> Thank you for reporting the issue! >>>>>> >>>>>> I've attached the latest version patch that incorporated all comments >>>>>> I got so far. I've removed the patch adding the 'prefer' mode of >>>>>> foreign_twophase_commit to keep the patch set simple. >>>>> >>>>> >>>>> I have started to review the patchset. Just a quick comment. >>>>> >>>>> Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch >>>>> contains changes (adding fdwxact includes) for >>>>> src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c >>>>> and src/backend/executor/execPartition.c files that doesn't seem to be >>>>> required with the latest version. >>>> >>>> Thanks for your comment. >>>> >>>> Right. I've removed these changes on the local branch. >>> >>> The latest patches failed to be applied to the master branch. Could you rebase the patches? >>> >> >> Thank you for letting me know. I've attached the latest version patch set. > > Thanks for updating the patch! > > IMO it's not easy to commit this 2PC patch at once because it's still large > and complicated. So I'm thinking it's better to separate the feature into > several parts and commit them gradually. What about separating > the feature into the following parts? > > #1 > Originally the server just executed xact callback that each FDW registered > when the transaction was committed. The patch changes this so that > the server manages the participants of FDW in the transaction and triggers > them to execute COMMIT or ROLLBACK. IMO this change can be applied > without 2PC feature. Thought? > > Even if we commit this patch and add new interface for FDW, we would > need to keep the old interface, for the FDW providing only old interface. > > > #2 > Originally when there was the FDW access in the transaction, > PREPARE TRANSACTION on that transaction failed with an error. The patch > allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED > even when FDW access occurs in the transaction. IMO this change can be > applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and > COMMIT/ROLLBACK PREPARED are automatically executed for each FDW > inside "top" COMMIT command). Thought? > > I'm not sure yet whether automatic resolution of "unresolved" prepared > transactions by the resolver process is necessary for this change or not. > If it's not necessary, it's better to exclude the resolver process from this > change, at this stage, to make the patch simpler. > > > #3 > Finally IMO we can provide the patch supporting "automatic" 2PC for each FDW, > based on the #1 and #2 patches. > > > What's your opinion about this? Also I'd like to report some typos in the patch. +#define ServerSupportTransactionCallack(fdw_part) \ "Callack" in this macro name should be "Callback"? +#define SeverSupportTwophaseCommit(fdw_part) \ "Sever" in this macro name should be "Server"? + proname => 'pg_stop_foreing_xact_resolver', provolatile => 'v', prorettype => 'bool', "foreing" should be "foreign"? + * FdwXact entry we call get_preparedid callback to get a transaction "get_preparedid" should be "get_prepareid"? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > IMO it's not easy to commit this 2PC patch at once because it's still large > and complicated. So I'm thinking it's better to separate the feature into > several parts and commit them gradually. > Hmm, I don't see that we have a consensus on the design and or interfaces of this patch and without that proceeding for commit doesn't seem advisable. Here are a few points which I remember offhand that require more work. 1. There is a competing design proposed and being discussed in another thread [1] for this purpose. I think both the approaches have pros and cons but there doesn't seem to be any conclusion yet on which one is better. 2. In this thread, we have discussed to try integrating this patch with some other FDWs (say MySQL, mongodb, etc.) to ensure that the APIs we are exposing are general enough that other FDWs can use them to implement 2PC. I could see some speculations about the same but no concrete work on the same has been done. 3. In another thread [1], we have seen that the patch being discussed in this thread might need to re-designed if we have to use some other design for global-visibility than what is proposed in that thread. I think it is quite likely that can happen considering no one is able to come up with the solution to major design problems spotted in that patch yet. It appears to me that even though these points were raised before in some form we are just trying to bypass them to commit whatever we have in the current patch which I find quite surprising. [1] - https://www.postgresql.org/message-id/07b2c899-4ed0-4c87-1327-23c750311248%40postgrespro.ru -- With Regards, Amit Kapila.
On 2020/09/08 10:34, Amit Kapila wrote: > On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> IMO it's not easy to commit this 2PC patch at once because it's still large >> and complicated. So I'm thinking it's better to separate the feature into >> several parts and commit them gradually. >> > > Hmm, I don't see that we have a consensus on the design and or > interfaces of this patch and without that proceeding for commit > doesn't seem advisable. Here are a few points which I remember offhand > that require more work. Thanks! > 1. There is a competing design proposed and being discussed in another > thread [1] for this purpose. I think both the approaches have pros and > cons but there doesn't seem to be any conclusion yet on which one is > better. I was thinking that [1] was discussing global snapshot feature for "atomic visibility" rather than the solution like 2PC for "atomic commit". But if another approach for "atomic commit" was also proposed at [1], that's good. I will check that. > 2. In this thread, we have discussed to try integrating this patch > with some other FDWs (say MySQL, mongodb, etc.) to ensure that the > APIs we are exposing are general enough that other FDWs can use them > to implement 2PC. I could see some speculations about the same but no > concrete work on the same has been done. Yes, you're right. > 3. In another thread [1], we have seen that the patch being discussed > in this thread might need to re-designed if we have to use some other > design for global-visibility than what is proposed in that thread. I > think it is quite likely that can happen considering no one is able to > come up with the solution to major design problems spotted in that > patch yet. You imply that global-visibility patch should be come first before "2PC" patch? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, Sep 8, 2020 at 8:05 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/09/08 10:34, Amit Kapila wrote: > > On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> IMO it's not easy to commit this 2PC patch at once because it's still large > >> and complicated. So I'm thinking it's better to separate the feature into > >> several parts and commit them gradually. > >> > > > > Hmm, I don't see that we have a consensus on the design and or > > interfaces of this patch and without that proceeding for commit > > doesn't seem advisable. Here are a few points which I remember offhand > > that require more work. > > Thanks! > > > 1. There is a competing design proposed and being discussed in another > > thread [1] for this purpose. I think both the approaches have pros and > > cons but there doesn't seem to be any conclusion yet on which one is > > better. > > I was thinking that [1] was discussing global snapshot feature for > "atomic visibility" rather than the solution like 2PC for "atomic commit". > But if another approach for "atomic commit" was also proposed at [1], > that's good. I will check that. > Okay, that makes sense. > > 2. In this thread, we have discussed to try integrating this patch > > with some other FDWs (say MySQL, mongodb, etc.) to ensure that the > > APIs we are exposing are general enough that other FDWs can use them > > to implement 2PC. I could see some speculations about the same but no > > concrete work on the same has been done. > > Yes, you're right. > > > 3. In another thread [1], we have seen that the patch being discussed > > in this thread might need to re-designed if we have to use some other > > design for global-visibility than what is proposed in that thread. I > > think it is quite likely that can happen considering no one is able to > > come up with the solution to major design problems spotted in that > > patch yet. > > You imply that global-visibility patch should be come first before "2PC" patch? > I intend to say that the global-visibility work can impact this in a major way and we have analyzed that to some extent during a discussion on the other thread. So, I think without having a complete design/solution that addresses both the 2PC and global-visibility, it is not apparent what is the right way to proceed. It seems to me that rather than working on individual (or smaller) parts one needs to come up with a bigger picture (or overall design) and then once we have figured that out correctly, it would be easier to decide which parts can go first. -- With Regards, Amit Kapila.
RE: Transactions involving multiple postgres foreign servers, take 2
From: Amit Kapila <amit.kapila16@gmail.com> > I intend to say that the global-visibility work can impact this in a > major way and we have analyzed that to some extent during a discussion > on the other thread. So, I think without having a complete > design/solution that addresses both the 2PC and global-visibility, it > is not apparent what is the right way to proceed. It seems to me that > rather than working on individual (or smaller) parts one needs to come > up with a bigger picture (or overall design) and then once we have > figured that out correctly, it would be easier to decide which parts > can go first. I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss thebig picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holdingme captive; I'm just late.) Please wait a few days. But to proceed with the development, let me comment on the atomic commit and global visibility. * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we canavoid it. * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-nodeatomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available,and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shutdown when the node's clock is distant from the other nodes. * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to betransferred among the cluster nodes. However, this seems to have to track the order of read and write operations amongconcurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCO paper seemsto present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybody kindlyinterpret this? Commitment ordering (CO) - yoavraz2 https://sites.google.com/site/yoavraz2/the_principle_of_co As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to beaddressed: 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdwand jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA featureas SQL statements, not C functions as defined in the XA specification. 2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Eachbackend should perform 2PC. 3. postgres_fdw cannot detect remote updates when the UDF executed on a remote node updates data. Regards Takayuki Tsunakawa
On Mon, 7 Sep 2020 at 17:59, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/08/21 15:25, Masahiko Sawada wrote: > > On Fri, 21 Aug 2020 at 00:36, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/07/27 15:59, Masahiko Sawada wrote: > >>> On Thu, 23 Jul 2020 at 22:51, Muhammad Usama <m.usama@gmail.com> wrote: > >>>> > >>>> > >>>> > >>>> On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > >>>>> > >>>>> On Sat, 18 Jul 2020 at 01:55, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2020/07/16 14:47, Masahiko Sawada wrote: > >>>>>>> On Tue, 14 Jul 2020 at 11:19, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2020/07/14 9:08, Masahiro Ikeda wrote: > >>>>>>>>>> I've attached the latest version patches. I've incorporated the review > >>>>>>>>>> comments I got so far and improved locking strategy. > >>>>>>>>> > >>>>>>>>> Thanks for updating the patch! > >>>>>>>> > >>>>>>>> +1 > >>>>>>>> I'm interested in these patches and now studying them. While checking > >>>>>>>> the behaviors of the patched PostgreSQL, I got three comments. > >>>>>>> > >>>>>>> Thank you for testing this patch! > >>>>>>> > >>>>>>>> > >>>>>>>> 1. We can access to the foreign table even during recovery in the HEAD. > >>>>>>>> But in the patched version, when I did that, I got the following error. > >>>>>>>> Is this intentional? > >>>>>>>> > >>>>>>>> ERROR: cannot assign TransactionIds during recovery > >>>>>>> > >>>>>>> No, it should be fixed. I'm going to fix this by not collecting > >>>>>>> participants for atomic commit during recovery. > >>>>>> > >>>>>> Thanks for trying to fix the issues! > >>>>>> > >>>>>> I'd like to report one more issue. When I started new transaction > >>>>>> in the local server, executed INSERT in the remote server via > >>>>>> postgres_fdw and then quit psql, I got the following assertion failure. > >>>>>> > >>>>>> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570) > >>>>>> 0 postgres 0x000000010d52f3c0 ExceptionalCondition + 160 > >>>>>> 1 postgres 0x000000010cefbc49 ForgetAllFdwXactParticipants + 313 > >>>>>> 2 postgres 0x000000010cefff14 AtProcExit_FdwXact + 20 > >>>>>> 3 postgres 0x000000010d313fe3 shmem_exit + 179 > >>>>>> 4 postgres 0x000000010d313e7a proc_exit_prepare + 122 > >>>>>> 5 postgres 0x000000010d313da3 proc_exit + 19 > >>>>>> 6 postgres 0x000000010d35112f PostgresMain + 3711 > >>>>>> 7 postgres 0x000000010d27bb3a BackendRun + 570 > >>>>>> 8 postgres 0x000000010d27af6b BackendStartup + 475 > >>>>>> 9 postgres 0x000000010d279ed1 ServerLoop + 593 > >>>>>> 10 postgres 0x000000010d277940 PostmasterMain + 6016 > >>>>>> 11 postgres 0x000000010d1597b9 main + 761 > >>>>>> 12 libdyld.dylib 0x00007fff7161e3d5 start + 1 > >>>>>> 13 ??? 0x0000000000000003 0x0 + 3 > >>>>>> > >>>>> > >>>>> Thank you for reporting the issue! > >>>>> > >>>>> I've attached the latest version patch that incorporated all comments > >>>>> I got so far. I've removed the patch adding the 'prefer' mode of > >>>>> foreign_twophase_commit to keep the patch set simple. > >>>> > >>>> > >>>> I have started to review the patchset. Just a quick comment. > >>>> > >>>> Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch > >>>> contains changes (adding fdwxact includes) for > >>>> src/backend/executor/nodeForeignscan.c, src/backend/executor/nodeModifyTable.c > >>>> and src/backend/executor/execPartition.c files that doesn't seem to be > >>>> required with the latest version. > >>> > >>> Thanks for your comment. > >>> > >>> Right. I've removed these changes on the local branch. > >> > >> The latest patches failed to be applied to the master branch. Could you rebase the patches? > >> > > > > Thank you for letting me know. I've attached the latest version patch set. > > Thanks for updating the patch! > > IMO it's not easy to commit this 2PC patch at once because it's still large > and complicated. So I'm thinking it's better to separate the feature into > several parts and commit them gradually. What about separating > the feature into the following parts? > > #1 > Originally the server just executed xact callback that each FDW registered > when the transaction was committed. The patch changes this so that > the server manages the participants of FDW in the transaction and triggers > them to execute COMMIT or ROLLBACK. IMO this change can be applied > without 2PC feature. Thought? > > Even if we commit this patch and add new interface for FDW, we would > need to keep the old interface, for the FDW providing only old interface. > > > #2 > Originally when there was the FDW access in the transaction, > PREPARE TRANSACTION on that transaction failed with an error. The patch > allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED > even when FDW access occurs in the transaction. IMO this change can be > applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and > COMMIT/ROLLBACK PREPARED are automatically executed for each FDW > inside "top" COMMIT command). Thought? > > I'm not sure yet whether automatic resolution of "unresolved" prepared > transactions by the resolver process is necessary for this change or not. > If it's not necessary, it's better to exclude the resolver process from this > change, at this stage, to make the patch simpler. > > > #3 > Finally IMO we can provide the patch supporting "automatic" 2PC for each FDW, > based on the #1 and #2 patches. > > > What's your opinion about this? Regardless of which approaches of 2PC implementation being selected splitting the patch into logical small patches is a good idea and the above suggestion makes sense to me. Regarding #2, I guess that we would need resolver and launcher processes even if we would support only manual PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED commands: On COMMIT PREPARED command, I think we should commit the local prepared transaction first then commit foreign prepared transactions. Otherwise, it violates atomic commit principles when the local node failed to commit a foreign prepared transaction and the user changed to ROLLBACK PREPARED. OTOH once we committed locally, we cannot change to rollback. And attempting to commit foreign prepared transactions could lead an error due to connection error, OOM caused by palloc etc. Therefore we discussed using background processes, resolver and launcher, to take in charge of committing foreign prepared transactions so that the process who executed COMMIT PREPARED will never error out after local commit. So I think the patch #2 will have the patch also adding resolver and launcher processes. And in the patch #3 we will change the code to support automatic 2PC as you suggested. In addition, the part of the automatic resolution of in-doubt transactions can also be a separate patch, which will be the #4 patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > #2 > Originally when there was the FDW access in the transaction, > PREPARE TRANSACTION on that transaction failed with an error. The patch > allows PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED > even when FDW access occurs in the transaction. IMO this change can be > applied without *automatic* 2PC feature (i.e., PREPARE TRANSACTION and > COMMIT/ROLLBACK PREPARED are automatically executed for each FDW > inside "top" COMMIT command). Thought? > > I'm not sure yet whether automatic resolution of "unresolved" prepared > transactions by the resolver process is necessary for this change or not. > If it's not necessary, it's better to exclude the resolver process from this > change, at this stage, to make the patch simpler. I agree with this. However, in case of explicit prepare, if we are not going to try automatic resolution, it might be better to provide a way to pass the information about transactions prepared on the foreign servers if they can not be resolved at the time of commit so that the user can take it up to resolve those him/herself. This was an idea that Tom had suggested at the very beginning of the first take. -- Best Wishes, Ashutosh Bapat
On 2020/09/08 12:03, Amit Kapila wrote: > On Tue, Sep 8, 2020 at 8:05 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/09/08 10:34, Amit Kapila wrote: >>> On Mon, Sep 7, 2020 at 2:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> IMO it's not easy to commit this 2PC patch at once because it's still large >>>> and complicated. So I'm thinking it's better to separate the feature into >>>> several parts and commit them gradually. >>>> >>> >>> Hmm, I don't see that we have a consensus on the design and or >>> interfaces of this patch and without that proceeding for commit >>> doesn't seem advisable. Here are a few points which I remember offhand >>> that require more work. >> >> Thanks! >> >>> 1. There is a competing design proposed and being discussed in another >>> thread [1] for this purpose. I think both the approaches have pros and >>> cons but there doesn't seem to be any conclusion yet on which one is >>> better. >> >> I was thinking that [1] was discussing global snapshot feature for >> "atomic visibility" rather than the solution like 2PC for "atomic commit". >> But if another approach for "atomic commit" was also proposed at [1], >> that's good. I will check that. >> > > Okay, that makes sense. I read Alexey's 2PC patch (0001-Add-postgres_fdw.use_twophase-GUC-to-use-2PC.patch) proposed at [1]. As Alexey told at that thread, there are two big differences between his patch and Sawada-san's; 1) whether there is the resolver process for foreign transactions, 2) 2PC logic is implemented only inside postgres_fdw or both FDW and PostgreSQL core. I think that 2) is the first decision point. Alexey's 2PC patch is very simple and all the 2PC logic is implemented only inside postgres_fdw. But this means that 2PC is not usable if multiple types of FDW (e.g., postgres_fdw and mysql_fdw) participate at the transaction. This may be ok if we implement 2PC feature only for PostgreSQL sharding using postgres_fdw. But if we implement 2PC as the improvement on FDW independently from PostgreSQL sharding, I think that it's necessary to support other FDW. And this is our direction, isn't it? Sawada-san's patch supports that case by implememnting some conponents for that also in PostgreSQL core. For example, with the patch, all the remote transactions that participate at the transaction are managed by PostgreSQL core instead of postgres_fdw layer. Therefore, at least regarding the difference 2), I think that Sawada-san's approach is better. Thought? [1] https://postgr.es/m/3ef7877bfed0582019eab3d462a43275@postgrespro.ru Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
Alexey-san, Sawada-san, cc: Fujii-san, From: Fujii Masao <masao.fujii@oss.nttdata.com> > But if we > implement 2PC as the improvement on FDW independently from PostgreSQL > sharding, I think that it's necessary to support other FDW. And this is our > direction, isn't it? I understand the same way as Fujii san. 2PC FDW is itself useful, so I think we should pursue the tidy FDW interface andgood performance withinn the FDW framework. "tidy" means that many other FDWs should be able to implement it. I guessXA/JTA is the only material we can use to consider whether the FDW interface is good. > Sawada-san's patch supports that case by implememnting some conponents > for that also in PostgreSQL core. For example, with the patch, all the remote > transactions that participate at the transaction are managed by PostgreSQL > core instead of postgres_fdw layer. > > Therefore, at least regarding the difference 2), I think that Sawada-san's > approach is better. Thought? I think so. Sawada-san's patch needs to address the design issues I posed before digging into the code for thorough review,though. BTW, is there something Sawada-san can take from Alexey-san's patch? I'm concerned about the performance for practical use. Do you two have differences in these points, for instance? The first two items are often cited to evaluate the algorithm'sperformance, as you know. * The number of round trips to remote nodes. * The number of disk I/Os on each node and all nodes in total (WAL, two-phase file, pg_subtrans file, CLOG?). * Are prepare and commit executed in parallel on remote nodes? (serious DBMSs do so) * Is there any serialization point in the processing? (Sawada-san's has one) I'm sorry to repeat myself, but I don't think we can compromise the 2PC performance. Of course, we recommend users to designa schema that co-locates data that each transaction accesses to avoid 2PC, but it's not always possible (e.g., whensecondary indexes are used.) Plus, as the following quote from TPC-C specification shows, TPC-C requires 15% of (Payment?) transactions to do 2PC. (Iknew this on Microsoft, CockroachDB, or Citus Data's site.) -------------------------------------------------- Independent of the mode of selection, the customer resident warehouse is the home warehouse 85% of the time and is a randomly selected remote warehouse 15% of the time. This can be implemented by generating two random numbers x and y within [1 .. 100]; . If x <= 85 a customer is selected from the selected district number (C_D_ID = D_ID) and the home warehouse number (C_W_ID = W_ID). The customer is paying through his/her own warehouse. . If x > 85 a customer is selected from a random district number (C_D_ID is randomly selected within [1 .. 10]), and a random remote warehouse number (C_W_ID is randomly selected within the range of active warehouses (see Clause 4.2.2), and C_W_ID ≠ W_ID). The customer is paying through a warehouse and a district other than his/her own. -------------------------------------------------- Regards Takayuki Tsunakawa
On 2020/09/10 10:13, tsunakawa.takay@fujitsu.com wrote: > Alexey-san, Sawada-san, > cc: Fujii-san, > > > From: Fujii Masao <masao.fujii@oss.nttdata.com> >> But if we >> implement 2PC as the improvement on FDW independently from PostgreSQL >> sharding, I think that it's necessary to support other FDW. And this is our >> direction, isn't it? > > I understand the same way as Fujii san. 2PC FDW is itself useful, so I think we should pursue the tidy FDW interface andgood performance withinn the FDW framework. "tidy" means that many other FDWs should be able to implement it. I guessXA/JTA is the only material we can use to consider whether the FDW interface is good. Originally start(), commit() and rollback() are supported as FDW interfaces. With his patch, prepare() is supported. Whatother interfaces need to be supported per XA/JTA? As far as I and Sawada-san discussed this upthread, to support MySQL, another type of start() would be necessary to issue"XA START id" command. end() might be also necessary to issue "XA END id", but that command can be issued via prepare()together with "XA PREPARE id". I'm not familiar with XA/JTA and XA transaction interfaces on other major DBMS. So I'd like to know what other interfacesare necessary additionally? > > >> Sawada-san's patch supports that case by implememnting some conponents >> for that also in PostgreSQL core. For example, with the patch, all the remote >> transactions that participate at the transaction are managed by PostgreSQL >> core instead of postgres_fdw layer. >> >> Therefore, at least regarding the difference 2), I think that Sawada-san's >> approach is better. Thought? > > I think so. Sawada-san's patch needs to address the design issues I posed before digging into the code for thorough review,though. > > BTW, is there something Sawada-san can take from Alexey-san's patch? I'm concerned about the performance for practicaluse. Do you two have differences in these points, for instance? IMO Sawada-san's version of 2PC is less performant, but it's because his patch provides more functionality. For example, with his patch, WAL is written to automatically complete the unresolve foreign transactions in the case of failure. OTOH, Alexey patch introduces no new WAL for 2PC. Of course, generating more WAL would cause more overhead. But if we need automatic resolution feature, it's inevitable to introduce new WAL whichever the patch we choose. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Amit Kapila <amit.kapila16@gmail.com> > > I intend to say that the global-visibility work can impact this in a > > major way and we have analyzed that to some extent during a discussion > > on the other thread. So, I think without having a complete > > design/solution that addresses both the 2PC and global-visibility, it > > is not apparent what is the right way to proceed. It seems to me that > > rather than working on individual (or smaller) parts one needs to come > > up with a bigger picture (or overall design) and then once we have > > figured that out correctly, it would be easier to decide which parts > > can go first. > > I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss thebig picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holdingme captive; I'm just late.) Please wait a few days. > > But to proceed with the development, let me comment on the atomic commit and global visibility. > > * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if wecan avoid it. > > * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-nodeatomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available,and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shutdown when the node's clock is distant from the other nodes. > > * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information tobe transferred among the cluster nodes. However, this seems to have to track the order of read and write operations amongconcurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCO paper seemsto present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybody kindlyinterpret this? > > Commitment ordering (CO) - yoavraz2 > https://sites.google.com/site/yoavraz2/the_principle_of_co > > > As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues tobe addressed: > > 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdwand jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA featureas SQL statements, not C functions as defined in the XA specification. I agree that we need to verify new FDW APIs will be suitable for other FDWs than postgres_fdw as well. > > 2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Eachbackend should perform 2PC. Not sure it's safe that each backend perform PREPARE and COMMIT PREPARED since the current design is for not leading an inconsistency between the actual transaction result and the result the user sees. But in the future, I think we can have multiple background workers per database for better performance. > > 3. postgres_fdw cannot detect remote updates when the UDF executed on a remote node updates data. I assume that you mean the pushing the UDF down to a foreign server. If so, I think we can do this by improving postgres_fdw. In the current patch, registering and unregistering a foreign server to a group of 2PC and marking a foreign server as updated is FDW responsible. So perhaps if we had a way to tell postgres_fdw that the UDF might update the data on the foreign server, postgres_fdw could mark the foreign server as updated if the UDF is shippable. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/09/11 0:37, Masahiko Sawada wrote: > On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: >> >> From: Amit Kapila <amit.kapila16@gmail.com> >>> I intend to say that the global-visibility work can impact this in a >>> major way and we have analyzed that to some extent during a discussion >>> on the other thread. So, I think without having a complete >>> design/solution that addresses both the 2PC and global-visibility, it >>> is not apparent what is the right way to proceed. It seems to me that >>> rather than working on individual (or smaller) parts one needs to come >>> up with a bigger picture (or overall design) and then once we have >>> figured that out correctly, it would be easier to decide which parts >>> can go first. >> >> I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss thebig picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holdingme captive; I'm just late.) Please wait a few days. >> >> But to proceed with the development, let me comment on the atomic commit and global visibility. >> >> * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if wecan avoid it. >> >> * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-nodeatomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available,and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shutdown when the node's clock is distant from the other nodes. >> >> * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information tobe transferred among the cluster nodes. However, this seems to have to track the order of read and write operations amongconcurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCO paper seemsto present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybody kindlyinterpret this? >> >> Commitment ordering (CO) - yoavraz2 >> https://sites.google.com/site/yoavraz2/the_principle_of_co >> >> >> As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues tobe addressed: >> >> 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdwand jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA featureas SQL statements, not C functions as defined in the XA specification. > > I agree that we need to verify new FDW APIs will be suitable for other > FDWs than postgres_fdw as well. > >> >> 2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Eachbackend should perform 2PC. > > Not sure it's safe that each backend perform PREPARE and COMMIT > PREPARED since the current design is for not leading an inconsistency > between the actual transaction result and the result the user sees. Can I check my understanding about why the resolver process is necessary? Firstly, you think that issuing COMMIT PREPARED command to the foreign server can cause an error, for example, because ofconnection error, OOM, etc. On the other hand, only waiting for other process to issue the command is less likely to causean error. Right? If an error occurs in backend process after commit record is WAL-logged, the error would be reported to the client and itmay misunderstand that the transaction failed even though commit record was already flushed. So you think that each backendshould not issue COMMIT PREPARED command to avoid that inconsistency. To avoid that, it's better to make other process,the resolver, issue the command and just make each backend wait for that to completed. Right? Also using the resolver process has another merit; when there are unresolved foreign transactions but the corresponding backendexits, the resolver can try to resolve them. If something like this automatic resolution is necessary, the processlike the resolver would be necessary. Right? To the contrary, if we don't need such automatic resolution (i.e., unresolved foreign transactions always need to be resolvedmanually) and we can prevent the code to issue COMMIT PREPARED command from causing an error (not sure if that'spossible, though...), probably we don't need the resolver process. Right? > But in the future, I think we can have multiple background workers per > database for better performance. Yes, that's an idea. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
From: Fujii Masao <masao.fujii@oss.nttdata.com> > Originally start(), commit() and rollback() are supported as FDW interfaces. > As far as I and Sawada-san discussed this upthread, to support MySQL, > another type of start() would be necessary to issue "XA START id" command. > end() might be also necessary to issue "XA END id", but that command can be > issued via prepare() together with "XA PREPARE id". Yeah, I think we can call xa_end and xa_prepare in the FDW's prepare function. The issue is when to call xa_start, which requires XID as an argument. We don't want to call it in transactions that accessonly one node...? > With his patch, prepare() is supported. What other interfaces need to be > supported per XA/JTA? > > I'm not familiar with XA/JTA and XA transaction interfaces on other major > DBMS. So I'd like to know what other interfaces are necessary additionally? I think xa_start, xa_end, xa_prepare, xa_commit, xa_rollback, and xa_recover are sufficient. The XA specification is here: https://pubs.opengroup.org/onlinepubs/009680699/toc.pdf You can see the function reference in Chapter 5, and the concept in Chapter 3. Chapter 6 was probably showing the statetransition (function call sequence.) > IMO Sawada-san's version of 2PC is less performant, but it's because his > patch provides more functionality. For example, with his patch, WAL is written > to automatically complete the unresolve foreign transactions in the case of > failure. OTOH, Alexey patch introduces no new WAL for 2PC. > Of course, generating more WAL would cause more overhead. > But if we need automatic resolution feature, it's inevitable to introduce new > WAL whichever the patch we choose. Please do not get me wrong. I know Sawada-san is trying to ensure durability. I just wanted to know what each patch doesin how much cost in terms of disk and network I/Os, and if one patch can take something from another for less cost. I'm simply guessing (without having read the code yet) that each transaction basically does: - two round trips (prepare, commit) to each remote node - two WAL writes (prepare, commit) on the local node and each remote node - one write for two-phase state file on each remote node - one write to record participants on the local node It felt hard to think about the algorithm efficiency from the source code. As you may have seen, the DBMS textbook and/orpapers describe disk and network I/Os to evaluate algorithms. I thought such information would be useful before goingdeeper into the source code. Maybe such things can be written in the following Sawada-san's wiki or README in the end. Atomic Commit of Distributed Transactions https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions Regards Takayuki Tsunakawa
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > 2. 2PC processing is queued and serialized in one background worker. That > severely subdues transaction throughput. Each backend should perform > 2PC. > > Not sure it's safe that each backend perform PREPARE and COMMIT > PREPARED since the current design is for not leading an inconsistency > between the actual transaction result and the result the user sees. As Fujii-san is asking, I also would like to know what situation you think is not safe. Are you worried that the FDW's commitfunction might call ereport(ERROR | FATAL | PANIC)? If so, can't we stipulate that the FDW implementor should ensurethat the commit function always returns control to the caller? > But in the future, I think we can have multiple background workers per > database for better performance. Does the database in "per database" mean the local database (that applications connect to), or the remote database accessedvia FDW? I'm wondering how the FDW and background worker(s) can realize parallel prepare and parallel commit. That is, the coordinatortransaction performs: 1. Issue prepare to all participant nodes, but doesn't wait for the reply for each issue. 2. Waits for replies from all participants. 3. Issue commit to all participant nodes, but doesn't wait for the reply for each issue. 4. Waits for replies from all participants. If we just consider PostgreSQL and don't think about FDW, we can use libpq async functions -- PQsendQuery, PQconsumeInput,and PQgetResult. pgbench uses them so that one thread can issue SQL statements on multiple connections inparallel. But when we consider the FDW interface, plus other DBMSs, how can we achieve the parallelism? > > 3. postgres_fdw cannot detect remote updates when the UDF executed on a > remote node updates data. > > I assume that you mean the pushing the UDF down to a foreign server. > If so, I think we can do this by improving postgres_fdw. In the current patch, > registering and unregistering a foreign server to a group of 2PC and marking a > foreign server as updated is FDW responsible. So perhaps if we had a way to > tell postgres_fdw that the UDF might update the data on the foreign server, > postgres_fdw could mark the foreign server as updated if the UDF is shippable. Maybe we can consider VOLATILE functions update data. That may be overreaction, though. Another idea is to add a new value to the ReadyForQuery message in the FE/BE protocol. Say, 'U' if in a transaction blockthat updated data. Here we consider "updated" as having allocated an XID. 52.7. Message Formats https://www.postgresql.org/docs/devel/protocol-message-formats.html -------------------------------------------------- ReadyForQuery (B) Byte1 Current backend transaction status indicator. Possible values are 'I' if idle (not in a transaction block); 'T' if in a transactionblock; or 'E' if in a failed transaction block (queries will be rejected until block is ended). -------------------------------------------------- Regards Takayuki Tsunakawa
On Fri, 11 Sep 2020 at 11:58, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/09/11 0:37, Masahiko Sawada wrote: > > On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > >> > >> From: Amit Kapila <amit.kapila16@gmail.com> > >>> I intend to say that the global-visibility work can impact this in a > >>> major way and we have analyzed that to some extent during a discussion > >>> on the other thread. So, I think without having a complete > >>> design/solution that addresses both the 2PC and global-visibility, it > >>> is not apparent what is the right way to proceed. It seems to me that > >>> rather than working on individual (or smaller) parts one needs to come > >>> up with a bigger picture (or overall design) and then once we have > >>> figured that out correctly, it would be easier to decide which parts > >>> can go first. > >> > >> I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discussthe big picture! I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobodyis holding me captive; I'm just late.) Please wait a few days. > >> > >> But to proceed with the development, let me comment on the atomic commit and global visibility. > >> > >> * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and ifwe can avoid it. > >> > >> * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB. That is, 2PC for multi-nodeatomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available,and the timestamp-based global visibility. However, the timestamp-based approach makes the database instance shutdown when the node's clock is distant from the other nodes. > >> > >> * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other informationto be transferred among the cluster nodes. However, this seems to have to track the order of read and write operationsamong concurrent transactions to ensure the correct commit order, so I'm not sure about the performance. The MVCOpaper seems to present the information we need, but I haven't understood it well yet (it's difficult.) Could you anybodykindly interpret this? > >> > >> Commitment ordering (CO) - yoavraz2 > >> https://sites.google.com/site/yoavraz2/the_principle_of_co > >> > >> > >> As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issuesto be addressed: > >> > >> 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.) I think oracle_fdwand jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA featureas SQL statements, not C functions as defined in the XA specification. > > > > I agree that we need to verify new FDW APIs will be suitable for other > > FDWs than postgres_fdw as well. > > > >> > >> 2. 2PC processing is queued and serialized in one background worker. That severely subdues transaction throughput. Each backend should perform 2PC. > > > > Not sure it's safe that each backend perform PREPARE and COMMIT > > PREPARED since the current design is for not leading an inconsistency > > between the actual transaction result and the result the user sees. > > Can I check my understanding about why the resolver process is necessary? > > Firstly, you think that issuing COMMIT PREPARED command to the foreign server can cause an error, for example, becauseof connection error, OOM, etc. On the other hand, only waiting for other process to issue the command is less likelyto cause an error. Right? > > If an error occurs in backend process after commit record is WAL-logged, the error would be reported to the client andit may misunderstand that the transaction failed even though commit record was already flushed. So you think that eachbackend should not issue COMMIT PREPARED command to avoid that inconsistency. To avoid that, it's better to make otherprocess, the resolver, issue the command and just make each backend wait for that to completed. Right? > > Also using the resolver process has another merit; when there are unresolved foreign transactions but the correspondingbackend exits, the resolver can try to resolve them. If something like this automatic resolution is necessary,the process like the resolver would be necessary. Right? > > To the contrary, if we don't need such automatic resolution (i.e., unresolved foreign transactions always need to be resolvedmanually) and we can prevent the code to issue COMMIT PREPARED command from causing an error (not sure if that'spossible, though...), probably we don't need the resolver process. Right? Yes, I'm on the same page about all the above explanations. The resolver process has two functionalities: resolving foreign transactions automatically when the user issues COMMIT (the case you described in the second paragraph), and resolving foreign transaction when the corresponding backend no longer exist or when the server crashes during in the middle of 2PC (described in the third paragraph). Considering the design without the resolver process, I think we can easily replace the latter with the manual resolution. OTOH, it's not easy for the former. I have no idea about better design for now, although, as you described, if we could ensure that the process doesn't raise an error during resolving foreign transactions after committing the local transaction we would not need the resolver process. Or the second idea would be that the backend commits only the local transaction then returns the acknowledgment of COMMIT to the user without resolving foreign transactions. Then the user manually resolves the foreign transactions by, for example, using the SQL function pg_resolve_foreign_xact() within a separate transaction. That way, even if an error occurred during resolving foreign transactions (i.g., executing COMMIT PREPARED), it’s okay as the user is already aware of the local transaction having been committed and can retry to resolve the unresolved foreign transaction. So we won't need the resolver process while avoiding such inconsistency. But a drawback would be that the transaction commit doesn't ensure that all foreign transactions are completed. The subsequent transactions would need to check if the previous distributed transaction is completed to see its results. I’m not sure it’s a good design in terms of usability. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 11 Sep 2020 at 18:24, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > > > 2. 2PC processing is queued and serialized in one background worker. That > > severely subdues transaction throughput. Each backend should perform > > 2PC. > > > > Not sure it's safe that each backend perform PREPARE and COMMIT > > PREPARED since the current design is for not leading an inconsistency > > between the actual transaction result and the result the user sees. > > As Fujii-san is asking, I also would like to know what situation you think is not safe. Are you worried that the FDW'scommit function might call ereport(ERROR | FATAL | PANIC)? Yes. > If so, can't we stipulate that the FDW implementor should ensure that the commit function always returns control to thecaller? How can the FDW implementor ensure that? Since even palloc could call ereport(ERROR) I guess it's hard to require that to all FDW implementors. > > > > But in the future, I think we can have multiple background workers per > > database for better performance. > > Does the database in "per database" mean the local database (that applications connect to), or the remote database accessedvia FDW? I meant the local database. In the current patch, we launch the resolver process per local database. My idea is to allow launching multiple resolver processes for one local database as long as the number of workers doesn't exceed the limit. > > I'm wondering how the FDW and background worker(s) can realize parallel prepare and parallel commit. That is, the coordinatortransaction performs: > > 1. Issue prepare to all participant nodes, but doesn't wait for the reply for each issue. > 2. Waits for replies from all participants. > 3. Issue commit to all participant nodes, but doesn't wait for the reply for each issue. > 4. Waits for replies from all participants. > > If we just consider PostgreSQL and don't think about FDW, we can use libpq async functions -- PQsendQuery, PQconsumeInput,and PQgetResult. pgbench uses them so that one thread can issue SQL statements on multiple connections inparallel. > > But when we consider the FDW interface, plus other DBMSs, how can we achieve the parallelism? It's still a rough idea but I think we can use TMASYNC flag and xa_complete explained in the XA specification. The core transaction manager call prepare, commit, rollback APIs with the flag, requiring to execute the operation asynchronously and to return a handler (e.g., a socket taken by PQsocket in postgres_fdw case) to the transaction manager. Then the transaction manager continues polling the handler until it becomes readable and testing the completion using by xa_complete() with no wait, until all foreign servers return OK on xa_complete check. > > > > > 3. postgres_fdw cannot detect remote updates when the UDF executed on a > > remote node updates data. > > > > I assume that you mean the pushing the UDF down to a foreign server. > > If so, I think we can do this by improving postgres_fdw. In the current patch, > > registering and unregistering a foreign server to a group of 2PC and marking a > > foreign server as updated is FDW responsible. So perhaps if we had a way to > > tell postgres_fdw that the UDF might update the data on the foreign server, > > postgres_fdw could mark the foreign server as updated if the UDF is shippable. > > Maybe we can consider VOLATILE functions update data. That may be overreaction, though. Sorry I don't understand that. The volatile functions are not pushed down to the foreign servers in the first place, no? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 11, 2020 at 4:37 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Considering the design without the resolver process, I think we can > easily replace the latter with the manual resolution. OTOH, it's not > easy for the former. I have no idea about better design for now, > although, as you described, if we could ensure that the process > doesn't raise an error during resolving foreign transactions after > committing the local transaction we would not need the resolver > process. My initial patch used the same backend to resolve foreign transactions. But in that case even though the user receives COMMIT completed, the backend isn't accepting the next query till it is busy resolving the foreign server. That might be a usability issue again if attempting to resolve all foreign transactions takes noticeable time. If we go this route, we should try to resolve as many foreign transactions as possible ignoring any errors while doing so and somehow let user know which transactions couldn't be resolved. User can then take responsibility for resolving those. > > Or the second idea would be that the backend commits only the local > transaction then returns the acknowledgment of COMMIT to the user > without resolving foreign transactions. Then the user manually > resolves the foreign transactions by, for example, using the SQL > function pg_resolve_foreign_xact() within a separate transaction. That > way, even if an error occurred during resolving foreign transactions > (i.g., executing COMMIT PREPARED), it’s okay as the user is already > aware of the local transaction having been committed and can retry to > resolve the unresolved foreign transaction. So we won't need the > resolver process while avoiding such inconsistency. > > But a drawback would be that the transaction commit doesn't ensure > that all foreign transactions are completed. The subsequent > transactions would need to check if the previous distributed > transaction is completed to see its results. I’m not sure it’s a good > design in terms of usability. I agree, this won't be acceptable. In either case, I think a solution where the local server takes responsibility to resolve foreign transactions will be better even in the first cut. -- Best Wishes, Ashutosh Bapat
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > If so, can't we stipulate that the FDW implementor should ensure that the > commit function always returns control to the caller? > > How can the FDW implementor ensure that? Since even palloc could call > ereport(ERROR) I guess it's hard to require that to all FDW > implementors. I think the what FDW commit routine will do is to just call xa_commit(), or PQexec("COMMIT PREPARED") in postgres_fdw. > It's still a rough idea but I think we can use TMASYNC flag and > xa_complete explained in the XA specification. The core transaction > manager call prepare, commit, rollback APIs with the flag, requiring > to execute the operation asynchronously and to return a handler (e.g., > a socket taken by PQsocket in postgres_fdw case) to the transaction > manager. Then the transaction manager continues polling the handler > until it becomes readable and testing the completion using by > xa_complete() with no wait, until all foreign servers return OK on > xa_complete check. Unfortunately, even Oracle and Db2 don't support XA asynchronous execution for years. Our DBMS Symfoware doesn't, either. I don't expect other DBMSs support it. Hmm, I'm afraid this may be one of the FDW's intractable walls for a serious scale-out DBMS. If we define asynchronous FDWroutines for 2PC, postgres_fdw would be able to implement them by using libpq asynchronous functions. But other DBMSscan't ... > > Maybe we can consider VOLATILE functions update data. That may be > overreaction, though. > > Sorry I don't understand that. The volatile functions are not pushed > down to the foreign servers in the first place, no? Ah, you're right. Then, the choices are twofold: (1) trust users in that their functions don't update data or the user'sclaim (specification) about it, and (2) get notification through FE/BE protocol that the remote transaction may haveupdated data. Regards Takayuki Tsunakawa
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > The resolver process has two functionalities: resolving foreign > transactions automatically when the user issues COMMIT (the case you > described in the second paragraph), and resolving foreign transaction > when the corresponding backend no longer exist or when the server > crashes during in the middle of 2PC (described in the third > paragraph). > > Considering the design without the resolver process, I think we can > easily replace the latter with the manual resolution. OTOH, it's not > easy for the former. I have no idea about better design for now, > although, as you described, if we could ensure that the process > doesn't raise an error during resolving foreign transactions after > committing the local transaction we would not need the resolver > process. Yeah, the resolver background process -- someone independent of client sessions -- is necessary, because the client sessiondisappears sometime. When the server that hosts the 2PC coordinator crashes, there are no client sessions. Our DBMSSymfoware also runs background threads that take care of resolution of in-doubt transactions due to a server or networkfailure. Then, how does the resolver get involved in 2PC to enable parallel 2PC? Two ideas quickly come to mind: (1) Each client backend issues prepare and commit to multiple remote nodes asynchronously. If the communication fails during commit, the client backend leaves the commit notification task to the resolver. That is, the resolver lends a hand during failure recovery, and doesn't interfere with the transaction processing duringnormal operation. (2) The resolver takes some responsibility in 2PC processing during normal operation. (send prepare and/or commit to remote nodes and get the results.) To avoid serial execution per transaction, the resolver bundles multiple requests, send them in bulk, and wait for multiplereplies at once. This allows the coordinator to do its own prepare processing in parallel with those of participants. However, in Postgres, this requires context switches between the client backend and the resolver. Our Symfoware takes (2). However, it doesn't suffer from the context switch, because the server is multi-threaded and furtherimplements or uses more lightweight entities than the thread. > Or the second idea would be that the backend commits only the local > transaction then returns the acknowledgment of COMMIT to the user > without resolving foreign transactions. Then the user manually > resolves the foreign transactions by, for example, using the SQL > function pg_resolve_foreign_xact() within a separate transaction. That > way, even if an error occurred during resolving foreign transactions > (i.g., executing COMMIT PREPARED), it’s okay as the user is already > aware of the local transaction having been committed and can retry to > resolve the unresolved foreign transaction. So we won't need the > resolver process while avoiding such inconsistency. > > But a drawback would be that the transaction commit doesn't ensure > that all foreign transactions are completed. The subsequent > transactions would need to check if the previous distributed > transaction is completed to see its results. I’m not sure it’s a good > design in terms of usability. I don't think it's a good design as you are worried. I guess that's why Postgres-XL had to create a tool called pgxc_cleanand ask the user to resolve transactions with it. pgxc_clean https://www.postgres-xl.org/documentation/pgxcclean.html "pgxc_clean is a Postgres-XL utility to maintain transaction status after a crash. When a Postgres-XL node crashes and recoversor fails over, the commit status of the node may be inconsistent with other nodes. pgxc_clean checks transactioncommit status and corrects them." Regards Takayuki Tsunakawa
On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote: > Thank you for letting me know. I've attached the latest version patch set. A rebase is needed again as the CF bot is complaining. -- Michael
Attachment
On Thu, 17 Sep 2020 at 14:25, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote: > > Thank you for letting me know. I've attached the latest version patch set. > > A rebase is needed again as the CF bot is complaining. Thank you for letting me know. I'm updating the patch and splitting into small pieces as Fujii-san suggested. I'll submit the latest patch set early next week. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 16 Sep 2020 at 13:20, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > If so, can't we stipulate that the FDW implementor should ensure that the > > commit function always returns control to the caller? > > > > How can the FDW implementor ensure that? Since even palloc could call > > ereport(ERROR) I guess it's hard to require that to all FDW > > implementors. > > I think the what FDW commit routine will do is to just call xa_commit(), or PQexec("COMMIT PREPARED") in postgres_fdw. Yes, but it still seems hard to me that we require for all FDW implementations to commit/rollback prepared transactions without the possibility of ERROR. > > > > It's still a rough idea but I think we can use TMASYNC flag and > > xa_complete explained in the XA specification. The core transaction > > manager call prepare, commit, rollback APIs with the flag, requiring > > to execute the operation asynchronously and to return a handler (e.g., > > a socket taken by PQsocket in postgres_fdw case) to the transaction > > manager. Then the transaction manager continues polling the handler > > until it becomes readable and testing the completion using by > > xa_complete() with no wait, until all foreign servers return OK on > > xa_complete check. > > Unfortunately, even Oracle and Db2 don't support XA asynchronous execution for years. Our DBMS Symfoware doesn't, either. I don't expect other DBMSs support it. > > Hmm, I'm afraid this may be one of the FDW's intractable walls for a serious scale-out DBMS. If we define asynchronousFDW routines for 2PC, postgres_fdw would be able to implement them by using libpq asynchronous functions. Butother DBMSs can't ... I think it's not necessarily that all FDW implementations need to be able to support xa_complete(). We can support both synchronous and asynchronous executions of prepare/commit/rollback. > > > > > Maybe we can consider VOLATILE functions update data. That may be > > overreaction, though. > > > > Sorry I don't understand that. The volatile functions are not pushed > > down to the foreign servers in the first place, no? > > Ah, you're right. Then, the choices are twofold: (1) trust users in that their functions don't update data or the user'sclaim (specification) about it, and (2) get notification through FE/BE protocol that the remote transaction may haveupdated data. > I'm confused about the point you're concerned about the UDF function. If you're concerned that executing a UDF function by like 'SELECT myfunc();' updates data on a foreign server, since the UDF should know which foreign server it modifies data on it should be able to register the foreign server and mark as modified. Or you’re concerned that a UDF function in WHERE condition is pushed down and updates data (e.g., ‘SELECT … FROM foreign_tbl WHERE id = myfunc()’)? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > Yes, but it still seems hard to me that we require for all FDW > implementations to commit/rollback prepared transactions without the > possibility of ERROR. Of course we can't eliminate the possibility of error, because remote servers require network communication. What I'm sayingis to just require the FDW to return error like xa_commit(), not throwing control away with ereport(ERROR). I don'tthink it's too strict. > I think it's not necessarily that all FDW implementations need to be > able to support xa_complete(). We can support both synchronous and > asynchronous executions of prepare/commit/rollback. Yes, I think parallel prepare and commit can be an option for FDW. But I don't think it's an option for a serious scale-outDBMS. If we want to use FDW as part of PostgreSQL's scale-out infrastructure, we should design (if not implementedin the first version) how the parallelism can be realized. That design is also necessary because it could affectthe FDW API. > If you're concerned that executing a UDF function by like 'SELECT > myfunc();' updates data on a foreign server, since the UDF should know > which foreign server it modifies data on it should be able to register > the foreign server and mark as modified. Or you’re concerned that a > UDF function in WHERE condition is pushed down and updates data (e.g., > ‘SELECT … FROM foreign_tbl WHERE id = myfunc()’)? What I had in mind is "SELECT myfunc(...) FROM mytable WHERE col = ...;" Does the UDF call get pushed down to the foreignserver in this case? If not now, could it be pushed down in the future? If it could be, it's worth considering howto detect the remote update now. Regards Takayuki Tsunakawa
On Tue, Sep 22, 2020 at 6:48 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > > I think it's not necessarily that all FDW implementations need to be > > able to support xa_complete(). We can support both synchronous and > > asynchronous executions of prepare/commit/rollback. > > Yes, I think parallel prepare and commit can be an option for FDW. But I don't think it's an option for a serious scale-outDBMS. If we want to use FDW as part of PostgreSQL's scale-out infrastructure, we should design (if not implementedin the first version) how the parallelism can be realized. That design is also necessary because it could affectthe FDW API. parallelism here has both pros and cons. If one of the servers errors out while preparing for a transaction, there is no point in preparing the transaction on other servers. In parallel execution we will prepare on multiple servers before realising that one of them has failed to do so. On the other hand preparing on multiple servers in parallel provides a speed up. But this can be an improvement on version 1. The current approach doesn't render such an improvement impossible. So if that's something hard to do, we should do that in the next version rather than complicating this patch. -- Best Wishes, Ashutosh Bapat
RE: Transactions involving multiple postgres foreign servers, take 2
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> > parallelism here has both pros and cons. If one of the servers errors > out while preparing for a transaction, there is no point in preparing > the transaction on other servers. In parallel execution we will > prepare on multiple servers before realising that one of them has > failed to do so. On the other hand preparing on multiple servers in > parallel provides a speed up. And pros are dominant in practice. If many transactions are erroring out (during prepare), the system is not functioningfor the user. Such an application should be corrected before they are put into production. > But this can be an improvement on version 1. The current approach > doesn't render such an improvement impossible. So if that's something > hard to do, we should do that in the next version rather than > complicating this patch. Could you share your idea on how the current approach could enable parallelism? This is an important point, because (1)the FDW may not lead us to a seriously competitive scale-out DBMS, and (2) a better FDW API and/or implementation couldbe considered for non-parallel interaction if we have the realization of parallelism in mind. I think that kind ofconsideration is the design (for the future). Regards Takayuki Tsunakawa
On Wed, Sep 23, 2020 at 2:13 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> > > parallelism here has both pros and cons. If one of the servers errors > > out while preparing for a transaction, there is no point in preparing > > the transaction on other servers. In parallel execution we will > > prepare on multiple servers before realising that one of them has > > failed to do so. On the other hand preparing on multiple servers in > > parallel provides a speed up. > > And pros are dominant in practice. If many transactions are erroring out (during prepare), the system is not functioningfor the user. Such an application should be corrected before they are put into production. > > > > But this can be an improvement on version 1. The current approach > > doesn't render such an improvement impossible. So if that's something > > hard to do, we should do that in the next version rather than > > complicating this patch. > > Could you share your idea on how the current approach could enable parallelism? This is an important point, because (1)the FDW may not lead us to a seriously competitive scale-out DBMS, and (2) a better FDW API and/or implementation couldbe considered for non-parallel interaction if we have the realization of parallelism in mind. I think that kind ofconsideration is the design (for the future). > The way I am looking at is to put the parallelism in the resolution worker and not in the FDW. If we use multiple resolution workers, they can fire commit/abort on multiple foreign servers at a time. But if we want parallelism within a single resolution worker, we will need a separate FDW APIs for firing asynchronous commit/abort prepared txn and fetching their results resp. But given the variety of FDWs, not all of them will support asynchronous API, so we have to support synchronous API anyway, which is what can be targeted in the first version. Thinking more about it, the core may support an API which accepts a list of prepared transactions, their foreign servers and user mappings and let FDW resolve all those either in parallel or one by one. So parallelism is responsibility of FDW and not the core. But then we loose parallelism across FDWs, which may not be a common case. Given the complications around this, I think we should go ahead supporting synchronous API first and in second version introduce optional asynchronous API. -- Best Wishes, Ashutosh Bapat
On Tue, 22 Sep 2020 at 10:17, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > Yes, but it still seems hard to me that we require for all FDW > > implementations to commit/rollback prepared transactions without the > > possibility of ERROR. > > Of course we can't eliminate the possibility of error, because remote servers require network communication. What I'msaying is to just require the FDW to return error like xa_commit(), not throwing control away with ereport(ERROR). Idon't think it's too strict. So with your idea, I think we require FDW developers to not call ereport(ERROR) as much as possible. If they need to use a function including palloc, lappend etc that could call ereport(ERROR), they need to use PG_TRY() and PG_CATCH() and return the control along with the error message to the transaction manager rather than raising an error. Then the transaction manager will emit the error message at an error level lower than ERROR (e.g., WARNING), and call commit/rollback API again. But normally we do some cleanup on error but in this case the retrying commit/rollback is performed without any cleanup. Is that right? I’m not sure it’s safe though. > > > > I think it's not necessarily that all FDW implementations need to be > > able to support xa_complete(). We can support both synchronous and > > asynchronous executions of prepare/commit/rollback. > > Yes, I think parallel prepare and commit can be an option for FDW. But I don't think it's an option for a serious scale-outDBMS. If we want to use FDW as part of PostgreSQL's scale-out infrastructure, we should design (if not implementedin the first version) how the parallelism can be realized. That design is also necessary because it could affectthe FDW API. > > > > If you're concerned that executing a UDF function by like 'SELECT > > myfunc();' updates data on a foreign server, since the UDF should know > > which foreign server it modifies data on it should be able to register > > the foreign server and mark as modified. Or you’re concerned that a > > UDF function in WHERE condition is pushed down and updates data (e.g., > > ‘SELECT … FROM foreign_tbl WHERE id = myfunc()’)? > > What I had in mind is "SELECT myfunc(...) FROM mytable WHERE col = ...;" Does the UDF call get pushed down to the foreignserver in this case? If not now, could it be pushed down in the future? If it could be, it's worth considering howto detect the remote update now. IIUC aggregation functions can be pushed down to the foreign server but I have not idea the normal UDF in the select list is pushed down. I wonder if it isn't. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > So with your idea, I think we require FDW developers to not call > ereport(ERROR) as much as possible. If they need to use a function > including palloc, lappend etc that could call ereport(ERROR), they > need to use PG_TRY() and PG_CATCH() and return the control along with > the error message to the transaction manager rather than raising an > error. Then the transaction manager will emit the error message at an > error level lower than ERROR (e.g., WARNING), and call commit/rollback > API again. But normally we do some cleanup on error but in this case > the retrying commit/rollback is performed without any cleanup. Is that > right? I’m not sure it’s safe though. Yes. It's legitimate to require the FDW commit routine to return control, because the prepare of 2PC is a promise to commitsuccessfully. The second-phase commit should avoid doing that could fail. For example, if some memory is needed forcommit, it should be allocated in prepare or before. > IIUC aggregation functions can be pushed down to the foreign server > but I have not idea the normal UDF in the select list is pushed down. > I wonder if it isn't. Oh, that's the current situation. Understood. I thought the UDF call is also pushed down, as I saw Greenplum does so. (Reading the manual, Greenplum disallows data updates in the UDF when it's executed on the remote segment server.) (Aren't we overlooking something else that updates data on the remote server while the local server is unaware?) Regards Takayuki Tsunakawa
On Fri, 18 Sep 2020 at 17:00, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 17 Sep 2020 at 14:25, Michael Paquier <michael@paquier.xyz> wrote: > > > > On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote: > > > Thank you for letting me know. I've attached the latest version patch set. > > > > A rebase is needed again as the CF bot is complaining. > > Thank you for letting me know. I'm updating the patch and splitting > into small pieces as Fujii-san suggested. I'll submit the latest patch > set early next week. > I've rebased the patch set and split into small pieces. Here are short descriptions of each change: v26-0001-Recreate-RemoveForeignServerById.patch This commit recreates RemoveForeignServerById that was removed by b1d32d3e3. This is necessary because we need to check if there is a foreign transaction involved with the foreign server that is about to be removed. v26-0002-Introduce-transaction-manager-for-foreign-transa.patch This commit adds the basic foreign transaction manager, CommitForeignTransaction, and RollbackForeignTransaction API. These APIs support only one-phase. With this change, FDW is able to control its transaction using the foreign transaction manager, not using XactCallback. v26-0003-postgres_fdw-supports-commit-and-rollback-APIs.patch This commit implements both CommitForeignTransaction and RollbackForeignTransaction APIs in postgres_fdw. Note that since PREPARE TRANSACTION is still not supported there is nothing the user newly is able to do. v26-0004-Add-PrepareForeignTransaction-API.patch This commit adds prepared foreign transaction support including WAL logging and recovery, and PrepareForeignTransaction API. With this change, the user is able to do 'PREPARE TRANSACTION' and 'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the local transaction. It doesn't do anything for foreign transactions. Therefore, the user needs to resolve foreign transactions manually by executing the pg_resolve_foreign_xacts() SQL function which is also introduced by this commit. v26-0005-postgres_fdw-supports-prepare-API-and-support-co.patch This commit implements PrepareForeignTransaction API and makes CommitForeignTransaction and RollbackForeignTransaction supports two-phase commit. v26-0006-Add-GetPrepareID-API.patch This commit adds GetPrepareID API. v26-0007-Automatic-foreign-transaciton-resolution-on-COMM.patch This commit adds the automatic foreign transaction resolution on COMMIT/ROLLBACK PREPARED by using foreign transaction resolver and launcher processes. With this change, the user is able to commit/rollback the distributed transaction by COMMIT/ROLLBACK PREPARED without manual resolution. The involved foreign transactions are automatically resolved by a resolver process. v26-0008-Automatic-foreign-transaciton-resolution-on-comm.patch This commit adds the automatic foreign transaction resolution on commit/rollback. With this change, the user is able to commit the foreign transactions automatically on commit without executing PREPARE TRANSACTION when foreign_twophase_commit is 'required'. IOW, we can guarantee that all foreign transactions had been resolved when the user got an acknowledgment of COMMIT. v26-0009-postgres_fdw-supports-automatically-resolution.patch This commit makes postgres_fdw supports the 0008 change. v26-0010-Documentation-update.patch v26-0011-Add-regression-tests-for-foreign-twophase-commit.patch The above commits are documentation update and regression tests. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
- v26-0008-Automatic-foreign-transaciton-resolution-on-comm.patch
- v26-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v26-0009-postgres_fdw-supports-automatically-resolution.patch
- v26-0010-Documentation-update.patch
- v26-0007-Automatic-foreign-transaciton-resolution-on-COMM.patch
- v26-0006-Add-GetPrepareID-API.patch
- v26-0005-postgres_fdw-supports-prepare-API-and-support-co.patch
- v26-0003-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v26-0004-Add-PrepareForeignTransaction-API.patch
- v26-0002-Introduce-transaction-manager-for-foreign-transa.patch
- v26-0001-Recreate-RemoveForeignServerById.patch
RE: Transactions involving multiple postgres foreign servers, take 2
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> > The way I am looking at is to put the parallelism in the resolution > worker and not in the FDW. If we use multiple resolution workers, they > can fire commit/abort on multiple foreign servers at a time. From a single session's view, yes. However, the requests from multiple sessions are processed one at a time within eachresolver, because the resolver has to call the synchronous FDW prepare/commit routines and wait for the response fromthe remote server. That's too limiting. > But if we want parallelism within a single resolution worker, we will > need a separate FDW APIs for firing asynchronous commit/abort prepared > txn and fetching their results resp. But given the variety of FDWs, > not all of them will support asynchronous API, so we have to support > synchronous API anyway, which is what can be targeted in the first > version. I agree in that most FDWs will be unlikely to have asynchronous prepare/commit functions, as demonstrated by the fact thateven Oracle and Db2 don't implement XA asynchronous APIs. That's one problem of using FDW for Postgres scale-out. Whenwe enhance FDW, we have to take care of other DBMSs to make the FDW interface practical. OTOH, we want to make maximumuse of Postgres features, such as libpq asynchronous API, to make Postgres scale-out as performant as possible. Butthe scale-out design is bound by the FDW interface. I don't feel accepting such less performant design is an attitudeof this community, as people here are strict against even 1 or 2 percent performance drop. > Thinking more about it, the core may support an API which accepts a > list of prepared transactions, their foreign servers and user mappings > and let FDW resolve all those either in parallel or one by one. So > parallelism is responsibility of FDW and not the core. But then we > loose parallelism across FDWs, which may not be a common case. Hmm, I understand asynchronous FDW relation scan is being developed now, in the form of cooperation between the FDW and theexecutor. If we make just the FDW responsible for prepare/commit parallelism, the design becomes asymmetric. As yousay, I'm not sure if the parallelism is wanted among different types, say, Postgres and Oracle. In fact, major DBMSsdon't implement XA asynchronous API. But such lack of parallelism may be one cause of the bad reputation that 2PC (ofXA) is slow. > Given the complications around this, I think we should go ahead > supporting synchronous API first and in second version introduce > optional asynchronous API. How about the following? * Add synchronous and asynchronous versions of prepare/commit/abort routines and a routine to wait for completion of asynchronousexecution in FdwRoutine. They are optional. postgres_fdw can implement the asynchronous routines using libpq asynchronous functions. Other DBMSs can implement XA asynchronousAPI for them in theory. * The client backend uses asynchronous FDW routines if available: /* Issue asynchronous prepare | commit | rollback to FDWs that support it */ foreach (per each foreign server used in the transaction) { if (fdwroutine->{prepare | commit | rollback}_async_func) fdwroutine->{prepare | commit | rollback}_async_func(...); } /* Wait for completion of asynchronous prepare | commit | rollback */ foreach (per each foreign server used in the transaction) { if (fdwroutine->{prepare | commit | rollback}_async_func) ret = fdwroutine->wait_for_completion(...); } /* Issue synchronous prepare | commit | rollback to FDWs that don't support it */ foreach (per each foreign server used in the transaction) { if (fdwroutine->{prepare | commit | rollback}_async_func == NULL) ret = fdwroutine->{prepare | commit | rollback}_func(...); } * The client backend asks the resolver to commit or rollback the remote transaction only when the remote transaction fails(due to the failure of remote server or network.) That is, the resolver is not involved during normal operation. This will not be complex, and can be included in the first version, if we really want to use FDW for Postgres scale-out. Regards Takayuki Tsunakawa
On Thu, 24 Sep 2020 at 17:23, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > So with your idea, I think we require FDW developers to not call > > ereport(ERROR) as much as possible. If they need to use a function > > including palloc, lappend etc that could call ereport(ERROR), they > > need to use PG_TRY() and PG_CATCH() and return the control along with > > the error message to the transaction manager rather than raising an > > error. Then the transaction manager will emit the error message at an > > error level lower than ERROR (e.g., WARNING), and call commit/rollback > > API again. But normally we do some cleanup on error but in this case > > the retrying commit/rollback is performed without any cleanup. Is that > > right? I’m not sure it’s safe though. > > > Yes. It's legitimate to require the FDW commit routine to return control, because the prepare of 2PC is a promise to commitsuccessfully. The second-phase commit should avoid doing that could fail. For example, if some memory is needed forcommit, it should be allocated in prepare or before. > I don't think it's always possible to avoid raising errors in advance. Considering how postgres_fdw can implement your idea, I think postgres_fdw would need PG_TRY() and PG_CATCH() for its connection management. It has a connection cache in the local memory using HTAB. It needs to create an entry for the first time to connect (e.g., when prepare and commit prepared a transaction are performed by different processes) and it needs to re-connect the foreign server when the entry is invalidated. In both cases, ERROR could happen. I guess the same is true for other FDW implementations. Possibly other FDWs might need more work for example cleanup or releasing resources. I think that the pros of your idea are to make the transaction manager simple since we don't need resolvers and launcher but the cons are to bring the complexity to FDW implementation codes instead. Also, IMHO I don't think it's safe way that FDW does neither re-throwing an error nor abort transaction when an error occurs. In terms of performance you're concerned, I wonder if we can somewhat eliminate the bottleneck if multiple resolvers are able to run on one database in the future. For example, if we could launch resolver processes as many as connections on the database, individual backend processes could have one resolver process. Since there would be contention and inter-process communication it still brings some overhead but it might be negligible comparing to network round trip. Perhaps we can hear more opinions on that from other hackers to decide the FDW transaction API design. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > I don't think it's always possible to avoid raising errors in advance. > Considering how postgres_fdw can implement your idea, I think > postgres_fdw would need PG_TRY() and PG_CATCH() for its connection > management. It has a connection cache in the local memory using HTAB. > It needs to create an entry for the first time to connect (e.g., when > prepare and commit prepared a transaction are performed by different > processes) and it needs to re-connect the foreign server when the > entry is invalidated. In both cases, ERROR could happen. I guess the > same is true for other FDW implementations. Possibly other FDWs might > need more work for example cleanup or releasing resources. I think Why does the client backend have to create a new connection cache entry during PREPARE or COMMIT PREPARE? Doesn't the clientbackend naturally continue to use connections that it has used in its current transaction? > that the pros of your idea are to make the transaction manager simple > since we don't need resolvers and launcher but the cons are to bring > the complexity to FDW implementation codes instead. Also, IMHO I don't > think it's safe way that FDW does neither re-throwing an error nor > abort transaction when an error occurs. No, I didn't say the resolver is unnecessary. The resolver takes care of terminating remote transactions when the clientbackend encountered an error during COMMIT/ROLLBACK PREPARED. > In terms of performance you're concerned, I wonder if we can somewhat > eliminate the bottleneck if multiple resolvers are able to run on one > database in the future. For example, if we could launch resolver > processes as many as connections on the database, individual backend > processes could have one resolver process. Since there would be > contention and inter-process communication it still brings some > overhead but it might be negligible comparing to network round trip. Do you mean that if concurrent 200 clients each update data on two foreign servers, there are 400 resolvers? ...That's overuseof resources. Regards Takayuki Tsunakawa
On Fri, 25 Sep 2020 at 18:21, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > I don't think it's always possible to avoid raising errors in advance. > > Considering how postgres_fdw can implement your idea, I think > > postgres_fdw would need PG_TRY() and PG_CATCH() for its connection > > management. It has a connection cache in the local memory using HTAB. > > It needs to create an entry for the first time to connect (e.g., when > > prepare and commit prepared a transaction are performed by different > > processes) and it needs to re-connect the foreign server when the > > entry is invalidated. In both cases, ERROR could happen. I guess the > > same is true for other FDW implementations. Possibly other FDWs might > > need more work for example cleanup or releasing resources. I think > > Why does the client backend have to create a new connection cache entry during PREPARE or COMMIT PREPARE? Doesn't theclient backend naturally continue to use connections that it has used in its current transaction? I think there are two cases: a process executes PREPARE TRANSACTION and another process executes COMMIT PREPARED later, and if the coordinator has cascaded foreign servers (i.g., a foreign server has its foreign server) and temporary connection problem happens in the intermediate node after PREPARE then another process on the intermediate node will execute COMMIT PREPARED on its foreign server. > > > > that the pros of your idea are to make the transaction manager simple > > since we don't need resolvers and launcher but the cons are to bring > > the complexity to FDW implementation codes instead. Also, IMHO I don't > > think it's safe way that FDW does neither re-throwing an error nor > > abort transaction when an error occurs. > > No, I didn't say the resolver is unnecessary. The resolver takes care of terminating remote transactions when the clientbackend encountered an error during COMMIT/ROLLBACK PREPARED. Understood. With your idea, we can remove at least the code of making backend wait and inter-process communication between backends and resolvers. I think we need to consider that it's really safe and what needs to achieve your idea safely. > > > > In terms of performance you're concerned, I wonder if we can somewhat > > eliminate the bottleneck if multiple resolvers are able to run on one > > database in the future. For example, if we could launch resolver > > processes as many as connections on the database, individual backend > > processes could have one resolver process. Since there would be > > contention and inter-process communication it still brings some > > overhead but it might be negligible comparing to network round trip. > > Do you mean that if concurrent 200 clients each update data on two foreign servers, there are 400 resolvers? ...That'soveruse of resources. I think we have 200 resolvers in this case since one resolver process per backend process. Or another idea is that all processes queue foreign transactions to resolve into the shared memory queue and resolver processes fetch and resolve them instead of assigning one distributed transaction to one resolver process. Using asynchronous execution, the resolver process can process a bunch of foreign transactions across distributed transactions and grouped by the foreign server at once. It might be more complex than the current approach but having multiple resolver processes on one database would increase through-put well especially by combining with asynchronous execution. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > On Fri, 25 Sep 2020 at 18:21, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > Why does the client backend have to create a new connection cache entry > during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally > continue to use connections that it has used in its current transaction? > > I think there are two cases: a process executes PREPARE TRANSACTION > and another process executes COMMIT PREPARED later, and if the > coordinator has cascaded foreign servers (i.g., a foreign server has > its foreign server) and temporary connection problem happens in the > intermediate node after PREPARE then another process on the > intermediate node will execute COMMIT PREPARED on its foreign server. Aren't both the cases failure cases, and thus handled by the resolver? > > > In terms of performance you're concerned, I wonder if we can somewhat > > > eliminate the bottleneck if multiple resolvers are able to run on one > > > database in the future. For example, if we could launch resolver > > > processes as many as connections on the database, individual backend > > > processes could have one resolver process. Since there would be > > > contention and inter-process communication it still brings some > > > overhead but it might be negligible comparing to network round trip. > > > > Do you mean that if concurrent 200 clients each update data on two foreign > servers, there are 400 resolvers? ...That's overuse of resources. > > I think we have 200 resolvers in this case since one resolver process > per backend process. That does not parallelize prepare or commit for a single client, as each resolver can process only one prepare or commitsynchronously at a time. Not to mention the resource usage is high. > Or another idea is that all processes queue > foreign transactions to resolve into the shared memory queue and > resolver processes fetch and resolve them instead of assigning one > distributed transaction to one resolver process. Using asynchronous > execution, the resolver process can process a bunch of foreign > transactions across distributed transactions and grouped by the > foreign server at once. It might be more complex than the current > approach but having multiple resolver processes on one database would > increase through-put well especially by combining with asynchronous > execution. Yeah, that sounds complex. It's simpler and natural for each client backend to use the connections it has used in its currenttransaction and issue prepare and commit to the foreign servers, and the resolver just takes care of failed commitsand aborts behind the scenes. That's like the walwriter takes care of writing WAL based on the client backend thatcommits asynchronously. Regards Takayuki Tsunakawa
On Mon, 28 Sep 2020 at 13:58, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > On Fri, 25 Sep 2020 at 18:21, tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > > > Why does the client backend have to create a new connection cache entry > > during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally > > continue to use connections that it has used in its current transaction? > > > > I think there are two cases: a process executes PREPARE TRANSACTION > > and another process executes COMMIT PREPARED later, and if the > > coordinator has cascaded foreign servers (i.g., a foreign server has > > its foreign server) and temporary connection problem happens in the > > intermediate node after PREPARE then another process on the > > intermediate node will execute COMMIT PREPARED on its foreign server. > > Aren't both the cases failure cases, and thus handled by the resolver? No. Please imagine a case where a user executes PREPARE TRANSACTION on the transaction that modified data on foreign servers. The backend process prepares both the local transaction and foreign transactions. But another client can execute COMMIT PREPARED on the prepared transaction. In this case, another backend newly connects foreign servers and commits prepared foreign transactions. Therefore, the new connection cache entry can be created during COMMIT PREPARED which could lead to an error but since the local prepared transaction is already committed the backend must not fail with an error. In the latter case, I’m assumed that the backend continues to retry foreign transaction resolution until the user requests cancellation. Please imagine the case where the server-A connects a foreign server (say, server-B) and server-B connects another foreign server (say, server-C). The transaction initiated on server-A modified the data on both local and server-B which further modified the data on server-C and executed COMMIT. The backend process on server-A (say, backend-A) sends PREPARE TRANSACTION to server-B then the backend process on server-B (say, backend-B) connected by backend-A prepares the local transaction and further sends PREPARE TRANSACTION to server-C. Let’s suppose a temporary connection failure happens between server-A and server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase of 2PC). When the backend-A attempts to sends COMMIT PREPARED to server-B it realizes that the connection to server-B was lost but since the user doesn’t request cancellatino yet the backend-A retries to connect server-B and suceeds. Since now that the backend-A established a new connection to server-B, there is another backend process on server-B (say, backend-B’). Since the backend-B’ doen’t have a connection to server-C yet, it creates new connection cache entry, which could lead to an error. IOW, on server-B different processes performed PREPARE TRANSACTION and COMMIT PREPARED and the later process created a connection cache entry. > > > > > > In terms of performance you're concerned, I wonder if we can somewhat > > > > eliminate the bottleneck if multiple resolvers are able to run on one > > > > database in the future. For example, if we could launch resolver > > > > processes as many as connections on the database, individual backend > > > > processes could have one resolver process. Since there would be > > > > contention and inter-process communication it still brings some > > > > overhead but it might be negligible comparing to network round trip. > > > > > > Do you mean that if concurrent 200 clients each update data on two foreign > > servers, there are 400 resolvers? ...That's overuse of resources. > > > > I think we have 200 resolvers in this case since one resolver process > > per backend process. > > That does not parallelize prepare or commit for a single client, as each resolver can process only one prepare or commitsynchronously at a time. Not to mention the resource usage is high. Well, I think we should discuss parallel (and/or asyncronous) execution of prepare and commit separated from the discussion on whether the resolver process is responsible for 2nd phase of 2PC. I've been suggesting that the first phase and the second phase of 2PC should be performed by different processes in terms of safety. And having multiple resolvers on one database is my suggestion in response to the concern you raised that one resolver process on one database can be bottleneck. Both parallel executionand asynchronous execution are slightly related to this topic but I think it should be discussed separately. Regarding parallel and asynchronous execution, I basically agree on supporting asynchronous execution as the XA specification also has, although I think it's better not to include it in the first version for simplisity. Overall, my suggestion for the first version is to support synchronous execution of prepare, commit, and rollback, have one resolver process per database, and have resolver take 2nd phase of 2PC. As the next step we can add APIs for asynchronous execution, have multiple resolvers on one database and so on. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > No. Please imagine a case where a user executes PREPARE TRANSACTION on > the transaction that modified data on foreign servers. The backend > process prepares both the local transaction and foreign transactions. > But another client can execute COMMIT PREPARED on the prepared > transaction. In this case, another backend newly connects foreign > servers and commits prepared foreign transactions. Therefore, the new > connection cache entry can be created during COMMIT PREPARED which > could lead to an error but since the local prepared transaction is > already committed the backend must not fail with an error. > > In the latter case, I’m assumed that the backend continues to retry > foreign transaction resolution until the user requests cancellation. > Please imagine the case where the server-A connects a foreign server > (say, server-B) and server-B connects another foreign server (say, > server-C). The transaction initiated on server-A modified the data on > both local and server-B which further modified the data on server-C > and executed COMMIT. The backend process on server-A (say, backend-A) > sends PREPARE TRANSACTION to server-B then the backend process on > server-B (say, backend-B) connected by backend-A prepares the local > transaction and further sends PREPARE TRANSACTION to server-C. Let’s > suppose a temporary connection failure happens between server-A and > server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase > of 2PC). When the backend-A attempts to sends COMMIT PREPARED to > server-B it realizes that the connection to server-B was lost but > since the user doesn’t request cancellatino yet the backend-A retries > to connect server-B and suceeds. Since now that the backend-A > established a new connection to server-B, there is another backend > process on server-B (say, backend-B’). Since the backend-B’ doen’t > have a connection to server-C yet, it creates new connection cache > entry, which could lead to an error. IOW, on server-B different > processes performed PREPARE TRANSACTION and COMMIT PREPARED and > the > later process created a connection cache entry. Thank you, I understood the situation. I don't think it's a good design to not address practical performance during normaloperation by fearing the rare error case. The transaction manager (TM) or the FDW implementor can naturally do things like the following: * Use palloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) to return control to the caller. * Use PG_TRY(), as its overhead is relatively negligible to connection establishment. * If the commit fails, the TM asks the resolver to take care of committing the remote transaction, and returns success tothe user. > Regarding parallel and asynchronous execution, I basically agree on > supporting asynchronous execution as the XA specification also has, > although I think it's better not to include it in the first version > for simplisity. > > Overall, my suggestion for the first version is to support synchronous > execution of prepare, commit, and rollback, have one resolver process > per database, and have resolver take 2nd phase of 2PC. As the next > step we can add APIs for asynchronous execution, have multiple > resolvers on one database and so on. We don't have to rush to commit a patch that is likely to exhibit non-practical performance, as we still have much time leftfor PG 14. The design needs to be more thought for the ideal goal and refined. By making efforts to sort through theideal design, we may be able to avoid rework and API inconsistency. As for the API, we haven't validated yet that theFDW implementor can use XA, have we? Regards Takayuki Tsunakawa
On Tue, 29 Sep 2020 at 11:37, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > No. Please imagine a case where a user executes PREPARE TRANSACTION on > > the transaction that modified data on foreign servers. The backend > > process prepares both the local transaction and foreign transactions. > > But another client can execute COMMIT PREPARED on the prepared > > transaction. In this case, another backend newly connects foreign > > servers and commits prepared foreign transactions. Therefore, the new > > connection cache entry can be created during COMMIT PREPARED which > > could lead to an error but since the local prepared transaction is > > already committed the backend must not fail with an error. > > > > In the latter case, I’m assumed that the backend continues to retry > > foreign transaction resolution until the user requests cancellation. > > Please imagine the case where the server-A connects a foreign server > > (say, server-B) and server-B connects another foreign server (say, > > server-C). The transaction initiated on server-A modified the data on > > both local and server-B which further modified the data on server-C > > and executed COMMIT. The backend process on server-A (say, backend-A) > > sends PREPARE TRANSACTION to server-B then the backend process on > > server-B (say, backend-B) connected by backend-A prepares the local > > transaction and further sends PREPARE TRANSACTION to server-C. Let’s > > suppose a temporary connection failure happens between server-A and > > server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase > > of 2PC). When the backend-A attempts to sends COMMIT PREPARED to > > server-B it realizes that the connection to server-B was lost but > > since the user doesn’t request cancellatino yet the backend-A retries > > to connect server-B and suceeds. Since now that the backend-A > > established a new connection to server-B, there is another backend > > process on server-B (say, backend-B’). Since the backend-B’ doen’t > > have a connection to server-C yet, it creates new connection cache > > entry, which could lead to an error. IOW, on server-B different > > processes performed PREPARE TRANSACTION and COMMIT PREPARED and > > the > > later process created a connection cache entry. > > Thank you, I understood the situation. I don't think it's a good design to not address practical performance during normaloperation by fearing the rare error case. > > The transaction manager (TM) or the FDW implementor can naturally do things like the following: > > * Use palloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) to return control to the caller. > > * Use PG_TRY(), as its overhead is relatively negligible to connection establishment. I suppose you mean that the FDW implementor uses PG_TRY() to catch an error but not do PG_RE_THROW(). I'm concerned that it's safe to return the control to the caller and continue trying to resolve foreign transactions without neither rethrowing an error nor transaction abort. IMHO, it's rather a bad design something like "high performance but doesn't work fine in a rare failure case", especially for the transaction management feature. > > * If the commit fails, the TM asks the resolver to take care of committing the remote transaction, and returns successto the user. > > > > Regarding parallel and asynchronous execution, I basically agree on > > supporting asynchronous execution as the XA specification also has, > > although I think it's better not to include it in the first version > > for simplisity. > > > > Overall, my suggestion for the first version is to support synchronous > > execution of prepare, commit, and rollback, have one resolver process > > per database, and have resolver take 2nd phase of 2PC. As the next > > step we can add APIs for asynchronous execution, have multiple > > resolvers on one database and so on. > > We don't have to rush to commit a patch that is likely to exhibit non-practical performance, as we still have much timeleft for PG 14. The design needs to be more thought for the ideal goal and refined. By making efforts to sort throughthe ideal design, we may be able to avoid rework and API inconsistency. As for the API, we haven't validated yetthat the FDW implementor can use XA, have we? Yes, we still need to check if FDW implementor other than postgres_fdw is able to support these APIs. I agree that we need more discussion on the design. My suggestion is to start a small, simple feature as the first step and not try to include everything in the first version. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 29 Sep 2020 at 15:03, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 29 Sep 2020 at 11:37, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > No. Please imagine a case where a user executes PREPARE TRANSACTION on > > > the transaction that modified data on foreign servers. The backend > > > process prepares both the local transaction and foreign transactions. > > > But another client can execute COMMIT PREPARED on the prepared > > > transaction. In this case, another backend newly connects foreign > > > servers and commits prepared foreign transactions. Therefore, the new > > > connection cache entry can be created during COMMIT PREPARED which > > > could lead to an error but since the local prepared transaction is > > > already committed the backend must not fail with an error. > > > > > > In the latter case, I’m assumed that the backend continues to retry > > > foreign transaction resolution until the user requests cancellation. > > > Please imagine the case where the server-A connects a foreign server > > > (say, server-B) and server-B connects another foreign server (say, > > > server-C). The transaction initiated on server-A modified the data on > > > both local and server-B which further modified the data on server-C > > > and executed COMMIT. The backend process on server-A (say, backend-A) > > > sends PREPARE TRANSACTION to server-B then the backend process on > > > server-B (say, backend-B) connected by backend-A prepares the local > > > transaction and further sends PREPARE TRANSACTION to server-C. Let’s > > > suppose a temporary connection failure happens between server-A and > > > server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase > > > of 2PC). When the backend-A attempts to sends COMMIT PREPARED to > > > server-B it realizes that the connection to server-B was lost but > > > since the user doesn’t request cancellatino yet the backend-A retries > > > to connect server-B and suceeds. Since now that the backend-A > > > established a new connection to server-B, there is another backend > > > process on server-B (say, backend-B’). Since the backend-B’ doen’t > > > have a connection to server-C yet, it creates new connection cache > > > entry, which could lead to an error. IOW, on server-B different > > > processes performed PREPARE TRANSACTION and COMMIT PREPARED and > > > the > > > later process created a connection cache entry. > > > > Thank you, I understood the situation. I don't think it's a good design to not address practical performance duringnormal operation by fearing the rare error case. > > > > The transaction manager (TM) or the FDW implementor can naturally do things like the following: > > > > * Use palloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) to return control to the caller. > > > > * Use PG_TRY(), as its overhead is relatively negligible to connection establishment. > > I suppose you mean that the FDW implementor uses PG_TRY() to catch an > error but not do PG_RE_THROW(). I'm concerned that it's safe to return > the control to the caller and continue trying to resolve foreign > transactions without neither rethrowing an error nor transaction > abort. > > IMHO, it's rather a bad design something like "high performance but > doesn't work fine in a rare failure case", especially for the > transaction management feature. To avoid misunderstanding, I didn't mean to disregard the performance. I mean especially for the transaction management feature it's essential to work fine even in failure cases. So I hope we have a safe, robust, and probably simple design for the first version that might be low performance yet though but have a potential for performance improvement and we will be able to try to improve performance later. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > To avoid misunderstanding, I didn't mean to disregard the performance. > I mean especially for the transaction management feature it's > essential to work fine even in failure cases. So I hope we have a > safe, robust, and probably simple design for the first version that > might be low performance yet though but have a potential for > performance improvement and we will be able to try to improve > performance later. Yes, correctness (safety?) is a basic premise. I understand that given the time left for PG 14, we haven't yet given upa sound design that offers practical or normally expected performance. I don't think the design has not well thought yetto see if it's simple or complex. At least, I don't believe doing "send commit request, perform commit on a remote server,and wait for reply" sequence one transaction at a time in turn is what this community (and other DBMSs) tolerate. A kid's tricycle is safe, but it's not safe to ride a tricycle on the road. Let's not rush to commit and do ourbest! Regards Takayuki Tsunakawa
On Wed, 30 Sep 2020 at 16:02, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > To avoid misunderstanding, I didn't mean to disregard the performance. > > I mean especially for the transaction management feature it's > > essential to work fine even in failure cases. So I hope we have a > > safe, robust, and probably simple design for the first version that > > might be low performance yet though but have a potential for > > performance improvement and we will be able to try to improve > > performance later. > > Yes, correctness (safety?) is a basic premise. I understand that given the time left for PG 14, we haven't yet given upa sound design that offers practical or normally expected performance. I don't think the design has not well thought yetto see if it's simple or complex. At least, I don't believe doing "send commit request, perform commit on a remote server,and wait for reply" sequence one transaction at a time in turn is what this community (and other DBMSs) tolerate. A kid's tricycle is safe, but it's not safe to ride a tricycle on the road. Let's not rush to commit and do ourbest! Okay. I'd like to resolve my concern that I repeatedly mentioned and we don't find a good solution yet. That is, how we handle errors raised by FDW transaction callbacks during committing/rolling back prepared foreign transactions. Actually, this has already been discussed before[1] and we concluded at that time that using a background worker to commit/rolling back foreign prepared transactions is the best way. Anyway, let me summarize the discussion on this issue so far. With your idea, after the local commit, the backend process directly call transaction FDW API to commit the foreign prepared transactions. However, it's likely to happen an error (i.g. ereport(ERROR)) during that due to various reasons. It could be an OOM by memory allocation, connection error whatever. In case an error happens during committing prepared foreign transactions, the user will get the error but it's too late. The local transaction and possibly other foreign prepared transaction have already been committed. You proposed the first idea to avoid such a situation that FDW implementor can write the code while trying to reduce the possibility of errors happening as much as possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive solution. They might miss, not know it, or use other functions provided by the core that could lead an error. Another idea is to use PG_TRY() and PG_CATCH(). IIUC with this idea, FDW implementor catches an error but ignores it rather than rethrowing by PG_RE_THROW() in order to return the control to the core after an error. I’m really not sure it’s a correct usage of those macros. In addition, after returning to the core, it will retry to resolve the same or other foreign transactions. That is, after ignoring an error, the core needs to continue working and possibly call transaction callbacks of other FDW implementations. Regards, [1] https://www.postgresql.org/message-id/CA%2BTgmoY%3DVkHrzXD%3Djw5DA%2BPp-ePW_6_v5n%2BTJk40s5Q9VXY-Pw%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > You proposed the first idea > to avoid such a situation that FDW implementor can write the code > while trying to reduce the possibility of errors happening as much as > possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and > hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive > solution. They might miss, not know it, or use other functions > provided by the core that could lead an error. We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor comparedto other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function,trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementorwould use to implement their commit will be very limited, won't they? Because most of the commit processing isperformed in the resource manager's library (e.g. Oracle and MySQL client library.) (Before that, the developer of server-side modules is not given any information on what functions (like palloc) are availablein the manual, is he?) > Another idea is to use > PG_TRY() and PG_CATCH(). IIUC with this idea, FDW implementor catches > an error but ignores it rather than rethrowing by PG_RE_THROW() in > order to return the control to the core after an error. I’m really not > sure it’s a correct usage of those macros. In addition, after > returning to the core, it will retry to resolve the same or other > foreign transactions. That is, after ignoring an error, the core needs > to continue working and possibly call transaction callbacks of other > FDW implementations. No, not ignore the error. The FDW can emit a WARNING, LOG, or NOTICE message, and return an error code to TM. TM can alsoemit a message like: WARNING: failed to commit part of a transaction on the foreign server 'XXX' HINT: The server continues to try committing the remote transaction. Then TM asks the resolver to take care of committing the remote transaction, and acknowledge the commit success to the client. The relevant return codes of xa_commit() are: -------------------------------------------------- [XAER_RMERR] An error occurred in committing the work performed on behalf of the transaction branch and the branch’s work has been rolled back. Note that returning this error signals a catastrophic event to a transaction manager since other resource managers may successfully commit their work on behalf of this branch. This error should be returned only when a resource manager concludes that it can never commit the branch and that it cannot hold the branch’s resources in a prepared state. Otherwise, [XA_RETRY] should be returned. [XAER_RMFAIL] An error occurred that makes the resource manager unavailable. -------------------------------------------------- Regards Takayuki Tsunakawa
On Fri, 2 Oct 2020 at 18:20, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > You proposed the first idea > > to avoid such a situation that FDW implementor can write the code > > while trying to reduce the possibility of errors happening as much as > > possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and > > hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive > > solution. They might miss, not know it, or use other functions > > provided by the core that could lead an error. > > We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor comparedto other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function,trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementorwould use to implement their commit will be very limited, won't they? Because most of the commit processing isperformed in the resource manager's library (e.g. Oracle and MySQL client library.) Yeah, if we think FDW implementors properly implement these APIs while following the guideline, giving the guideline is a good idea. But I’m not sure all FDW implementors are able to do that and even if the user uses an FDW whose transaction APIs don’t follow the guideline, the user won’t realize it. IMO it’s better to design the feature while not depending on external programs for reliability (correctness?) of this feature, although I might be too worried. > > > > Another idea is to use > > PG_TRY() and PG_CATCH(). IIUC with this idea, FDW implementor catches > > an error but ignores it rather than rethrowing by PG_RE_THROW() in > > order to return the control to the core after an error. I’m really not > > sure it’s a correct usage of those macros. In addition, after > > returning to the core, it will retry to resolve the same or other > > foreign transactions. That is, after ignoring an error, the core needs > > to continue working and possibly call transaction callbacks of other > > FDW implementations. > > No, not ignore the error. The FDW can emit a WARNING, LOG, or NOTICE message, and return an error code to TM. TM canalso emit a message like: > > WARNING: failed to commit part of a transaction on the foreign server 'XXX' > HINT: The server continues to try committing the remote transaction. > > Then TM asks the resolver to take care of committing the remote transaction, and acknowledge the commit success to theclient. It seems like if failed to resolve, the backend would return an acknowledgment of COMMIT to the client and the resolver process resolves foreign prepared transactions in the background. So we can ensure that the distributed transaction is completed at the time when the client got an acknowledgment of COMMIT if 2nd phase of 2PC is successfully completed in the first attempts. OTOH, if it failed for whatever reason, there is no such guarantee. From an optimistic perspective, i.g., the failures are unlikely to happen, it will work well but IMO it’s not uncommon to fail to resolve foreign transactions due to network issue, especially in an unreliable network environment for example geo-distributed database. So I think it will end up requiring the client to check if preceding distributed transactions are completed or not in order to see the results of these transactions. We could retry the foreign transaction resolution before leaving it to the resolver process but the problem that the core continues trying to resolve foreign transactions without neither transaction aborting and rethrowing even after an error still remains. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 6, 2020 at 7:22 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 2 Oct 2020 at 18:20, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > You proposed the first idea > > > to avoid such a situation that FDW implementor can write the code > > > while trying to reduce the possibility of errors happening as much as > > > possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and > > > hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive > > > solution. They might miss, not know it, or use other functions > > > provided by the core that could lead an error. > > > > We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor comparedto other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function,trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementorwould use to implement their commit will be very limited, won't they? Because most of the commit processing isperformed in the resource manager's library (e.g. Oracle and MySQL client library.) > > Yeah, if we think FDW implementors properly implement these APIs while > following the guideline, giving the guideline is a good idea. But I’m > not sure all FDW implementors are able to do that and even if the user > uses an FDW whose transaction APIs don’t follow the guideline, the > user won’t realize it. IMO it’s better to design the feature while not > depending on external programs for reliability (correctness?) of this > feature, although I might be too worried. > +1 for that. I don't think it's even in the hands of implementers to avoid throwing an error in all the conditions. -- Best Wishes, Ashutosh Bapat
On Tue, Oct 6, 2020 at 10:52 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 2 Oct 2020 at 18:20, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > You proposed the first idea > > > to avoid such a situation that FDW implementor can write the code > > > while trying to reduce the possibility of errors happening as much as > > > possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and > > > hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive > > > solution. They might miss, not know it, or use other functions > > > provided by the core that could lead an error. > > > > We can give the guideline in the manual, can't we? It should not be especially difficult for the FDW implementor comparedto other Postgres's extensibility features that have their own rules -- table/index AM, user-defined C function,trigger function in C, user-defined data types, hooks, etc. And, the Postgres functions that the FDW implementorwould use to implement their commit will be very limited, won't they? Because most of the commit processing isperformed in the resource manager's library (e.g. Oracle and MySQL client library.) > > Yeah, if we think FDW implementors properly implement these APIs while > following the guideline, giving the guideline is a good idea. But I’m > not sure all FDW implementors are able to do that and even if the user > uses an FDW whose transaction APIs don’t follow the guideline, the > user won’t realize it. IMO it’s better to design the feature while not > depending on external programs for reliability (correctness?) of this > feature, although I might be too worried. > After more thoughts on Tsunakawa-san’s idea it seems to need the following conditions: * At least postgres_fdw is viable to implement these APIs while guaranteeing not to happen any error. * A certain number of FDWs (or majority of FDWs) can do that in a similar way to postgres_fdw by using the guideline and probably postgres_fdw as a reference. These are necessary for FDW implementors to implement APIs while following the guideline and for the core to trust them. As far as postgres_fdw goes, what we need to do when committing a foreign transaction resolution is to get a connection from the connection cache or create and connect if not found, construct a SQL query (COMMIT/ROLLBACK PREPARED with identifier) using a fixed-size buffer, send the query, and get the result. The possible place to raise an error is limited. In case of failures such as connection error FDW can return false to the core along with a flag indicating to ask the core retry. Then the core will retry to resolve foreign transactions after some sleep. OTOH if FDW sized up that there is no hope of resolving the foreign transaction, it also could return false to the core along with another flag indicating to remove the entry and not to retry. Also, the transaction resolution by FDW needs to be cancellable (interruptible) but cannot use CHECK_FOR_INTERRUPTS(). Probably, as Tsunakawa-san also suggested, it’s not impossible to implement these APIs in postgres_fdw while guaranteeing not to happen any error, although not sure the code complexity. So I think the first condition may be true but not sure about the second assumption, particularly about the interruptible part. I thought we could support both ideas to get their pros; supporting Tsunakawa-san's idea and then my idea if necessary, and FDW can choose whether to ask the resolver process to perform 2nd phase of 2PC or not. But it's not a good idea in terms of complexity. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
RE: Transactions involving multiple postgres foreign servers, take 2
Sorry to be late to respond. (My PC is behaving strangely after upgrading Win10 2004) From: Masahiko Sawada <sawada.mshk@gmail.com> > After more thoughts on Tsunakawa-san’s idea it seems to need the > following conditions: > > * At least postgres_fdw is viable to implement these APIs while > guaranteeing not to happen any error. > * A certain number of FDWs (or majority of FDWs) can do that in a > similar way to postgres_fdw by using the guideline and probably > postgres_fdw as a reference. > > These are necessary for FDW implementors to implement APIs while > following the guideline and for the core to trust them. > > As far as postgres_fdw goes, what we need to do when committing a > foreign transaction resolution is to get a connection from the > connection cache or create and connect if not found, construct a SQL > query (COMMIT/ROLLBACK PREPARED with identifier) using a fixed-size > buffer, send the query, and get the result. The possible place to > raise an error is limited. In case of failures such as connection > error FDW can return false to the core along with a flag indicating to > ask the core retry. Then the core will retry to resolve foreign > transactions after some sleep. OTOH if FDW sized up that there is no > hope of resolving the foreign transaction, it also could return false > to the core along with another flag indicating to remove the entry and > not to retry. Also, the transaction resolution by FDW needs to be > cancellable (interruptible) but cannot use CHECK_FOR_INTERRUPTS(). > > Probably, as Tsunakawa-san also suggested, it’s not impossible to > implement these APIs in postgres_fdw while guaranteeing not to happen > any error, although not sure the code complexity. So I think the first > condition may be true but not sure about the second assumption, > particularly about the interruptible part. Yeah, I expect the commit of the second phase should not be difficult for the FDW developer. As for the cancellation during commit retry, I don't think we necessarily have to make the TM responsible for retrying thecommits. Many DBMSs have their own timeout functionality such as connection timeout, socket timeout, and statement timeout. Users can set those parameters in the foreign server options based on how long the end user can wait. That is,TM calls FDW's commit routine just once. If the TM makes efforts to retry commits, the duration would be from a few seconds to 30 seconds. Then, we can hold backthe cancellation during that period. > I thought we could support both ideas to get their pros; supporting > Tsunakawa-san's idea and then my idea if necessary, and FDW can choose > whether to ask the resolver process to perform 2nd phase of 2PC or > not. But it's not a good idea in terms of complexity. I don't feel the need for leaving the commit to the resolver during normal operation. seems like if failed to resolve, the backend would return an > acknowledgment of COMMIT to the client and the resolver process > resolves foreign prepared transactions in the background. So we can > ensure that the distributed transaction is completed at the time when > the client got an acknowledgment of COMMIT if 2nd phase of 2PC is > successfully completed in the first attempts. OTOH, if it failed for > whatever reason, there is no such guarantee. From an optimistic > perspective, i.g., the failures are unlikely to happen, it will work > well but IMO it’s not uncommon to fail to resolve foreign transactions > due to network issue, especially in an unreliable network environment > for example geo-distributed database. So I think it will end up > requiring the client to check if preceding distributed transactions > are completed or not in order to see the results of these > transactions. That issue exists with any method, doesn't it? Regards Takayuki Tsunakawa
On Thu, 8 Oct 2020 at 18:05, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > Sorry to be late to respond. (My PC is behaving strangely after upgrading Win10 2004) > > From: Masahiko Sawada <sawada.mshk@gmail.com> > > After more thoughts on Tsunakawa-san’s idea it seems to need the > > following conditions: > > > > * At least postgres_fdw is viable to implement these APIs while > > guaranteeing not to happen any error. > > * A certain number of FDWs (or majority of FDWs) can do that in a > > similar way to postgres_fdw by using the guideline and probably > > postgres_fdw as a reference. > > > > These are necessary for FDW implementors to implement APIs while > > following the guideline and for the core to trust them. > > > > As far as postgres_fdw goes, what we need to do when committing a > > foreign transaction resolution is to get a connection from the > > connection cache or create and connect if not found, construct a SQL > > query (COMMIT/ROLLBACK PREPARED with identifier) using a fixed-size > > buffer, send the query, and get the result. The possible place to > > raise an error is limited. In case of failures such as connection > > error FDW can return false to the core along with a flag indicating to > > ask the core retry. Then the core will retry to resolve foreign > > transactions after some sleep. OTOH if FDW sized up that there is no > > hope of resolving the foreign transaction, it also could return false > > to the core along with another flag indicating to remove the entry and > > not to retry. Also, the transaction resolution by FDW needs to be > > cancellable (interruptible) but cannot use CHECK_FOR_INTERRUPTS(). > > > > Probably, as Tsunakawa-san also suggested, it’s not impossible to > > implement these APIs in postgres_fdw while guaranteeing not to happen > > any error, although not sure the code complexity. So I think the first > > condition may be true but not sure about the second assumption, > > particularly about the interruptible part. > > Yeah, I expect the commit of the second phase should not be difficult for the FDW developer. > > As for the cancellation during commit retry, I don't think we necessarily have to make the TM responsible for retryingthe commits. Many DBMSs have their own timeout functionality such as connection timeout, socket timeout, and statementtimeout. > Users can set those parameters in the foreign server options based on how long the end user can wait. That is, TM callsFDW's commit routine just once. What about temporary network failures? I think there are users who don't want to give up resolving foreign transactions failed due to a temporary network failure. Or even they might want to wait for transaction completion until they send a cancel request. If we want to call the commit routine only once and therefore want FDW to retry connecting the foreign server within the call, it means we require all FDW implementors to write a retry loop code that is interruptible and ensures not to raise an error, which increases difficulty. Also, what if the user sets the statement timeout to 60 sec and they want to cancel the waits after 5 sec by pressing ctl-C? You mentioned that client libraries of other DBMSs don't have asynchronous execution functionality. If the SQL execution function is not interruptible, the user will end up waiting for 60 sec, which seems not good. > If the TM makes efforts to retry commits, the duration would be from a few seconds to 30 seconds. Then, we can hold backthe cancellation during that period. > > > > I thought we could support both ideas to get their pros; supporting > > Tsunakawa-san's idea and then my idea if necessary, and FDW can choose > > whether to ask the resolver process to perform 2nd phase of 2PC or > > not. But it's not a good idea in terms of complexity. > > I don't feel the need for leaving the commit to the resolver during normal operation. I meant it's for FDWs that cannot guarantee not to happen error during resolution. > seems like if failed to resolve, the backend would return an > > acknowledgment of COMMIT to the client and the resolver process > > resolves foreign prepared transactions in the background. So we can > > ensure that the distributed transaction is completed at the time when > > the client got an acknowledgment of COMMIT if 2nd phase of 2PC is > > successfully completed in the first attempts. OTOH, if it failed for > > whatever reason, there is no such guarantee. From an optimistic > > perspective, i.g., the failures are unlikely to happen, it will work > > well but IMO it’s not uncommon to fail to resolve foreign transactions > > due to network issue, especially in an unreliable network environment > > for example geo-distributed database. So I think it will end up > > requiring the client to check if preceding distributed transactions > > are completed or not in order to see the results of these > > transactions. > > That issue exists with any method, doesn't it? Yes, but if we don’t retry to resolve foreign transactions at all on an unreliable network environment, the user might end up requiring every transaction to check the status of foreign transactions of the previous distributed transaction before starts. If we allow to do retry, I guess we ease that somewhat. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > What about temporary network failures? I think there are users who > don't want to give up resolving foreign transactions failed due to a > temporary network failure. Or even they might want to wait for > transaction completion until they send a cancel request. If we want to > call the commit routine only once and therefore want FDW to retry > connecting the foreign server within the call, it means we require all > FDW implementors to write a retry loop code that is interruptible and > ensures not to raise an error, which increases difficulty. > > Yes, but if we don’t retry to resolve foreign transactions at all on > an unreliable network environment, the user might end up requiring > every transaction to check the status of foreign transactions of the > previous distributed transaction before starts. If we allow to do > retry, I guess we ease that somewhat. OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If thenetwork failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon,too. Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly findthe English manual, so below is in Japanese. I quoted some text that got through machine translation, which appearsa bit strange.) https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm -------------------------------------------------- Abandon timeout Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phasecommit transaction. In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction untilall resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attemptis made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transactionmanager rolls back the transaction and releases the held lock on behalf of the destroyed transaction. -------------------------------------------------- > Also, what if the user sets the statement timeout to 60 sec and they > want to cancel the waits after 5 sec by pressing ctl-C? You mentioned > that client libraries of other DBMSs don't have asynchronous execution > functionality. If the SQL execution function is not interruptible, the > user will end up waiting for 60 sec, which seems not good. FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation ofSQL execution. Regards Takayuki Tsunakawa
At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > What about temporary network failures? I think there are users who > > don't want to give up resolving foreign transactions failed due to a > > temporary network failure. Or even they might want to wait for > > transaction completion until they send a cancel request. If we want to > > call the commit routine only once and therefore want FDW to retry > > connecting the foreign server within the call, it means we require all > > FDW implementors to write a retry loop code that is interruptible and > > ensures not to raise an error, which increases difficulty. > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > an unreliable network environment, the user might end up requiring > > every transaction to check the status of foreign transactions of the > > previous distributed transaction before starts. If we allow to do > > retry, I guess we ease that somewhat. > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. Ifthe network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transactionsoon, too. I should missing something, though... I don't understand why we hate ERRORs from fdw-2pc-commit routine so much. I think remote-commits should be performed before local commit passes the point-of-no-return and the v26-0002 actually places AtEOXact_FdwXact() before the critical section. (FWIW, I think remote commits should be performed by backends, not by another process, because backends should wait for all remote-commits to end anyway and it is simpler. If we want to multiple remote-commits in parallel, we could do that by adding some async-waiting interface.) > Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly findthe English manual, so below is in Japanese. I quoted some text that got through machine translation, which appearsa bit strange.) > > https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm > -------------------------------------------------- > Abandon timeout > Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phasecommit transaction. > > In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction untilall resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attemptis made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transactionmanager rolls back the transaction and releases the held lock on behalf of the destroyed transaction. > -------------------------------------------------- That's not a retry timeout but a timeout for total time of all 2nd-phase-commits. But I think it would be sufficient. Even if an fdw could retry 2pc-commit, it's a matter of that fdw and the core has nothing to do with. > > Also, what if the user sets the statement timeout to 60 sec and they > > want to cancel the waits after 5 sec by pressing ctl-C? You mentioned > > that client libraries of other DBMSs don't have asynchronous execution > > functionality. If the SQL execution function is not interruptible, the > > user will end up waiting for 60 sec, which seems not good. I think fdw-2pc-commit can be interruptible safely as far as we run the remote commits before entring critical section of local commit. > FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation ofSQL execution. At least postgres_fdw is interruptible while waiting the remote. create view lt as select 1 as slp from (select pg_sleep(10)) t; create foreign table ft(slp int) server sv1 options (table_name 'lt'); select * from ft; ^CCancel request sent ERROR: canceling statement due to user request regrds. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Transactions involving multiple postgres foreign servers, take 2
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > much. I think remote-commits should be performed before local commit > passes the point-of-no-return and the v26-0002 actually places > AtEOXact_FdwXact() before the critical section. I don't hate ERROR, but it would be simpler and understandable for the FDW commit routine to just return control to the caller(TM) and let TM do whatever is appropriate (asks the resolver to handle the failed commit, and continues to requestnext FDW to commit.) > > https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm > > -------------------------------------------------- > > Abandon timeout > > Specifies the maximum time (in seconds) that the transaction manager > attempts to complete the second phase of a two-phase commit transaction. > > > > In the second phase of a two-phase commit transaction, the transaction > manager attempts to complete the transaction until all resource managers > indicate that the transaction is complete. After the abort transaction timer > expires, no attempt is made to resolve the transaction. If the transaction enters > a ready state before it is destroyed, the transaction manager rolls back the > transaction and releases the held lock on behalf of the destroyed transaction. > > -------------------------------------------------- > > That's not a retry timeout but a timeout for total time of all > 2nd-phase-commits. But I think it would be sufficient. Even if an > fdw could retry 2pc-commit, it's a matter of that fdw and the core has > nothing to do with. Yeah, the WebLogic documentation doesn't say whether it performs retries during the timeout period. I just cited as an examplethat has a timeout parameter for the second phase of 2PC. > At least postgres_fdw is interruptible while waiting the remote. > > create view lt as select 1 as slp from (select pg_sleep(10)) t; > create foreign table ft(slp int) server sv1 options (table_name 'lt'); > select * from ft; > ^CCancel request sent > ERROR: canceling statement due to user request I'm afraid the cancellation doesn't work while postgres_fdw is trying to connect to a down server. Also, Postgres manualdoesn't say about cancellation, so we cannot expect FDWs to respond to user's cancel request. Regards Takayuki Tsunakawa
On Fri, 9 Oct 2020 at 11:33, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > What about temporary network failures? I think there are users who > > don't want to give up resolving foreign transactions failed due to a > > temporary network failure. Or even they might want to wait for > > transaction completion until they send a cancel request. If we want to > > call the commit routine only once and therefore want FDW to retry > > connecting the foreign server within the call, it means we require all > > FDW implementors to write a retry loop code that is interruptible and > > ensures not to raise an error, which increases difficulty. > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > an unreliable network environment, the user might end up requiring > > every transaction to check the status of foreign transactions of the > > previous distributed transaction before starts. If we allow to do > > retry, I guess we ease that somewhat. > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. Ifthe network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transactionsoon, too. Well, I agree that it's not mandatory. I think it's better if the user can choose. I also doubt how useful the per-foreign-server timeout setting you mentioned before. For example, suppose the transaction involves with three foreign servers that have different timeout setting, what if the backend failed to commit on the first one of the server due to timeout? Does it attempt to commit on the other two servers? Or does it give up and return the control to the client? In the former case, what if the backend failed again on one of the other two servers due to timeout? The backend might end up waiting for all timeouts and in practice the user is not aware of how many servers are involved with the transaction, for example in a sharding. So It seems to be hard to predict the total timeout. In the latter case, the backend might succeed to commit on the other two nodes. Also, the timeout setting of the first foreign server virtually is used as the whole foreign transaction resolution timeout. However, the user cannot control the order of resolution. So again it seems to be hard for the user to predict the timeout. So If we have a timeout mechanism, I think it's better if the user can control the timeout for each transaction. Probably the same is true for the retry. > > Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly findthe English manual, so below is in Japanese. I quoted some text that got through machine translation, which appearsa bit strange.) > > https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm > -------------------------------------------------- > Abandon timeout > Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phasecommit transaction. > > In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction untilall resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attemptis made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transactionmanager rolls back the transaction and releases the held lock on behalf of the destroyed transaction. > -------------------------------------------------- Yeah per-transaction timeout for 2nd phase of 2PC seems a good idea. > > > > > Also, what if the user sets the statement timeout to 60 sec and they > > want to cancel the waits after 5 sec by pressing ctl-C? You mentioned > > that client libraries of other DBMSs don't have asynchronous execution > > functionality. If the SQL execution function is not interruptible, the > > user will end up waiting for 60 sec, which seems not good. > > FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation ofSQL execution. For example in postgres_fdw, it executes a SQL in asynchronous manner using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on (see do_sql_command() and pgfdw_get_result()). Therefore it the user pressed ctl-C, the remote query would be canceled and raise an ERROR. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > What about temporary network failures? I think there are users who > > > don't want to give up resolving foreign transactions failed due to a > > > temporary network failure. Or even they might want to wait for > > > transaction completion until they send a cancel request. If we want to > > > call the commit routine only once and therefore want FDW to retry > > > connecting the foreign server within the call, it means we require all > > > FDW implementors to write a retry loop code that is interruptible and > > > ensures not to raise an error, which increases difficulty. > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > > an unreliable network environment, the user might end up requiring > > > every transaction to check the status of foreign transactions of the > > > previous distributed transaction before starts. If we allow to do > > > retry, I guess we ease that somewhat. > > > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. Ifthe network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transactionsoon, too. > > I should missing something, though... > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > much. I think remote-commits should be performed before local commit > passes the point-of-no-return and the v26-0002 actually places > AtEOXact_FdwXact() before the critical section. > So you're thinking the following sequence? 1. Prepare all foreign transactions. 2. Commit the all prepared foreign transactions. 3. Commit the local transaction. Suppose we have the backend process call the commit routine, what if one of FDW raises an ERROR during committing the foreign transaction after committing other foreign transactions? The transaction will end up with an abort but some foreign transactions are already committed. Also, what if the backend process failed to commit the local transaction? Since it already committed all foreign transactions it cannot ensure the global atomicity in this case too. Therefore, I think we should commit the distributed transactions in the following sequence: 1. Prepare all foreign transactions. 2. Commit the local transaction. 3. Commit the all prepared foreign transactions. But this is still not a perfect solution. If we have the backend process call the commit routine and an error happens during executing the commit routine of an FDW (i.g., at step 3) it's too late to report an error to the client because we already committed the local transaction. So the current solution is to have a background process commit the foreign transactions so that the backend can just wait without the possibility of errors. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > I also doubt how useful the per-foreign-server timeout setting you > mentioned before. For example, suppose the transaction involves with > three foreign servers that have different timeout setting, what if the > backend failed to commit on the first one of the server due to > timeout? Does it attempt to commit on the other two servers? Or does > it give up and return the control to the client? In the former case, > what if the backend failed again on one of the other two servers due > to timeout? The backend might end up waiting for all timeouts and in > practice the user is not aware of how many servers are involved with > the transaction, for example in a sharding. So It seems to be hard to > predict the total timeout. In the latter case, the backend might > succeed to commit on the other two nodes. Also, the timeout setting of > the first foreign server virtually is used as the whole foreign > transaction resolution timeout. However, the user cannot control the > order of resolution. So again it seems to be hard for the user to > predict the timeout. So If we have a timeout mechanism, I think it's > better if the user can control the timeout for each transaction. > Probably the same is true for the retry. I agree that the user can control the timeout per transaction, not per FDW. I was just not sure if the Postgres core candefine the timeout parameter and the FDWs can follow its setting. However, JTA defines a transaction timeout API (notcommit timeout, though), and each RM can choose to implement them. So I think we can define the parameter and/or routinesfor the timeout in core likewise. -------------------------------------------------- public interface javax.transaction.xa.XAResource int getTransactionTimeout() throws XAException This method returns the transaction timeout value set for this XAResourceinstance. If XAResource. setTransactionTimeout was not use prior to invoking this method, the return value is the default timeout set for the resource manager; otherwise, the value used in the previous setTransactionTimeoutcall is returned. Throws: XAException An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL. Returns: The transaction timeout values in seconds. boolean setTransactionTimeout(int seconds) throws XAException This method sets the transaction timeout value for this XAResourceinstance. Once set, this timeout value is effective until setTransactionTimeoutis invoked again with a different value. To reset the timeout value to the default value used by the resource manager, set the value to zero. If the timeout operation is performed successfully, the method returns true; otherwise false. If a resource manager does not support transaction timeout value to be set explicitly, this method returns false. Parameters: seconds An positive integer specifying the timout value in seconds. Zero resets the transaction timeout value to the default one used by the resource manager. A negative value results in XAException to be thrown with XAER_INVAL error code. Returns: true if transaction timeout value is set successfully; otherwise false. Throws: XAException An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL, or XAER_INVAL. -------------------------------------------------- > For example in postgres_fdw, it executes a SQL in asynchronous manner > using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on > (see do_sql_command() and pgfdw_get_result()). Therefore it the user > pressed ctl-C, the remote query would be canceled and raise an ERROR. Yeah, as I replied to Horiguchi-san, postgres_fdw can cancel queries. But postgres_fdw is not ready to cancel connectionestablishment, is it? At present, the user needs to set connect_timeout parameter on the foreign server to a reasonableshort time so that it can respond quickly to cancellation requests. Alternately, we can modify postgres_fdw touse libpq's asynchronous connect functions. Another issue is that the Postgres manual does not stipulate anything about cancellation of FDW processing. That's why Isaid that the current FDW does not support cancellation in general. Of course, I think we can stipulate the ability tocancel processing in the FDW interface. Regards Takayuki Tsunakawa
On Mon, 12 Oct 2020 at 11:08, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > I also doubt how useful the per-foreign-server timeout setting you > > mentioned before. For example, suppose the transaction involves with > > three foreign servers that have different timeout setting, what if the > > backend failed to commit on the first one of the server due to > > timeout? Does it attempt to commit on the other two servers? Or does > > it give up and return the control to the client? In the former case, > > what if the backend failed again on one of the other two servers due > > to timeout? The backend might end up waiting for all timeouts and in > > practice the user is not aware of how many servers are involved with > > the transaction, for example in a sharding. So It seems to be hard to > > predict the total timeout. In the latter case, the backend might > > succeed to commit on the other two nodes. Also, the timeout setting of > > the first foreign server virtually is used as the whole foreign > > transaction resolution timeout. However, the user cannot control the > > order of resolution. So again it seems to be hard for the user to > > predict the timeout. So If we have a timeout mechanism, I think it's > > better if the user can control the timeout for each transaction. > > Probably the same is true for the retry. > > I agree that the user can control the timeout per transaction, not per FDW. I was just not sure if the Postgres core candefine the timeout parameter and the FDWs can follow its setting. However, JTA defines a transaction timeout API (notcommit timeout, though), and each RM can choose to implement them. So I think we can define the parameter and/or routinesfor the timeout in core likewise. I was thinking to have a GUC timeout parameter like statement_timeout. The backend waits for the setting value when resolving foreign transactions. But this idea seems different. FDW can set its timeout via a transaction timeout API, is that right? But even if FDW can set the timeout using a transaction timeout API, the problem that client libraries for some DBMS don't support interruptible functions still remains. The user can set a short time to the timeout but it also leads to unnecessary timeouts. Thoughts? > > > -------------------------------------------------- > public interface javax.transaction.xa.XAResource > > int getTransactionTimeout() throws XAException > This method returns the transaction timeout value set for this XAResourceinstance. If XAResource. > setTransactionTimeout was not use prior to invoking this method, the return value is the > default timeout set for the resource manager; otherwise, the value used in the previous setTransactionTimeoutcall > is returned. > > Throws: XAException > An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL. > > Returns: > The transaction timeout values in seconds. > > boolean setTransactionTimeout(int seconds) throws XAException > This method sets the transaction timeout value for this XAResourceinstance. Once set, this timeout value > is effective until setTransactionTimeoutis invoked again with a different value. To reset the timeout > value to the default value used by the resource manager, set the value to zero. > > If the timeout operation is performed successfully, the method returns true; otherwise false. If a resource > manager does not support transaction timeout value to be set explicitly, this method returns false. > > Parameters: > > seconds > An positive integer specifying the timout value in seconds. Zero resets the transaction timeout > value to the default one used by the resource manager. A negative value results in XAException > to be thrown with XAER_INVAL error code. > > Returns: > true if transaction timeout value is set successfully; otherwise false. > > Throws: XAException > An error has occurred. Possible exception values are: XAER_RMERR, XAER_RMFAIL, or > XAER_INVAL. > -------------------------------------------------- > > > > > For example in postgres_fdw, it executes a SQL in asynchronous manner > > using by PQsendQuery(), PQconsumeInput() and PQgetResult() and so on > > (see do_sql_command() and pgfdw_get_result()). Therefore it the user > > pressed ctl-C, the remote query would be canceled and raise an ERROR. > > Yeah, as I replied to Horiguchi-san, postgres_fdw can cancel queries. But postgres_fdw is not ready to cancel connectionestablishment, is it? At present, the user needs to set connect_timeout parameter on the foreign server to a reasonableshort time so that it can respond quickly to cancellation requests. Alternately, we can modify postgres_fdw touse libpq's asynchronous connect functions. Yes, I think using asynchronous connect functions seems a good idea. > Another issue is that the Postgres manual does not stipulate anything about cancellation of FDW processing. That's whyI said that the current FDW does not support cancellation in general. Of course, I think we can stipulate the abilityto cancel processing in the FDW interface. Yeah, it's the FDW developer responsibility to write the code to execute the remote SQL that is interruptible. +1 for adding that to the doc. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > I was thinking to have a GUC timeout parameter like statement_timeout. > The backend waits for the setting value when resolving foreign > transactions. Me too. > But this idea seems different. FDW can set its timeout > via a transaction timeout API, is that right? I'm not perfectly sure about how the TM( application server works) , but probably no. The TM has a configuration parameterfor transaction timeout, and the TM calls XAResource.setTransactionTimeout() with that or smaller value for theargument. > But even if FDW can set > the timeout using a transaction timeout API, the problem that client > libraries for some DBMS don't support interruptible functions still > remains. The user can set a short time to the timeout but it also > leads to unnecessary timeouts. Thoughts? Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client library doesn't support cancellation (e.g. doesn'trespond to Ctrl+C or provide a function that cancel processing in pgorogss), then the Postgres user just finds thathe can't cancel queries (just like we experienced with odbc_fdw.) Regards Takayuki Tsunakawa
At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > > What about temporary network failures? I think there are users who > > > > don't want to give up resolving foreign transactions failed due to a > > > > temporary network failure. Or even they might want to wait for > > > > transaction completion until they send a cancel request. If we want to > > > > call the commit routine only once and therefore want FDW to retry > > > > connecting the foreign server within the call, it means we require all > > > > FDW implementors to write a retry loop code that is interruptible and > > > > ensures not to raise an error, which increases difficulty. > > > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > > > an unreliable network environment, the user might end up requiring > > > > every transaction to check the status of foreign transactions of the > > > > previous distributed transaction before starts. If we allow to do > > > > retry, I guess we ease that somewhat. > > > > > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit thetransaction soon, too. > > > > I should missing something, though... > > > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > > much. I think remote-commits should be performed before local commit > > passes the point-of-no-return and the v26-0002 actually places > > AtEOXact_FdwXact() before the critical section. > > > > So you're thinking the following sequence? > > 1. Prepare all foreign transactions. > 2. Commit the all prepared foreign transactions. > 3. Commit the local transaction. > > Suppose we have the backend process call the commit routine, what if > one of FDW raises an ERROR during committing the foreign transaction > after committing other foreign transactions? The transaction will end > up with an abort but some foreign transactions are already committed. Ok, I understand what you are aiming. It is apparently out of the focus of the two-phase commit protocol. Each FDW server can try to keep the contract as far as its ability reaches, but in the end such kind of failure is inevitable. Even if we require FDW developers not to respond until a 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze even not in an extremely bad case. We have no other choices than shutting the server down (then the succeeding server start removes the garbage commits) or continueing working leaving some information in a system storage (or reverting the garbage commits). What we can do in that case is to provide a automated way to resolve the inconsistency. > Also, what if the backend process failed to commit the local > transaction? Since it already committed all foreign transactions it > cannot ensure the global atomicity in this case too. Therefore, I > think we should commit the distributed transactions in the following > sequence: Ditto. It's out of the range of 2pc. Using p2c for local transaction could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc could reduce the probability but can't elimite failure cases. > 1. Prepare all foreign transactions. > 2. Commit the local transaction. > 3. Commit the all prepared foreign transactions. > > But this is still not a perfect solution. If we have the backend 2pc is not a perfect solution in the first place. Attaching a similar phase to it cannot make it "perfect". > process call the commit routine and an error happens during executing > the commit routine of an FDW (i.g., at step 3) it's too late to report > an error to the client because we already committed the local > transaction. So the current solution is to have a background process > commit the foreign transactions so that the backend can just wait > without the possibility of errors. Whatever process tries to complete a transaction, the client must wait for the transaction to end and anyway that's just a freeze in the client's view, unless you intended to respond to local commit before all participant complete. I don't think most of client applications wouldn't wait for frozen server forever. We have the same issue at the time the client decided to give up the transacton, or the leader session is killed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > > > What about temporary network failures? I think there are users who > > > > > don't want to give up resolving foreign transactions failed due to a > > > > > temporary network failure. Or even they might want to wait for > > > > > transaction completion until they send a cancel request. If we want to > > > > > call the commit routine only once and therefore want FDW to retry > > > > > connecting the foreign server within the call, it means we require all > > > > > FDW implementors to write a retry loop code that is interruptible and > > > > > ensures not to raise an error, which increases difficulty. > > > > > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > > > > an unreliable network environment, the user might end up requiring > > > > > every transaction to check the status of foreign transactions of the > > > > > previous distributed transaction before starts. If we allow to do > > > > > retry, I guess we ease that somewhat. > > > > > > > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit thetransaction soon, too. > > > > > > I should missing something, though... > > > > > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > > > much. I think remote-commits should be performed before local commit > > > passes the point-of-no-return and the v26-0002 actually places > > > AtEOXact_FdwXact() before the critical section. > > > > > > > So you're thinking the following sequence? > > > > 1. Prepare all foreign transactions. > > 2. Commit the all prepared foreign transactions. > > 3. Commit the local transaction. > > > > Suppose we have the backend process call the commit routine, what if > > one of FDW raises an ERROR during committing the foreign transaction > > after committing other foreign transactions? The transaction will end > > up with an abort but some foreign transactions are already committed. > > Ok, I understand what you are aiming. > > It is apparently out of the focus of the two-phase commit > protocol. Each FDW server can try to keep the contract as far as its > ability reaches, but in the end such kind of failure is > inevitable. Even if we require FDW developers not to respond until a > 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze > even not in an extremely bad case. > > We have no other choices than shutting the server down (then the > succeeding server start removes the garbage commits) or continueing > working leaving some information in a system storage (or reverting the > garbage commits). What we can do in that case is to provide a > automated way to resolve the inconsistency. > > > Also, what if the backend process failed to commit the local > > transaction? Since it already committed all foreign transactions it > > cannot ensure the global atomicity in this case too. Therefore, I > > think we should commit the distributed transactions in the following > > sequence: > > Ditto. It's out of the range of 2pc. Using p2c for local transaction > could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc > could reduce the probability but can't elimite failure cases. IMO the problems I mentioned arise from the fact that the above sequence doesn't really follow the 2pc protocol in the first place. We can think of the fact that we commit the local transaction without preparation while preparing foreign transactions as that we’re using the 2pc with last resource transaction optimization (or last agent optimization)[1]. That is, we prepare all foreign transactions first and the local node is always the last resource to process. At this time, the outcome of the distributed transaction completely depends on the fate of the last resource (i.g., the local transaction). If it fails, the distributed transaction must be abort by rolling back prepared foreign transactions. OTOH, if it succeeds, all prepared foreign transaction must be committed. Therefore, we don’t need to prepare the last resource and can commit it. In this way, if we want to commit the local transaction without preparation, the local transaction must be committed at last. But since the above sequence doesn’t follow this protocol, we will have such problems. I think if we follow the 2pc properly, such basic failures don't happen. > > > 1. Prepare all foreign transactions. > > 2. Commit the local transaction. > > 3. Commit the all prepared foreign transactions. > > > > But this is still not a perfect solution. If we have the backend > > 2pc is not a perfect solution in the first place. Attaching a similar > phase to it cannot make it "perfect". > > > process call the commit routine and an error happens during executing > > the commit routine of an FDW (i.g., at step 3) it's too late to report > > an error to the client because we already committed the local > > transaction. So the current solution is to have a background process > > commit the foreign transactions so that the backend can just wait > > without the possibility of errors. > > Whatever process tries to complete a transaction, the client must wait > for the transaction to end and anyway that's just a freeze in the > client's view, unless you intended to respond to local commit before > all participant complete. Yes, but the point of using a separate process is that even if FDW code raises an error, the client wanting for transaction resolution doesn't get it and it's interruptible. [1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 13 Oct 2020 11:56:51 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > > > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > > > > What about temporary network failures? I think there are users who > > > > > > don't want to give up resolving foreign transactions failed due to a > > > > > > temporary network failure. Or even they might want to wait for > > > > > > transaction completion until they send a cancel request. If we want to > > > > > > call the commit routine only once and therefore want FDW to retry > > > > > > connecting the foreign server within the call, it means we require all > > > > > > FDW implementors to write a retry loop code that is interruptible and > > > > > > ensures not to raise an error, which increases difficulty. > > > > > > > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > > > > > an unreliable network environment, the user might end up requiring > > > > > > every transaction to check the status of foreign transactions of the > > > > > > previous distributed transaction before starts. If we allow to do > > > > > > retry, I guess we ease that somewhat. > > > > > > > > > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit thetransaction soon, too. > > > > > > > > I should missing something, though... > > > > > > > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > > > > much. I think remote-commits should be performed before local commit > > > > passes the point-of-no-return and the v26-0002 actually places > > > > AtEOXact_FdwXact() before the critical section. > > > > > > > > > > So you're thinking the following sequence? > > > > > > 1. Prepare all foreign transactions. > > > 2. Commit the all prepared foreign transactions. > > > 3. Commit the local transaction. > > > > > > Suppose we have the backend process call the commit routine, what if > > > one of FDW raises an ERROR during committing the foreign transaction > > > after committing other foreign transactions? The transaction will end > > > up with an abort but some foreign transactions are already committed. > > > > Ok, I understand what you are aiming. > > > > It is apparently out of the focus of the two-phase commit > > protocol. Each FDW server can try to keep the contract as far as its > > ability reaches, but in the end such kind of failure is > > inevitable. Even if we require FDW developers not to respond until a > > 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze > > even not in an extremely bad case. > > > > We have no other choices than shutting the server down (then the > > succeeding server start removes the garbage commits) or continueing > > working leaving some information in a system storage (or reverting the > > garbage commits). What we can do in that case is to provide a > > automated way to resolve the inconsistency. > > > > > Also, what if the backend process failed to commit the local > > > transaction? Since it already committed all foreign transactions it > > > cannot ensure the global atomicity in this case too. Therefore, I > > > think we should commit the distributed transactions in the following > > > sequence: > > > > Ditto. It's out of the range of 2pc. Using p2c for local transaction > > could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc > > could reduce the probability but can't elimite failure cases. > > IMO the problems I mentioned arise from the fact that the above > sequence doesn't really follow the 2pc protocol in the first place. > > We can think of the fact that we commit the local transaction without > preparation while preparing foreign transactions as that we’re using > the 2pc with last resource transaction optimization (or last agent > optimization)[1]. That is, we prepare all foreign transactions first > and the local node is always the last resource to process. At this > time, the outcome of the distributed transaction completely depends on > the fate of the last resource (i.g., the local transaction). If it > fails, the distributed transaction must be abort by rolling back > prepared foreign transactions. OTOH, if it succeeds, all prepared > foreign transaction must be committed. Therefore, we don’t need to > prepare the last resource and can commit it. In this way, if we want There are cases of commit-failure of a local transaction caused by too-many notifications or by serialization failure. > to commit the local transaction without preparation, the local > transaction must be committed at last. But since the above sequence > doesn’t follow this protocol, we will have such problems. I think if > we follow the 2pc properly, such basic failures don't happen. True. But I haven't suggested that sequence. > > > 1. Prepare all foreign transactions. > > > 2. Commit the local transaction. > > > 3. Commit the all prepared foreign transactions. > > > > > > But this is still not a perfect solution. If we have the backend > > > > 2pc is not a perfect solution in the first place. Attaching a similar > > phase to it cannot make it "perfect". > > > > > process call the commit routine and an error happens during executing > > > the commit routine of an FDW (i.g., at step 3) it's too late to report > > > an error to the client because we already committed the local > > > transaction. So the current solution is to have a background process > > > commit the foreign transactions so that the backend can just wait > > > without the possibility of errors. > > > > Whatever process tries to complete a transaction, the client must wait > > for the transaction to end and anyway that's just a freeze in the > > client's view, unless you intended to respond to local commit before > > all participant complete. > > Yes, but the point of using a separate process is that even if FDW > code raises an error, the client wanting for transaction resolution > doesn't get it and it's interruptible. > > [1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html I don't get the point. If FDW-commit is called on the same process, an error from FDW-commit outright leads to the failure of the current commit. Isn't "the client wanting for transaction resolution" the client of the leader process of the 2pc-commit in the same-process model? I should missing something, but postgres_fdw allows query cancelation at commit time. (But I think it is depends on timing whether the remote commit is completed or aborted.). Perhaps the feature was introduced after the project started? > commit ae9bfc5d65123aaa0d1cca9988037489760bdeae > Author: Robert Haas <rhaas@postgresql.org> > Date: Wed Jun 7 15:14:55 2017 -0400 > > postgres_fdw: Allow cancellation of transaction control commands. I thought that we are discussing on fdw-errors during the 2pc-commit phase. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 13 Oct 2020 11:56:51 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > > At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > > > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > > > > > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > > > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > > > > > What about temporary network failures? I think there are users who > > > > > > > don't want to give up resolving foreign transactions failed due to a > > > > > > > temporary network failure. Or even they might want to wait for > > > > > > > transaction completion until they send a cancel request. If we want to > > > > > > > call the commit routine only once and therefore want FDW to retry > > > > > > > connecting the foreign server within the call, it means we require all > > > > > > > FDW implementors to write a retry loop code that is interruptible and > > > > > > > ensures not to raise an error, which increases difficulty. > > > > > > > > > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > > > > > > an unreliable network environment, the user might end up requiring > > > > > > > every transaction to check the status of foreign transactions of the > > > > > > > previous distributed transaction before starts. If we allow to do > > > > > > > retry, I guess we ease that somewhat. > > > > > > > > > > > > OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit thetransaction soon, too. > > > > > > > > > > I should missing something, though... > > > > > > > > > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > > > > > much. I think remote-commits should be performed before local commit > > > > > passes the point-of-no-return and the v26-0002 actually places > > > > > AtEOXact_FdwXact() before the critical section. > > > > > > > > > > > > > So you're thinking the following sequence? > > > > > > > > 1. Prepare all foreign transactions. > > > > 2. Commit the all prepared foreign transactions. > > > > 3. Commit the local transaction. > > > > > > > > Suppose we have the backend process call the commit routine, what if > > > > one of FDW raises an ERROR during committing the foreign transaction > > > > after committing other foreign transactions? The transaction will end > > > > up with an abort but some foreign transactions are already committed. > > > > > > Ok, I understand what you are aiming. > > > > > > It is apparently out of the focus of the two-phase commit > > > protocol. Each FDW server can try to keep the contract as far as its > > > ability reaches, but in the end such kind of failure is > > > inevitable. Even if we require FDW developers not to respond until a > > > 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze > > > even not in an extremely bad case. > > > > > > We have no other choices than shutting the server down (then the > > > succeeding server start removes the garbage commits) or continueing > > > working leaving some information in a system storage (or reverting the > > > garbage commits). What we can do in that case is to provide a > > > automated way to resolve the inconsistency. > > > > > > > Also, what if the backend process failed to commit the local > > > > transaction? Since it already committed all foreign transactions it > > > > cannot ensure the global atomicity in this case too. Therefore, I > > > > think we should commit the distributed transactions in the following > > > > sequence: > > > > > > Ditto. It's out of the range of 2pc. Using p2c for local transaction > > > could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc > > > could reduce the probability but can't elimite failure cases. > > > > IMO the problems I mentioned arise from the fact that the above > > sequence doesn't really follow the 2pc protocol in the first place. > > > > We can think of the fact that we commit the local transaction without > > preparation while preparing foreign transactions as that we’re using > > the 2pc with last resource transaction optimization (or last agent > > optimization)[1]. That is, we prepare all foreign transactions first > > and the local node is always the last resource to process. At this > > time, the outcome of the distributed transaction completely depends on > > the fate of the last resource (i.g., the local transaction). If it > > fails, the distributed transaction must be abort by rolling back > > prepared foreign transactions. OTOH, if it succeeds, all prepared > > foreign transaction must be committed. Therefore, we don’t need to > > prepare the last resource and can commit it. In this way, if we want > > There are cases of commit-failure of a local transaction caused by > too-many notifications or by serialization failure. Yes, even if that happens we are still able to rollback all foreign transactions. > > > to commit the local transaction without preparation, the local > > transaction must be committed at last. But since the above sequence > > doesn’t follow this protocol, we will have such problems. I think if > > we follow the 2pc properly, such basic failures don't happen. > > True. But I haven't suggested that sequence. Okay, I might have missed your point. Could you elaborate on the idea you mentioned before, "I think remote-commits should be performed before local commit passes the point-of-no-return"? > > > > > 1. Prepare all foreign transactions. > > > > 2. Commit the local transaction. > > > > 3. Commit the all prepared foreign transactions. > > > > > > > > But this is still not a perfect solution. If we have the backend > > > > > > 2pc is not a perfect solution in the first place. Attaching a similar > > > phase to it cannot make it "perfect". > > > > > > > process call the commit routine and an error happens during executing > > > > the commit routine of an FDW (i.g., at step 3) it's too late to report > > > > an error to the client because we already committed the local > > > > transaction. So the current solution is to have a background process > > > > commit the foreign transactions so that the backend can just wait > > > > without the possibility of errors. > > > > > > Whatever process tries to complete a transaction, the client must wait > > > for the transaction to end and anyway that's just a freeze in the > > > client's view, unless you intended to respond to local commit before > > > all participant complete. > > > > Yes, but the point of using a separate process is that even if FDW > > code raises an error, the client wanting for transaction resolution > > doesn't get it and it's interruptible. > > > > [1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html > > I don't get the point. If FDW-commit is called on the same process, an > error from FDW-commit outright leads to the failure of the current > commit. Isn't "the client wanting for transaction resolution" the > client of the leader process of the 2pc-commit in the same-process > model? > > I should missing something, but postgres_fdw allows query cancelation > at commit time. (But I think it is depends on timing whether the > remote commit is completed or aborted.). Perhaps the feature was > introduced after the project started? > > > commit ae9bfc5d65123aaa0d1cca9988037489760bdeae > > Author: Robert Haas <rhaas@postgresql.org> > > Date: Wed Jun 7 15:14:55 2017 -0400 > > > > postgres_fdw: Allow cancellation of transaction control commands. > > I thought that we are discussing on fdw-errors during the 2pc-commit > phase. > Yes, I'm also discussing on fdw-errors during the 2pc-commit phase that happens after committing the local transaction. Even if FDW-commit raises an error due to the user's cancel request or whatever reason during committing the prepared foreign transactions, it's too late. The client will get an error like "ERROR: canceling statement due to user request" and would think the transaction is aborted but it's not true, the local transaction is already committed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failure ofa local transaction caused by > > too-many notifications or by serialization failure. > > Yes, even if that happens we are still able to rollback all foreign > transactions. Mmm. I'm confused. If this is about 2pc-commit-request(or prepare) phase, we can rollback the remote transactions. But I think we're focusing 2pc-commit phase. remote transaction that has already 2pc-committed, they can be no longer rollback'ed. > > > to commit the local transaction without preparation, the local > > > transaction must be committed at last. But since the above sequence > > > doesn’t follow this protocol, we will have such problems. I think if > > > we follow the 2pc properly, such basic failures don't happen. > > > > True. But I haven't suggested that sequence. > > Okay, I might have missed your point. Could you elaborate on the idea > you mentioned before, "I think remote-commits should be performed > before local commit passes the point-of-no-return"? It is simply the condition that we can ERROR-out from CommitTransaction. I thought that when you say like "we cannot ERROR-out" you meant "since that is raised to FATAL", but it seems to me that both of you are looking another aspect. If the aspect is "what to do complete the all-prepared p2c transaction at all costs", I'd say "there's a fundamental limitaion". Although I'm not sure what you mean exactly by prohibiting errors from fdw routines , if that meant "the API can fail, but must not raise an exception", that policy is enforced by setting a critical section. However, if it were "the API mustn't fail", that cannot be realized, I believe. > > I thought that we are discussing on fdw-errors during the 2pc-commit > > phase. > > > > Yes, I'm also discussing on fdw-errors during the 2pc-commit phase > that happens after committing the local transaction. > > Even if FDW-commit raises an error due to the user's cancel request or > whatever reason during committing the prepared foreign transactions, > it's too late. The client will get an error like "ERROR: canceling > statement due to user request" and would think the transaction is > aborted but it's not true, the local transaction is already committed. By the way I found that I misread the patch. in v26-0002, AtEOXact_FdwXact() is actually called after the point-of-no-return. What is the reason for the place? We can error-out before changing the state to TRANS_COMMIT. And if any of the remotes ended with 2pc-commit (not prepare phase) failure, consistency of the commit is no longer guaranteed so we have no choice other than shutting down the server, or continuing running allowing the incosistency. What do we want in that case? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, 14 Oct 2020 at 13:19, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failure ofa local transaction caused by > > > too-many notifications or by serialization failure. > > > > Yes, even if that happens we are still able to rollback all foreign > > transactions. > > Mmm. I'm confused. If this is about 2pc-commit-request(or prepare) > phase, we can rollback the remote transactions. But I think we're > focusing 2pc-commit phase. remote transaction that has already > 2pc-committed, they can be no longer rollback'ed. Did you mention a failure of local commit, right? With the current approach, we prepare all foreign transactions first and then commit the local transaction. After committing the local transaction we commit the prepared foreign transactions. So suppose a serialization failure happens during committing the local transaction, we still are able to roll back foreign transactions. The check of serialization failure of the foreign transactions has already been done at the prepare phase. > > > > > to commit the local transaction without preparation, the local > > > > transaction must be committed at last. But since the above sequence > > > > doesn’t follow this protocol, we will have such problems. I think if > > > > we follow the 2pc properly, such basic failures don't happen. > > > > > > True. But I haven't suggested that sequence. > > > > Okay, I might have missed your point. Could you elaborate on the idea > > you mentioned before, "I think remote-commits should be performed > > before local commit passes the point-of-no-return"? > > It is simply the condition that we can ERROR-out from > CommitTransaction. I thought that when you say like "we cannot > ERROR-out" you meant "since that is raised to FATAL", but it seems to > me that both of you are looking another aspect. > > If the aspect is "what to do complete the all-prepared p2c transaction > at all costs", I'd say "there's a fundamental limitaion". Although > I'm not sure what you mean exactly by prohibiting errors from fdw > routines , if that meant "the API can fail, but must not raise an > exception", that policy is enforced by setting a critical > section. However, if it were "the API mustn't fail", that cannot be > realized, I believe. When I say "we cannot error-out" it means it's too late. What I'd like to prevent is that the backend process returns an error to the client after committing the local transaction. Because it will mislead the user. > > > > I thought that we are discussing on fdw-errors during the 2pc-commit > > > phase. > > > > > > > Yes, I'm also discussing on fdw-errors during the 2pc-commit phase > > that happens after committing the local transaction. > > > > Even if FDW-commit raises an error due to the user's cancel request or > > whatever reason during committing the prepared foreign transactions, > > it's too late. The client will get an error like "ERROR: canceling > > statement due to user request" and would think the transaction is > > aborted but it's not true, the local transaction is already committed. > > By the way I found that I misread the patch. in v26-0002, > AtEOXact_FdwXact() is actually called after the > point-of-no-return. What is the reason for the place? We can > error-out before changing the state to TRANS_COMMIT. > Are you referring to v26-0002-Introduce-transaction-manager-for-foreign-transa.patch? If so, the patch doesn't implement 2pc. I think we can commit the foreign transaction before changing the state to TRANS_COMMIT but in any case it cannot ensure atomic commit. It just adds both commit and rollback transaction APIs so that FDW can control transactions by using these API, not by XactCallback. > And if any of the remotes ended with 2pc-commit (not prepare phase) > failure, consistency of the commit is no longer guaranteed so we have > no choice other than shutting down the server, or continuing running > allowing the incosistency. What do we want in that case? I think it depends on the failure. If 2pc-commit failed due to network connection failure or the server crash, we would need to try again later. We normally expect the prepared transaction is able to be committed with no issue but in case it could not, I think we can leave the choice for the user: resolve it manually after recovered, give up etc. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
(v26 fails on the current master) At Wed, 14 Oct 2020 13:52:49 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Wed, 14 Oct 2020 at 13:19, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > > On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failureof a local transaction caused by > > > > too-many notifications or by serialization failure. > > > > > > Yes, even if that happens we are still able to rollback all foreign > > > transactions. > > > > Mmm. I'm confused. If this is about 2pc-commit-request(or prepare) > > phase, we can rollback the remote transactions. But I think we're > > focusing 2pc-commit phase. remote transaction that has already > > 2pc-committed, they can be no longer rollback'ed. > > Did you mention a failure of local commit, right? With the current > approach, we prepare all foreign transactions first and then commit > the local transaction. After committing the local transaction we > commit the prepared foreign transactions. So suppose a serialization > failure happens during committing the local transaction, we still are > able to roll back foreign transactions. The check of serialization > failure of the foreign transactions has already been done at the > prepare phase. Understood. > > > > > to commit the local transaction without preparation, the local > > > > > transaction must be committed at last. But since the above sequence > > > > > doesn’t follow this protocol, we will have such problems. I think if > > > > > we follow the 2pc properly, such basic failures don't happen. > > > > > > > > True. But I haven't suggested that sequence. > > > > > > Okay, I might have missed your point. Could you elaborate on the idea > > > you mentioned before, "I think remote-commits should be performed > > > before local commit passes the point-of-no-return"? > > > > It is simply the condition that we can ERROR-out from > > CommitTransaction. I thought that when you say like "we cannot > > ERROR-out" you meant "since that is raised to FATAL", but it seems to > > me that both of you are looking another aspect. > > > > If the aspect is "what to do complete the all-prepared p2c transaction > > at all costs", I'd say "there's a fundamental limitaion". Although > > I'm not sure what you mean exactly by prohibiting errors from fdw > > routines , if that meant "the API can fail, but must not raise an > > exception", that policy is enforced by setting a critical > > section. However, if it were "the API mustn't fail", that cannot be > > realized, I believe. > > When I say "we cannot error-out" it means it's too late. What I'd like > to prevent is that the backend process returns an error to the client > after committing the local transaction. Because it will mislead the > user. Anyway we don't do anything that can fail after changing state to TRANS_COMMIT. So we cannot run fdw-2pc-commit after that since it cannot be failure-proof. if we do them before the point we cannot ERROR-out after local commit completes. > > > > I thought that we are discussing on fdw-errors during the 2pc-commit > > > > phase. > > > > > > > > > > Yes, I'm also discussing on fdw-errors during the 2pc-commit phase > > > that happens after committing the local transaction. > > > > > > Even if FDW-commit raises an error due to the user's cancel request or > > > whatever reason during committing the prepared foreign transactions, > > > it's too late. The client will get an error like "ERROR: canceling > > > statement due to user request" and would think the transaction is > > > aborted but it's not true, the local transaction is already committed. > > > > By the way I found that I misread the patch. in v26-0002, > > AtEOXact_FdwXact() is actually called after the > > point-of-no-return. What is the reason for the place? We can > > error-out before changing the state to TRANS_COMMIT. > > > > Are you referring to > v26-0002-Introduce-transaction-manager-for-foreign-transa.patch? If > so, the patch doesn't implement 2pc. I think we can commit the foreign Ah, I guessed that the trigger points of PREPARE and COMMIT that are inserted by 0002 won't be moved by the following patches. So the direction of my discussion doesn't change by the fact. > transaction before changing the state to TRANS_COMMIT but in any case > it cannot ensure atomic commit. It just adds both commit and rollback I guess that you have the local-commit-failure case in mind? Couldn't we internally prepare the local transaction then following the correct p2c protocol involving the local transaction? (I'm looking v26-0008) > transaction APIs so that FDW can control transactions by using these > API, not by XactCallback. > > And if any of the remotes ended with 2pc-commit (not prepare phase) > > failure, consistency of the commit is no longer guaranteed so we have > > no choice other than shutting down the server, or continuing running > > allowing the incosistency. What do we want in that case? > > I think it depends on the failure. If 2pc-commit failed due to network > connection failure or the server crash, we would need to try again > later. We normally expect the prepared transaction is able to be > committed with no issue but in case it could not, I think we can leave > the choice for the user: resolve it manually after recovered, give up > etc. Understood. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, 14 Oct 2020 at 17:11, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > (v26 fails on the current master) Thanks, I'll update the patch. > > At Wed, 14 Oct 2020 13:52:49 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > On Wed, 14 Oct 2020 at 13:19, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > > At Wed, 14 Oct 2020 12:09:34 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > > > On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrot> > There are cases of commit-failureof a local transaction caused by > > > > > too-many notifications or by serialization failure. > > > > > > > > Yes, even if that happens we are still able to rollback all foreign > > > > transactions. > > > > > > Mmm. I'm confused. If this is about 2pc-commit-request(or prepare) > > > phase, we can rollback the remote transactions. But I think we're > > > focusing 2pc-commit phase. remote transaction that has already > > > 2pc-committed, they can be no longer rollback'ed. > > > > Did you mention a failure of local commit, right? With the current > > approach, we prepare all foreign transactions first and then commit > > the local transaction. After committing the local transaction we > > commit the prepared foreign transactions. So suppose a serialization > > failure happens during committing the local transaction, we still are > > able to roll back foreign transactions. The check of serialization > > failure of the foreign transactions has already been done at the > > prepare phase. > > Understood. > > > > > > > to commit the local transaction without preparation, the local > > > > > > transaction must be committed at last. But since the above sequence > > > > > > doesn’t follow this protocol, we will have such problems. I think if > > > > > > we follow the 2pc properly, such basic failures don't happen. > > > > > > > > > > True. But I haven't suggested that sequence. > > > > > > > > Okay, I might have missed your point. Could you elaborate on the idea > > > > you mentioned before, "I think remote-commits should be performed > > > > before local commit passes the point-of-no-return"? > > > > > > It is simply the condition that we can ERROR-out from > > > CommitTransaction. I thought that when you say like "we cannot > > > ERROR-out" you meant "since that is raised to FATAL", but it seems to > > > me that both of you are looking another aspect. > > > > > > If the aspect is "what to do complete the all-prepared p2c transaction > > > at all costs", I'd say "there's a fundamental limitaion". Although > > > I'm not sure what you mean exactly by prohibiting errors from fdw > > > routines , if that meant "the API can fail, but must not raise an > > > exception", that policy is enforced by setting a critical > > > section. However, if it were "the API mustn't fail", that cannot be > > > realized, I believe. > > > > When I say "we cannot error-out" it means it's too late. What I'd like > > to prevent is that the backend process returns an error to the client > > after committing the local transaction. Because it will mislead the > > user. > > Anyway we don't do anything that can fail after changing state to > TRANS_COMMIT. So we cannot run fdw-2pc-commit after that since it > cannot be failure-proof. if we do them before the point we cannot > ERROR-out after local commit completes. > > > > > > I thought that we are discussing on fdw-errors during the 2pc-commit > > > > > phase. > > > > > > > > > > > > > Yes, I'm also discussing on fdw-errors during the 2pc-commit phase > > > > that happens after committing the local transaction. > > > > > > > > Even if FDW-commit raises an error due to the user's cancel request or > > > > whatever reason during committing the prepared foreign transactions, > > > > it's too late. The client will get an error like "ERROR: canceling > > > > statement due to user request" and would think the transaction is > > > > aborted but it's not true, the local transaction is already committed. > > > > > > By the way I found that I misread the patch. in v26-0002, > > > AtEOXact_FdwXact() is actually called after the > > > point-of-no-return. What is the reason for the place? We can > > > error-out before changing the state to TRANS_COMMIT. > > > > > > > Are you referring to > > v26-0002-Introduce-transaction-manager-for-foreign-transa.patch? If > > so, the patch doesn't implement 2pc. I think we can commit the foreign > > Ah, I guessed that the trigger points of PREPARE and COMMIT that are > inserted by 0002 won't be moved by the following patches. So the > direction of my discussion doesn't change by the fact. > > > transaction before changing the state to TRANS_COMMIT but in any case > > it cannot ensure atomic commit. It just adds both commit and rollback > > I guess that you have the local-commit-failure case in mind? Couldn't > we internally prepare the local transaction then following the correct > p2c protocol involving the local transaction? (I'm looking v26-0008) Yes, we could. But as I mentioned before if we always commit the local transaction last, we don't necessarily need to prepare the local transaction. If we prepared the local transaction, I think we would be able to allow FDW's commit routine to raise an error even during 2pc-commit, but only for the first time. Once we committed any one of the involved transactions including the local transaction and foreign transactions, the commit routine must not raise an error during 2pc-commit for the same reason; it's too late. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 12 Oct 2020 at 17:19, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > I was thinking to have a GUC timeout parameter like statement_timeout. > > The backend waits for the setting value when resolving foreign > > transactions. > > Me too. > > > > But this idea seems different. FDW can set its timeout > > via a transaction timeout API, is that right? > > I'm not perfectly sure about how the TM( application server works) , but probably no. The TM has a configuration parameterfor transaction timeout, and the TM calls XAResource.setTransactionTimeout() with that or smaller value for theargument. > > > > But even if FDW can set > > the timeout using a transaction timeout API, the problem that client > > libraries for some DBMS don't support interruptible functions still > > remains. The user can set a short time to the timeout but it also > > leads to unnecessary timeouts. Thoughts? > > Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client library doesn't support cancellation (e.g.doesn't respond to Ctrl+C or provide a function that cancel processing in pgorogss), then the Postgres user just findsthat he can't cancel queries (just like we experienced with odbc_fdw.) So the idea of using another process to commit prepared foreign transactions seems better also in terms of this point. Even if a DBMS client library doesn’t support query cancellation, the transaction commit can return the control to the client when the user press ctl-c as the backend process is just sleeping using WaitLatch() (it’s similar to synchronous replication) Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client > library doesn't support cancellation (e.g. doesn't respond to Ctrl+C or provide a > function that cancel processing in pgorogss), then the Postgres user just finds > that he can't cancel queries (just like we experienced with odbc_fdw.) > > So the idea of using another process to commit prepared foreign > transactions seems better also in terms of this point. Even if a DBMS > client library doesn’t support query cancellation, the transaction > commit can return the control to the client when the user press ctl-c > as the backend process is just sleeping using WaitLatch() (it’s > similar to synchronous replication) I have to say that's nitpicking. I believe almost nobody does, or cares about, canceling commits, at the expense of impracticalperformance due to non-parallelism, serial execution in each resolver, and context switches. Also, FDW is not cancellable in general. It makes no sense to care only about commit. (Fortunately, postgres_fdw is cancellable in any way.) Regards Takayuki Tsunakawa
On Mon, 19 Oct 2020 at 14:39, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > Unfortunately, I'm afraid we can do nothing about it. If the DBMS's client > > library doesn't support cancellation (e.g. doesn't respond to Ctrl+C or provide a > > function that cancel processing in pgorogss), then the Postgres user just finds > > that he can't cancel queries (just like we experienced with odbc_fdw.) > > > > So the idea of using another process to commit prepared foreign > > transactions seems better also in terms of this point. Even if a DBMS > > client library doesn’t support query cancellation, the transaction > > commit can return the control to the client when the user press ctl-c > > as the backend process is just sleeping using WaitLatch() (it’s > > similar to synchronous replication) > > I have to say that's nitpicking. I believe almost nobody does, or cares about, canceling commits, Really? I don’t think so. I think It’s terrible that the query gets stuck for a long time and we cannot do anything than waiting until a crashed foreign server is restored. We can have a timeout but I don’t think every user wants to use the timeout or the user might want to set a timeout to a relatively large value by the concern of misdetection. I guess synchronous replication had similar concerns so it has a similar mechanism. > at the expense of impractical performance due to non-parallelism, serial execution in each resolver, and context switches. I have never said that we’re going to live with serial execution in each resolver and non-parallelism. I've been repeatedly saying that it would be possible that we improve this feature over the releases to get a good performance even if we use a separate background process. Using a background process to commit is the only option to support interruptible foreign transaction resolution for now whereas there are some ideas for performance improvements. I think we don't have enough discussion on how we can improve the idea of using a separate process and how much performance will improve and how possible it is. It's not late to reject that idea after the discussion. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > On Mon, 19 Oct 2020 at 14:39, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > I have to say that's nitpicking. I believe almost nobody does, or cares about, > canceling commits, > > Really? I don’t think so. I think It’s terrible that the query gets > stuck for a long time and we cannot do anything than waiting until a > crashed foreign server is restored. We can have a timeout but I don’t > think every user wants to use the timeout or the user might want to > set a timeout to a relatively large value by the concern of > misdetection. I guess synchronous replication had similar concerns so > it has a similar mechanism. Really. I thought we were talking about canceling commits with Ctrl + C as you referred, right? I couldn't imagine, inproduction environments where many sessions are running transactions concurrently, how the user (DBA) wants and can canceleach stuck session during commit one by one with Ctrl + C by hand. I haven't seen such a feature exist or been consideredcrucial that enables the user (administrator) to cancel running processing with Ctrl + C from the side. Rather, setting appropriate timeout is the current sound system design , isn't it? It spans many areas - TCP/IP, heartbeatsof load balancers and clustering software, request and response to application servers and database servers, etc. I sympathize with your concern that users may not be confident about their settings. But that's the current practiceunfortunately. > > at the expense of impractical performance due to non-parallelism, serial > execution in each resolver, and context switches. > > I have never said that we’re going to live with serial execution in > each resolver and non-parallelism. I've been repeatedly saying that it > would be possible that we improve this feature over the releases to > get a good performance even if we use a separate background process. IIRC, I haven't seen a reasonable design based on a separate process that handles commits during normal operation. WhatI heard is to launch as many resolvers as the client sessions, but that consumes too much resource as I said. > Using a background process to commit is the only option to support > interruptible foreign transaction resolution for now whereas there are > some ideas for performance improvements. A practical solution is the timeout for the FDW in general, as in application servers. postgres_fdw can benefit from Ctrl+ C as well. > I think we don't have enough > discussion on how we can improve the idea of using a separate process > and how much performance will improve and how possible it is. It's not > late to reject that idea after the discussion. Yeah, I agree that discussion is not enough yet. In other words, the design has not reached the quality for the first releaseyet. We should try to avoid using "Hopefully, we should be able to improve in the next release (I haven't seen thedesign in light, though)" as an excuse for getting a half-baked patch committed that does not offer practical quality. I saw many developers' patches were rejected because of insufficient performance, e.g. even 0.8% performance impact. (I'm one of those developers, actually...) I have been feeling this community is rigorous about performance. Wehave to be sincere. Regards Takayuki Tsunakawa
On Mon, Oct 19, 2020 at 2:37 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > Really. I thought we were talking about canceling commits with Ctrl + C as you referred, right? I couldn't imagine, inproduction environments where many sessions are running transactions concurrently, how the user (DBA) wants and can canceleach stuck session during commit one by one with Ctrl + C by hand. I haven't seen such a feature exist or been consideredcrucial that enables the user (administrator) to cancel running processing with Ctrl + C from the side. Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel running query from any backend or terminate a backend. For either to work the backend needs to be interruptible. IIRC, Robert had made an effort to make postgres_fdw interruptible few years back. -- Best Wishes, Ashutosh Bapat
On Mon, 19 Oct 2020 at 20:37, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 2:37 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > Really. I thought we were talking about canceling commits with Ctrl + C as you referred, right? I couldn't imagine,in production environments where many sessions are running transactions concurrently, how the user (DBA) wants andcan cancel each stuck session during commit one by one with Ctrl + C by hand. I haven't seen such a feature exist orbeen considered crucial that enables the user (administrator) to cancel running processing with Ctrl + C from the side. > > Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel > running query from any backend or terminate a backend. For either to > work the backend needs to be interruptible. IIRC, Robert had made an > effort to make postgres_fdw interruptible few years back. Right. Also, We discussed having a timeout on the core side but I'm concerned that the timeout also might not work if it's not interruptible. While using the timeout is a good idea, I have to think there is also a certain number of the user who doesn't use this timeout as there is a certain number of the users who doesn't use timeouts such as statement_timeout. We must not ignore such users and It might not be advisable to design a feature that ignores such users. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Transactions involving multiple postgres foreign servers, take 2
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> > Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel > running query from any backend or terminate a backend. For either to > work the backend needs to be interruptible. IIRC, Robert had made an > effort to make postgres_fdw interruptible few years back. Yeah, I know those functions. Sawada-san was talking about Ctrl + C, so I responded accordingly. Also, how can the DBA find sessions to run those functions against? Can he tell if a session is connected to or runningSQL to a given foreign server? Can he terminate or cancel all session with one SQL command that are stuck in accessinga particular foreign server? Furthermore, FDW is not cancellable in general. So, I don't see a point in trying hard to make only commit be cancelable. Regards Takayuki Tsunakawa
At Tue, 20 Oct 2020 02:44:09 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> > > Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel > > running query from any backend or terminate a backend. For either to > > work the backend needs to be interruptible. IIRC, Robert had made an > > effort to make postgres_fdw interruptible few years back. > > Yeah, I know those functions. Sawada-san was talking about Ctrl + C, so I responded accordingly. > > Also, how can the DBA find sessions to run those functions against? Can he tell if a session is connected to or runningSQL to a given foreign server? Can he terminate or cancel all session with one SQL command that are stuck in accessinga particular foreign server? I don't think the inability to cancel all session at once cannot be a reason not to not to allow operators to cancel a stuck session. > Furthermore, FDW is not cancellable in general. So, I don't see a point in trying hard to make only commit be cancelable. I think that it is quite important that operators can cancel any process that has been stuck for a long time. Furthermore, postgres_fdw is more likely to be stuck since network is involved so the usefulness of that feature would be higher. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Transactions involving multiple postgres foreign servers, take 2
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > I don't think the inability to cancel all session at once cannot be a > reason not to not to allow operators to cancel a stuck session. Yeah, I didn't mean to discount the ability to cancel queries. I just want to confirm how the user can use the cancellationin practice. I didn't see how the user can use the cancellation in the FDW framework, so I asked about it. We have to think about the user's context if we regard canceling commits as important. > > Furthermore, FDW is not cancellable in general. So, I don't see a point in > trying hard to make only commit be cancelable. > > I think that it is quite important that operators can cancel any > process that has been stuck for a long time. Furthermore, postgres_fdw > is more likely to be stuck since network is involved so the usefulness > of that feature would be higher. But lower than practical performance during normal operation. BTW, speaking of network, how can postgres_fdw respond quickly to cancel request when libpq is waiting for a reply from adown foreign server? Can the user continue to use that session after cancellation? Regards Takayuki Tsunakawa
On Tue, 20 Oct 2020 at 13:23, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > I don't think the inability to cancel all session at once cannot be a > > reason not to not to allow operators to cancel a stuck session. > > Yeah, I didn't mean to discount the ability to cancel queries. I just want to confirm how the user can use the cancellationin practice. I didn't see how the user can use the cancellation in the FDW framework, so I asked about it. We have to think about the user's context if we regard canceling commits as important. > I think it doesn't matter whether in FDW framework or not. The user normally doesn't care which backend processes connecting to foreign servers. They will attempt to cancel the query like always if they realized that a backend gets stuck. There are surely plenty of users who use query cancellation. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Tue, 20 Oct 2020 at 13:23, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > > I don't think the inability to cancel all session at once cannot be a > > > reason not to not to allow operators to cancel a stuck session. > > > > Yeah, I didn't mean to discount the ability to cancel queries. I just want to confirm how the user can use the cancellationin practice. I didn't see how the user can use the cancellation in the FDW framework, so I asked about it. We have to think about the user's context if we regard canceling commits as important. > > > > I think it doesn't matter whether in FDW framework or not. The user > normally doesn't care which backend processes connecting to foreign > servers. They will attempt to cancel the query like always if they > realized that a backend gets stuck. There are surely plenty of users > who use query cancellation. The most serious impact from inability of canceling a query on a certain session is that server-restart is required to end such a session. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 20 Oct 2020 04:23:12 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > > Furthermore, FDW is not cancellable in general. So, I don't see a point in > > trying hard to make only commit be cancelable. > > > > I think that it is quite important that operators can cancel any > > process that has been stuck for a long time. Furthermore, postgres_fdw > > is more likely to be stuck since network is involved so the usefulness > > of that feature would be higher. > > But lower than practical performance during normal operation. > > BTW, speaking of network, how can postgres_fdw respond quickly to cancel request when libpq is waiting for a reply froma down foreign server? Can the user continue to use that session after cancellation? It seems to respond to a statement-cancel signal immediately while waiting for a coming byte. However, seems to wait forever while waiting a space in send-buffer. (Is that mean the session will be stuck if it sends a large chunk of bytes while the network is down?) After receiving a signal, it closes the problem connection. So the local session is usable after that but the fiailed remote sessions are closed and created another one at the next use. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Transactions involving multiple postgres foreign servers, take 2
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote in > > I think it doesn't matter whether in FDW framework or not. The user > > normally doesn't care which backend processes connecting to foreign > > servers. They will attempt to cancel the query like always if they > > realized that a backend gets stuck. There are surely plenty of users > > who use query cancellation. > > The most serious impact from inability of canceling a query on a > certain session is that server-restart is required to end such a > session. OK, as I may be repeating, I didn't deny the need for cancellation. Let''s organize the argument. * FDW in general My understanding is that the FDW feature does not stipulate anything about cancellation. In fact, odbc_fdw was uncancelable. What do we do about this? * postgres_fdw Fortunately, it is (should be?) cancelable whatever method we choose for 2PC. So no problem. But is it really cancellable now? What if the libpq call is waiting for response when the foreign server or network is down? "Inability to cancel requires database server restart" feels a bit exaggerating, as libpq has tcp_keepalive* and tcp_user_timeoutconnection parameters, and even without setting them, TCP timeout works. Regards Takayuki Tsunakawa
RE: Transactions involving multiple postgres foreign servers, take 2
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > It seems to respond to a statement-cancel signal immediately while > waiting for a coming byte. However, seems to wait forever while > waiting a space in send-buffer. (Is that mean the session will be > stuck if it sends a large chunk of bytes while the network is down?) What part makes you worried about that? libpq's send processing? I've just examined pgfdw_cancel_query(), too. As below, it uses a hidden 30 second timeout. After all, postgres_fdw alsorelies on timeout already. /* * If it takes too long to cancel the query and discard the result, assume * the connection is dead. */ endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), 30000); > After receiving a signal, it closes the problem connection. So the > local session is usable after that but the fiailed remote sessions are > closed and created another one at the next use. I couldn't see that the problematic connection is closed when the cancellation fails... Am I looking at a wrong place? /* * If connection is already unsalvageable, don't touch it * further. */ if (entry->changing_xact_state) break; Regards Takayuki Tsunakawa
On Tue, 20 Oct 2020 at 16:54, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote in > > > I think it doesn't matter whether in FDW framework or not. The user > > > normally doesn't care which backend processes connecting to foreign > > > servers. They will attempt to cancel the query like always if they > > > realized that a backend gets stuck. There are surely plenty of users > > > who use query cancellation. > > > > The most serious impact from inability of canceling a query on a > > certain session is that server-restart is required to end such a > > session. > > OK, as I may be repeating, I didn't deny the need for cancellation. So what's your opinion? > Let''s organize the argument. > > * FDW in general > My understanding is that the FDW feature does not stipulate anything about cancellation. In fact, odbc_fdw was uncancelable. What do we do about this? > > * postgres_fdw > Fortunately, it is (should be?) cancelable whatever method we choose for 2PC. So no problem. > But is it really cancellable now? What if the libpq call is waiting for response when the foreign server or network isdown? I don’t think we need to stipulate the query cancellation. Anyway I guess the facts neither that we don’t stipulate anything about query cancellation now nor that postgres_fdw might not be cancellable in some situations now are not a reason for not supporting query cancellation. If it's a desirable behavior and users want it, we need to put an effort to support it as much as possible like we’ve done in postgres_fdw. Some FDWs unfortunately might not be able to support it only by their functionality but it would be good if we can achieve that by combination of PostgreSQL and FDW plugins. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 20 Oct 2020 at 17:56, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > It seems to respond to a statement-cancel signal immediately while > > waiting for a coming byte. However, seems to wait forever while > > waiting a space in send-buffer. (Is that mean the session will be > > stuck if it sends a large chunk of bytes while the network is down?) > > What part makes you worried about that? libpq's send processing? > > I've just examined pgfdw_cancel_query(), too. As below, it uses a hidden 30 second timeout. After all, postgres_fdw alsorelies on timeout already. It uses the timeout but it's also cancellable before the timeout. See we call CHECK_FOR_INTERRUPTS() in pgfdw_get_cleanup_result(). > > > > After receiving a signal, it closes the problem connection. So the > > local session is usable after that but the fiailed remote sessions are > > closed and created another one at the next use. > > I couldn't see that the problematic connection is closed when the cancellation fails... Am I looking at a wrong place? > > /* > * If connection is already unsalvageable, don't touch it > * further. > */ > if (entry->changing_xact_state) > break; > I guess Horiguchi-san refereed the following code in pgfdw_xact_callback(): /* * If the connection isn't in a good idle state, discard it to * recover. Next GetConnection will open a new connection. */ if (PQstatus(entry->conn) != CONNECTION_OK || PQtransactionStatus(entry->conn) != PQTRANS_IDLE || entry->changing_xact_state) { elog(DEBUG3, "discarding connection %p", entry->conn); disconnect_pg_server(entry); } Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 20 Oct 2020 21:22:31 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Tue, 20 Oct 2020 at 17:56, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > > It seems to respond to a statement-cancel signal immediately while > > > waiting for a coming byte. However, seems to wait forever while > > > waiting a space in send-buffer. (Is that mean the session will be > > > stuck if it sends a large chunk of bytes while the network is down?) > > > > What part makes you worried about that? libpq's send processing? > > > > I've just examined pgfdw_cancel_query(), too. As below, it uses a hidden 30 second timeout. After all, postgres_fdwalso relies on timeout already. > > It uses the timeout but it's also cancellable before the timeout. See > we call CHECK_FOR_INTERRUPTS() in pgfdw_get_cleanup_result(). Yes. And as Sawada-san mentioned it's not a matter if a specific FDW module accepts cancellation or not. It's sufficient that we have one example. Other FDWs will follow postgres_fdw if needed. > > > After receiving a signal, it closes the problem connection. So the > > > local session is usable after that but the fiailed remote sessions are > > > closed and created another one at the next use. > > > > I couldn't see that the problematic connection is closed when the cancellation fails... Am I looking at a wrong place? ... > > I guess Horiguchi-san refereed the following code in pgfdw_xact_callback(): > > /* > * If the connection isn't in a good idle state, discard it to > * recover. Next GetConnection will open a new connection. > */ > if (PQstatus(entry->conn) != CONNECTION_OK || > PQtransactionStatus(entry->conn) != PQTRANS_IDLE || > entry->changing_xact_state) > { > elog(DEBUG3, "discarding connection %p", entry->conn); > disconnect_pg_server(entry); > } Right. Although it's not directly relevant to this discussion, precisely, that part is not visited just after the remote "COMMIT TRANSACTION" failed. If that commit fails or is canceled, an exception is raised while entry->changing_xact_state = true. Then the function is called again within AbortCurrentTransaction() and reaches the above code. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Transactions involving multiple postgres foreign servers, take 2
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > if (PQstatus(entry->conn) != CONNECTION_OK || > > PQtransactionStatus(entry->conn) != PQTRANS_IDLE || > > entry->changing_xact_state) > > { > > elog(DEBUG3, "discarding connection %p", entry->conn); > > disconnect_pg_server(entry); > > } > > Right. Although it's not directly relevant to this discussion, > precisely, that part is not visited just after the remote "COMMIT > TRANSACTION" failed. If that commit fails or is canceled, an exception > is raised while entry->changing_xact_state = true. Then the function > is called again within AbortCurrentTransaction() and reaches the above > code. Ah, then the connection to the foreign server is closed after failing to cancel the query. Thanks. Regards Takayuki Tsunakawa
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > So what's your opinion? My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others may havepointed out something else too, but I don't remember), before going deeper into the code review. * FDW interface New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validatethe FDW interface. What FDW function would call what XA function(s)? What should be the arguments for the FEW functions? * Performance Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be the firstrelease quality. I proposed the idea. (If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want tokeep Postgres's reputation.) As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helpsevaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simply imaginingwhat is typically written in database textbooks and research papers. I'm asking this because I saw some discussionin this thread that some new WAL records are added. I was worried that transactions have to write WAL recordsother than prepare and commit unlike textbook implementations. Atomic Commit of Distributed Transactions https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions * Query cancellation As you showed, there's no problem with postgres_fdw? The cancelability of FDW in general remains a problem, but that can be a separate undertaking. * Global visibility This is what Amit-san suggested some times -- "design it before reviewing the current patch." I'm a bit optimistic aboutthis and think this FDW 2PC can be implemented separately as a pure enhancement of FDW. But I also understand his concern. If your (our?) aim is to use this FDW 2PC for sharding, we may have to design the combination of 2PC and visibilityfirst. > I don’t think we need to stipulate the query cancellation. Anyway I > guess the facts neither that we don’t stipulate anything about query > cancellation now nor that postgres_fdw might not be cancellable in > some situations now are not a reason for not supporting query > cancellation. If it's a desirable behavior and users want it, we need > to put an effort to support it as much as possible like we’ve done in > postgres_fdw. Some FDWs unfortunately might not be able to support it > only by their functionality but it would be good if we can achieve > that by combination of PostgreSQL and FDW plugins. Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interface andits documentation so that FDW developers can implement what we consider important -- query cancellation in your discussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That'swhat Tomas Vondra pointed out several years ago. Regards Takayuki Tsunakawa
On Wed, Oct 21, 2020 at 3:03 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > So what's your opinion? > > * Global visibility > This is what Amit-san suggested some times -- "design it before reviewing the current patch." I'm a bit optimistic aboutthis and think this FDW 2PC can be implemented separately as a pure enhancement of FDW. But I also understand his concern. If your (our?) aim is to use this FDW 2PC for sharding, > As far as I understand that is what the goal is for which this is a step. For example, see the wiki [1]. I understand that wiki is not the final thing but I have seen other places as well where there is a mention of FDW based sharding and I feel this is the reason why many people are trying to improve this area. That is why I suggested having an upfront design of global visibility and a deadlock detector along with this work. [1] - https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding -- With Regards, Amit Kapila.
On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > So what's your opinion? > > My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others mayhave pointed out something else too, but I don't remember), before going deeper into the code review. > > * FDW interface > New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validatethe FDW interface. > What FDW function would call what XA function(s)? What should be the arguments for the FEW functions? I guess since FDW interfaces may be affected by the feature architecture we can discuss later. > * Performance > Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be the firstrelease quality. I proposed the idea. > (If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I wantto keep Postgres's reputation.) What is in your mind regarding the implementation of parallel prepare and commit? Given that some FDW plugins don't support asynchronous execution I guess we need to use parallel workers or something. That is, the backend process launches parallel workers to prepare/commit/rollback foreign transactions in parallel. I don't deny this approach but it'll definitely make the feature complex and needs more codes. My point is a small start and keeping simple the first version. Even if we need one or more years for this feature, I think that introducing the simple and minimum functionality as the first version to the core still has benefits. We will be able to have the opportunity to get real feedback from users and to fix bugs in the main infrastructure before making it complex. In this sense, the patch having the backend return without waits for resolution after the local commit would be a good start as the first version (i.g., up to applying v26-0006 patch). Anyway, the architecture should be extensible enough for future improvements. For the performance improvements, we will be able to support asynchronous and/or prepare/commit/rollback. Moreover, having multiple resolver processes on one database would also help get better through-put. For the user who needs much better through-put, the user also can select not to wait for resolution after the local commit, like synchronous_commit = ‘local’ in replication. > As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) Thathelps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simplyimagining what is typically written in database textbooks and research papers. I'm asking this because I saw somediscussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL recordsother than prepare and commit unlike textbook implementations. > > Atomic Commit of Distributed Transactions > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions Understood. I'll add an explanation about the message flow and disk writes to the wiki page. We need to consider the point of error handling during resolving foreign transactions too. > > > I don’t think we need to stipulate the query cancellation. Anyway I > > guess the facts neither that we don’t stipulate anything about query > > cancellation now nor that postgres_fdw might not be cancellable in > > some situations now are not a reason for not supporting query > > cancellation. If it's a desirable behavior and users want it, we need > > to put an effort to support it as much as possible like we’ve done in > > postgres_fdw. Some FDWs unfortunately might not be able to support it > > only by their functionality but it would be good if we can achieve > > that by combination of PostgreSQL and FDW plugins. > > Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interfaceand its documentation so that FDW developers can implement what we consider important -- query cancellation in yourdiscussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago. I suspect the story is somewhat different. libpq fortunately supports asynchronous execution, but when it comes to canceling the foreign transaction resolution I think basically all FDW plugins are in the same situation at this time. We can choose whether to make it cancellable or not. According to the discussion so far, it completely depends on the architecture of this feature. So my point is whether it's worth to have this functionality for users and whether users want it, not whether postgres_fdw is ok. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > So what's your opinion? > > > > My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (others mayhave pointed out something else too, but I don't remember), before going deeper into the code review. > > > > * FDW interface > > New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on to validatethe FDW interface. > > What FDW function would call what XA function(s)? What should be the arguments for the FEW functions? > > I guess since FDW interfaces may be affected by the feature > architecture we can discuss later. > > > * Performance > > Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be thefirst release quality. I proposed the idea. > > (If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I wantto keep Postgres's reputation.) > > What is in your mind regarding the implementation of parallel prepare > and commit? Given that some FDW plugins don't support asynchronous > execution I guess we need to use parallel workers or something. That > is, the backend process launches parallel workers to > prepare/commit/rollback foreign transactions in parallel. I don't deny > this approach but it'll definitely make the feature complex and needs > more codes. > > My point is a small start and keeping simple the first version. Even > if we need one or more years for this feature, I think that > introducing the simple and minimum functionality as the first version > to the core still has benefits. We will be able to have the > opportunity to get real feedback from users and to fix bugs in the > main infrastructure before making it complex. In this sense, the patch > having the backend return without waits for resolution after the local > commit would be a good start as the first version (i.g., up to > applying v26-0006 patch). Anyway, the architecture should be > extensible enough for future improvements. > > For the performance improvements, we will be able to support > asynchronous and/or prepare/commit/rollback. Moreover, having multiple > resolver processes on one database would also help get better > through-put. For the user who needs much better through-put, the user > also can select not to wait for resolution after the local commit, > like synchronous_commit = ‘local’ in replication. > > > As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) Thathelps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simplyimagining what is typically written in database textbooks and research papers. I'm asking this because I saw somediscussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL recordsother than prepare and commit unlike textbook implementations. > > > > Atomic Commit of Distributed Transactions > > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions > > Understood. I'll add an explanation about the message flow and disk > writes to the wiki page. Done. > > We need to consider the point of error handling during resolving > foreign transactions too. > > > > > > I don’t think we need to stipulate the query cancellation. Anyway I > > > guess the facts neither that we don’t stipulate anything about query > > > cancellation now nor that postgres_fdw might not be cancellable in > > > some situations now are not a reason for not supporting query > > > cancellation. If it's a desirable behavior and users want it, we need > > > to put an effort to support it as much as possible like we’ve done in > > > postgres_fdw. Some FDWs unfortunately might not be able to support it > > > only by their functionality but it would be good if we can achieve > > > that by combination of PostgreSQL and FDW plugins. > > > > Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interfaceand its documentation so that FDW developers can implement what we consider important -- query cancellation in yourdiscussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago. > > I suspect the story is somewhat different. libpq fortunately supports > asynchronous execution, but when it comes to canceling the foreign > transaction resolution I think basically all FDW plugins are in the > same situation at this time. We can choose whether to make it > cancellable or not. According to the discussion so far, it completely > depends on the architecture of this feature. So my point is whether > it's worth to have this functionality for users and whether users want > it, not whether postgres_fdw is ok. > I've thought again about the idea that once the backend failed to resolve a foreign transaction it leaves to a resolver process. With this idea, the backend process perform the 2nd phase of 2PC only once. If an error happens during resolution it leaves to a resolver process and returns an error to the client. We used to use this idea in the previous patches and it’s discussed sometimes. First of all, this idea doesn’t resolve the problem of error handling that the transaction could return an error to the client in spite of having been committed the local transaction. There is an argument that this behavior could also happen even in a single server environment but I guess the situation is slightly different. Basically what the transaction does after the commit is cleanup. An error could happen during cleanup but if it happens it’s likely due to a bug of something wrong inside PostgreSQL or OS. On the other hand, during and after resolution the transaction does major works such as connecting a foreign server, sending an SQL, getting the result, and writing a WAL to remove the entry. These are more likely to happen an error. Also, with this idea, the client needs to check if the error got from the server is really true because the local transaction might have been committed. Although this could happen even in a single server environment how many users check that in practice? If a server crashes, subsequent transactions end up failing due to a network connection error but it seems hard to distinguish between such a real error and the fake error. Moreover, it’s questionable in terms of extensibility. We would not able to support keeping waiting for distributed transactions to complete even if an error happens, like synchronous replication. The user might want to wait in case where the failure is temporary such as temporary network disconnection. Trying resolution only once seems to have cons of both asynchronous and synchronous resolutions. So I’m thinking that with this idea the user will need to change their application so that it checks if the error they got is really true, which is cumbersome for users. Also, it seems to me we need to circumspectly discuss whether this idea could weaken extensibility. Anyway, according to the discussion, it seems to me that we got a consensus so far that the backend process prepares all foreign transactions and a resolver process is necessary to resolve in-doubt transaction in background. So I’ve changed the patch set as follows. Applying these all patches, we can support asynchronous foreign transaction resolution. That is, at transaction commit the backend process prepares all foreign transactions, and then commit the local transaction. After that, it returns OK of commit to the client while leaving the prepared foreign transaction to a resolver process. A resolver process fetches the foreign transactions to resolve and resolves them in background. Since the 2nd phase of 2PC is performed asynchronously a transaction that wants to see the previous transaction result needs to check its status. Here is brief explaination for each patches: v27-0001-Introduce-transaction-manager-for-foreign-transa.patch This commit adds the basic foreign transaction manager, CommitForeignTransaction, and RollbackForeignTransaction API. These APIs support only one-phase. With this change, FDW is able to control its transaction using the foreign transaction manager, not using XactCallback. v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch This commit implements both CommitForeignTransaction and RollbackForeignTransaction APIs in postgres_fdw. Note that since PREPARE TRANSACTION is still not supported there is nothing the user newly is able to do. v27-0003-Recreate-RemoveForeignServerById.patch This commit recreates RemoveForeignServerById that was removed by b1d32d3e3. This is necessary because we need to check if there is a foreign transaction involved with the foreign server that is about to be removed. v27-0004-Add-PrepareForeignTransaction-API.patch This commit adds prepared foreign transaction support including WAL logging and recovery, and PrepareForeignTransaction API. With this change, the user is able to do 'PREPARE TRANSACTION’ and 'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the local transaction. It doesn't do anything for foreign transactions. Therefore, the user needs to resolve foreign transactions manually by executing the pg_resolve_foreign_xacts() SQL function which is also introduced by this commit. v27-0005-postgres_fdw-supports-prepare-API.patch This commit implements PrepareForeignTransaction API and makes CommitForeignTransaction and RollbackForeignTransaction supports two-phase commit. v27-0006-Add-GetPrepareId-API.patch This commit adds GetPrepareID API. v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch This commit introduces foreign transaction resolver and launcher processes. With this change, the user doesn’t need to manually execute pg_resolve_foreign_xacts() function to resolve foreign transactions prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED. Instead, a resolver process automatically resolves them in background. v27-0008-Prepare-foreign-transactions-at-commit-time.patch With this commit, the transaction prepares foreign transactions marked as modified at transaction commit if foreign_twophase_commit is ‘required’. Previously the user needs to do PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC transparently to the user. But the transaction returns OK of commit to the client after committing the local transaction and notifying the resolver process, without waits. Foreign transactions are asynchronously resolved by the resolver process. v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch With this commit, the transactions started via postgres_fdw are marked as modified, which is necessary to use 2PC. v27-0010-Documentation-update.patch v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch Documentation update and regression tests. The missing piece from the previous version patch is synchronously transaction resolution. In the previous patch, foreign transactions are synchronously resolved by a resolver process. But since it's under discussion whether this is a good approach and I'm considering optimizing the logic it’s not included in the current patch set. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
- v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v27-0006-Add-GetPrepareId-API.patch
- v27-0005-postgres_fdw-supports-prepare-API.patch
- v27-0010-Documentation-update.patch
- v27-0004-Add-PrepareForeignTransaction-API.patch
- v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v27-0008-Prepare-foreign-transactions-at-commit-time.patch
- v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v27-0003-Recreate-RemoveForeignServerById.patch
- v27-0001-Introduce-transaction-manager-for-foreign-transa.patch
On Thu, Nov 5, 2020 at 12:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > > > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > > So what's your opinion? > > > > > > My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (othersmay have pointed out something else too, but I don't remember), before going deeper into the code review. > > > > > > * FDW interface > > > New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on tovalidate the FDW interface. > > > What FDW function would call what XA function(s)? What should be the arguments for the FEW functions? > > > > I guess since FDW interfaces may be affected by the feature > > architecture we can discuss later. > > > > > * Performance > > > Parallel prepare and commits on the client backend. The current implementation is untolerable and should not be thefirst release quality. I proposed the idea. > > > (If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. Iwant to keep Postgres's reputation.) > > > > What is in your mind regarding the implementation of parallel prepare > > and commit? Given that some FDW plugins don't support asynchronous > > execution I guess we need to use parallel workers or something. That > > is, the backend process launches parallel workers to > > prepare/commit/rollback foreign transactions in parallel. I don't deny > > this approach but it'll definitely make the feature complex and needs > > more codes. > > > > My point is a small start and keeping simple the first version. Even > > if we need one or more years for this feature, I think that > > introducing the simple and minimum functionality as the first version > > to the core still has benefits. We will be able to have the > > opportunity to get real feedback from users and to fix bugs in the > > main infrastructure before making it complex. In this sense, the patch > > having the backend return without waits for resolution after the local > > commit would be a good start as the first version (i.g., up to > > applying v26-0006 patch). Anyway, the architecture should be > > extensible enough for future improvements. > > > > For the performance improvements, we will be able to support > > asynchronous and/or prepare/commit/rollback. Moreover, having multiple > > resolver processes on one database would also help get better > > through-put. For the user who needs much better through-put, the user > > also can select not to wait for resolution after the local commit, > > like synchronous_commit = ‘local’ in replication. > > > > > As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'm simplyimagining what is typically written in database textbooks and research papers. I'm asking this because I saw somediscussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL recordsother than prepare and commit unlike textbook implementations. > > > > > > Atomic Commit of Distributed Transactions > > > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions > > > > Understood. I'll add an explanation about the message flow and disk > > writes to the wiki page. > > Done. > > > > > We need to consider the point of error handling during resolving > > foreign transactions too. > > > > > > > > > I don’t think we need to stipulate the query cancellation. Anyway I > > > > guess the facts neither that we don’t stipulate anything about query > > > > cancellation now nor that postgres_fdw might not be cancellable in > > > > some situations now are not a reason for not supporting query > > > > cancellation. If it's a desirable behavior and users want it, we need > > > > to put an effort to support it as much as possible like we’ve done in > > > > postgres_fdw. Some FDWs unfortunately might not be able to support it > > > > only by their functionality but it would be good if we can achieve > > > > that by combination of PostgreSQL and FDW plugins. > > > > > > Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interfaceand its documentation so that FDW developers can implement what we consider important -- query cancellation in yourdiscussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago. > > > > I suspect the story is somewhat different. libpq fortunately supports > > asynchronous execution, but when it comes to canceling the foreign > > transaction resolution I think basically all FDW plugins are in the > > same situation at this time. We can choose whether to make it > > cancellable or not. According to the discussion so far, it completely > > depends on the architecture of this feature. So my point is whether > > it's worth to have this functionality for users and whether users want > > it, not whether postgres_fdw is ok. > > > > I've thought again about the idea that once the backend failed to > resolve a foreign transaction it leaves to a resolver process. With > this idea, the backend process perform the 2nd phase of 2PC only once. > If an error happens during resolution it leaves to a resolver process > and returns an error to the client. We used to use this idea in the > previous patches and it’s discussed sometimes. > > First of all, this idea doesn’t resolve the problem of error handling > that the transaction could return an error to the client in spite of > having been committed the local transaction. There is an argument that > this behavior could also happen even in a single server environment > but I guess the situation is slightly different. Basically what the > transaction does after the commit is cleanup. An error could happen > during cleanup but if it happens it’s likely due to a bug of > something wrong inside PostgreSQL or OS. On the other hand, during and > after resolution the transaction does major works such as connecting a > foreign server, sending an SQL, getting the result, and writing a WAL > to remove the entry. These are more likely to happen an error. > > Also, with this idea, the client needs to check if the error got from > the server is really true because the local transaction might have > been committed. Although this could happen even in a single server > environment how many users check that in practice? If a server > crashes, subsequent transactions end up failing due to a network > connection error but it seems hard to distinguish between such a real > error and the fake error. > > Moreover, it’s questionable in terms of extensibility. We would not > able to support keeping waiting for distributed transactions to > complete even if an error happens, like synchronous replication. The > user might want to wait in case where the failure is temporary such as > temporary network disconnection. Trying resolution only once seems to > have cons of both asynchronous and synchronous resolutions. > > So I’m thinking that with this idea the user will need to change their > application so that it checks if the error they got is really true, > which is cumbersome for users. Also, it seems to me we need to > circumspectly discuss whether this idea could weaken extensibility. > > > Anyway, according to the discussion, it seems to me that we got a > consensus so far that the backend process prepares all foreign > transactions and a resolver process is necessary to resolve in-doubt > transaction in background. So I’ve changed the patch set as follows. > Applying these all patches, we can support asynchronous foreign > transaction resolution. That is, at transaction commit the backend > process prepares all foreign transactions, and then commit the local > transaction. After that, it returns OK of commit to the client while > leaving the prepared foreign transaction to a resolver process. A > resolver process fetches the foreign transactions to resolve and > resolves them in background. Since the 2nd phase of 2PC is performed > asynchronously a transaction that wants to see the previous > transaction result needs to check its status. > > Here is brief explaination for each patches: > > v27-0001-Introduce-transaction-manager-for-foreign-transa.patch > > This commit adds the basic foreign transaction manager, > CommitForeignTransaction, and RollbackForeignTransaction API. These > APIs support only one-phase. With this change, FDW is able to control > its transaction using the foreign transaction manager, not using > XactCallback. > > v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch > > This commit implements both CommitForeignTransaction and > RollbackForeignTransaction APIs in postgres_fdw. Note that since > PREPARE TRANSACTION is still not supported there is nothing the user > newly is able to do. > > v27-0003-Recreate-RemoveForeignServerById.patch > > This commit recreates RemoveForeignServerById that was removed by > b1d32d3e3. This is necessary because we need to check if there is a > foreign transaction involved with the foreign server that is about to > be removed. > > v27-0004-Add-PrepareForeignTransaction-API.patch > > This commit adds prepared foreign transaction support including WAL > logging and recovery, and PrepareForeignTransaction API. With this > change, the user is able to do 'PREPARE TRANSACTION’ and > 'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves > foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the > local transaction. It doesn't do anything for foreign transactions. > Therefore, the user needs to resolve foreign transactions manually by > executing the pg_resolve_foreign_xacts() SQL function which is also > introduced by this commit. > > v27-0005-postgres_fdw-supports-prepare-API.patch > > This commit implements PrepareForeignTransaction API and makes > CommitForeignTransaction and RollbackForeignTransaction supports > two-phase commit. > > v27-0006-Add-GetPrepareId-API.patch > > This commit adds GetPrepareID API. > > v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch > > This commit introduces foreign transaction resolver and launcher > processes. With this change, the user doesn’t need to manually execute > pg_resolve_foreign_xacts() function to resolve foreign transactions > prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED. > Instead, a resolver process automatically resolves them in background. > > v27-0008-Prepare-foreign-transactions-at-commit-time.patch > > With this commit, the transaction prepares foreign transactions marked > as modified at transaction commit if foreign_twophase_commit is > ‘required’. Previously the user needs to do PREPARE TRANSACTION and > COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC > transparently to the user. But the transaction returns OK of commit to > the client after committing the local transaction and notifying the > resolver process, without waits. Foreign transactions are > asynchronously resolved by the resolver process. > > v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch > > With this commit, the transactions started via postgres_fdw are marked > as modified, which is necessary to use 2PC. > > v27-0010-Documentation-update.patch > v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch > > Documentation update and regression tests. > > The missing piece from the previous version patch is synchronously > transaction resolution. In the previous patch, foreign transactions > are synchronously resolved by a resolver process. But since it's under > discussion whether this is a good approach and I'm considering > optimizing the logic it’s not included in the current patch set. > > Cfbot reported an error. I've attached the updated version patch set to make cfbot happy. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
- v28-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v28-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v28-0010-Documentation-update.patch
- v28-0006-Add-GetPrepareId-API.patch
- v28-0008-Prepare-foreign-transactions-at-commit-time.patch
- v28-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v28-0003-Recreate-RemoveForeignServerById.patch
- v28-0005-postgres_fdw-supports-prepare-API.patch
- v28-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v28-0004-Add-PrepareForeignTransaction-API.patch
- v28-0001-Introduce-transaction-manager-for-foreign-transa.patch
On Sun, Nov 8, 2020 at 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Nov 5, 2020 at 12:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com > > > <tsunakawa.takay@fujitsu.com> wrote: > > > > > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > > > So what's your opinion? > > > > > > > > My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (othersmay have pointed out something else too, but I don't remember), before going deeper into the code review. > > > > > > > > * FDW interface > > > > New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on tovalidate the FDW interface. > > > > What FDW function would call what XA function(s)? What should be the arguments for the FEW functions? > > > > > > I guess since FDW interfaces may be affected by the feature > > > architecture we can discuss later. > > > > > > > * Performance > > > > Parallel prepare and commits on the client backend. The current implementation is untolerable and should not bethe first release quality. I proposed the idea. > > > > (If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want to keep Postgres's reputation.) > > > > > > What is in your mind regarding the implementation of parallel prepare > > > and commit? Given that some FDW plugins don't support asynchronous > > > execution I guess we need to use parallel workers or something. That > > > is, the backend process launches parallel workers to > > > prepare/commit/rollback foreign transactions in parallel. I don't deny > > > this approach but it'll definitely make the feature complex and needs > > > more codes. > > > > > > My point is a small start and keeping simple the first version. Even > > > if we need one or more years for this feature, I think that > > > introducing the simple and minimum functionality as the first version > > > to the core still has benefits. We will be able to have the > > > opportunity to get real feedback from users and to fix bugs in the > > > main infrastructure before making it complex. In this sense, the patch > > > having the backend return without waits for resolution after the local > > > commit would be a good start as the first version (i.g., up to > > > applying v26-0006 patch). Anyway, the architecture should be > > > extensible enough for future improvements. > > > > > > For the performance improvements, we will be able to support > > > asynchronous and/or prepare/commit/rollback. Moreover, having multiple > > > resolver processes on one database would also help get better > > > through-put. For the user who needs much better through-put, the user > > > also can select not to wait for resolution after the local commit, > > > like synchronous_commit = ‘local’ in replication. > > > > > > > As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'msimply imagining what is typically written in database textbooks and research papers. I'm asking this because I saw somediscussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL recordsother than prepare and commit unlike textbook implementations. > > > > > > > > Atomic Commit of Distributed Transactions > > > > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions > > > > > > Understood. I'll add an explanation about the message flow and disk > > > writes to the wiki page. > > > > Done. > > > > > > > > We need to consider the point of error handling during resolving > > > foreign transactions too. > > > > > > > > > > > > I don’t think we need to stipulate the query cancellation. Anyway I > > > > > guess the facts neither that we don’t stipulate anything about query > > > > > cancellation now nor that postgres_fdw might not be cancellable in > > > > > some situations now are not a reason for not supporting query > > > > > cancellation. If it's a desirable behavior and users want it, we need > > > > > to put an effort to support it as much as possible like we’ve done in > > > > > postgres_fdw. Some FDWs unfortunately might not be able to support it > > > > > only by their functionality but it would be good if we can achieve > > > > > that by combination of PostgreSQL and FDW plugins. > > > > > > > > Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interfaceand its documentation so that FDW developers can implement what we consider important -- query cancellation in yourdiscussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago. > > > > > > I suspect the story is somewhat different. libpq fortunately supports > > > asynchronous execution, but when it comes to canceling the foreign > > > transaction resolution I think basically all FDW plugins are in the > > > same situation at this time. We can choose whether to make it > > > cancellable or not. According to the discussion so far, it completely > > > depends on the architecture of this feature. So my point is whether > > > it's worth to have this functionality for users and whether users want > > > it, not whether postgres_fdw is ok. > > > > > > > I've thought again about the idea that once the backend failed to > > resolve a foreign transaction it leaves to a resolver process. With > > this idea, the backend process perform the 2nd phase of 2PC only once. > > If an error happens during resolution it leaves to a resolver process > > and returns an error to the client. We used to use this idea in the > > previous patches and it’s discussed sometimes. > > > > First of all, this idea doesn’t resolve the problem of error handling > > that the transaction could return an error to the client in spite of > > having been committed the local transaction. There is an argument that > > this behavior could also happen even in a single server environment > > but I guess the situation is slightly different. Basically what the > > transaction does after the commit is cleanup. An error could happen > > during cleanup but if it happens it’s likely due to a bug of > > something wrong inside PostgreSQL or OS. On the other hand, during and > > after resolution the transaction does major works such as connecting a > > foreign server, sending an SQL, getting the result, and writing a WAL > > to remove the entry. These are more likely to happen an error. > > > > Also, with this idea, the client needs to check if the error got from > > the server is really true because the local transaction might have > > been committed. Although this could happen even in a single server > > environment how many users check that in practice? If a server > > crashes, subsequent transactions end up failing due to a network > > connection error but it seems hard to distinguish between such a real > > error and the fake error. > > > > Moreover, it’s questionable in terms of extensibility. We would not > > able to support keeping waiting for distributed transactions to > > complete even if an error happens, like synchronous replication. The > > user might want to wait in case where the failure is temporary such as > > temporary network disconnection. Trying resolution only once seems to > > have cons of both asynchronous and synchronous resolutions. > > > > So I’m thinking that with this idea the user will need to change their > > application so that it checks if the error they got is really true, > > which is cumbersome for users. Also, it seems to me we need to > > circumspectly discuss whether this idea could weaken extensibility. > > > > > > Anyway, according to the discussion, it seems to me that we got a > > consensus so far that the backend process prepares all foreign > > transactions and a resolver process is necessary to resolve in-doubt > > transaction in background. So I’ve changed the patch set as follows. > > Applying these all patches, we can support asynchronous foreign > > transaction resolution. That is, at transaction commit the backend > > process prepares all foreign transactions, and then commit the local > > transaction. After that, it returns OK of commit to the client while > > leaving the prepared foreign transaction to a resolver process. A > > resolver process fetches the foreign transactions to resolve and > > resolves them in background. Since the 2nd phase of 2PC is performed > > asynchronously a transaction that wants to see the previous > > transaction result needs to check its status. > > > > Here is brief explaination for each patches: > > > > v27-0001-Introduce-transaction-manager-for-foreign-transa.patch > > > > This commit adds the basic foreign transaction manager, > > CommitForeignTransaction, and RollbackForeignTransaction API. These > > APIs support only one-phase. With this change, FDW is able to control > > its transaction using the foreign transaction manager, not using > > XactCallback. > > > > v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch > > > > This commit implements both CommitForeignTransaction and > > RollbackForeignTransaction APIs in postgres_fdw. Note that since > > PREPARE TRANSACTION is still not supported there is nothing the user > > newly is able to do. > > > > v27-0003-Recreate-RemoveForeignServerById.patch > > > > This commit recreates RemoveForeignServerById that was removed by > > b1d32d3e3. This is necessary because we need to check if there is a > > foreign transaction involved with the foreign server that is about to > > be removed. > > > > v27-0004-Add-PrepareForeignTransaction-API.patch > > > > This commit adds prepared foreign transaction support including WAL > > logging and recovery, and PrepareForeignTransaction API. With this > > change, the user is able to do 'PREPARE TRANSACTION’ and > > 'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves > > foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the > > local transaction. It doesn't do anything for foreign transactions. > > Therefore, the user needs to resolve foreign transactions manually by > > executing the pg_resolve_foreign_xacts() SQL function which is also > > introduced by this commit. > > > > v27-0005-postgres_fdw-supports-prepare-API.patch > > > > This commit implements PrepareForeignTransaction API and makes > > CommitForeignTransaction and RollbackForeignTransaction supports > > two-phase commit. > > > > v27-0006-Add-GetPrepareId-API.patch > > > > This commit adds GetPrepareID API. > > > > v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch > > > > This commit introduces foreign transaction resolver and launcher > > processes. With this change, the user doesn’t need to manually execute > > pg_resolve_foreign_xacts() function to resolve foreign transactions > > prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED. > > Instead, a resolver process automatically resolves them in background. > > > > v27-0008-Prepare-foreign-transactions-at-commit-time.patch > > > > With this commit, the transaction prepares foreign transactions marked > > as modified at transaction commit if foreign_twophase_commit is > > ‘required’. Previously the user needs to do PREPARE TRANSACTION and > > COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC > > transparently to the user. But the transaction returns OK of commit to > > the client after committing the local transaction and notifying the > > resolver process, without waits. Foreign transactions are > > asynchronously resolved by the resolver process. > > > > v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch > > > > With this commit, the transactions started via postgres_fdw are marked > > as modified, which is necessary to use 2PC. > > > > v27-0010-Documentation-update.patch > > v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch > > > > Documentation update and regression tests. > > > > The missing piece from the previous version patch is synchronously > > transaction resolution. In the previous patch, foreign transactions > > are synchronously resolved by a resolver process. But since it's under > > discussion whether this is a good approach and I'm considering > > optimizing the logic it’s not included in the current patch set. > > > > > > Cfbot reported an error. I've attached the updated version patch set > to make cfbot happy. Since the previous version conflicts with the current HEAD I've attached the rebased version patch set. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
- v29-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v29-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v29-0006-Add-GetPrepareId-API.patch
- v29-0008-Prepare-foreign-transactions-at-commit-time.patch
- v29-0010-Documentation-update.patch
- v29-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v29-0005-postgres_fdw-supports-prepare-API.patch
- v29-0003-Recreate-RemoveForeignServerById.patch
- v29-0004-Add-PrepareForeignTransaction-API.patch
- v29-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v29-0001-Introduce-transaction-manager-for-foreign-transa.patch
On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Since the previous version conflicts with the current HEAD I've > attached the rebased version patch set. Rebased the patch set again to the current HEAD. The discussion of this patch is very long so here is a short summary of the current state: It’s still under discussion which approaches are the best for the distributed transaction commit as a building block of built-in sharing using foreign data wrappers. Since we’re considering that we use this feature for built-in sharding, the design depends on the architecture of built-in sharding. For example, with the current patch, the PostgreSQL node that received a COMMIT from the client works as a coordinator and it commits the transactions using 2PC on all foreign servers involved with the transaction. This approach would be good with the de-centralized sharding architecture but not with centralized architecture like the GTM node of Postgres-XC and Postgres-XL that is a dedicated component that is responsible for transaction management. Since we don't get a consensus on the built-in sharding architecture yet, it's still an open question that this patch's approach is really good as a building block of the built-in sharding. On the other hand, this feature is not necessarily dedicated to the built-in sharding. For example, the distributed transaction commit through FDW is important also when atomically moving data between two servers via FDWs. Using a dedicated process or server like GTM could be an over solution. Having the node that received a COMMIT work as a coordinator would be better and straight forward. There is no noticeable TODO in the functionality so far covered by this patch set. This patchset adds new FDW APIs to support 2PC, introduces the global transaction manager, and implement those FDW APIs to postgres_fdw. Also, it has regression tests and documentation. Transactions on foreign servers involved with the distributed transaction are committed using 2PC. Committing using 2PC is performed asynchronously and transparently to the user. Therefore, it doesn’t guarantee that transactions on the foreign server are also committed when the client gets an acknowledgment of COMMIT. The patch doesn't cover synchronous foreign transaction commit via 2PC is not covered by this patch as we still need a discussion on the design. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
- v30-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v30-0008-Prepare-foreign-transactions-at-commit-time.patch
- v30-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v30-0010-Documentation-update.patch
- v30-0006-Add-GetPrepareId-API.patch
- v30-0003-Recreate-RemoveForeignServerById.patch
- v30-0005-postgres_fdw-supports-prepare-API.patch
- v30-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v30-0004-Add-PrepareForeignTransaction-API.patch
- v30-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v30-0001-Introduce-transaction-manager-for-foreign-transa.patch
On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Since the previous version conflicts with the current HEAD I've > > attached the rebased version patch set. > > Rebased the patch set again to the current HEAD. > > The discussion of this patch is very long so here is a short summary > of the current state: > > It’s still under discussion which approaches are the best for the > distributed transaction commit as a building block of built-in sharing > using foreign data wrappers. > > Since we’re considering that we use this feature for built-in > sharding, the design depends on the architecture of built-in sharding. > For example, with the current patch, the PostgreSQL node that received > a COMMIT from the client works as a coordinator and it commits the > transactions using 2PC on all foreign servers involved with the > transaction. This approach would be good with the de-centralized > sharding architecture but not with centralized architecture like the > GTM node of Postgres-XC and Postgres-XL that is a dedicated component > that is responsible for transaction management. Since we don't get a > consensus on the built-in sharding architecture yet, it's still an > open question that this patch's approach is really good as a building > block of the built-in sharding. > > On the other hand, this feature is not necessarily dedicated to the > built-in sharding. For example, the distributed transaction commit > through FDW is important also when atomically moving data between two > servers via FDWs. Using a dedicated process or server like GTM could > be an over solution. Having the node that received a COMMIT work as a > coordinator would be better and straight forward. > > There is no noticeable TODO in the functionality so far covered by > this patch set. This patchset adds new FDW APIs to support 2PC, > introduces the global transaction manager, and implement those FDW > APIs to postgres_fdw. Also, it has regression tests and documentation. > Transactions on foreign servers involved with the distributed > transaction are committed using 2PC. Committing using 2PC is performed > asynchronously and transparently to the user. Therefore, it doesn’t > guarantee that transactions on the foreign server are also committed > when the client gets an acknowledgment of COMMIT. The patch doesn't > cover synchronous foreign transaction commit via 2PC is not covered by > this patch as we still need a discussion on the design. > I've attached the rebased patches to make cfbot happy. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
- v31-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v31-0005-postgres_fdw-supports-prepare-API.patch
- v31-0008-Prepare-foreign-transactions-at-commit-time.patch
- v31-0010-Documentation-update.patch
- v31-0006-Add-GetPrepareId-API.patch
- v31-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v31-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v31-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v31-0003-Recreate-RemoveForeignServerById.patch
- v31-0004-Add-PrepareForeignTransaction-API.patch
- v31-0001-Introduce-transaction-manager-for-foreign-transa.patch
+ return FDWXACT_STATUS_ABORTING;
+
On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Since the previous version conflicts with the current HEAD I've
> > attached the rebased version patch set.
>
> Rebased the patch set again to the current HEAD.
>
> The discussion of this patch is very long so here is a short summary
> of the current state:
>
> It’s still under discussion which approaches are the best for the
> distributed transaction commit as a building block of built-in sharing
> using foreign data wrappers.
>
> Since we’re considering that we use this feature for built-in
> sharding, the design depends on the architecture of built-in sharding.
> For example, with the current patch, the PostgreSQL node that received
> a COMMIT from the client works as a coordinator and it commits the
> transactions using 2PC on all foreign servers involved with the
> transaction. This approach would be good with the de-centralized
> sharding architecture but not with centralized architecture like the
> GTM node of Postgres-XC and Postgres-XL that is a dedicated component
> that is responsible for transaction management. Since we don't get a
> consensus on the built-in sharding architecture yet, it's still an
> open question that this patch's approach is really good as a building
> block of the built-in sharding.
>
> On the other hand, this feature is not necessarily dedicated to the
> built-in sharding. For example, the distributed transaction commit
> through FDW is important also when atomically moving data between two
> servers via FDWs. Using a dedicated process or server like GTM could
> be an over solution. Having the node that received a COMMIT work as a
> coordinator would be better and straight forward.
>
> There is no noticeable TODO in the functionality so far covered by
> this patch set. This patchset adds new FDW APIs to support 2PC,
> introduces the global transaction manager, and implement those FDW
> APIs to postgres_fdw. Also, it has regression tests and documentation.
> Transactions on foreign servers involved with the distributed
> transaction are committed using 2PC. Committing using 2PC is performed
> asynchronously and transparently to the user. Therefore, it doesn’t
> guarantee that transactions on the foreign server are also committed
> when the client gets an acknowledgment of COMMIT. The patch doesn't
> cover synchronous foreign transaction commit via 2PC is not covered by
> this patch as we still need a discussion on the design.
>
I've attached the rebased patches to make cfbot happy.
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, Thank you for reviewing the patch! > For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch : > > However these functions are not neither committed nor aborted at > > I think the double negation was not intentional. Should be 'are neither ...' Fixed. > > For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the return statement ? Hmm, you mean that we need MAXALIGN(size) after adding the size of FdwXactData structs? Size FdwXactShmemSize(void) { Size size; /* Size for foreign transaction information array */ size = offsetof(FdwXactCtlData, fdwxacts); size = add_size(size, mul_size(max_prepared_foreign_xacts, sizeof(FdwXact))); size = MAXALIGN(size); size = add_size(size, mul_size(max_prepared_foreign_xacts, sizeof(FdwXactData))); return size; } I don't think we need to do that. Looking at other similar code such as TwoPhaseShmemSize() doesn't do that. Why do you think we need that? > > + fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part); > > For the function name, Fdw and Xact appear twice, each. Maybe one of them can be dropped ? Agreed. Changed to FdwXactInsertEntry(). > > + * we don't need to anything for this participant because all foreign > > 'need to' -> 'need to do' Fixed. > > + else if (TransactionIdDidAbort(xid)) > + return FDWXACT_STATUS_ABORTING; > + > the 'else' can be omitted since the preceding if would return. Fixed. > > + if (max_prepared_foreign_xacts <= 0) > > I wonder when the value for max_prepared_foreign_xacts would be negative (and whether that should be considered an error). > Fixed to (max_prepared_foreign_xacts == 0) Attached the updated version patch set. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
- v32-0010-Documentation-update.patch
- v32-0004-Add-PrepareForeignTransaction-API.patch
- v32-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v32-0008-Prepare-foreign-transactions-at-commit-time.patch
- v32-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v32-0003-Recreate-RemoveForeignServerById.patch
- v32-0006-Add-GetPrepareId-API.patch
- v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v32-0005-postgres_fdw-supports-prepare-API.patch
- v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v32-0001-Introduce-transaction-manager-for-foreign-transa.patch
+ errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
Thank you for reviewing the patch!
> For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :
>
> However these functions are not neither committed nor aborted at
>
> I think the double negation was not intentional. Should be 'are neither ...'
Fixed.
>
> For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the return statement ?
Hmm, you mean that we need MAXALIGN(size) after adding the size of
FdwXactData structs?
Size
FdwXactShmemSize(void)
{
Size size;
/* Size for foreign transaction information array */
size = offsetof(FdwXactCtlData, fdwxacts);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXact)));
size = MAXALIGN(size);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXactData)));
return size;
}
I don't think we need to do that. Looking at other similar code such
as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?
>
> + fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
>
> For the function name, Fdw and Xact appear twice, each. Maybe one of them can be dropped ?
Agreed. Changed to FdwXactInsertEntry().
>
> + * we don't need to anything for this participant because all foreign
>
> 'need to' -> 'need to do'
Fixed.
>
> + else if (TransactionIdDidAbort(xid))
> + return FDWXACT_STATUS_ABORTING;
> +
> the 'else' can be omitted since the preceding if would return.
Fixed.
>
> + if (max_prepared_foreign_xacts <= 0)
>
> I wonder when the value for max_prepared_foreign_xacts would be negative (and whether that should be considered an error).
>
Fixed to (max_prepared_foreign_xacts == 0)
Attached the updated version patch set.
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
+Size
+FdwXactRslvShmemSize(void)
+ errhint("You might need to increase max_foreign_transaction_resolvers.")));
+ fdwxact->locking_backend = InvalidBackendId;
+ LWLockRelease(FdwXactLock);
Hi,For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :+ bool have_notwophase = false;Maybe name the variable have_no_twophase so that it is easier to read.+ * Two-phase commit is not required if the number of servers performedperformed -> performing+ errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
+ errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));The lines are really long. Please wrap into more lines.On Wed, Jan 13, 2021 at 9:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
Thank you for reviewing the patch!
> For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :
>
> However these functions are not neither committed nor aborted at
>
> I think the double negation was not intentional. Should be 'are neither ...'
Fixed.
>
> For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the return statement ?
Hmm, you mean that we need MAXALIGN(size) after adding the size of
FdwXactData structs?
Size
FdwXactShmemSize(void)
{
Size size;
/* Size for foreign transaction information array */
size = offsetof(FdwXactCtlData, fdwxacts);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXact)));
size = MAXALIGN(size);
size = add_size(size, mul_size(max_prepared_foreign_xacts,
sizeof(FdwXactData)));
return size;
}
I don't think we need to do that. Looking at other similar code such
as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?
>
> + fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
>
> For the function name, Fdw and Xact appear twice, each. Maybe one of them can be dropped ?
Agreed. Changed to FdwXactInsertEntry().
>
> + * we don't need to anything for this participant because all foreign
>
> 'need to' -> 'need to do'
Fixed.
>
> + else if (TransactionIdDidAbort(xid))
> + return FDWXACT_STATUS_ABORTING;
> +
> the 'else' can be omitted since the preceding if would return.
Fixed.
>
> + if (max_prepared_foreign_xacts <= 0)
>
> I wonder when the value for max_prepared_foreign_xacts would be negative (and whether that should be considered an error).
>
Fixed to (max_prepared_foreign_xacts == 0)
Attached the updated version patch set.
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
On Fri, Jan 15, 2021 at 4:03 AM Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, > For v32-0008-Prepare-foreign-transactions-at-commit-time.patch : Thank you for reviewing the patch! > > + bool have_notwophase = false; > > Maybe name the variable have_no_twophase so that it is easier to read. Fixed. > > + * Two-phase commit is not required if the number of servers performed > > performed -> performing Fixed. > > + errmsg("cannot process a distributed transaction that has operated on a foreign server that does not supporttwo-phase commit protocol"), > + errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers whichare not capable of two-phase commit"))); > > The lines are really long. Please wrap into more lines. Hmm, we can do that but if we do that, it makes grepping by the error message hard. Please refer to the documentation about the formatting guideline[1]: Limit line lengths so that the code is readable in an 80-column window. (This doesn't mean that you must never go past 80 columns. For instance, breaking a long error message string in arbitrary places just to keep the code within 80 columns is probably not a net gain in readability.) These changes have been made in the local branch. I'll post the updated patch set after incorporating all the comments. Regards, [1] https://www.postgresql.org/docs/devel/source-format.html -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
+ * entry is update. To avoid holding the lock during transaction processing
+ found = true;
+ break;
On Fri, Jan 15, 2021 at 4:03 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :
Thank you for reviewing the patch!
>
> + bool have_notwophase = false;
>
> Maybe name the variable have_no_twophase so that it is easier to read.
Fixed.
>
> + * Two-phase commit is not required if the number of servers performed
>
> performed -> performing
Fixed.
>
> + errmsg("cannot process a distributed transaction that has operated on a foreign server that does not support two-phase commit protocol"),
> + errdetail("foreign_twophase_commit is \'required\' but the transaction has some foreign servers which are not capable of two-phase commit")));
>
> The lines are really long. Please wrap into more lines.
Hmm, we can do that but if we do that, it makes grepping by the error
message hard. Please refer to the documentation about the formatting
guideline[1]:
Limit line lengths so that the code is readable in an 80-column
window. (This doesn't mean that you must never go past 80 columns. For
instance, breaking a long error message string in arbitrary places
just to keep the code within 80 columns is probably not a net gain in
readability.)
These changes have been made in the local branch. I'll post the
updated patch set after incorporating all the comments.
Regards,
[1] https://www.postgresql.org/docs/devel/source-format.html
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu <zyu@yugabyte.com> wrote: > > For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch : > > + entry->changing_xact_state = true; > ... > + entry->changing_xact_state = abort_cleanup_failure; > > I don't see return statement in between the two assignments. I wonder why entry->changing_xact_state is set to true, andlater being assigned again. Because postgresRollbackForeignTransaction() can get called again in case where an error occurred during aborting and cleanup the transaction. For example, if an error occurred when executing ABORT TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR), postgresRollbackForeignTransaction() will get called again while entry->changing_xact_state is still true. Then the entry will be caught by the following condition and cleaned up: /* * If connection is before starting transaction or is already unsalvageable, * do only the cleanup and don't touch it further. */ if (entry->changing_xact_state) { pgfdw_cleanup_after_transaction(entry); return; } > > For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch : > > bq. This commits introduces to new background processes: foreign > > commits introduces to new -> commit introduces two new Fixed. > > +FdwXactExistsXid(TransactionId xid) > > Since Xid is the parameter to this method, I think the Xid suffix can be dropped from the method name. But there is already a function named FdwXactExists()? bool FdwXactExists(Oid dbid, Oid serverid, Oid userid) As far as I read other code, we already have such functions that have the same functionality but have different arguments. For instance, SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think we can leave as it is but is it better to have like FdwXactCheckExistence() and FdwXactCheckExistenceByXid()? > > + * Portions Copyright (c) 2020, PostgreSQL Global Development Group > > Please correct year in the next patch set. Fixed. > > +FdwXactLauncherRequestToLaunch(void) > > Since the launcher's job is to 'launch', I think the Launcher can be omitted from the method name. Agreed. How about FdwXactRequestToLaunchResolver()? > > +/* Report shared memory space needed by FdwXactRsoverShmemInit */ > +Size > +FdwXactRslvShmemSize(void) > > Are both Rsover and Rslv referring to resolver ? It would be better to use whole word which reduces confusion. > Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or FdwXactResolveShmemInit) Agreed. I realized that these functions are the launcher's function, not resolver's. So I'd change to FdwXactLauncherShmemSize() and FdwXactLauncherShmemInit() respectively. > > +fdwxact_launch_resolver(Oid dbid) > > The above method is not in camel case. It would be better if method names are consistent (in casing). Fixed. > > + errmsg("out of foreign transaction resolver slots"), > + errhint("You might need to increase max_foreign_transaction_resolvers."))); > > It would be nice to include the value of max_foreign_xact_resolvers I agree it would be nice but looking at other code we don't include the value in this kind of messages. > > For fdwxact_resolver_onexit(): > > + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); > + fdwxact->locking_backend = InvalidBackendId; > + LWLockRelease(FdwXactLock); > > There is no call to method inside the for loop which may take time. I wonder if the lock can be obtained prior to the forloop and released coming out of the for loop. Agreed. > > +FXRslvLoop(void) > > Please use Resolver instead of Rslv Fixed. > > + FdwXactResolveFdwXacts(held_fdwxacts, nheld); > > Fdw and Xact are repeated twice each in the method name. Probably the method name can be made shorter. Fixed. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :
>
> + entry->changing_xact_state = true;
> ...
> + entry->changing_xact_state = abort_cleanup_failure;
>
> I don't see return statement in between the two assignments. I wonder why entry->changing_xact_state is set to true, and later being assigned again.
Because postgresRollbackForeignTransaction() can get called again in
case where an error occurred during aborting and cleanup the
transaction. For example, if an error occurred when executing ABORT
TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
postgresRollbackForeignTransaction() will get called again while
entry->changing_xact_state is still true. Then the entry will be
caught by the following condition and cleaned up:
/*
* If connection is before starting transaction or is already unsalvageable,
* do only the cleanup and don't touch it further.
*/
if (entry->changing_xact_state)
{
pgfdw_cleanup_after_transaction(entry);
return;
}
>
> For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :
>
> bq. This commits introduces to new background processes: foreign
>
> commits introduces to new -> commit introduces two new
Fixed.
>
> +FdwXactExistsXid(TransactionId xid)
>
> Since Xid is the parameter to this method, I think the Xid suffix can be dropped from the method name.
But there is already a function named FdwXactExists()?
bool
FdwXactExists(Oid dbid, Oid serverid, Oid userid)
As far as I read other code, we already have such functions that have
the same functionality but have different arguments. For instance,
SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
we can leave as it is but is it better to have like
FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?
>
> + * Portions Copyright (c) 2020, PostgreSQL Global Development Group
>
> Please correct year in the next patch set.
Fixed.
>
> +FdwXactLauncherRequestToLaunch(void)
>
> Since the launcher's job is to 'launch', I think the Launcher can be omitted from the method name.
Agreed. How about FdwXactRequestToLaunchResolver()?
>
> +/* Report shared memory space needed by FdwXactRsoverShmemInit */
> +Size
> +FdwXactRslvShmemSize(void)
>
> Are both Rsover and Rslv referring to resolver ? It would be better to use whole word which reduces confusion.
> Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or FdwXactResolveShmemInit)
Agreed. I realized that these functions are the launcher's function,
not resolver's. So I'd change to FdwXactLauncherShmemSize() and
FdwXactLauncherShmemInit() respectively.
>
> +fdwxact_launch_resolver(Oid dbid)
>
> The above method is not in camel case. It would be better if method names are consistent (in casing).
Fixed.
>
> + errmsg("out of foreign transaction resolver slots"),
> + errhint("You might need to increase max_foreign_transaction_resolvers.")));
>
> It would be nice to include the value of max_foreign_xact_resolvers
I agree it would be nice but looking at other code we don't include
the value in this kind of messages.
>
> For fdwxact_resolver_onexit():
>
> + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
> + fdwxact->locking_backend = InvalidBackendId;
> + LWLockRelease(FdwXactLock);
>
> There is no call to method inside the for loop which may take time. I wonder if the lock can be obtained prior to the for loop and released coming out of the for loop.
Agreed.
>
> +FXRslvLoop(void)
>
> Please use Resolver instead of Rslv
Fixed.
>
> + FdwXactResolveFdwXacts(held_fdwxacts, nheld);
>
> Fdw and Xact are repeated twice each in the method name. Probably the method name can be made shorter.
Fixed.
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
On 2021/01/18 14:54, Masahiko Sawada wrote: > On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu <zyu@yugabyte.com> wrote: >> >> For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch : >> >> + entry->changing_xact_state = true; >> ... >> + entry->changing_xact_state = abort_cleanup_failure; >> >> I don't see return statement in between the two assignments. I wonder why entry->changing_xact_state is set to true, andlater being assigned again. > > Because postgresRollbackForeignTransaction() can get called again in > case where an error occurred during aborting and cleanup the > transaction. For example, if an error occurred when executing ABORT > TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR), > postgresRollbackForeignTransaction() will get called again while > entry->changing_xact_state is still true. Then the entry will be > caught by the following condition and cleaned up: > > /* > * If connection is before starting transaction or is already unsalvageable, > * do only the cleanup and don't touch it further. > */ > if (entry->changing_xact_state) > { > pgfdw_cleanup_after_transaction(entry); > return; > } > >> >> For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch : >> >> bq. This commits introduces to new background processes: foreign >> >> commits introduces to new -> commit introduces two new > > Fixed. > >> >> +FdwXactExistsXid(TransactionId xid) >> >> Since Xid is the parameter to this method, I think the Xid suffix can be dropped from the method name. > > But there is already a function named FdwXactExists()? > > bool > FdwXactExists(Oid dbid, Oid serverid, Oid userid) > > As far as I read other code, we already have such functions that have > the same functionality but have different arguments. For instance, > SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think > we can leave as it is but is it better to have like > FdwXactCheckExistence() and FdwXactCheckExistenceByXid()? > >> >> + * Portions Copyright (c) 2020, PostgreSQL Global Development Group >> >> Please correct year in the next patch set. > > Fixed. > >> >> +FdwXactLauncherRequestToLaunch(void) >> >> Since the launcher's job is to 'launch', I think the Launcher can be omitted from the method name. > > Agreed. How about FdwXactRequestToLaunchResolver()? > >> >> +/* Report shared memory space needed by FdwXactRsoverShmemInit */ >> +Size >> +FdwXactRslvShmemSize(void) >> >> Are both Rsover and Rslv referring to resolver ? It would be better to use whole word which reduces confusion. >> Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or FdwXactResolveShmemInit) > > Agreed. I realized that these functions are the launcher's function, > not resolver's. So I'd change to FdwXactLauncherShmemSize() and > FdwXactLauncherShmemInit() respectively. > >> >> +fdwxact_launch_resolver(Oid dbid) >> >> The above method is not in camel case. It would be better if method names are consistent (in casing). > > Fixed. > >> >> + errmsg("out of foreign transaction resolver slots"), >> + errhint("You might need to increase max_foreign_transaction_resolvers."))); >> >> It would be nice to include the value of max_foreign_xact_resolvers > > I agree it would be nice but looking at other code we don't include > the value in this kind of messages. > >> >> For fdwxact_resolver_onexit(): >> >> + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); >> + fdwxact->locking_backend = InvalidBackendId; >> + LWLockRelease(FdwXactLock); >> >> There is no call to method inside the for loop which may take time. I wonder if the lock can be obtained prior to thefor loop and released coming out of the for loop. > > Agreed. > >> >> +FXRslvLoop(void) >> >> Please use Resolver instead of Rslv > > Fixed. > >> >> + FdwXactResolveFdwXacts(held_fdwxacts, nheld); >> >> Fdw and Xact are repeated twice each in the method name. Probably the method name can be made shorter. > > Fixed. You fixed some issues. But maybe you forgot to attach the latest patches? I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature.If there are such changes, IMO we can apply them in advance, and which would make the patches simpler. + if (PQresultStatus(res) != PGRES_COMMAND_OK) + ereport(ERROR, (errmsg("could not commit transaction on server %s", + frstate->server->servername))); You changed the code this way because you want to include the server name in the error message? I agree that it's helpfulto report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error()for the returned PGresult, the reported error message contains less information. If this understandingis right, I don't think that this change is an improvement. Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that it alsoreports the server name? If we do that, the server name is reported not only when COMMIT fails but also when other commandsfail. Of course, if this change is not essential, we can skip doing this in the first version. - /* - * Regardless of the event type, we can now mark ourselves as out of the - * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE, - * this saves a useless scan of the hashtable during COMMIT or PREPARE.) - */ - xact_got_connection = false; With this change, xact_got_connection seems to never be set to false. Doesn't this break pgfdw_subxact_callback() using xact_got_connection? + /* Also reset cursor numbering for next transaction */ + cursor_number = 0; Originally this variable is reset to 0 once per transaction end. But with the patch, it's reset to 0 every time when a foreigntransaction ends at each connection. This change would be harmless fortunately in practice, but seems not right theoretically. This makes me wonder if new FDW API is not good at handling the case where some operations need to be performed once pertransaction end. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > You fixed some issues. But maybe you forgot to attach the latest patches? Yes, I've attached the updated patches. > > I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature.If there are such changes, IMO we can apply them in advance, and which would make the patches simpler. Thank you for reviewing the patches! > > + if (PQresultStatus(res) != PGRES_COMMAND_OK) > + ereport(ERROR, (errmsg("could not commit transaction on server %s", > + frstate->server->servername))); > > You changed the code this way because you want to include the server name in the error message? I agree that it's helpfulto report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error()for the returned PGresult, the reported error message contains less information. If this understandingis right, I don't think that this change is an improvement. Right. It's better to use do_sql_command() instead. > Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that italso reports the server name? If we do that, the server name is reported not only when COMMIT fails but also when othercommands fail. > > Of course, if this change is not essential, we can skip doing this in the first version. Yes, I think it's not essential for now. We can improve it later if we want. > > - /* > - * Regardless of the event type, we can now mark ourselves as out of the > - * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE, > - * this saves a useless scan of the hashtable during COMMIT or PREPARE.) > - */ > - xact_got_connection = false; > > With this change, xact_got_connection seems to never be set to false. Doesn't this break pgfdw_subxact_callback() usingxact_got_connection? I think xact_got_connection is set to false in pgfdw_cleanup_after_transaction() that is called at the end of each foreign transaction (i.g., in postgresCommitForeignTransaction() and postgresRollbackForeignTransaction()). But as you're concerned below, it's reset for each foreign transaction end rather than the parent's transaction end. > > + /* Also reset cursor numbering for next transaction */ > + cursor_number = 0; > > Originally this variable is reset to 0 once per transaction end. But with the patch, it's reset to 0 every time when aforeign transaction ends at each connection. This change would be harmless fortunately in practice, but seems not righttheoretically. > > This makes me wonder if new FDW API is not good at handling the case where some operations need to be performed once pertransaction end. I think that the problem comes from the fact that FDW needs to use both SubXactCallback and new FDW API. If we want to perform some operations at the end of the top transaction per FDW, not per foreign transaction, we will either still need to use XactCallback or need to rethink the FDW API design. But given that we call commit and rollback FDW API for only foreign servers that actually started a transaction, I’m not sure if there are such operations in practice. IIUC there is not at least from the normal (not-sub) transaction termination perspective. IIUC xact_got_transaction is used to skip iterating over all cached connections to find open remote (sub) transactions. This is not necessary anymore at least from the normal transaction termination perspective. So maybe we can improve it so that it tracks whether any of the cached connections opened a subtransaction. That is, we set it true when we created a savepoint on any connections and set it false at the end of pgfdw_subxact_callback() if we see that xact_depth of all cached entry is less than or equal to 1 after iterating over all entries. Regarding cursor_number, it essentially needs to be unique at least within a transaction so we can manage it per transaction or per connection. But the current postgres_fdw rather ensure uniqueness across all connections. So it seems to me that this can be fixed by making individual connection have cursor_number and resetting it in pgfdw_cleanup_after_transaction(). I think this can be in a separate patch. Or it also could solve this problem that we terminate subtransactions via a FDW API but I don't think it's a good idea. What do you think? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
- v34-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v34-0005-postgres_fdw-supports-prepare-API.patch
- v34-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v34-0008-Prepare-foreign-transactions-at-commit-time.patch
- v34-0010-Documentation-update.patch
- v34-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v34-0006-Add-GetPrepareId-API.patch
- v34-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v34-0003-Recreate-RemoveForeignServerById.patch
- v34-0004-Add-PrepareForeignTransaction-API.patch
- v34-0001-Introduce-transaction-manager-for-foreign-transa.patch
On Sat, Jan 16, 2021 at 1:39 AM Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, Thank you for reviewing the patch! > For v32-0004-Add-PrepareForeignTransaction-API.patch : > > + * Whenever a foreign transaction is processed, the corresponding FdwXact > + * entry is update. To avoid holding the lock during transaction processing > + * which may take an unpredicatable time the in-memory data of foreign > > entry is update -> entry is updated > > unpredictable -> unpredictable Fixed. ¨ > > + int nlefts = 0; > > nlefts -> nremaining > > + elog(DEBUG1, "left %u foreign transactions", nlefts); > > The message can be phrased as "%u foreign transactions remaining" Fixed. > > +FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts) > > Fdw and Xact are repeated. Seems one should suffice. How about naming the method FdwXactResolveTransactions() ? > Similar comment for FdwXactResolveOneFdwXact(FdwXact fdwxact) Agreed. I changed to ResolveFdwXacts() and ResolveOneFdwXact() respectively to avoid a long function name. > > For get_fdwxact(): > > + /* This entry matches the condition */ > + found = true; > + break; > > Instead of breaking and returning, you can return within the loop directly. Fixed. Those changes are incorporated into the latest version patches[1] I submitted today. Regards, [1] https://www.postgresql.org/message-id/CAD21AoBYyA5O%2BFPN4Cs9YWiKjq319BvF5fYmKNsFTZfwTcWjQw%40mail.gmail.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 2021/01/27 14:08, Masahiko Sawada wrote: > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> >> >> You fixed some issues. But maybe you forgot to attach the latest patches? > > Yes, I've attached the updated patches. Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change. + * An FDW that implements both commit and rollback APIs can request to register + * the foreign transaction by FdwXactRegisterXact() to participate it to a + * group of distributed tranasction. The registered foreign transactions are + * identified by OIDs of server and user. I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions can havethe same combination of OIDs of server and user. For example, the following two SELECT queries start the different foreigntransactions but their user OID is the same. OID of user mapping should be used instead of OID of user? CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw; CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres'); CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres'); CREATE TABLE t(i int); CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't'); BEGIN; SELECT * FROM ft; DROP USER MAPPING FOR postgres SERVER loopback ; SELECT * FROM ft; COMMIT; + /* Commit foreign transactions if any */ + AtEOXact_FdwXact(true); Don't we need to pass XACT_EVENT_PARALLEL_PRE_COMMIT or XACT_EVENT_PRE_COMMIT flag? Probably we don't need to do this ifpostgres_fdw is only user of this new API. But if we make this new API generic one, such flags seem necessary so that someforeign data wrappers might have different behaviors for those flags. Because of the same reason as above, AtEOXact_FdwXact() should also be called after CallXactCallbacks(is_parallel_worker? XACT_EVENT_PARALLEL_COMMIT : XACT_EVENT_COMMIT)? + /* + * Abort foreign transactions if any. This needs to be done before marking + * this transaction as not running since FDW's transaction callbacks might + * assume this transaction is still in progress. + */ + AtEOXact_FdwXact(false); Same as above. +/* + * This function is called at PREPARE TRANSACTION. Since we don't support + * preparing foreign transactions yet, raise an error if the local transaction + * has any foreign transaction. + */ +void +AtPrepare_FdwXact(void) +{ + if (FdwXactParticipants != NIL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot PREPARE a transaction that has operated on foreign tables"))); +} This means that some foreign data wrappers suppporting the prepare transaction (though I'm not sure if such wappers actuallyexist or not) cannot use the new API? If we want to allow those wrappers to use new API, AtPrepare_FdwXact() shouldcall the prepare callback and each wrapper should emit an error within the callback if necessary. + foreach(lc, FdwXactParticipants) + { + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc); + + if (fdw_part->server->serverid == serverid && + fdw_part->usermapping->userid == userid) Isn't this ineffecient when starting lots of foreign transactions because we need to scan all the entries in the list everytime? +static ConnCacheEntry * +GetConnectionCacheEntry(Oid umid) +{ + bool found; + ConnCacheEntry *entry; + ConnCacheKey key; + + /* First time through, initialize connection cache hashtable */ + if (ConnectionHash == NULL) + { + HASHCTL ctl; + + ctl.keysize = sizeof(ConnCacheKey); + ctl.entrysize = sizeof(ConnCacheEntry); + ConnectionHash = hash_create("postgres_fdw connections", 8, + &ctl, + HASH_ELEM | HASH_BLOBS); Currently ConnectionHash is created under TopMemoryContext. With the patch, since GetConnectionCacheEntry() can be calledin other places, ConnectionHash may be created under the memory context other than TopMemoryContext? If so, that'ssafe? - if (PQstatus(entry->conn) != CONNECTION_OK || - PQtransactionStatus(entry->conn) != PQTRANS_IDLE || - entry->changing_xact_state || - entry->invalidated) ... + if (PQstatus(entry->conn) != CONNECTION_OK || + PQtransactionStatus(entry->conn) != PQTRANS_IDLE || + entry->changing_xact_state) Why did you get rid of the condition "entry->invalidated"? > >> >> I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature.If there are such changes, IMO we can apply them in advance, and which would make the patches simpler. > > Thank you for reviewing the patches! > >> >> + if (PQresultStatus(res) != PGRES_COMMAND_OK) >> + ereport(ERROR, (errmsg("could not commit transaction on server %s", >> + frstate->server->servername))); >> >> You changed the code this way because you want to include the server name in the error message? I agree that it's helpfulto report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error()for the returned PGresult, the reported error message contains less information. If this understandingis right, I don't think that this change is an improvement. > > Right. It's better to use do_sql_command() instead. > >> Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that italso reports the server name? If we do that, the server name is reported not only when COMMIT fails but also when othercommands fail. >> >> Of course, if this change is not essential, we can skip doing this in the first version. > > Yes, I think it's not essential for now. We can improve it later if we want. > >> >> - /* >> - * Regardless of the event type, we can now mark ourselves as out of the >> - * transaction. (Note: if we are here during PRE_COMMIT or PRE_PREPARE, >> - * this saves a useless scan of the hashtable during COMMIT or PREPARE.) >> - */ >> - xact_got_connection = false; >> >> With this change, xact_got_connection seems to never be set to false. Doesn't this break pgfdw_subxact_callback() usingxact_got_connection? > > I think xact_got_connection is set to false in > pgfdw_cleanup_after_transaction() that is called at the end of each > foreign transaction (i.g., in postgresCommitForeignTransaction() and > postgresRollbackForeignTransaction()). > > But as you're concerned below, it's reset for each foreign transaction > end rather than the parent's transaction end. > >> >> + /* Also reset cursor numbering for next transaction */ >> + cursor_number = 0; >> >> Originally this variable is reset to 0 once per transaction end. But with the patch, it's reset to 0 every time when aforeign transaction ends at each connection. This change would be harmless fortunately in practice, but seems not righttheoretically. >> >> This makes me wonder if new FDW API is not good at handling the case where some operations need to be performed once pertransaction end. > > I think that the problem comes from the fact that FDW needs to use > both SubXactCallback and new FDW API. > > If we want to perform some operations at the end of the top > transaction per FDW, not per foreign transaction, we will either still > need to use XactCallback or need to rethink the FDW API design. But > given that we call commit and rollback FDW API for only foreign > servers that actually started a transaction, I’m not sure if there are > such operations in practice. IIUC there is not at least from the > normal (not-sub) transaction termination perspective. One feature in my mind that may not match with this new API is to perform transaction commits on multiple servers in parallel.That's something like the following. As far as I can recall, another proposed version of 2pc on postgres_fdw patchincluded that feature. If we want to implement this to increase the performance of transaction commit in the future,I'm afraid that new API will prevent that. foreach(foreign transactions) send commit command foreach(foreign transactions) wait for reply of commit On second thought, new per-transaction commit/rollback callback is essential when users or the resolver process want to resolvethe specifed foreign transaction, but not essential when backends commit/rollback foreign transactions. That is, evenif we add per-transaction new API for users and resolver process, backends can still use CallXactCallbacks() when theycommit/rollback foreign transactions. Is this understanding right? > > IIUC xact_got_transaction is used to skip iterating over all cached > connections to find open remote (sub) transactions. This is not > necessary anymore at least from the normal transaction termination > perspective. So maybe we can improve it so that it tracks whether any > of the cached connections opened a subtransaction. That is, we set it > true when we created a savepoint on any connections and set it false > at the end of pgfdw_subxact_callback() if we see that xact_depth of > all cached entry is less than or equal to 1 after iterating over all > entries. OK. > Regarding cursor_number, it essentially needs to be unique at least > within a transaction so we can manage it per transaction or per > connection. But the current postgres_fdw rather ensure uniqueness > across all connections. So it seems to me that this can be fixed by > making individual connection have cursor_number and resetting it in > pgfdw_cleanup_after_transaction(). I think this can be in a separate > patch. Maybe, so let's work on this later, at least after we confirm that this change is really necessary. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2021/01/27 14:08, Masahiko Sawada wrote: > > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao > > <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> You fixed some issues. But maybe you forgot to attach the latest patches? > > > > Yes, I've attached the updated patches. > > Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change. > > + * An FDW that implements both commit and rollback APIs can request to register > + * the foreign transaction by FdwXactRegisterXact() to participate it to a > + * group of distributed tranasction. The registered foreign transactions are > + * identified by OIDs of server and user. > > I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions can havethe same combination of OIDs of server and user. For example, the following two SELECT queries start the different foreigntransactions but their user OID is the same. OID of user mapping should be used instead of OID of user? > > CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw; > CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres'); > CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres'); > CREATE TABLE t(i int); > CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't'); > BEGIN; > SELECT * FROM ft; > DROP USER MAPPING FOR postgres SERVER loopback ; > SELECT * FROM ft; > COMMIT; Good catch. I've considered using user mapping OID or a pair of user mapping OID and server OID as a key of foreign transactions but I think it also has a problem if an FDW caches the connection by pair of server OID and user OID whereas the core identifies them by user mapping OID. For instance, mysql_fdw manages connections by pair of server OID and user OID. For example, let's consider the following execution: BEGIN; SET ROLE user_A; INSERT INTO ft1 VALUES (1); SET ROLE user_B; INSERT INTO ft1 VALUES (1); COMMIT; Suppose that an FDW identifies the connections by {server OID, user OID} and the core GTM identifies the transactions by user mapping OID, and user_A and user_B use the public user mapping to connect server_X. In the FDW, there are two connections identified by {user_A, sever_X} and {user_B, server_X} respectively, and therefore opens two transactions on each connection, while GTM has only one FdwXact entry because the two connections refer to the same user mapping OID. As a result, at the end of the transaction, GTM ends only one foreign transaction, leaving another one. Using user mapping OID seems natural to me but I'm concerned that changing role in the middle of transaction is likely to happen than dropping the public user mapping but not sure. We would need to find more better way. > > + /* Commit foreign transactions if any */ > + AtEOXact_FdwXact(true); > > Don't we need to pass XACT_EVENT_PARALLEL_PRE_COMMIT or XACT_EVENT_PRE_COMMIT flag? Probably we don't need to do this ifpostgres_fdw is only user of this new API. But if we make this new API generic one, such flags seem necessary so that someforeign data wrappers might have different behaviors for those flags. > > Because of the same reason as above, AtEOXact_FdwXact() should also be called after CallXactCallbacks(is_parallel_worker? XACT_EVENT_PARALLEL_COMMIT : XACT_EVENT_COMMIT)? Agreed. In AtEOXact_FdwXact() we call either CommitForeignTransaction() or RollbackForeignTransaction() with FDWXACT_FLAG_ONEPHASE flag for each foreign transaction. So for example in commit case, we will call new FDW APIs in the following order: 1. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction. 2. Commit locally. 3. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction. In the future when we have a new FDW API to prepare foreign transaction, the sequence will be: 1. Call PrepareForeignTransaction() for each foreign transaction. 2. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT flag for each foreign transaction. 3. Commit locally. 4. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT flag for each foreign transaction. So we expect FDW that wants to support 2PC not to commit foreign transaction if CommitForeignTransaction() is called with XACT_EVENT_PARALLEL_PRE_COMMIT flag and no FDWXACT_FLAG_ONEPHASE flag. > > + /* > + * Abort foreign transactions if any. This needs to be done before marking > + * this transaction as not running since FDW's transaction callbacks might > + * assume this transaction is still in progress. > + */ > + AtEOXact_FdwXact(false); > > Same as above. > > +/* > + * This function is called at PREPARE TRANSACTION. Since we don't support > + * preparing foreign transactions yet, raise an error if the local transaction > + * has any foreign transaction. > + */ > +void > +AtPrepare_FdwXact(void) > +{ > + if (FdwXactParticipants != NIL) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("cannot PREPARE a transaction that has operated on foreign tables"))); > +} > > This means that some foreign data wrappers suppporting the prepare transaction (though I'm not sure if such wappers actuallyexist or not) cannot use the new API? If we want to allow those wrappers to use new API, AtPrepare_FdwXact() shouldcall the prepare callback and each wrapper should emit an error within the callback if necessary. I think if we support the prepare callback and allow FDWs to prepare foreign transactions, we have to call CommitForeignTransaction() on COMMIT PREPARED for foreign transactions that are associated with the local prepared transaction. But how can we know which foreign transactions are? Even a client who didn’t do PREPARE TRANSACTION could do COMMIT PREPARED. We would need to store the information of which foreign transactions are associated with the local transaction somewhere. The 0004 patch introduces WAL logging along with prepare API and we store that information to a WAL record. I think it’s better at this time to disallow PREPARE TRANSACTION when at least one foreign transaction is registered via FDW API. > > + foreach(lc, FdwXactParticipants) > + { > + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc); > + > + if (fdw_part->server->serverid == serverid && > + fdw_part->usermapping->userid == userid) > > Isn't this ineffecient when starting lots of foreign transactions because we need to scan all the entries in the list everytime? Agreed. I'll change it to a hash map. > > +static ConnCacheEntry * > +GetConnectionCacheEntry(Oid umid) > +{ > + bool found; > + ConnCacheEntry *entry; > + ConnCacheKey key; > + > + /* First time through, initialize connection cache hashtable */ > + if (ConnectionHash == NULL) > + { > + HASHCTL ctl; > + > + ctl.keysize = sizeof(ConnCacheKey); > + ctl.entrysize = sizeof(ConnCacheEntry); > + ConnectionHash = hash_create("postgres_fdw connections", 8, > + &ctl, > + HASH_ELEM | HASH_BLOBS); > > Currently ConnectionHash is created under TopMemoryContext. With the patch, since GetConnectionCacheEntry() can be calledin other places, ConnectionHash may be created under the memory context other than TopMemoryContext? If so, that'ssafe? hash_create() creates a hash map under TopMemoryContext unless HASH_CONTEXT is specified. So I think ConnectionHash is still created in the same memory context. > > - if (PQstatus(entry->conn) != CONNECTION_OK || > - PQtransactionStatus(entry->conn) != PQTRANS_IDLE || > - entry->changing_xact_state || > - entry->invalidated) > ... > + if (PQstatus(entry->conn) != CONNECTION_OK || > + PQtransactionStatus(entry->conn) != PQTRANS_IDLE || > + entry->changing_xact_state) > > Why did you get rid of the condition "entry->invalidated"? My bad. I'll fix it. > > > > If we want to perform some operations at the end of the top > > transaction per FDW, not per foreign transaction, we will either still > > need to use XactCallback or need to rethink the FDW API design. But > > given that we call commit and rollback FDW API for only foreign > > servers that actually started a transaction, I’m not sure if there are > > such operations in practice. IIUC there is not at least from the > > normal (not-sub) transaction termination perspective. > > One feature in my mind that may not match with this new API is to perform transaction commits on multiple servers in parallel.That's something like the following. As far as I can recall, another proposed version of 2pc on postgres_fdw patchincluded that feature. If we want to implement this to increase the performance of transaction commit in the future,I'm afraid that new API will prevent that. > > foreach(foreign transactions) > send commit command > > foreach(foreign transactions) > wait for reply of commit What I'm thinking is to pass a flag, say FDWXACT_ASYNC, to Commit/RollbackForeignTransaction() and add a new API to wait for the operation to complete, say CompleteForeignTransaction(). If commit/rollback callback in an FDW is called with FDWXACT_ASYNC flag, it should send the command and immediately return the handler (e.g., PQsocket() in postgres_fdw). The GTM gathers the handlers and poll events on them. To complete the command, the GTM calls CompleteForeignTransaction() to wait for the command to complete. Please refer to XA specification for details (especially xa_complete() and TMASYNC flag). A pseudo-code is something like the followings: foreach (foreign transactions) call CommitForeignTransaction(FDWXACT_ASYNC); append the returned fd to the array. while (true) { poll event on fds; call CompleteForeignTransaction() for fd owner; if (success) remove fd from the array; if (array is empty) break; } > > On second thought, new per-transaction commit/rollback callback is essential when users or the resolver process want toresolve the specifed foreign transaction, but not essential when backends commit/rollback foreign transactions. That is,even if we add per-transaction new API for users and resolver process, backends can still use CallXactCallbacks() whenthey commit/rollback foreign transactions. Is this understanding right? I haven’t tried that but I think that's possible if we can know commit/rollback callback (e.g., postgresCommitForeignTransaction() etc in postgres_fdw) is called via SQL function (pg_resolve_foreign_xact() SQL function) or called by the resolver process. That is, we register foreign transaction via FdwXactRegisterXact(), don’t do nothing in postgresCommit/RollbackForeignTransaction() if these are called by the backend, and perform COMMIT/ROLLBACK in pgfdw_xact_callback() in asynchronous manner. On the other hand, if postgresCommit/RollbackForeignTransaction() is called via SQL function or by the resolver these functions commit/rollback the transaction. > > > Regarding cursor_number, it essentially needs to be unique at least > > within a transaction so we can manage it per transaction or per > > connection. But the current postgres_fdw rather ensure uniqueness > > across all connections. So it seems to me that this can be fixed by > > making individual connection have cursor_number and resetting it in > > pgfdw_cleanup_after_transaction(). I think this can be in a separate > > patch. > > Maybe, so let's work on this later, at least after we confirm that > this change is really necessary. Okay. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > > > > > On 2021/01/27 14:08, Masahiko Sawada wrote: > > > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao > > > <masao.fujii@oss.nttdata.com> wrote: > > >> > > >> > > >> You fixed some issues. But maybe you forgot to attach the latest patches? > > > > > > Yes, I've attached the updated patches. > > > > Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change. > > > > + * An FDW that implements both commit and rollback APIs can request to register > > + * the foreign transaction by FdwXactRegisterXact() to participate it to a > > + * group of distributed tranasction. The registered foreign transactions are > > + * identified by OIDs of server and user. > > > > I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions canhave the same combination of OIDs of server and user. For example, the following two SELECT queries start the differentforeign transactions but their user OID is the same. OID of user mapping should be used instead of OID of user? > > > > CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw; > > CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres'); > > CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres'); > > CREATE TABLE t(i int); > > CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't'); > > BEGIN; > > SELECT * FROM ft; > > DROP USER MAPPING FOR postgres SERVER loopback ; > > SELECT * FROM ft; > > COMMIT; > > Good catch. I've considered using user mapping OID or a pair of user > mapping OID and server OID as a key of foreign transactions but I > think it also has a problem if an FDW caches the connection by pair of > server OID and user OID whereas the core identifies them by user > mapping OID. For instance, mysql_fdw manages connections by pair of > server OID and user OID. > > For example, let's consider the following execution: > > BEGIN; > SET ROLE user_A; > INSERT INTO ft1 VALUES (1); > SET ROLE user_B; > INSERT INTO ft1 VALUES (1); > COMMIT; > > Suppose that an FDW identifies the connections by {server OID, user > OID} and the core GTM identifies the transactions by user mapping OID, > and user_A and user_B use the public user mapping to connect server_X. > In the FDW, there are two connections identified by {user_A, sever_X} > and {user_B, server_X} respectively, and therefore opens two > transactions on each connection, while GTM has only one FdwXact entry > because the two connections refer to the same user mapping OID. As a > result, at the end of the transaction, GTM ends only one foreign > transaction, leaving another one. > > Using user mapping OID seems natural to me but I'm concerned that > changing role in the middle of transaction is likely to happen than > dropping the public user mapping but not sure. We would need to find > more better way. After more thought, I'm inclined to think it's better to identify foreign transactions by user mapping OID. The main reason is, I think FDWs that manages connection caches by pair of user OID and server OID potentially has a problem with the scenario Fujii-san mentioned. If an FDW has to use another user mapping (i.g., connection information) due to the currently used user mapping being removed, it would have to disconnect the previous connection because it has to use the same connection cache. But at that time it doesn't know the transaction will be committed or aborted. Also, such FDW has the same problem that postgres_fdw used to have; a backend establishes multiple connections with the same connection information if multiple local users use the public user mapping. Even from the perspective of foreign transaction management, it more makes sense that foreign transactions correspond to the connections to foreign servers, not to the local connection information. I can see that some FDW implementations such as mysql_fdw and firebird_fdw identify connections by pair of server OID and user OID but I think this is because they consulted to old postgres_fdw code. I suspect that there is no use case where FDW needs to identify connections in that way. If the core GTM identifies them by user mapping OID, we could enforce those FDWs to change their way but I think that change would be the right improvement. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 2021/03/17 12:03, Masahiko Sawada wrote: > I've attached the updated version patch set. Thanks for updating the patches! I'm now restarting to review of 2PC because I'd like to use this feature in PG15. I think the following logic of resolving and removing the fdwxact entries by the transaction resolver needs to be fixed. 1. check if pending fdwxact entries exist HoldInDoubtFdwXacts() checks if there are entries which the condition is InvalidBackendId and so on. After that it gets the indexes of the fdwxacts array. The fdwXactLock is released at the end of this phase. 2. resolve and remove the entries held in 1th phase. ResolveFdwXacts() resloves the status per each fdwxact entry using the indexes. The end of resolving, the transaction resolver remove the entry in fdwxacts array via remove_fdwact(). The way to remove the entry is the following. Since to control using the index, the indexes of getting in the 1st phase are meaningless anymore. /* Remove the entry from active array */ FdwXactCtl->num_fdwxacts--; FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts]; This seems to lead resolving the unexpected fdwxacts and it can occur the following assertion error. That's why I noticed. For example, there is the case which a backend inserts new fdwxact entry in the free space, which the resolver removed the entry right before, and the resolver accesses the new entry which doesn't need to resolve yet because it use the indexes checked in 1st phase. Assert(fdwxact->locking_backend == MyBackendId); The simple solution is that to get fdwXactLock exclusive all the time from the begining of 1st phase to the finishing of 2nd phase. But, I worried that the performance impact became too big... I came up with two solutions although there may be better solutions. A. to remove resolved entries at once after resolution for all held entries is finished If so, we don't need to take the exclusive lock for a long time. But, this have other problems, which pg_remove_foreign_xact() can still remove entries and we need to handle the fail of resolving. I wondered that we can solve the first problem to introduce a new lock like "removing lock" and only the processes which hold the lock can remove the entries. The performance impact is limited since the insertion the fdwxact entries is not blocked by this lock. And second problem can be solved using try-catch sentence. B. to merge 1st and 2nd phase Now, the resolver resolves the entries together. That's the reason why it's difficult to remove the entries. So, it seems to solve the problem to execute checking, resolving and removing per each entry. I think it's better since this is simpler than A. If I'm missing something, please let me know. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Tue, Apr 27, 2021 at 10:03 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > On 2021/03/17 12:03, Masahiko Sawada wrote: > > I've attached the updated version patch set. > > Thanks for updating the patches! I'm now restarting to review of 2PC because > I'd like to use this feature in PG15. Thank you for reviewing the patch! Much appreciated. > > > I think the following logic of resolving and removing the fdwxact entries > by the transaction resolver needs to be fixed. > > 1. check if pending fdwxact entries exist > > HoldInDoubtFdwXacts() checks if there are entries which the condition is > InvalidBackendId and so on. After that it gets the indexes of the fdwxacts > array. The fdwXactLock is released at the end of this phase. > > 2. resolve and remove the entries held in 1th phase. > > ResolveFdwXacts() resloves the status per each fdwxact entry using the > indexes. The end of resolving, the transaction resolver remove the entry in > fdwxacts array via remove_fdwact(). > > The way to remove the entry is the following. Since to control using the > index, the indexes of getting in the 1st phase are meaningless anymore. > > /* Remove the entry from active array */ > FdwXactCtl->num_fdwxacts--; > FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts]; > > This seems to lead resolving the unexpected fdwxacts and it can occur the > following assertion error. That's why I noticed. For example, there is the > case which a backend inserts new fdwxact entry in the free space, which the > resolver removed the entry right before, and the resolver accesses the new > entry which doesn't need to resolve yet because it use the indexes checked in > 1st phase. > > Assert(fdwxact->lockeing_backend == MyBackendId); Good point. I agree with your analysis. > > > > The simple solution is that to get fdwXactLock exclusive all the time from the > begining of 1st phase to the finishing of 2nd phase. But, I worried that the > performance impact became too big... > > I came up with two solutions although there may be better solutions. > > A. to remove resolved entries at once after resolution for all held entries is > finished > > If so, we don't need to take the exclusive lock for a long time. But, this > have other problems, which pg_remove_foreign_xact() can still remove entries > and we need to handle the fail of resolving. > > I wondered that we can solve the first problem to introduce a new lock like > "removing lock" and only the processes which hold the lock can remove the > entries. The performance impact is limited since the insertion the fdwxact > entries is not blocked by this lock. And second problem can be solved using > try-catch sentence. > > > B. to merge 1st and 2nd phase > > Now, the resolver resolves the entries together. That's the reason why it's > difficult to remove the entries. So, it seems to solve the problem to execute > checking, resolving and removing per each entry. I think it's better since > this is simpler than A. If I'm missing something, please let me know. It seems to me that solution B would be simpler and better. I'll try to fix this issue by using solution B and rebase the patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch : Thank you for reviewing the patch! > > With this commit, the foreign server modified within the transaction marked as 'modified'. > > transaction marked -> transaction is marked Will fix. > > +#define IsForeignTwophaseCommitRequested() \ > + (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED) > > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired. But even if foreign_twophase_commit is FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if there is only one modified server, right? It seems the name IsForeignTwophaseCommitRequested is fine. > > +static bool > +checkForeignTwophaseCommitRequired(bool local_modified) > > + if (!ServerSupportTwophaseCommit(fdw_part)) > + have_no_twophase = true; > ... > + if (have_no_twophase) > + ereport(ERROR, > > It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s). > Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop wouldbecome if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1). > have_no_twophase is no longer needed. Hmm, I think If we process one 2pc-non-capable server first and then process another one 2pc-capable server, we should raise an error but cannot detect that. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
Thank you for reviewing the patch!
>
> With this commit, the foreign server modified within the transaction marked as 'modified'.
>
> transaction marked -> transaction is marked
Will fix.
>
> +#define IsForeignTwophaseCommitRequested() \
> + (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
>
> Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired.
But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.
>
> +static bool
> +checkForeignTwophaseCommitRequired(bool local_modified)
>
> + if (!ServerSupportTwophaseCommit(fdw_part))
> + have_no_twophase = true;
> ...
> + if (have_no_twophase)
> + ereport(ERROR,
>
> It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s).
> Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop would become if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
> have_no_twophase is no longer needed.
Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote: > > > > On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote: >> > >> > Hi, >> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch : >> >> Thank you for reviewing the patch! >> >> > >> > With this commit, the foreign server modified within the transaction marked as 'modified'. >> > >> > transaction marked -> transaction is marked >> >> Will fix. >> >> > >> > +#define IsForeignTwophaseCommitRequested() \ >> > + (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED) >> > >> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired. >> >> But even if foreign_twophase_commit is >> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if >> there is only one modified server, right? It seems the name >> IsForeignTwophaseCommitRequested is fine. >> >> > >> > +static bool >> > +checkForeignTwophaseCommitRequired(bool local_modified) >> > >> > + if (!ServerSupportTwophaseCommit(fdw_part)) >> > + have_no_twophase = true; >> > ... >> > + if (have_no_twophase) >> > + ereport(ERROR, >> > >> > It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s). >> > Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop wouldbecome if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1). >> > have_no_twophase is no longer needed. >> >> Hmm, I think If we process one 2pc-non-capable server first and then >> process another one 2pc-capable server, we should raise an error but >> cannot detect that. > > > Then the check would stay as what you have in the patch: > > if (!ServerSupportTwophaseCommit(fdw_part)) > > When the non-2pc-capable server is encountered, we would report the error in place (following the ServerSupportTwophaseCommitcheck) and come out of the loop. > have_no_twophase can be dropped. But if we processed only one non-2pc-capable server, we would raise an error but should not in that case. On second thought, I think we can track how many servers are modified or not capable of 2PC during registration and unr-egistration. Then we can consider both 2PC is required and there is non-2pc-capable server is involved without looking through all participants. Thoughts? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
>
> On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> > Hi,
>> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
>>
>> Thank you for reviewing the patch!
>>
>> >
>> > With this commit, the foreign server modified within the transaction marked as 'modified'.
>> >
>> > transaction marked -> transaction is marked
>>
>> Will fix.
>>
>> >
>> > +#define IsForeignTwophaseCommitRequested() \
>> > + (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
>> >
>> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired.
>>
>> But even if foreign_twophase_commit is
>> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
>> there is only one modified server, right? It seems the name
>> IsForeignTwophaseCommitRequested is fine.
>>
>> >
>> > +static bool
>> > +checkForeignTwophaseCommitRequired(bool local_modified)
>> >
>> > + if (!ServerSupportTwophaseCommit(fdw_part))
>> > + have_no_twophase = true;
>> > ...
>> > + if (have_no_twophase)
>> > + ereport(ERROR,
>> >
>> > It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s).
>> > Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop would become if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
>> > have_no_twophase is no longer needed.
>>
>> Hmm, I think If we process one 2pc-non-capable server first and then
>> process another one 2pc-capable server, we should raise an error but
>> cannot detect that.
>
>
> Then the check would stay as what you have in the patch:
>
> if (!ServerSupportTwophaseCommit(fdw_part))
>
> When the non-2pc-capable server is encountered, we would report the error in place (following the ServerSupportTwophaseCommit check) and come out of the loop.
> have_no_twophase can be dropped.
But if we processed only one non-2pc-capable server, we would raise an
error but should not in that case.
On second thought, I think we can track how many servers are modified
or not capable of 2PC during registration and unr-egistration. Then we
can consider both 2PC is required and there is non-2pc-capable server
is involved without looking through all participants. Thoughts?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, May 3, 2021 at 11:11 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > > > On Mon, May 3, 2021 at 5:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote: >> > >> > >> > >> > On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> >> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote: >> >> > >> >> > Hi, >> >> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch : >> >> >> >> Thank you for reviewing the patch! >> >> >> >> > >> >> > With this commit, the foreign server modified within the transaction marked as 'modified'. >> >> > >> >> > transaction marked -> transaction is marked >> >> >> >> Will fix. >> >> >> >> > >> >> > +#define IsForeignTwophaseCommitRequested() \ >> >> > + (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED) >> >> > >> >> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired. >> >> >> >> But even if foreign_twophase_commit is >> >> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if >> >> there is only one modified server, right? It seems the name >> >> IsForeignTwophaseCommitRequested is fine. >> >> >> >> > >> >> > +static bool >> >> > +checkForeignTwophaseCommitRequired(bool local_modified) >> >> > >> >> > + if (!ServerSupportTwophaseCommit(fdw_part)) >> >> > + have_no_twophase = true; >> >> > ... >> >> > + if (have_no_twophase) >> >> > + ereport(ERROR, >> >> > >> >> > It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s). >> >> > Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loopwould become if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1). >> >> > have_no_twophase is no longer needed. >> >> >> >> Hmm, I think If we process one 2pc-non-capable server first and then >> >> process another one 2pc-capable server, we should raise an error but >> >> cannot detect that. >> > >> > >> > Then the check would stay as what you have in the patch: >> > >> > if (!ServerSupportTwophaseCommit(fdw_part)) >> > >> > When the non-2pc-capable server is encountered, we would report the error in place (following the ServerSupportTwophaseCommitcheck) and come out of the loop. >> > have_no_twophase can be dropped. >> >> But if we processed only one non-2pc-capable server, we would raise an >> error but should not in that case. >> >> On second thought, I think we can track how many servers are modified >> or not capable of 2PC during registration and unr-egistration. Then we >> can consider both 2PC is required and there is non-2pc-capable server >> is involved without looking through all participants. Thoughts? > > > That is something worth trying. > I've attached the updated patches that incorporated comments from Zhihong and Ikeda-san. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
- v36-0009-Add-regression-tests-for-foreign-twophase-commit.patch
- v36-0007-Add-GetPrepareId-API.patch
- v36-0008-Documentation-update.patch
- v36-0006-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v36-0005-Prepare-foreign-transactions-at-commit-time.patch
- v36-0004-postgres_fdw-supports-prepare-API.patch
- v36-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v36-0003-Support-two-phase-commit-for-foreign-transaction.patch
- v36-0001-Introduce-transaction-manager-for-foreign-transa.patch
On Mon, May 3, 2021 at 11:11 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
>
> On Mon, May 3, 2021 at 5:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Sun, May 2, 2021 at 1:23 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> >
>> >
>> > On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>
>> >> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>> >> >
>> >> > Hi,
>> >> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
>> >>
>> >> Thank you for reviewing the patch!
>> >>
>> >> >
>> >> > With this commit, the foreign server modified within the transaction marked as 'modified'.
>> >> >
>> >> > transaction marked -> transaction is marked
>> >>
>> >> Will fix.
>> >>
>> >> >
>> >> > +#define IsForeignTwophaseCommitRequested() \
>> >> > + (foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
>> >> >
>> >> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro should be named: IsForeignTwophaseCommitRequired.
>> >>
>> >> But even if foreign_twophase_commit is
>> >> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
>> >> there is only one modified server, right? It seems the name
>> >> IsForeignTwophaseCommitRequested is fine.
>> >>
>> >> >
>> >> > +static bool
>> >> > +checkForeignTwophaseCommitRequired(bool local_modified)
>> >> >
>> >> > + if (!ServerSupportTwophaseCommit(fdw_part))
>> >> > + have_no_twophase = true;
>> >> > ...
>> >> > + if (have_no_twophase)
>> >> > + ereport(ERROR,
>> >> >
>> >> > It seems the error case should be reported within the loop. This way, we don't need to iterate the other participant(s).
>> >> > Accordingly, nserverswritten should be incremented for local server prior to the loop. The condition in the loop would become if (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
>> >> > have_no_twophase is no longer needed.
>> >>
>> >> Hmm, I think If we process one 2pc-non-capable server first and then
>> >> process another one 2pc-capable server, we should raise an error but
>> >> cannot detect that.
>> >
>> >
>> > Then the check would stay as what you have in the patch:
>> >
>> > if (!ServerSupportTwophaseCommit(fdw_part))
>> >
>> > When the non-2pc-capable server is encountered, we would report the error in place (following the ServerSupportTwophaseCommit check) and come out of the loop.
>> > have_no_twophase can be dropped.
>>
>> But if we processed only one non-2pc-capable server, we would raise an
>> error but should not in that case.
>>
>> On second thought, I think we can track how many servers are modified
>> or not capable of 2PC during registration and unr-egistration. Then we
>> can consider both 2PC is required and there is non-2pc-capable server
>> is involved without looking through all participants. Thoughts?
>
>
> That is something worth trying.
>
I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
+ xid = GetTopTransactionId();
On 2021/05/11 13:37, Masahiko Sawada wrote: > I've attached the updated patches that incorporated comments from > Zhihong and Ikeda-san. Thanks for updating the patches! I have other comments including trivial things. a. about "foreign_transaction_resolver_timeout" parameter Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs. Is there any reason? Although the following is minor case, it may confuse some users. Example case is that 1. a client executes transaction with 2PC when the resolver is processing FdwXactResolverProcessInDoubtXacts(). 2. the resolution of 1st transaction must be waited until the other transactions for 2pc are executed or timeout. 3. if the client check the 1st result value, it should wait until resolution is finished for atomic visibility (although it depends on the way how to realize atomic visibility.) The clients may be waited foreign_transaction_resolver_timeout". Users may think it's stale. Like this situation can be observed after testing with pgbench. Some unresolved transaction remains after benchmarking. I assume that this default value refers to wal_sender, archiver, and so on. But, I think this parameter is more like "commit_delay". If so, 60 seconds seems to be big value. b. about performance bottleneck (just share my simple benchmark results) The resolver process can be performance bottleneck easily although I think some users want this feature even if the performance is not so good. I tested with very simple workload in my laptop. The test condition is * two remote foreign partitions and one transaction inserts an entry in each partitions. * local connection only. If NW latency became higher, the performance became worse. * pgbench with 8 clients. The test results is the following. The performance of 2PC is only 10% performance of the one of without 2PC. * with foreign_twophase_commit = requried -> If load with more than 10TPS, the number of unresolved foreign transactions is increasing and stop with the warning "Increase max_prepared_foreign_transactions". * with foreign_twophase_commit = disabled -> 122TPS in my environments. c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch * typo: s/tranasction/transaction/ * Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if (IsInParallelMode())" because make them in the same order as CommitTransaction()? * functions name of fdwxact.c Although this depends on my feeling, xact means transaction. If this feeling same as you, the function names of FdwXactRegisterXact and so on are odd to me. FdwXactRegisterEntry or FdwXactRegisterParticipant is better? * Are the following better? - s/to register the foreign transaction by/to register the foreign transaction participant by/ - s/The registered foreign transactions/The registered participants/ - s/given foreign transaction/given foreign transaction participant/ - s/Foreign transactions involved in the current transaction/Foreign transaction participants involved in the current transaction/ Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > On 2021/05/11 13:37, Masahiko Sawada wrote: > > I've attached the updated patches that incorporated comments from > > Zhihong and Ikeda-san. > > Thanks for updating the patches! > > > I have other comments including trivial things. > > > a. about "foreign_transaction_resolver_timeout" parameter > > Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs. > Is there any reason? Although the following is minor case, it may confuse some > users. > > Example case is that > > 1. a client executes transaction with 2PC when the resolver is processing > FdwXactResolverProcessInDoubtXacts(). > > 2. the resolution of 1st transaction must be waited until the other > transactions for 2pc are executed or timeout. > > 3. if the client check the 1st result value, it should wait until resolution > is finished for atomic visibility (although it depends on the way how to > realize atomic visibility.) The clients may be waited > foreign_transaction_resolver_timeout". Users may think it's stale. > > Like this situation can be observed after testing with pgbench. Some > unresolved transaction remains after benchmarking. > > I assume that this default value refers to wal_sender, archiver, and so on. > But, I think this parameter is more like "commit_delay". If so, 60 seconds > seems to be big value. IIUC this situation seems like the foreign transaction resolution is bottle-neck and doesn’t catch up to incoming resolution requests. But how foreignt_transaction_resolver_timeout relates to this situation? foreign_transaction_resolver_timeout controls when to terminate the resolver process that doesn't have any foreign transactions to resolve. So if we set it several milliseconds, resolver processes are terminated immediately after each resolution, imposing the cost of launching resolver processes on the next resolution. > > > b. about performance bottleneck (just share my simple benchmark results) > > The resolver process can be performance bottleneck easily although I think > some users want this feature even if the performance is not so good. > > I tested with very simple workload in my laptop. > > The test condition is > * two remote foreign partitions and one transaction inserts an entry in each > partitions. > * local connection only. If NW latency became higher, the performance became > worse. > * pgbench with 8 clients. > > The test results is the following. The performance of 2PC is only 10% > performance of the one of without 2PC. > > * with foreign_twophase_commit = requried > -> If load with more than 10TPS, the number of unresolved foreign transactions > is increasing and stop with the warning "Increase > max_prepared_foreign_transactions". What was the value of max_prepared_foreign_transactions? To speed up the foreign transaction resolution, some ideas have been discussed. As another idea, how about launching resolvers for each foreign server? That way, we resolve foreign transactions on each foreign server in parallel. If foreign transactions are concentrated on the particular server, we can have multiple resolvers for the one foreign server. It doesn’t change the fact that all foreign transaction resolutions are processed by resolver processes. Apart from that, we also might want to improve foreign transaction management so that transaction doesn’t end up with an error if the foreign transaction resolution doesn’t catch up with incoming transactions that require 2PC. Maybe we can evict and serialize a state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a future improvement. > * with foreign_twophase_commit = disabled > -> 122TPS in my environments. How much is the performance without those 2PC patches and with the same workload? i.e., how fast is the current postgres_fdw that uses XactCallback? > > > c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch > > * typo: s/tranasction/transaction/ > > * Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if > (IsInParallelMode())" because make them in the same order as CommitTransaction()? I'd prefer to move AtEOXact_FdwXact() in CommitTransaction after "if (IsInParallelMode())" since other pre-commit works are done after cleaning parallel contexts. What do you think? > > * functions name of fdwxact.c > > Although this depends on my feeling, xact means transaction. If this feeling > same as you, the function names of FdwXactRegisterXact and so on are odd to > me. FdwXactRegisterEntry or FdwXactRegisterParticipant is better? > FdwXactRegisterEntry sounds good to me. Thanks. > * Are the following better? > > - s/to register the foreign transaction by/to register the foreign transaction > participant by/ > > - s/The registered foreign transactions/The registered participants/ > > - s/given foreign transaction/given foreign transaction participant/ > > - s/Foreign transactions involved in the current transaction/Foreign > transaction participants involved in the current transaction/ Agreed with the above suggestions. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 2021/05/21 10:39, Masahiko Sawada wrote: > On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >> >> >> On 2021/05/11 13:37, Masahiko Sawada wrote: >>> I've attached the updated patches that incorporated comments from >>> Zhihong and Ikeda-san. >> >> Thanks for updating the patches! >> >> >> I have other comments including trivial things. >> >> >> a. about "foreign_transaction_resolver_timeout" parameter >> >> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs. >> Is there any reason? Although the following is minor case, it may confuse some >> users. >> >> Example case is that >> >> 1. a client executes transaction with 2PC when the resolver is processing >> FdwXactResolverProcessInDoubtXacts(). >> >> 2. the resolution of 1st transaction must be waited until the other >> transactions for 2pc are executed or timeout. >> >> 3. if the client check the 1st result value, it should wait until resolution >> is finished for atomic visibility (although it depends on the way how to >> realize atomic visibility.) The clients may be waited >> foreign_transaction_resolver_timeout". Users may think it's stale. >> >> Like this situation can be observed after testing with pgbench. Some >> unresolved transaction remains after benchmarking. >> >> I assume that this default value refers to wal_sender, archiver, and so on. >> But, I think this parameter is more like "commit_delay". If so, 60 seconds >> seems to be big value. > > IIUC this situation seems like the foreign transaction resolution is > bottle-neck and doesn’t catch up to incoming resolution requests. But > how foreignt_transaction_resolver_timeout relates to this situation? > foreign_transaction_resolver_timeout controls when to terminate the > resolver process that doesn't have any foreign transactions to > resolve. So if we set it several milliseconds, resolver processes are > terminated immediately after each resolution, imposing the cost of > launching resolver processes on the next resolution. Thanks for your comments! No, this situation is not related to the foreign transaction resolution is bottle-neck or not. This issue may happen when the workload has very few foreign transactions. If new foreign transaction comes while the transaction resolver is processing resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction waits until starting next transaction resolution. If next foreign transaction doesn't come, the foreign transaction must wait starting resolution until timeout. I mentioned this situation. Thanks for letting me know the side effect if setting resolution timeout to several milliseconds. I agree. But, why termination is needed? Is there a possibility to stale like walsender? >> >> >> b. about performance bottleneck (just share my simple benchmark results) >> >> The resolver process can be performance bottleneck easily although I think >> some users want this feature even if the performance is not so good. >> >> I tested with very simple workload in my laptop. >> >> The test condition is >> * two remote foreign partitions and one transaction inserts an entry in each >> partitions. >> * local connection only. If NW latency became higher, the performance became >> worse. >> * pgbench with 8 clients. >> >> The test results is the following. The performance of 2PC is only 10% >> performance of the one of without 2PC. >> >> * with foreign_twophase_commit = requried >> -> If load with more than 10TPS, the number of unresolved foreign transactions >> is increasing and stop with the warning "Increase >> max_prepared_foreign_transactions". > > What was the value of max_prepared_foreign_transactions? Now, I tested with 200. If each resolution is finished very soon, I thought it's enough because 8clients x 2partitions = 16, though... But, it's difficult how to know the stable values. > To speed up the foreign transaction resolution, some ideas have been > discussed. As another idea, how about launching resolvers for each > foreign server? That way, we resolve foreign transactions on each > foreign server in parallel. If foreign transactions are concentrated > on the particular server, we can have multiple resolvers for the one > foreign server. It doesn’t change the fact that all foreign > transaction resolutions are processed by resolver processes. Awesome! There seems to be another pros that even if a foreign server is temporarily busy or stopped due to fail over, other foreign server's transactions can be resolved. > Apart from that, we also might want to improve foreign transaction > management so that transaction doesn’t end up with an error if the > foreign transaction resolution doesn’t catch up with incoming > transactions that require 2PC. Maybe we can evict and serialize a > state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a > future improvement. Oh, great! I didn't come up with the idea. Although I thought the feature makes difficult to know the foreign transaction is resolved stably, DBAs can check "pg_foreign_xacts" view now and it's enough to output the situation of foreign transactions are spilled to the log. >> * with foreign_twophase_commit = disabled >> -> 122TPS in my environments. > > How much is the performance without those 2PC patches and with the > same workload? i.e., how fast is the current postgres_fdw that uses > XactCallback? OK, I'll test. >> c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch >> >> * typo: s/tranasction/transaction/ >> >> * Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if >> (IsInParallelMode())" because make them in the same order as CommitTransaction()? > > I'd prefer to move AtEOXact_FdwXact() in CommitTransaction after "if > (IsInParallelMode())" since other pre-commit works are done after > cleaning parallel contexts. What do you think? OK, I agree. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > On 2021/05/21 10:39, Masahiko Sawada wrote: > > On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > >> > >> > >> On 2021/05/11 13:37, Masahiko Sawada wrote: > >>> I've attached the updated patches that incorporated comments from > >>> Zhihong and Ikeda-san. > >> > >> Thanks for updating the patches! > >> > >> > >> I have other comments including trivial things. > >> > >> > >> a. about "foreign_transaction_resolver_timeout" parameter > >> > >> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs. > >> Is there any reason? Although the following is minor case, it may confuse some > >> users. > >> > >> Example case is that > >> > >> 1. a client executes transaction with 2PC when the resolver is processing > >> FdwXactResolverProcessInDoubtXacts(). > >> > >> 2. the resolution of 1st transaction must be waited until the other > >> transactions for 2pc are executed or timeout. > >> > >> 3. if the client check the 1st result value, it should wait until resolution > >> is finished for atomic visibility (although it depends on the way how to > >> realize atomic visibility.) The clients may be waited > >> foreign_transaction_resolver_timeout". Users may think it's stale. > >> > >> Like this situation can be observed after testing with pgbench. Some > >> unresolved transaction remains after benchmarking. > >> > >> I assume that this default value refers to wal_sender, archiver, and so on. > >> But, I think this parameter is more like "commit_delay". If so, 60 seconds > >> seems to be big value. > > > > IIUC this situation seems like the foreign transaction resolution is > > bottle-neck and doesn’t catch up to incoming resolution requests. But > > how foreignt_transaction_resolver_timeout relates to this situation? > > foreign_transaction_resolver_timeout controls when to terminate the > > resolver process that doesn't have any foreign transactions to > > resolve. So if we set it several milliseconds, resolver processes are > > terminated immediately after each resolution, imposing the cost of > > launching resolver processes on the next resolution. > > Thanks for your comments! > > No, this situation is not related to the foreign transaction resolution is > bottle-neck or not. This issue may happen when the workload has very few > foreign transactions. > > If new foreign transaction comes while the transaction resolver is processing > resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction > waits until starting next transaction resolution. If next foreign transaction > doesn't come, the foreign transaction must wait starting resolution until > timeout. I mentioned this situation. Thanks for your explanation. I think that in this case we should set the latch of the resolver after preparing all foreign transactions so that the resolver process those transactions without sleep. > > Thanks for letting me know the side effect if setting resolution timeout to > several milliseconds. I agree. But, why termination is needed? Is there a > possibility to stale like walsender? The purpose of this timeout is to terminate resolvers that are idle for a long time. The resolver processes don't necessarily need to keep running all the time for every database. On the other hand, launching a resolver process per commit would be a high cost. So we have resolver processes keep running at least for foreign_transaction_resolver_timeout. > > > >> > >> > >> b. about performance bottleneck (just share my simple benchmark results) > >> > >> The resolver process can be performance bottleneck easily although I think > >> some users want this feature even if the performance is not so good. > >> > >> I tested with very simple workload in my laptop. > >> > >> The test condition is > >> * two remote foreign partitions and one transaction inserts an entry in each > >> partitions. > >> * local connection only. If NW latency became higher, the performance became > >> worse. > >> * pgbench with 8 clients. > >> > >> The test results is the following. The performance of 2PC is only 10% > >> performance of the one of without 2PC. > >> > >> * with foreign_twophase_commit = requried > >> -> If load with more than 10TPS, the number of unresolved foreign transactions > >> is increasing and stop with the warning "Increase > >> max_prepared_foreign_transactions". > > > > What was the value of max_prepared_foreign_transactions? > > Now, I tested with 200. > > If each resolution is finished very soon, I thought it's enough because > 8clients x 2partitions = 16, though... But, it's difficult how to know the > stable values. During resolving one distributed transaction, the resolver needs both one round trip and fsync-ing WAL record for each foreign transaction. Since the client doesn’t wait for the distributed transaction to be resolved, the resolver process can be easily bottle-neck given there are 8 clients. If foreign transaction resolution was resolved synchronously, 16 would suffice. > > > > To speed up the foreign transaction resolution, some ideas have been > > discussed. As another idea, how about launching resolvers for each > > foreign server? That way, we resolve foreign transactions on each > > foreign server in parallel. If foreign transactions are concentrated > > on the particular server, we can have multiple resolvers for the one > > foreign server. It doesn’t change the fact that all foreign > > transaction resolutions are processed by resolver processes. > > Awesome! There seems to be another pros that even if a foreign server is > temporarily busy or stopped due to fail over, other foreign server's > transactions can be resolved. Yes. We also might need to be careful about the order of foreign transaction resolution. I think we need to resolve foreign transactions in arrival order at least within a foreign server. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 2021/05/21 13:45, Masahiko Sawada wrote: > On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda > <ikedamsh@oss.nttdata.com> wrote: >> >> >> >> On 2021/05/21 10:39, Masahiko Sawada wrote: >>> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >>>> >>>> >>>> On 2021/05/11 13:37, Masahiko Sawada wrote: >>>>> I've attached the updated patches that incorporated comments from >>>>> Zhihong and Ikeda-san. >>>> >>>> Thanks for updating the patches! >>>> >>>> >>>> I have other comments including trivial things. >>>> >>>> >>>> a. about "foreign_transaction_resolver_timeout" parameter >>>> >>>> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs. >>>> Is there any reason? Although the following is minor case, it may confuse some >>>> users. >>>> >>>> Example case is that >>>> >>>> 1. a client executes transaction with 2PC when the resolver is processing >>>> FdwXactResolverProcessInDoubtXacts(). >>>> >>>> 2. the resolution of 1st transaction must be waited until the other >>>> transactions for 2pc are executed or timeout. >>>> >>>> 3. if the client check the 1st result value, it should wait until resolution >>>> is finished for atomic visibility (although it depends on the way how to >>>> realize atomic visibility.) The clients may be waited >>>> foreign_transaction_resolver_timeout". Users may think it's stale. >>>> >>>> Like this situation can be observed after testing with pgbench. Some >>>> unresolved transaction remains after benchmarking. >>>> >>>> I assume that this default value refers to wal_sender, archiver, and so on. >>>> But, I think this parameter is more like "commit_delay". If so, 60 seconds >>>> seems to be big value. >>> >>> IIUC this situation seems like the foreign transaction resolution is >>> bottle-neck and doesn’t catch up to incoming resolution requests. But >>> how foreignt_transaction_resolver_timeout relates to this situation? >>> foreign_transaction_resolver_timeout controls when to terminate the >>> resolver process that doesn't have any foreign transactions to >>> resolve. So if we set it several milliseconds, resolver processes are >>> terminated immediately after each resolution, imposing the cost of >>> launching resolver processes on the next resolution. >> >> Thanks for your comments! >> >> No, this situation is not related to the foreign transaction resolution is >> bottle-neck or not. This issue may happen when the workload has very few >> foreign transactions. >> >> If new foreign transaction comes while the transaction resolver is processing >> resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction >> waits until starting next transaction resolution. If next foreign transaction >> doesn't come, the foreign transaction must wait starting resolution until >> timeout. I mentioned this situation. > > Thanks for your explanation. I think that in this case we should set > the latch of the resolver after preparing all foreign transactions so > that the resolver process those transactions without sleep. Yes, your idea is much better. Thanks! >> >> Thanks for letting me know the side effect if setting resolution timeout to >> several milliseconds. I agree. But, why termination is needed? Is there a >> possibility to stale like walsender? > > The purpose of this timeout is to terminate resolvers that are idle > for a long time. The resolver processes don't necessarily need to keep > running all the time for every database. On the other hand, launching > a resolver process per commit would be a high cost. So we have > resolver processes keep running at least for > foreign_transaction_resolver_timeout. Understood. I think it's reasonable. >>>> >>>> >>>> b. about performance bottleneck (just share my simple benchmark results) >>>> >>>> The resolver process can be performance bottleneck easily although I think >>>> some users want this feature even if the performance is not so good. >>>> >>>> I tested with very simple workload in my laptop. >>>> >>>> The test condition is >>>> * two remote foreign partitions and one transaction inserts an entry in each >>>> partitions. >>>> * local connection only. If NW latency became higher, the performance became >>>> worse. >>>> * pgbench with 8 clients. >>>> >>>> The test results is the following. The performance of 2PC is only 10% >>>> performance of the one of without 2PC. >>>> >>>> * with foreign_twophase_commit = requried >>>> -> If load with more than 10TPS, the number of unresolved foreign transactions >>>> is increasing and stop with the warning "Increase >>>> max_prepared_foreign_transactions". >>> >>> What was the value of max_prepared_foreign_transactions? >> >> Now, I tested with 200. >> >> If each resolution is finished very soon, I thought it's enough because >> 8clients x 2partitions = 16, though... But, it's difficult how to know the >> stable values. > > During resolving one distributed transaction, the resolver needs both > one round trip and fsync-ing WAL record for each foreign transaction. > Since the client doesn’t wait for the distributed transaction to be > resolved, the resolver process can be easily bottle-neck given there > are 8 clients. > > If foreign transaction resolution was resolved synchronously, 16 would suffice. OK, thanks. >> >> >>> To speed up the foreign transaction resolution, some ideas have been >>> discussed. As another idea, how about launching resolvers for each >>> foreign server? That way, we resolve foreign transactions on each >>> foreign server in parallel. If foreign transactions are concentrated >>> on the particular server, we can have multiple resolvers for the one >>> foreign server. It doesn’t change the fact that all foreign >>> transaction resolutions are processed by resolver processes. >> >> Awesome! There seems to be another pros that even if a foreign server is >> temporarily busy or stopped due to fail over, other foreign server's >> transactions can be resolved. > > Yes. We also might need to be careful about the order of foreign > transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server. I agree it's better. (Although this is my interest...) Is it necessary? Although this idea seems to be for atomic visibility, 2PC can't realize that as you know. So, I wondered that. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > On 2021/05/21 13:45, Masahiko Sawada wrote: > > > > Yes. We also might need to be careful about the order of foreign > > transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server. > > I agree it's better. > > (Although this is my interest...) > Is it necessary? Although this idea seems to be for atomic visibility, > 2PC can't realize that as you know. So, I wondered that. I think it's for fairness. If a foreign transaction arrived earlier gets put off so often for other foreign transactions arrived later due to its index in FdwXactCtl->xacts, it’s not understandable for users and not fair. I think it’s better to handle foreign transactions in FIFO manner (although this problem exists even in the current code). Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 2021/05/25 21:59, Masahiko Sawada wrote: > On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >> >> On 2021/05/21 13:45, Masahiko Sawada wrote: >>> >>> Yes. We also might need to be careful about the order of foreign >>> transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server. >> >> I agree it's better. >> >> (Although this is my interest...) >> Is it necessary? Although this idea seems to be for atomic visibility, >> 2PC can't realize that as you know. So, I wondered that. > > I think it's for fairness. If a foreign transaction arrived earlier > gets put off so often for other foreign transactions arrived later due > to its index in FdwXactCtl->xacts, it’s not understandable for users > and not fair. I think it’s better to handle foreign transactions in > FIFO manner (although this problem exists even in the current code). OK, thanks. On 2021/05/21 12:45, Masahiro Ikeda wrote: > On 2021/05/21 10:39, Masahiko Sawada wrote: >> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >> How much is the performance without those 2PC patches and with the >> same workload? i.e., how fast is the current postgres_fdw that uses >> XactCallback? > > OK, I'll test. The test results are followings. But, I couldn't confirm the performance improvements of 2PC patches though I may need to be changed the test condition. [condition] * 1 coordinator and 3 foreign servers * There are two custom scripts which access different two foreign servers per transaction ``` fxact_select.pgbench BEGIN; SELECT * FROM part:p1 WHERE id = :id; SELECT * FROM part:p2 WHERE id = :id; COMMIT; ``` ``` fxact_update.pgbench BEGIN; UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; COMMIT; ``` [results] I have tested three times. Performance difference seems to be within the range of errors. # 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable - fxact_update.pgbench 72.3, 74.9, 77.5 TPS => avg 74.9 TPS 110.5, 106.8, 103.2 ms => avg 106.8 ms - fxact_select.pgbench 1767.6, 1737.1, 1717.4 TPS => avg 1740.7 TPS 4.5, 4.6, 4.7 ms => avg 4.6ms # 6d0eb38557 without 2pc patches - fxact_update.pgbench 76.5, 70.6, 69.5 TPS => avg 72.2 TPS 104.534 + 113.244 + 115.097 => avg 111.0 ms -fxact_select.pgbench 1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS 4.2, 4.6, 4.6 ms=> 4.5 ms # About the bottleneck of the resolver process I investigated the performance bottleneck of the resolver process using perf. The main bottleneck is the following functions. 1st. 42.8% routine->CommitForeignTransaction() 2nd. 31.5% remove_fdwxact() 3rd. 10.16% CommitTransaction() 1st and 3rd problems can be solved by parallelizing resolver processes per remote servers. But, I wondered that the idea, which backends call also "COMMIT/ABORT PREPARED" and the resolver process only takes changes of resolving in-doubt foreign transactions, is better. In many cases, I think that the number of connections is much greater than the number of remote servers. If so, the parallelization is not enough. So, I think the idea which backends execute "PREPARED COMMIT" synchronously is better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in the extension. So, this idea is not bad. Although resolving asynchronously has the performance benefit, we can't take advantage because the resolver process can be bottleneck easily now. 2nd remove_fdwxact() syncs the WAL, which indicates the foreign transaction entry is removed. Is it necessary to sync momentarily? To remove syncing leads the time of recovery phase may be longer because some fdxact entries need to "COMMIT/ABORT PREPARED" again. But I think the effect is limited. # About other trivial comments. * Is it better to call pgstat_send_wal() in the resolver process? * Is it better to specify that only one resolver process can be launched in on database on the descrpition of "max_foreign_transaction_resolvers"? * Is it intentional that removing and inserting new lines in foreigncmds.c? * Is it better that "max_prepared_foreign_transactions=%d" is after "max_prepared_xacts=%d" in xlogdesc.c? * Is "fdwxact_queue" unnecessary now? * Is the following " + sizeof(FdwXactResolver)" unnecessary? #define SizeOfFdwXactResolverCtlData \ (offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver)) Although MultiXactStateData considered the backendIds start from 1 indexed, the resolvers start from 0 indexed. Sorry, if my understanding is wrong. * s/transaciton/transaction/ * s/foreign_xact_resolution_retry_interval since last resolver/foreign_xact_resolution_retry_interval since last resolver was/ * Don't we need the debug log in the following in postgres.c like logical launcher shutdown? else if (IsFdwXactLauncher()) { /* * The foreign transaction launcher can be stopped at any time. * Use exit status 1 so the background worker is restarted. */ proc_exit(1); } * Is pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS) not documented? * Is it better from "when arrived a requested by backend process." to "when a request by backend process is arrived."? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > On 2021/05/25 21:59, Masahiko Sawada wrote: > > On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > >> > >> On 2021/05/21 13:45, Masahiko Sawada wrote: > >>> > >>> Yes. We also might need to be careful about the order of foreign > >>> transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreignserver. > >> > >> I agree it's better. > >> > >> (Although this is my interest...) > >> Is it necessary? Although this idea seems to be for atomic visibility, > >> 2PC can't realize that as you know. So, I wondered that. > > > > I think it's for fairness. If a foreign transaction arrived earlier > > gets put off so often for other foreign transactions arrived later due > > to its index in FdwXactCtl->xacts, it’s not understandable for users > > and not fair. I think it’s better to handle foreign transactions in > > FIFO manner (although this problem exists even in the current code). > > OK, thanks. > > > On 2021/05/21 12:45, Masahiro Ikeda wrote: > > On 2021/05/21 10:39, Masahiko Sawada wrote: > >> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> > wrote: > >> How much is the performance without those 2PC patches and with the > >> same workload? i.e., how fast is the current postgres_fdw that uses > >> XactCallback? > > > > OK, I'll test. > > The test results are followings. But, I couldn't confirm the performance > improvements of 2PC patches though I may need to be changed the test condition. > > [condition] > * 1 coordinator and 3 foreign servers > * There are two custom scripts which access different two foreign servers per > transaction > > ``` fxact_select.pgbench > BEGIN; > SELECT * FROM part:p1 WHERE id = :id; > SELECT * FROM part:p2 WHERE id = :id; > COMMIT; > ``` > > ``` fxact_update.pgbench > BEGIN; > UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; > UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; > COMMIT; > ``` > > [results] > > I have tested three times. > Performance difference seems to be within the range of errors. > > # 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable > - fxact_update.pgbench > 72.3, 74.9, 77.5 TPS => avg 74.9 TPS > 110.5, 106.8, 103.2 ms => avg 106.8 ms > > - fxact_select.pgbench > 1767.6, 1737.1, 1717.4 TPS => avg 1740.7 TPS > 4.5, 4.6, 4.7 ms => avg 4.6ms > > # 6d0eb38557 without 2pc patches > - fxact_update.pgbench > 76.5, 70.6, 69.5 TPS => avg 72.2 TPS > 104.534 + 113.244 + 115.097 => avg 111.0 ms > > -fxact_select.pgbench > 1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS > 4.2, 4.6, 4.6 ms=> 4.5 ms > Thank you for testing! I think the result shows that managing foreign transactions on the core side would not be a problem in terms of performance. > > > > > # About the bottleneck of the resolver process > > I investigated the performance bottleneck of the resolver process using perf. > The main bottleneck is the following functions. > > 1st. 42.8% routine->CommitForeignTransaction() > 2nd. 31.5% remove_fdwxact() > 3rd. 10.16% CommitTransaction() > > 1st and 3rd problems can be solved by parallelizing resolver processes per > remote servers. But, I wondered that the idea, which backends call also > "COMMIT/ABORT PREPARED" and the resolver process only takes changes of > resolving in-doubt foreign transactions, is better. In many cases, I think > that the number of connections is much greater than the number of remote > servers. If so, the parallelization is not enough. > > So, I think the idea which backends execute "PREPARED COMMIT" synchronously is > better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in > the extension. So, this idea is not bad. Thank you for pointing it out. This idea has been proposed several times and there were discussions. I'd like to summarize the proposed ideas and those pros and cons before replying to your other comments. There are 3 ideas. After backend both prepares all foreign transaction and commit the local transaction, 1. the backend continues attempting to commit all prepared foreign transactions until all of them are committed. 2. the backend attempts to commit all prepared foreign transactions once. If an error happens, leave them for the resolver. 3. the backend asks the resolver that launched per foreign server to commit the prepared foreign transactions (and backend waits or doesn't wait for the commit completion depending on the setting). With ideas 1 and 2, since the backend itself commits all foreign transactions the resolver process cannot be a bottleneck, and probably the code can get more simple as backends don't need to communicate with resolver processes. However, those have two problems we need to deal with: First, users could get an error if an error happens during the backend committing prepared foreign transaction but the local transaction is already committed and some foreign transactions could also be committed, confusing users. There were two opinions to this problem: FDW developers should be responsible for writing FDW code such that any error doesn't happen during committing foreign transactions, and users can accept that confusion since an error could happen after writing the commit WAL even today without this 2PC feature. For the former point, I'm not sure it's always doable since even palloc() could raise an error and it seems hard to require all FDW developers to understand all possible paths of raising an error. And for the latter point, that's true but I think those cases are should-not-happen cases (i.g., rare cases) whereas the likelihood of an error during committing prepared transactions is not low (e.g., by network connectivity problem). I think we need to assume that that is not a rare case. The second problem is whether we can cancel committing foreign transactions by pg_cancel_backend() (or pressing Ctl-c). If the backend process commits prepared foreign transactions, it's FDW developers' responsibility to write code that is interruptible. I’m not sure it’s feasible for drivers for other databases. Idea 3 is proposed to deal with those problems. By having separate processes, resolver processes, committing prepared foreign transactions, we and FDW developers don't need to worry about those two problems. However as Ikeda-san shared the performance results, idea 3 is likely to have a performance problem since resolver processes can easily be bottle-neck. Moreover, with the current patch, since we asynchronously commit foreign prepared transactions, if many concurrent clients use 2PC, reaching max_foreign_prepared_transactions, transactions end up with an error. Through the long discussion on this thread, I've been thought we got a consensus on idea 3 but sometimes ideas 1 and 2 are proposed again for dealing with the performance problem. Idea 1 and 2 are also good and attractive, but I think we need to deal with the two problems first if we go with one of those ideas. To be honest, I'm really not sure it's good if we make those things FDW developers responsibility. As long as we commit foreign prepared transactions asynchronously and there is max_foreign_prepared_transactions limit, it's possible that committing those transactions could not keep up. Maybe the same is true for a case where the client heavily uses 2PC and asynchronously commits prepared transactions. If committing prepared transactions doesn't keep up with preparing transactions, the system reaches max_prepared_transactions. With the current patch, we commit prepared foreign transactions asynchronously. But maybe we need to compare the performance of ideas 1 (and 2) to idea 3 with synchronous foreign transaction resolution. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Transactions involving multiple postgres foreign servers, take 2
2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール:On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
On 2021/05/25 21:59, Masahiko Sawada wrote:On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
On 2021/05/21 13:45, Masahiko Sawada wrote:
Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server.
I agree it's better.
(Although this is my interest...)
Is it necessary? Although this idea seems to be for atomic visibility,
2PC can't realize that as you know. So, I wondered that.
I think it's for fairness. If a foreign transaction arrived earlier
gets put off so often for other foreign transactions arrived later due
to its index in FdwXactCtl->xacts, it’s not understandable for users
and not fair. I think it’s better to handle foreign transactions in
FIFO manner (although this problem exists even in the current code).
OK, thanks.
On 2021/05/21 12:45, Masahiro Ikeda wrote:On 2021/05/21 10:39, Masahiko Sawada wrote:wrote:On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com>How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?
OK, I'll test.
The test results are followings. But, I couldn't confirm the performance
improvements of 2PC patches though I may need to be changed the test condition.
[condition]
* 1 coordinator and 3 foreign servers
* There are two custom scripts which access different two foreign servers per
transaction
``` fxact_select.pgbench
BEGIN;
SELECT * FROM part:p1 WHERE id = :id;
SELECT * FROM part:p2 WHERE id = :id;
COMMIT;
```
``` fxact_update.pgbench
BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```
[results]
I have tested three times.
Performance difference seems to be within the range of errors.
# 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
- fxact_update.pgbench
72.3, 74.9, 77.5 TPS => avg 74.9 TPS
110.5, 106.8, 103.2 ms => avg 106.8 ms
- fxact_select.pgbench
1767.6, 1737.1, 1717.4 TPS => avg 1740.7 TPS
4.5, 4.6, 4.7 ms => avg 4.6ms
# 6d0eb38557 without 2pc patches
- fxact_update.pgbench
76.5, 70.6, 69.5 TPS => avg 72.2 TPS
104.534 + 113.244 + 115.097 => avg 111.0 ms
-fxact_select.pgbench
1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
4.2, 4.6, 4.6 ms=> 4.5 ms
Thank you for testing!
I think the result shows that managing foreign transactions on the
core side would not be a problem in terms of performance.
# About the bottleneck of the resolver process
I investigated the performance bottleneck of the resolver process using perf.
The main bottleneck is the following functions.
1st. 42.8% routine->CommitForeignTransaction()
2nd. 31.5% remove_fdwxact()
3rd. 10.16% CommitTransaction()
1st and 3rd problems can be solved by parallelizing resolver processes per
remote servers. But, I wondered that the idea, which backends call also
"COMMIT/ABORT PREPARED" and the resolver process only takes changes of
resolving in-doubt foreign transactions, is better. In many cases, I think
that the number of connections is much greater than the number of remote
servers. If so, the parallelization is not enough.
So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
the extension. So, this idea is not bad.
Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.
There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,
1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).
With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.
However, those have two problems we need to deal with:
First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could also be
committed, confusing users. There were two opinions to this problem:
FDW developers should be responsible for writing FDW code such that
any error doesn't happen during committing foreign transactions, and
users can accept that confusion since an error could happen after
writing the commit WAL even today without this 2PC feature. For the
former point, I'm not sure it's always doable since even palloc()
could raise an error and it seems hard to require all FDW developers
to understand all possible paths of raising an error. And for the
latter point, that's true but I think those cases are
should-not-happen cases (i.g., rare cases) whereas the likelihood of
an error during committing prepared transactions is not low (e.g., by
network connectivity problem). I think we need to assume that that is
not a rare case.
The second problem is whether we can cancel committing foreign
transactions by pg_cancel_backend() (or pressing Ctl-c). If the
backend process commits prepared foreign transactions, it's FDW
developers' responsibility to write code that is interruptible. I’m
not sure it’s feasible for drivers for other databases.
Idea 3 is proposed to deal with those problems. By having separate
processes, resolver processes, committing prepared foreign
transactions, we and FDW developers don't need to worry about those
two problems.
However as Ikeda-san shared the performance results, idea 3 is likely
to have a performance problem since resolver processes can easily be
bottle-neck. Moreover, with the current patch, since we asynchronously
commit foreign prepared transactions, if many concurrent clients use
2PC, reaching max_foreign_prepared_transactions, transactions end up
with an error.
Through the long discussion on this thread, I've been thought we got a
consensus on idea 3 but sometimes ideas 1 and 2 are proposed again for
dealing with the performance problem. Idea 1 and 2 are also good and
attractive, but I think we need to deal with the two problems first if
we go with one of those ideas. To be honest, I'm really not sure it's
good if we make those things FDW developers responsibility.
As long as we commit foreign prepared transactions asynchronously and
there is max_foreign_prepared_transactions limit, it's possible that
committing those transactions could not keep up. Maybe the same is
true for a case where the client heavily uses 2PC and asynchronously
commits prepared transactions. If committing prepared transactions
doesn't keep up with preparing transactions, the system reaches
max_prepared_transactions.
With the current patch, we commit prepared foreign transactions
asynchronously. But maybe we need to compare the performance of ideas
1 (and 2) to idea 3 with synchronous foreign transaction resolution.
Masahiro Ikeda
NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <sawada.mshk@gmail.com> 1. the backend continues attempting to commit all prepared foreign > transactions until all of them are committed. > 2. the backend attempts to commit all prepared foreign transactions > once. If an error happens, leave them for the resolver. > 3. the backend asks the resolver that launched per foreign server to > commit the prepared foreign transactions (and backend waits or doesn't > wait for the commit completion depending on the setting). > > With ideas 1 and 2, since the backend itself commits all foreign > transactions the resolver process cannot be a bottleneck, and probably > the code can get more simple as backends don't need to communicate > with resolver processes. > > However, those have two problems we need to deal with: > > First, users could get an error if an error happens during the backend > committing prepared foreign transaction but the local transaction is > already committed and some foreign transactions could also be > committed, confusing users. There were two opinions to this problem: > FDW developers should be responsible for writing FDW code such that > any error doesn't happen during committing foreign transactions, and > users can accept that confusion since an error could happen after > writing the commit WAL even today without this 2PC feature. Why does the user have to get an error? Once the local transaction has been prepared, which means all remote ones also havebeen prepared, the whole transaction is determined to commit. So, the user doesn't have to receive an error as longas the local node is alive. > For the > former point, I'm not sure it's always doable since even palloc() > could raise an error and it seems hard to require all FDW developers > to understand all possible paths of raising an error. No, this is a matter of discipline to ensure consistency, just in case we really have to return an error to the user. > And for the > latter point, that's true but I think those cases are > should-not-happen cases (i.g., rare cases) whereas the likelihood of > an error during committing prepared transactions is not low (e.g., by > network connectivity problem). I think we need to assume that that is > not a rare case. How do non-2PC and 2PC cases differ in the rarity of the error? > The second problem is whether we can cancel committing foreign > transactions by pg_cancel_backend() (or pressing Ctl-c). If the > backend process commits prepared foreign transactions, it's FDW > developers' responsibility to write code that is interruptible. I’m > not sure it’s feasible for drivers for other databases. That's true not only for prepare and commit but also for other queries. Why do we have to treat prepare and commit specially? > Through the long discussion on this thread, I've been thought we got a > consensus on idea 3 but sometimes ideas 1 and 2 are proposed again for I don't remember seeing any consensus yet? > With the current patch, we commit prepared foreign transactions > asynchronously. But maybe we need to compare the performance of ideas > 1 (and 2) to idea 3 with synchronous foreign transaction resolution. +1 Regards Takayuki Tsunakawa
On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com <ikedamsh@oss.nttdata.com> wrote: > > > > 2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール: > > > Thank you for pointing it out. This idea has been proposed several > times and there were discussions. I'd like to summarize the proposed > ideas and those pros and cons before replying to your other comments. > > There are 3 ideas. After backend both prepares all foreign transaction > and commit the local transaction, > > 1. the backend continues attempting to commit all prepared foreign > transactions until all of them are committed. > 2. the backend attempts to commit all prepared foreign transactions > once. If an error happens, leave them for the resolver. > 3. the backend asks the resolver that launched per foreign server to > commit the prepared foreign transactions (and backend waits or doesn't > wait for the commit completion depending on the setting). > > With ideas 1 and 2, since the backend itself commits all foreign > transactions the resolver process cannot be a bottleneck, and probably > the code can get more simple as backends don't need to communicate > with resolver processes. > > However, those have two problems we need to deal with: > > > Thanks for sharing the summarize. I understood there are problems related to > FDW implementation. > > First, users could get an error if an error happens during the backend > committing prepared foreign transaction but the local transaction is > already committed and some foreign transactions could also be > committed, confusing users. There were two opinions to this problem: > FDW developers should be responsible for writing FDW code such that > any error doesn't happen during committing foreign transactions, and > users can accept that confusion since an error could happen after > writing the commit WAL even today without this 2PC feature. For the > former point, I'm not sure it's always doable since even palloc() > could raise an error and it seems hard to require all FDW developers > to understand all possible paths of raising an error. And for the > latter point, that's true but I think those cases are > should-not-happen cases (i.g., rare cases) whereas the likelihood of > an error during committing prepared transactions is not low (e.g., by > network connectivity problem). I think we need to assume that that is > not a rare case. > > > Hmm… Sorry, I don’t have any good ideas now. > > If anything, I’m on second side which users accept the confusion though > let users know a error happens before local commit is done or not is necessary > because if the former case, users will execute the same query again. Yeah, users will need to remember the XID of the last executed transaction and check if it has been committed by pg_xact_status(). > > > The second problem is whether we can cancel committing foreign > transactions by pg_cancel_backend() (or pressing Ctl-c). If the > backend process commits prepared foreign transactions, it's FDW > developers' responsibility to write code that is interruptible. I’m > not sure it’s feasible for drivers for other databases. > > > Sorry, my understanding is not clear. > > After all prepares are done, the foreign transactions will be committed. > So, does this mean that FDW must leave the unresolved transaction to the transaction > resolver and show some messages like “Since the transaction is already committed, > the transaction will be resolved in background" ? I think this would happen after the backend cancels COMMIT PREPARED. To be able to cancel an in-progress query the backend needs to accept the interruption and send the cancel request. postgres_fdw can do that since libpq supports sending a query and waiting for the result but I’m not sure about other drivers. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <sawada.mshk@gmail.com> > 1. the backend continues attempting to commit all prepared foreign > > transactions until all of them are committed. > > 2. the backend attempts to commit all prepared foreign transactions > > once. If an error happens, leave them for the resolver. > > 3. the backend asks the resolver that launched per foreign server to > > commit the prepared foreign transactions (and backend waits or doesn't > > wait for the commit completion depending on the setting). > > > > With ideas 1 and 2, since the backend itself commits all foreign > > transactions the resolver process cannot be a bottleneck, and probably > > the code can get more simple as backends don't need to communicate > > with resolver processes. > > > > However, those have two problems we need to deal with: > > > > > First, users could get an error if an error happens during the backend > > committing prepared foreign transaction but the local transaction is > > already committed and some foreign transactions could also be > > committed, confusing users. There were two opinions to this problem: > > FDW developers should be responsible for writing FDW code such that > > any error doesn't happen during committing foreign transactions, and > > users can accept that confusion since an error could happen after > > writing the commit WAL even today without this 2PC feature. > > Why does the user have to get an error? Once the local transaction has been prepared, which means all remote ones alsohave been prepared, the whole transaction is determined to commit. So, the user doesn't have to receive an error aslong as the local node is alive. I think we should neither ignore the error thrown by FDW code nor lower the error level (e.g., ERROR to WARNING). > > > And for the > > latter point, that's true but I think those cases are > > should-not-happen cases (i.g., rare cases) whereas the likelihood of > > an error during committing prepared transactions is not low (e.g., by > > network connectivity problem). I think we need to assume that that is > > not a rare case. > > How do non-2PC and 2PC cases differ in the rarity of the error? I think the main difference would be that in 2PC case there will be network communications possibly with multiple servers after the local commit. > > > > The second problem is whether we can cancel committing foreign > > transactions by pg_cancel_backend() (or pressing Ctl-c). If the > > backend process commits prepared foreign transactions, it's FDW > > developers' responsibility to write code that is interruptible. I’m > > not sure it’s feasible for drivers for other databases. > > That's true not only for prepare and commit but also for other queries. Why do we have to treat prepare and commit specially? Good point. This would not be a blocker for ideas 1 and 2 but is a side benefit of idea 3. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <sawada.mshk@gmail.com> > On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > Why does the user have to get an error? Once the local transaction has been > prepared, which means all remote ones also have been prepared, the whole > transaction is determined to commit. So, the user doesn't have to receive an > error as long as the local node is alive. > > I think we should neither ignore the error thrown by FDW code nor > lower the error level (e.g., ERROR to WARNING). Why? (Forgive me for asking relentlessly... by imagining me as a cute 7-year-old boy/girl asking "Why Dad?") > > How do non-2PC and 2PC cases differ in the rarity of the error? > > I think the main difference would be that in 2PC case there will be > network communications possibly with multiple servers after the local > commit. Then, it's the same failure mode. That is, the same failure could occur for both cases. That doesn't require us to differentiatebetween them. Let's ignore this point from now on. Regards Takayuki Tsunakawa
On Fri, Jun 4, 2021 at 5:59 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <sawada.mshk@gmail.com> > > On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > > > Why does the user have to get an error? Once the local transaction has been > > prepared, which means all remote ones also have been prepared, the whole > > transaction is determined to commit. So, the user doesn't have to receive an > > error as long as the local node is alive. > > > > I think we should neither ignore the error thrown by FDW code nor > > lower the error level (e.g., ERROR to WARNING). > > Why? (Forgive me for asking relentlessly... by imagining me as a cute 7-year-old boy/girl asking "Why Dad?") I think we should not reinterpret the severity of the error and lower it. Especially, in this case, any kind of errors can be thrown. It could be such a serious error that FDW developer wants to report to the client. Do we lower even PANIC to a lower severity such as WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas lowering ERROR (and FATAL) to WARNING, why do we regard only them as non-error? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com > <ikedamsh@oss.nttdata.com> wrote: > > > > > > > > 2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール: > > > > > > Thank you for pointing it out. This idea has been proposed several > > times and there were discussions. I'd like to summarize the proposed > > ideas and those pros and cons before replying to your other comments. > > > > There are 3 ideas. After backend both prepares all foreign transaction > > and commit the local transaction, > > > > 1. the backend continues attempting to commit all prepared foreign > > transactions until all of them are committed. > > 2. the backend attempts to commit all prepared foreign transactions > > once. If an error happens, leave them for the resolver. > > 3. the backend asks the resolver that launched per foreign server to > > commit the prepared foreign transactions (and backend waits or doesn't > > wait for the commit completion depending on the setting). > > > > With ideas 1 and 2, since the backend itself commits all foreign > > transactions the resolver process cannot be a bottleneck, and probably > > the code can get more simple as backends don't need to communicate > > with resolver processes. > > > > However, those have two problems we need to deal with: > > > > > > Thanks for sharing the summarize. I understood there are problems related to > > FDW implementation. > > > > First, users could get an error if an error happens during the backend > > committing prepared foreign transaction but the local transaction is > > already committed and some foreign transactions could also be > > committed, confusing users. There were two opinions to this problem: > > FDW developers should be responsible for writing FDW code such that > > any error doesn't happen during committing foreign transactions, and > > users can accept that confusion since an error could happen after > > writing the commit WAL even today without this 2PC feature. For the > > former point, I'm not sure it's always doable since even palloc() > > could raise an error and it seems hard to require all FDW developers > > to understand all possible paths of raising an error. And for the > > latter point, that's true but I think those cases are > > should-not-happen cases (i.g., rare cases) whereas the likelihood of > > an error during committing prepared transactions is not low (e.g., by > > network connectivity problem). I think we need to assume that that is > > not a rare case. > > > > > > Hmm… Sorry, I don’t have any good ideas now. > > > > If anything, I’m on second side which users accept the confusion though > > let users know a error happens before local commit is done or not is necessary > > because if the former case, users will execute the same query again. > > Yeah, users will need to remember the XID of the last executed > transaction and check if it has been committed by pg_xact_status(). As the second idea, can we send something like a hint along with the error (or send a new type of error) that indicates the error happened after the transaction commit so that the client can decide whether or not to ignore the error? That way, we can deal with the confusion led by an error raised after the local commit by the existing post-commit cleanup routines (and post-commit xact callbacks) as well as by FDW’s commit prepared routine. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Transactions involving multiple postgres foreign servers, take 2
> 2021/06/04 17:16、Masahiko Sawada <sawada.mshk@gmail.com>のメール: > > On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com > <ikedamsh@oss.nttdata.com> wrote: >> >> >> >> 2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール: >> >> >> Thank you for pointing it out. This idea has been proposed several >> times and there were discussions. I'd like to summarize the proposed >> ideas and those pros and cons before replying to your other comments. >> >> There are 3 ideas. After backend both prepares all foreign transaction >> and commit the local transaction, >> >> 1. the backend continues attempting to commit all prepared foreign >> transactions until all of them are committed. >> 2. the backend attempts to commit all prepared foreign transactions >> once. If an error happens, leave them for the resolver. >> 3. the backend asks the resolver that launched per foreign server to >> commit the prepared foreign transactions (and backend waits or doesn't >> wait for the commit completion depending on the setting). >> >> With ideas 1 and 2, since the backend itself commits all foreign >> transactions the resolver process cannot be a bottleneck, and probably >> the code can get more simple as backends don't need to communicate >> with resolver processes. >> >> However, those have two problems we need to deal with: >> >> >> Thanks for sharing the summarize. I understood there are problems related to >> FDW implementation. >> >> First, users could get an error if an error happens during the backend >> committing prepared foreign transaction but the local transaction is >> already committed and some foreign transactions could also be >> committed, confusing users. There were two opinions to this problem: >> FDW developers should be responsible for writing FDW code such that >> any error doesn't happen during committing foreign transactions, and >> users can accept that confusion since an error could happen after >> writing the commit WAL even today without this 2PC feature. For the >> former point, I'm not sure it's always doable since even palloc() >> could raise an error and it seems hard to require all FDW developers >> to understand all possible paths of raising an error. And for the >> latter point, that's true but I think those cases are >> should-not-happen cases (i.g., rare cases) whereas the likelihood of >> an error during committing prepared transactions is not low (e.g., by >> network connectivity problem). I think we need to assume that that is >> not a rare case. >> >> >> Hmm… Sorry, I don’t have any good ideas now. >> >> If anything, I’m on second side which users accept the confusion though >> let users know a error happens before local commit is done or not is necessary >> because if the former case, users will execute the same query again. > > Yeah, users will need to remember the XID of the last executed > transaction and check if it has been committed by pg_xact_status(). > >> >> >> The second problem is whether we can cancel committing foreign >> transactions by pg_cancel_backend() (or pressing Ctl-c). If the >> backend process commits prepared foreign transactions, it's FDW >> developers' responsibility to write code that is interruptible. I’m >> not sure it’s feasible for drivers for other databases. >> >> >> Sorry, my understanding is not clear. >> >> After all prepares are done, the foreign transactions will be committed. >> So, does this mean that FDW must leave the unresolved transaction to the transaction >> resolver and show some messages like “Since the transaction is already committed, >> the transaction will be resolved in background" ? > > I think this would happen after the backend cancels COMMIT PREPARED. > To be able to cancel an in-progress query the backend needs to accept > the interruption and send the cancel request. postgres_fdw can do that > since libpq supports sending a query and waiting for the result but > I’m not sure about other drivers. Thanks, I understood that handling this issue is not scope of the 2PC feature as Tsunakawa-san and you said, Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: Transactions involving multiple postgres foreign servers, take 2
> 2021/06/04 21:38、Masahiko Sawada <sawada.mshk@gmail.com>のメール: > > On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Fri, Jun 4, 2021 at 3:58 PM ikedamsh@oss.nttdata.com >> <ikedamsh@oss.nttdata.com> wrote: >>> >>> >>> >>> 2021/06/04 12:28、Masahiko Sawada <sawada.mshk@gmail.com>のメール: >>> >>> >>> Thank you for pointing it out. This idea has been proposed several >>> times and there were discussions. I'd like to summarize the proposed >>> ideas and those pros and cons before replying to your other comments. >>> >>> There are 3 ideas. After backend both prepares all foreign transaction >>> and commit the local transaction, >>> >>> 1. the backend continues attempting to commit all prepared foreign >>> transactions until all of them are committed. >>> 2. the backend attempts to commit all prepared foreign transactions >>> once. If an error happens, leave them for the resolver. >>> 3. the backend asks the resolver that launched per foreign server to >>> commit the prepared foreign transactions (and backend waits or doesn't >>> wait for the commit completion depending on the setting). >>> >>> With ideas 1 and 2, since the backend itself commits all foreign >>> transactions the resolver process cannot be a bottleneck, and probably >>> the code can get more simple as backends don't need to communicate >>> with resolver processes. >>> >>> However, those have two problems we need to deal with: >>> >>> >>> Thanks for sharing the summarize. I understood there are problems related to >>> FDW implementation. >>> >>> First, users could get an error if an error happens during the backend >>> committing prepared foreign transaction but the local transaction is >>> already committed and some foreign transactions could also be >>> committed, confusing users. There were two opinions to this problem: >>> FDW developers should be responsible for writing FDW code such that >>> any error doesn't happen during committing foreign transactions, and >>> users can accept that confusion since an error could happen after >>> writing the commit WAL even today without this 2PC feature. For the >>> former point, I'm not sure it's always doable since even palloc() >>> could raise an error and it seems hard to require all FDW developers >>> to understand all possible paths of raising an error. And for the >>> latter point, that's true but I think those cases are >>> should-not-happen cases (i.g., rare cases) whereas the likelihood of >>> an error during committing prepared transactions is not low (e.g., by >>> network connectivity problem). I think we need to assume that that is >>> not a rare case. >>> >>> >>> Hmm… Sorry, I don’t have any good ideas now. >>> >>> If anything, I’m on second side which users accept the confusion though >>> let users know a error happens before local commit is done or not is necessary >>> because if the former case, users will execute the same query again. >> >> Yeah, users will need to remember the XID of the last executed >> transaction and check if it has been committed by pg_xact_status(). > > As the second idea, can we send something like a hint along with the > error (or send a new type of error) that indicates the error happened > after the transaction commit so that the client can decide whether or > not to ignore the error? That way, we can deal with the confusion led > by an error raised after the local commit by the existing post-commit > cleanup routines (and post-commit xact callbacks) as well as by FDW’s > commit prepared routine. I think your second idea is better because it’s easier for users to know what error happens and there is nothing users should do. Since the focus of "hint” is how to fix the problem, is it appropriate to use "context”? FWIF, I took a fast look to elog.c and I found there is “error_context_stack”. So, why don’t you add the context which shows like "the transaction fate is decided to COMMIT (or ROLLBACK). So, even if error happens, the transaction will be resolved in background” after the local commit? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <sawada.mshk@gmail.com> > I think we should not reinterpret the severity of the error and lower > it. Especially, in this case, any kind of errors can be thrown. It > could be such a serious error that FDW developer wants to report to > the client. Do we lower even PANIC to a lower severity such as > WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas > lowering ERROR (and FATAL) to WARNING, why do we regard only them as > non-error? Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit? FYI, the tx_commit() in the X/Open TX interface and the UserTransaction.commit() in JTA don't return such an error, IIRC. Do TX_FAIL and SystemException serve such a purpose? I don't feel like that. [Tuxedo manual (Japanese)] https://docs.oracle.com/cd/F25597_01/document/products/tuxedo/tux80j/atmi/rf3c91.htm [JTA] public interface javax.transaction.UserTransaction public void commit() throws RollbackException, HeuristicMixedException, HeuristicRollbackException, SecurityException, IllegalStateException, SystemException Throws: RollbackException Thrown to indicate that the transaction has been rolled back rather than committed. Throws: HeuristicMixedException Thrown to indicate that a heuristic decision was made and that some relevant updates have been committed while others have been rolled back. Throws: HeuristicRollbackException Thrown to indicate that a heuristic decision was made and that all relevant updates have been rolled back. Throws: SecurityException Thrown to indicate that the thread is not allowed to commit the transaction. Throws: IllegalStateException Thrown if the current thread is not associated with a transaction. Throws: SystemException Thrown if the transaction manager encounters an unexpected error condition. Regards Takayuki Tsunakawa
On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <sawada.mshk@gmail.com> > > I think we should not reinterpret the severity of the error and lower > > it. Especially, in this case, any kind of errors can be thrown. It > > could be such a serious error that FDW developer wants to report to > > the client. Do we lower even PANIC to a lower severity such as > > WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas > > lowering ERROR (and FATAL) to WARNING, why do we regard only them as > > non-error? > > Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit? It's not necessarily on a remote server. It could be a problem with the local server. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
(I have caught up here. Sorry in advance for possible pointless discussion by me..) At Tue, 8 Jun 2021 00:47:08 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > From: Masahiko Sawada <sawada.mshk@gmail.com> > > I think we should not reinterpret the severity of the error and lower > > it. Especially, in this case, any kind of errors can be thrown. It > > could be such a serious error that FDW developer wants to report to > > the client. Do we lower even PANIC to a lower severity such as > > WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas > > lowering ERROR (and FATAL) to WARNING, why do we regard only them as > > non-error? > > Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit? I think the discussion is based the behavior that any process that is responsible for finishing the 2pc-commit continue retrying remote commits until all of the remote-commits succeed. Maybe in most cases the errors duing remote-prepared-commit could be retry-able but as Sawada-san says I'm also not sure it's always the case. On the other hand, it could be said that we have no other way than retrying the remote-commits if we want to get over, say, instant network failures automatically. It is somewhat similar to WAL-restoration that continues complaining for recovery_commands failure without exiting. > FYI, the tx_commit() in the X/Open TX interface and the UserTransaction.commit() in JTA don't return such an error, IIRC. Do TX_FAIL and SystemException serve such a purpose? I don't feel like that. I'm not sure about how JTA works in detail, but doesn't UserTransaction.commit() return HeuristicMixedExcpetion when some of relevant updates have been committed but other not? Isn't it the same state with the case where some of the remote servers failed on remote-commit while others are succeeded? (I guess that UserTransaction.commit() would throw RollbackException if remote-prepare has been failed for any of the remotes.) > [Tuxedo manual (Japanese)] > https://docs.oracle.com/cd/F25597_01/document/products/tuxedo/tux80j/atmi/rf3c91.htm > > > [JTA] > public interface javax.transaction.UserTransaction > public void commit() > throws RollbackException, HeuristicMixedException, > HeuristicRollbackException, SecurityException, > IllegalStateException, SystemException > > Throws: RollbackException > Thrown to indicate that the transaction has been rolled back rather than committed. > > Throws: HeuristicMixedException > Thrown to indicate that a heuristic decision was made and that some relevant updates have been > committed while others have been rolled back. > > Throws: HeuristicRollbackException > Thrown to indicate that a heuristic decision was made and that all relevant updates have been rolled > back. > > Throws: SecurityException > Thrown to indicate that the thread is not allowed to commit the transaction. > > Throws: IllegalStateException > Thrown if the current thread is not associated with a transaction. > > Throws: SystemException > Thrown if the transaction manager encounters an unexpected error condition. > > > Regards > Takayuki Tsunakawa -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 8 Jun 2021 16:32:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Masahiko Sawada <sawada.mshk@gmail.com> > > > I think we should not reinterpret the severity of the error and lower > > > it. Especially, in this case, any kind of errors can be thrown. It > > > could be such a serious error that FDW developer wants to report to > > > the client. Do we lower even PANIC to a lower severity such as > > > WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas > > > lowering ERROR (and FATAL) to WARNING, why do we regard only them as > > > non-error? > > > > Why does the client have to know the error on a remote server, whereas the global transaction itself is destined to commit? > > It's not necessarily on a remote server. It could be a problem with > the local server. Isn't it a discussion about the errors from postgres_fdw? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <sawada.mshk@gmail.com> > On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > Why does the client have to know the error on a remote server, whereas the > global transaction itself is destined to commit? > > It's not necessarily on a remote server. It could be a problem with > the local server. Then, in what kind of scenario are we talking about the difficulty, and how is it difficult to handle, when we adopt eitherthe method 1 or 2? (I'd just like to have the same clear picture.) For example, 1. All FDWs prepared successfully. 2. The local transaction prepared successfully, too. 3. Some FDWs committed successfully. 4. One FDW failed to send the commit request because the remote server went down. Regards Takayuki Tsunakawa
RE: Transactions involving multiple postgres foreign servers, take 2
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > I think the discussion is based the behavior that any process that is > responsible for finishing the 2pc-commit continue retrying remote > commits until all of the remote-commits succeed. Thank you for coming back. We're talking about the first attempt to prepare and commit in each transaction, not the retrycase. > > Throws: HeuristicMixedException > > Thrown to indicate that a heuristic decision was made and that some > relevant updates have been > > committed while others have been rolled back. > I'm not sure about how JTA works in detail, but doesn't > UserTransaction.commit() return HeuristicMixedExcpetion when some of > relevant updates have been committed but other not? Isn't it the same > state with the case where some of the remote servers failed on > remote-commit while others are succeeded? No. Taking the description literally and considering the relevant XA specification, it's not about the remote commit failure. The remote server is not allowed to fail the commit once it has reported successful prepare, which is the contractof 2PC. HeuristicMixedException is about the manual resolution, typically by the DBA, using the DBMS-specific toolor the standard commit()/rollback() API. > (I guess that > UserTransaction.commit() would throw RollbackException if > remote-prepare has been failed for any of the remotes.) Correct. Regards Takayuki Tsunakawa
On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <sawada.mshk@gmail.com> > > On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > > > Why does the client have to know the error on a remote server, whereas the > > global transaction itself is destined to commit? > > > > It's not necessarily on a remote server. It could be a problem with > > the local server. > > Then, in what kind of scenario are we talking about the difficulty, and how is it difficult to handle, when we adopt eitherthe method 1 or 2? (I'd just like to have the same clear picture.) IMO, even though FDW's commit/rollback transaction code could be simple in some cases, I think we need to think that any kind of errors (or even FATAL or PANIC) could be thrown from the FDW code. It could be an error due to a temporary network problem, remote server down, driver’s unexpected error, or out of memory etc. Errors that happened after the local transaction commit doesn't affect the global transaction decision, as you mentioned. But the proccess or system could be in a bad state. Also, users might expect the process to exit on error by setting exit_on_error = on. Your idea sounds like that we have to ignore any errors happening after the local commit if they don’t affect the transaction outcome. It’s too scary to me and I think that it's a bad idea to blindly ignore all possible errors under such conditions. That could make the thing worse and will likely be foot-gun. It would be good if we can prove that it’s safe to ignore those errors but not sure how we can at least for me. This situation is true even today; an error could happen after committing the transaction. But I personally don’t want to add the code that increases the likelihood. Just to be clear, with your idea, we will ignore only ERROR or also FATAL and PANIC? And if an error happens during committing one of the prepared transactions on the foreign server, will we proceed with committing other transactions or return OK to the client? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <sawada.mshk@gmail.com> > On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > Then, in what kind of scenario are we talking about the difficulty, and how is > it difficult to handle, when we adopt either the method 1 or 2? (I'd just like to > have the same clear picture.) > > IMO, even though FDW's commit/rollback transaction code could be > simple in some cases, I think we need to think that any kind of errors > (or even FATAL or PANIC) could be thrown from the FDW code. It could > be an error due to a temporary network problem, remote server down, > driver’s unexpected error, or out of memory etc. Errors that happened > after the local transaction commit doesn't affect the global > transaction decision, as you mentioned. But the proccess or system > could be in a bad state. Also, users might expect the process to exit > on error by setting exit_on_error = on. Your idea sounds like that we > have to ignore any errors happening after the local commit if they > don’t affect the transaction outcome. It’s too scary to me and I think > that it's a bad idea to blindly ignore all possible errors under such > conditions. That could make the thing worse and will likely be > foot-gun. It would be good if we can prove that it’s safe to ignore > those errors but not sure how we can at least for me. > > This situation is true even today; an error could happen after > committing the transaction. But I personally don’t want to add the > code that increases the likelihood. I'm not talking about the code simplicity here (actually, I haven't reviewed the code around prepare and commit in the patchyet...) Also, I don't understand well what you're trying to insist and what realistic situations you have in mind byciting exit_on_error, FATAL, PANIC and so on. I just asked (in a different part) why the client has to know the error. Just to be clear, I'm not saying that we should hide the error completely behind the scenes. For example, you can allowthe FDW to emit a WARNING if the DBMS-specific client driver returns an error when committing. Further, if you wantto allow the FDW to throw an ERROR when committing, the transaction manager in core can catch it by PG_TRY(), so thatit can report back successfull commit of the global transaction to the client while it leaves the handling of failedcommit of the FDW to the resolver. (I don't think we like to use PG_TRY() during transaction commit for performancereasons, though.) Let's give it a hundred steps and let's say we want to report the error of the committing FDW to the client. If that's thecase, we can use SQLSTATE 02xxx (Warning) and attach the error message. > Just to be clear, with your idea, we will ignore only ERROR or also > FATAL and PANIC? And if an error happens during committing one of the > prepared transactions on the foreign server, will we proceed with > committing other transactions or return OK to the client? Neither FATAL nor PANIC can be ignored. When FATAL, which means the termination of a particular session, the committingof the remote transaction should be taken over by the resolver. Not to mention PANIC; we can't do anything. Otherwise,we proceed with committing other FDWs, hand off the task of committing the failed FDW to the resolver, and reportsuccess to the client. If you're not convinced, I'd like to ask you to investigate the code of some Java EE app server,say GlassFish, and share with us how it handles an error during commit. Regards Takayuki Tsunakawa
On Wed, Jun 9, 2021 at 4:10 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Masahiko Sawada <sawada.mshk@gmail.com> > > On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > > > Then, in what kind of scenario are we talking about the difficulty, and how is > > it difficult to handle, when we adopt either the method 1 or 2? (I'd just like to > > have the same clear picture.) > > > > IMO, even though FDW's commit/rollback transaction code could be > > simple in some cases, I think we need to think that any kind of errors > > (or even FATAL or PANIC) could be thrown from the FDW code. It could > > be an error due to a temporary network problem, remote server down, > > driver’s unexpected error, or out of memory etc. Errors that happened > > after the local transaction commit doesn't affect the global > > transaction decision, as you mentioned. But the proccess or system > > could be in a bad state. Also, users might expect the process to exit > > on error by setting exit_on_error = on. Your idea sounds like that we > > have to ignore any errors happening after the local commit if they > > don’t affect the transaction outcome. It’s too scary to me and I think > > that it's a bad idea to blindly ignore all possible errors under such > > conditions. That could make the thing worse and will likely be > > foot-gun. It would be good if we can prove that it’s safe to ignore > > those errors but not sure how we can at least for me. > > > > This situation is true even today; an error could happen after > > committing the transaction. But I personally don’t want to add the > > code that increases the likelihood. > > I'm not talking about the code simplicity here (actually, I haven't reviewed the code around prepare and commit in thepatch yet...) Also, I don't understand well what you're trying to insist and what realistic situations you have in mindby citing exit_on_error, FATAL, PANIC and so on. I just asked (in a different part) why the client has to know the error. > > Just to be clear, I'm not saying that we should hide the error completely behind the scenes. For example, you can allowthe FDW to emit a WARNING if the DBMS-specific client driver returns an error when committing. Further, if you wantto allow the FDW to throw an ERROR when committing, the transaction manager in core can catch it by PG_TRY(), so thatit can report back successfull commit of the global transaction to the client while it leaves the handling of failedcommit of the FDW to the resolver. (I don't think we like to use PG_TRY() during transaction commit for performancereasons, though.) > > Let's give it a hundred steps and let's say we want to report the error of the committing FDW to the client. If that'sthe case, we can use SQLSTATE 02xxx (Warning) and attach the error message. > Maybe it's better to start a new thread to discuss this topic. If your idea is good, we can lower all error that happened after writing the commit record to warning, reducing the cases where the client gets confusion by receiving an error after the commit. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada <sawada.mshk@gmail.com> > Maybe it's better to start a new thread to discuss this topic. If your > idea is good, we can lower all error that happened after writing the > commit record to warning, reducing the cases where the client gets > confusion by receiving an error after the commit. No. It's an important part because it determines the 2PC behavior and performance. This discussion had started from theconcern about performance before Ikeda-san reported pathological results. Don't rush forward, hoping someone will committhe current patch. I'm afraid you just don't want to change your design and code. Let's face the real issue. As I said before, and as Ikeda-san's performance benchmark results show, I have to say the design isn't done sufficiently. I talked with Fujii-san the other day about this patch. The patch is already huge and it's difficult to decodehow the patch works, e.g., what kind of new WALs it emits, how many disk writes it adds, how the error is handled,whether/how it's different from the textbook or other existing designs, etc. What happend to my request to add suchdesign description to the following page, so that reviewers can consider the design before spending much time on lookingat the code? What's the situation of the new FDW API that should naturally accommodate other FDW implementations? Atomic Commit of Distributed Transactions https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions Design should come first. I don't think it's a sincere attitude to require reviewers to spend long time to read the designfrom huge code. Regards Takayuki Tsunakawa
At Tue, 8 Jun 2021 08:45:24 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > I think the discussion is based the behavior that any process that is > > responsible for finishing the 2pc-commit continue retrying remote > > commits until all of the remote-commits succeed. > > Thank you for coming back. We're talking about the first attempt to prepare and commit in each transaction, not the retrycase. If we accept each elementary-commit (via FDW connection) to fail, the parent(?) there's no way the root 2pc-commit can succeed. How can we ignore the fdw-error in that case? > > > Throws: HeuristicMixedException > > > Thrown to indicate that a heuristic decision was made and that some > > relevant updates have been > > > committed while others have been rolled back. > > > I'm not sure about how JTA works in detail, but doesn't > > UserTransaction.commit() return HeuristicMixedExcpetion when some of > > relevant updates have been committed but other not? Isn't it the same > > state with the case where some of the remote servers failed on > > remote-commit while others are succeeded? > > No. Taking the description literally and considering the relevant XA specification, it's not about the remote commit failure. The remote server is not allowed to fail the commit once it has reported successful prepare, which is the contractof 2PC. HeuristicMixedException is about the manual resolution, typically by the DBA, using the DBMS-specific toolor the standard commit()/rollback() API. Mmm. The above seems as if saying that 2pc-comit does not interact with remotes. The interface contract does not cover everything that happens in the real world. If remote-commit fails, that is just an issue outside of the 2pc world. In reality remote-commit may fail for all reasons. https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distributed-transaction-that-uses-jta-methods > } catch (javax.transaction.xa.XAException xae) > { // Distributed transaction failed, so roll it back. > // Report XAException on prepare/commit. This suggests that both XAResoruce.prepare() and commit() can throw a exception. > > (I guess that > > UserTransaction.commit() would throw RollbackException if > > remote-prepare has been failed for any of the remotes.) > > Correct. So UserTransaction.commit() does not throw the same exception if remote-commit fails. Isn't the HeuristicMixedExcpetion the exception thrown in that case? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Transactions involving multiple postgres foreign servers, take 2
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > If we accept each elementary-commit (via FDW connection) to fail, the > parent(?) there's no way the root 2pc-commit can succeed. How can we > ignore the fdw-error in that case? No, we don't ignore the error during FDW commit. As mentioned at the end of this mail, the question is how the FDW reportsthe eror to the caller (transaction manager in Postgres core), and how we should handle it. As below, Glassfish catches the resource manager's error during commit, retries the commit if the error is transient or communicationfailure, and hands off the processing of failed commit to the recovery manager. (I used all of my energy today;I'd be grateful if someone could figure out whether Glassfish reports the error to the application.) [XATerminatorImpl.java] public void commit(Xid xid, boolean onePhase) throws XAException { ... } else { coord.commit(); } [TopCoordinator.java] // Commit all participants. If a fatal error occurs during // this method, then the process must be ended with a fatal error. ... try { participants.distributeCommit(); } catch (Throwable exc) { [RegisteredResources.java] void distributeCommit() throws HeuristicMixed, HeuristicHazard, NotPrepared { ... // Browse through the participants, committing them. The following is // intended to be done asynchronously as a group of operations. ... // Tell the resource to commit. // Catch any exceptions here; keep going until // no exception is left. ... // If the exception is neither TRANSIENT or // COMM_FAILURE, it is unexpected, so display a // message and give up with this Resource. ... // For TRANSIENT or COMM_FAILURE, wait // for a while, then retry the commit. ... // If the retry limit has been exceeded, // end the process with a fatal error. ... if (!transactionCompleted) { if (coord != null) RecoveryManager.addToIncompleTx(coord, true); > > No. Taking the description literally and considering the relevant XA > specification, it's not about the remote commit failure. The remote server is > not allowed to fail the commit once it has reported successful prepare, which is > the contract of 2PC. HeuristicMixedException is about the manual resolution, > typically by the DBA, using the DBMS-specific tool or the standard > commit()/rollback() API. > > Mmm. The above seems as if saying that 2pc-comit does not interact > with remotes. The interface contract does not cover everything that > happens in the real world. If remote-commit fails, that is just an > issue outside of the 2pc world. In reality remote-commit may fail for > all reasons. The following part of XA specification is relevant. We're considering to model the FDW 2PC interface based on XA, becauseit seems like the only standard interface and thus other FDWS would naturally take advantage of, aren't we? Then,we need to take care of such things as this. The interface design is not easy. So, proper design and its review shouldcome first, before going deeper into the huge code patch. 2.3.3 Heuristic Branch Completion -------------------------------------------------- Some RMs may employ heuristic decision-making: an RM that has prepared to commit a transaction branch may decide to commit or roll back its work independently of the TM. It could then unlock shared resources. This may leave them in an inconsistent state. When the TM ultimately directs an RM to complete the branch, the RM may respond that it has already done so. The RM reports whether it committed the branch, rolled it back, or completed it with mixed results (committed some work and rolled back other work). An RM that reports heuristic completion to the TM must not discard its knowledge of the transaction branch. The TM calls the RM once more to authorise it to forget the branch. This requirement means that the RM must notify the TM of all heuristic decisions, even those that match the decision the TM requested. The referenced OSI DTP specifications (model) and (service) define heuristics more precisely. -------------------------------------------------- > https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distr > ibuted-transaction-that-uses-jta-methods > This suggests that both XAResoruce.prepare() and commit() can throw a > exception. Yes, XAResource() can throw an exception: void commit(Xid xid, boolean onePhase) throws XAException Throws: XAException An error has occurred. Possible XAExceptions are XA_HEURHAZ, XA_HEURCOM, XA_HEURRB, XA_HEURMIX, XAER_RMERR, XAER_RMFAIL, XAER_NOTA, XAER_INVAL, or XAER_PROTO. This is equivalent to xa_commit() in the XA specification. xa_commit() can return an error code that have the same namesas above. The question we're trying to answer here is: * How such an error should be handled? Glassfish (and possibly other Java EE servers) catch the error, continue to commit the rest of participants, and handle thefailed resource manager's commit in the background. In Postgres, if we allow FDWs to do ereport(ERROR), how can we dosimilar things? * Should we report the error to the client? If yes, should it be reported as a failure of commit, or as an informationalmessage (WARNING) of a successful commit? Why does the client want to know the error, where the global transaction'scommit has been promised? Regards Takayuki Tsunakawa
On Fri, Jun 4, 2021 at 4:04 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > Why does the user have to get an error? Once the local transaction has been prepared, which means all remote ones alsohave been prepared, the whole transaction is determined to commit. So, the user doesn't have to receive an error aslong as the local node is alive. That is completely unrealistic. As Sawada-san has pointed out repeatedly, there are tons of things that can go wrong even after the remote side has prepared the transaction. Preparing a transaction only promises that the remote side will let you commit the transaction upon request. It doesn't guarantee that you'll be able to make the request. Like Sawada-san says, network problems, out of memory issues, or many other things could stop that from happening. Someone could come along in another session and run "ROLLBACK PREPARED" on the remote side, and now the "COMMIT PREPARED" will never succeed no matter how many times you try it. At least, not unless someone goes and creates a new prepared transaction with the same 2PC identifier, but then you won't be committing the correct transaction anyway. Or someone could take the remote server and drop it in a volcano. How do you propose that we avoid giving the user an error after the remote server has been dropped into a volcano, even though the local node is still alive? Also, leaving aside theoretical arguments, I think it's not realistically possible for an FDW author to write code to commit a prepared transaction that will be safe in the context of running late in PrepareTransaction(), after we've already done RecordTransactionCommit(). Such code can't avoid throwing errors because it can't avoid performing operations and allocating memory. It's already been mentioned that, if an ERROR is thrown, it would be reported to the user in place of the COMMIT acknowledgement that they are expecting. Now, it has also been suggested that we could downgrade the ERROR to a WARNING and still report the COMMIT. That doesn't sound easy to do, because when the ERROR happens, control is going to jump to AbortTransaction(). But even if you could hack it so it works like that, it doesn't really solve the problem. What about all of the other servers where the prepared transaction also needs to be committed? In the design of PostgreSQL, in all circumstances, the way you recover from an error is to abort the transaction. That is what brings the system back to a clean state. You can't simply ignore the requirement to abort the transaction and keep doing more work. It will never be reliable, and Tom will instantaneously demand that any code works like that be reverted -- and for good reason. I am not sure that it's 100% impossible to find a way to solve this problem without just having the resolver do all the work, but I think it's going to be extremely difficult. We tried to figure out some vaguely similar things while working on undo, and it really didn't go very well. The later stages of CommitTransaction() and AbortTransaction() are places where very few kinds of code are safe to execute, and finding a way to patch around that problem is not simple either. If the resolver performance is poor, perhaps we could try to find a way to improve it. I don't know. But I don't think it does any good to say, well, no errors can occur after the remote transaction is prepared. That's clearly incorrect. -- Robert Haas EDB: http://www.enterprisedb.com
RE: Transactions involving multiple postgres foreign servers, take 2
From: Robert Haas <robertmhaas@gmail.com> > That is completely unrealistic. As Sawada-san has pointed out > repeatedly, there are tons of things that can go wrong even after the > remote side has prepared the transaction. Preparing a transaction only > promises that the remote side will let you commit the transaction upon > request. It doesn't guarantee that you'll be able to make the request. > Like Sawada-san says, network problems, out of memory issues, or many > other things could stop that from happening. Someone could come along > in another session and run "ROLLBACK PREPARED" on the remote side, and > now the "COMMIT PREPARED" will never succeed no matter how many times > you try it. At least, not unless someone goes and creates a new > prepared transaction with the same 2PC identifier, but then you won't > be committing the correct transaction anyway. Or someone could take > the remote server and drop it in a volcano. How do you propose that we > avoid giving the user an error after the remote server has been > dropped into a volcano, even though the local node is still alive? I understand that. As I cited yesterday and possibly before, that's why xa_commit() returns various return codes. So, Ihave never suggested that FDWs should not report an error and always report success for the commit request. They shouldbe allowed to report an error. The question I have been asking is how. With that said, we should only have two options; one is the return value of theFDW commit routine, and the other is via ereport(ERROR). I suggested the possibility of the former, because if the FDWdoes ereport(ERROR), Postgres core (transaction manager) may have difficulty in handling the rest of the participants. > Also, leaving aside theoretical arguments, I think it's not > realistically possible for an FDW author to write code to commit a > prepared transaction that will be safe in the context of running late > in PrepareTransaction(), after we've already done > RecordTransactionCommit(). Such code can't avoid throwing errors > because it can't avoid performing operations and allocating memory. I'm not completely sure about this. I thought (and said) that the only thing the FDW does would be to send a commit requestthrough an existing connection. So, I think it's not a severe restriction to require FDWs to do ereport(ERROR) duringcommits (of the second phase of 2PC.) > It's already been mentioned that, if an ERROR is thrown, it would be > reported to the user in place of the COMMIT acknowledgement that they > are expecting. Now, it has also been suggested that we could downgrade > the ERROR to a WARNING and still report the COMMIT. That doesn't sound > easy to do, because when the ERROR happens, control is going to jump > to AbortTransaction(). But even if you could hack it so it works like > that, it doesn't really solve the problem. What about all of the other > servers where the prepared transaction also needs to be committed? In > the design of PostgreSQL, in all circumstances, the way you recover > from an error is to abort the transaction. That is what brings the > system back to a clean state. You can't simply ignore the requirement > to abort the transaction and keep doing more work. It will never be > reliable, and Tom will instantaneously demand that any code works like > that be reverted -- and for good reason. (I took "abort" as the same as "rollback" here.) Once we've sent commit requests to some participants, we can't abort thetransaction. If one FDW returned an error halfway, we need to send commit requests to the rest of participants. It's a design question, as I repeatedly said, whether and how we should report the error of some participants to the client. For instance, how should we report the errors of multiple participants? Concatenate those error messages? Anyway, we should design the interface first, giving much thought and respecting the ideas of predecessors (TX/XA, MS DTC,JTA/JTS). Otherwise, we may end up like "We implemented like this, so the interface is like this and it can only behavelike this, although you may find it strange..." That might be a situation similar to what your comment "the designof PostgreSQL, in all circumstances, the way you recover from an error is to abort the transaction" suggests -- Postgresdoesn't have statement-level rollback. > I am not sure that it's 100% impossible to find a way to solve this > problem without just having the resolver do all the work, but I think > it's going to be extremely difficult. We tried to figure out some > vaguely similar things while working on undo, and it really didn't go > very well. The later stages of CommitTransaction() and > AbortTransaction() are places where very few kinds of code are safe to > execute, and finding a way to patch around that problem is not simple > either. If the resolver performance is poor, perhaps we could try to > find a way to improve it. I don't know. But I don't think it does any > good to say, well, no errors can occur after the remote transaction is > prepared. That's clearly incorrect. I don't think the resolver-based approach would bring us far enough. It's fundamentally a bottleneck. Such a backgroundprocess should only handle commits whose requests failed to be sent due to server down. My requests are only twofold and haven't changed for long: design the FDW interface that implementors can naturally follow,and design to ensure performance. Regards Takayuki Tsunakawa
On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > I understand that. As I cited yesterday and possibly before, that's why xa_commit() returns various return codes. So,I have never suggested that FDWs should not report an error and always report success for the commit request. They shouldbe allowed to report an error. In the text to which I was responding it seemed like you were saying the opposite. Perhaps I misunderstood. > The question I have been asking is how. With that said, we should only have two options; one is the return value of theFDW commit routine, and the other is via ereport(ERROR). I suggested the possibility of the former, because if the FDWdoes ereport(ERROR), Postgres core (transaction manager) may have difficulty in handling the rest of the participants. I don't think that is going to work. It is very difficult to write code that doesn't ever ERROR in PostgreSQL. It is not impossible if the operation is trivial enough, but I think you're greatly underestimating the complexity of committing the remote transaction. If somebody had designed PostgreSQL so that every function returns a return code and every time you call some other function you check that return code and pass any error up to your own caller, then there would be no problem here. But in fact the design was that at the first sign of trouble you throw an ERROR. It's not easy to depart from that programming model in just one place. > > Also, leaving aside theoretical arguments, I think it's not > > realistically possible for an FDW author to write code to commit a > > prepared transaction that will be safe in the context of running late > > in PrepareTransaction(), after we've already done > > RecordTransactionCommit(). Such code can't avoid throwing errors > > because it can't avoid performing operations and allocating memory. > > I'm not completely sure about this. I thought (and said) that the only thing the FDW does would be to send a commit requestthrough an existing connection. So, I think it's not a severe restriction to require FDWs to do ereport(ERROR) duringcommits (of the second phase of 2PC.) To send a commit request through an existing connection, you have to send some bytes over the network using a send() or write() system call. That can fail. Then you have to read the response back over the network using recv() or read(). That can also fail. You also need to parse the result that you get from the remote side, which can also fail, because you could get back garbage for some reason. And depending on the details, you might first need to construct the message you're going to send, which might be able to fail too. Also, the data might be encrypted using SSL, so you might have to decrypt it, which can also fail, and you might need to encrypt data before sending it, which can fail. In fact, if you're using the OpenSSL, trying to call SSL_read() or SSL_write() can both read and write data from the socket, even multiple times, so you have extra opportunities to fail. > (I took "abort" as the same as "rollback" here.) Once we've sent commit requests to some participants, we can't abortthe transaction. If one FDW returned an error halfway, we need to send commit requests to the rest of participants. I understand that it's not possible to abort the local transaction to abort after it's been committed, but that doesn't mean that we're going to be able to send the commit requests to the rest of the participants. We want to be able to do that, certainly, but there's no guarantee that it's actually possible. Again, the remote servers may be dropped into a volcano, or less seriously, we may not be able to access them. Also, someone may kill off our session. > It's a design question, as I repeatedly said, whether and how we should report the error of some participants to the client. For instance, how should we report the errors of multiple participants? Concatenate those error messages? Sure, I agree that there are some questions about how to report errors. > Anyway, we should design the interface first, giving much thought and respecting the ideas of predecessors (TX/XA, MS DTC,JTA/JTS). Otherwise, we may end up like "We implemented like this, so the interface is like this and it can only behavelike this, although you may find it strange..." That might be a situation similar to what your comment "the designof PostgreSQL, in all circumstances, the way you recover from an error is to abort the transaction" suggests -- Postgresdoesn't have statement-level rollback. I think that's a valid concern, but we also have to have a plan that is realistic. Some things are indeed not possible in PostgreSQL's design. Also, some of these problems are things everyone has to somehow confront. There's no database doing 2PC that can't have a situation where one of the machines disappears unexpectedly due to some natural disaster or administrator interference. It might be the case that our inability to do certain things safely during transaction commit puts us out of compliance with the spec, but it can't be the case that some other system has no possible failures during transaction commit. The problem of the network potentially being disconnected between one packet and the next exists in every system. > I don't think the resolver-based approach would bring us far enough. It's fundamentally a bottleneck. Such a backgroundprocess should only handle commits whose requests failed to be sent due to server down. Why is it fundamentally a bottleneck? It seems to me in some cases it could scale better than any other approach. If we have to commit on 100 shards in only one process we can only do those commits one at a time. If we can use resolver processes we could do all 100 at once if the user can afford to run that many resolvers, which should be way faster. It is true that if the resolver does not have a connection open and must open one, that might be slow, but presumably after that it can keep the connection open and reuse it for subsequent distributed transactions. I don't really see why that should be particularly slow. -- Robert Haas EDB: http://www.enterprisedb.com
On 2021/05/11 13:37, Masahiko Sawada wrote: > I've attached the updated patches that incorporated comments from > Zhihong and Ikeda-san. Thanks for updating the patches! I'm still reading these patches, but I'd like to share some review comments that I found so far. (1) +/* Remove the foreign transaction from FdwXactParticipants */ +void +FdwXactUnregisterXact(UserMapping *usermapping) +{ + Assert(IsTransactionState()); + RemoveFdwXactEntry(usermapping->umid); +} Currently there is no user of FdwXactUnregisterXact(). This function should be removed? (2) When I ran the regression test, I got the following failure. ========= Contents of ./src/test/modules/test_fdwxact/regression.diffs diff -U3 /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out --- /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out 2021-06-10 02:19:43.808622747+0000 +++ /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out 2021-06-10 02:29:53.452410462+0000 @@ -174,7 +174,7 @@ SELECT count(*) FROM pg_foreign_xacts; count ------- - 1 + 4 (1 row) (3) + errmsg("could not read foreign transaction state from xlog at %X/%X", + (uint32) (lsn >> 32), + (uint32) lsn))); LSN_FORMAT_ARGS() should be used? (4) +extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, + int len); Since RecreateFdwXactFile() is used only in fdwxact.c, the above "extern" is not necessary? (5) +2. Pre-Commit phase (1st phase of two-phase commit) +we record the corresponding WAL indicating that the foreign server is involved +with the current transaction before doing PREPARE all foreign transactions. +Thus, in case we loose connectivity to the foreign server or crash ourselves, +we will remember that we might have prepared tranascation on the foreign +server, and try to resolve it when connectivity is restored or after crash +recovery. So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there for WAL record to be replicated to the standby if sync replication is enabled? Otherwise, when the failover happens, new primary (past-standby) might not have enough XLOG_FDWXACT_INSERT WAL records and might fail to find some in-doubt foreign transactions. (6) XLogFlush() is called for each foreign transaction. So if there are many foreign transactions, XLogFlush() is called too frequently. Which might cause unnecessary performance overhead? Instead, for example, we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions() after inserting all WAL records for all foreign transactions? (7) /* Open connection; report that we'll create a prepared statement. */ fmstate->conn = GetConnection(user, true, &fmstate->conn_state); + MarkConnectionModified(user); MarkConnectionModified() should be called also when TRUNCATE on a foreign table is executed? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
From: Robert Haas <robertmhaas@gmail.com> > On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > The question I have been asking is how. With that said, we should only have > two options; one is the return value of the FDW commit routine, and the other is > via ereport(ERROR). I suggested the possibility of the former, because if the > FDW does ereport(ERROR), Postgres core (transaction manager) may have > difficulty in handling the rest of the participants. > > I don't think that is going to work. It is very difficult to write > code that doesn't ever ERROR in PostgreSQL. It is not impossible if > the operation is trivial enough, but I think you're greatly > underestimating the complexity of committing the remote transaction. > If somebody had designed PostgreSQL so that every function returns a > return code and every time you call some other function you check that > return code and pass any error up to your own caller, then there would > be no problem here. But in fact the design was that at the first sign > of trouble you throw an ERROR. It's not easy to depart from that > programming model in just one place. > > I'm not completely sure about this. I thought (and said) that the only thing > the FDW does would be to send a commit request through an existing > connection. So, I think it's not a severe restriction to require FDWs to do > ereport(ERROR) during commits (of the second phase of 2PC.) > > To send a commit request through an existing connection, you have to > send some bytes over the network using a send() or write() system > call. That can fail. Then you have to read the response back over the > network using recv() or read(). That can also fail. You also need to > parse the result that you get from the remote side, which can also > fail, because you could get back garbage for some reason. And > depending on the details, you might first need to construct the > message you're going to send, which might be able to fail too. Also, > the data might be encrypted using SSL, so you might have to decrypt > it, which can also fail, and you might need to encrypt data before > sending it, which can fail. In fact, if you're using the OpenSSL, > trying to call SSL_read() or SSL_write() can both read and write data > from the socket, even multiple times, so you have extra opportunities > to fail. I know sending a commit request may get an error from various underlying functions, but we're talking about the client side,not the Postgres's server side that could unexpectedly ereport(ERROR) somewhere. So, the new FDW commit routine won'tlose control and can return an error code as its return value. For instance, the FDW commit routine for DBMS-X wouldtypically be: int DBMSXCommit(...) { int ret; /* extract info from the argument to pass to xa_commit() */ ret = DBMSX_xa_commit(...); /* This is the actual commit function which is exposed to the app server (e.g. Tuxedo) through the xa_commit() interface*/ /* map xa_commit() return values to the corresponding return values of the FDW commit routine */ switch (ret) { case XA_RMERR: ret = ...; break; ... } return ret; } > I think that's a valid concern, but we also have to have a plan that > is realistic. Some things are indeed not possible in PostgreSQL's > design. Also, some of these problems are things everyone has to > somehow confront. There's no database doing 2PC that can't have a > situation where one of the machines disappears unexpectedly due to > some natural disaster or administrator interference. It might be the > case that our inability to do certain things safely during transaction > commit puts us out of compliance with the spec, but it can't be the > case that some other system has no possible failures during > transaction commit. The problem of the network potentially being > disconnected between one packet and the next exists in every system. So, we need to design how commit behaves from the user's perspective. That's the functional design. We should figure outwhat's the desirable response of commit first, and then see if we can implement it or have to compromise in some way. I think we can reference the X/Open TX standard and/or JTS (Java Transaction Service) specification (I haven't had achance to read them yet, though.) Just in case we can't find the requested commit behavior in the volcano case from thosespecifications, ... (I'm hesitant to say this because it may be hard,) it's desirable to follow representative productssuch as Tuxedo and GlassFish (the reference implementation of Java EE specs.) > > I don't think the resolver-based approach would bring us far enough. It's > fundamentally a bottleneck. Such a background process should only handle > commits whose requests failed to be sent due to server down. > > Why is it fundamentally a bottleneck? It seems to me in some cases it > could scale better than any other approach. If we have to commit on > 100 shards in only one process we can only do those commits one at a > time. If we can use resolver processes we could do all 100 at once if > the user can afford to run that many resolvers, which should be way > faster. It is true that if the resolver does not have a connection > open and must open one, that might be slow, but presumably after that > it can keep the connection open and reuse it for subsequent > distributed transactions. I don't really see why that should be > particularly slow. Concurrent transactions are serialized at the resolver. I heard that the current patch handles 2PC like this: the TM (transactionmanager in Postgres core) requests prepare to the resolver, the resolver sends prepare to the remote server andwait for reply, the TM gets back control from the resolver, TM requests commit to the resolver, the resolver sends committo the remote server and wait for reply, and TM gets back control. The resolver handles one transaction at a time. In regard to the case where one session has to commit on multiple remote servers, we're talking about the asynchronous interfacejust like what the XA standard provides. Regards Takayuki Tsunakawa
On Sun, Jun 13, 2021 at 10:04 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > I know sending a commit request may get an error from various underlying functions, but we're talking about the clientside, not the Postgres's server side that could unexpectedly ereport(ERROR) somewhere. So, the new FDW commit routinewon't lose control and can return an error code as its return value. For instance, the FDW commit routine for DBMS-Xwould typically be: > > int > DBMSXCommit(...) > { > int ret; > > /* extract info from the argument to pass to xa_commit() */ > > ret = DBMSX_xa_commit(...); > /* This is the actual commit function which is exposed to the app server (e.g. Tuxedo) through the xa_commit()interface */ > > /* map xa_commit() return values to the corresponding return values of the FDW commit routine */ > switch (ret) > { > case XA_RMERR: > ret = ...; > break; > ... > } > > return ret; > } Well, we're talking about running this commit routine from within CommitTransaction(), right? So I think it is in fact running in the server. And if that's so, then you have to worry about how to make it respond to interrupts. You can't just call some functions DBMSX_xa_commit() and wait for infinite time for it to return. Look at pgfdw_get_result() for an example of what real code to do this looks like. > So, we need to design how commit behaves from the user's perspective. That's the functional design. We should figureout what's the desirable response of commit first, and then see if we can implement it or have to compromise in someway. I think we can reference the X/Open TX standard and/or JTS (Java Transaction Service) specification (I haven'thad a chance to read them yet, though.) Just in case we can't find the requested commit behavior in the volcano casefrom those specifications, ... (I'm hesitant to say this because it may be hard,) it's desirable to follow representativeproducts such as Tuxedo and GlassFish (the reference implementation of Java EE specs.) Honestly, I am not quite sure what any specification has to say about this. We're talking about what happens when a user does something with a foreign table and then type COMMIT. That's all about providing a set of behaviors that are consistent with how PostgreSQL works in other situations. You can't negotiate away the requirement to handle errors in a way that works with PostgreSQL's infrastructure, or the requirement that any length operation handle interrupts properly, by appealing to a specification. > Concurrent transactions are serialized at the resolver. I heard that the current patch handles 2PC like this: the TM (transactionmanager in Postgres core) requests prepare to the resolver, the resolver sends prepare to the remote server andwait for reply, the TM gets back control from the resolver, TM requests commit to the resolver, the resolver sends committo the remote server and wait for reply, and TM gets back control. The resolver handles one transaction at a time. That sounds more like a limitation of the present implementation than a fundamental problem. We shouldn't reject the idea of having a resolver process handle this just because the initial implementation might be slow. If there's no fundamental problem with the idea, parallelism and concurrency can be improved in separate patches at a later time. It's much more important at this stage to reject ideas that are not theoretically sound. -- Robert Haas EDB: http://www.enterprisedb.com
RE: Transactions involving multiple postgres foreign servers, take 2
From: Robert Haas <robertmhaas@gmail.com> > Well, we're talking about running this commit routine from within > CommitTransaction(), right? So I think it is in fact running in the > server. And if that's so, then you have to worry about how to make it > respond to interrupts. You can't just call some functions > DBMSX_xa_commit() and wait for infinite time for it to return. Look at > pgfdw_get_result() for an example of what real code to do this looks > like. Postgres can do that, but other implementations can not necessaily do it, I'm afraid. But before that, the FDW interfacedocumentation doesn't describe anything about how to handle interrupts. Actually, odbc_fdw and possibly other FDWsdon't respond to interrupts. > Honestly, I am not quite sure what any specification has to say about > this. We're talking about what happens when a user does something with > a foreign table and then type COMMIT. That's all about providing a set > of behaviors that are consistent with how PostgreSQL works in other > situations. You can't negotiate away the requirement to handle errors > in a way that works with PostgreSQL's infrastructure, or the > requirement that any length operation handle interrupts properly, by > appealing to a specification. What we're talking here is mainly whether commit should return success or failure when some participants failed to commitin the second phase of 2PC. That's new to Postgres, isn't it? Anyway, we should respect existing relevant specificationsand (well-known) implementations before we conclude that we have to devise our own behavior. > That sounds more like a limitation of the present implementation than > a fundamental problem. We shouldn't reject the idea of having a > resolver process handle this just because the initial implementation > might be slow. If there's no fundamental problem with the idea, > parallelism and concurrency can be improved in separate patches at a > later time. It's much more important at this stage to reject ideas > that are not theoretically sound. We talked about that, and unfortunately, I haven't seen a good and feasible idea to enhance the current approach that involvesthe resolver from the beginning of 2PC processing. Honestly, I don't understand why such a "one prepare, one commitin turn" serialization approach can be allowed in PostgreSQL where developers pursue best performance and even triesto refrain from adding an if statement in a hot path. As I showed and Ikeda-san said, other implementations have eachclient session send prepare and commit requests. That's a natural way to achieve reasonable concurrency and performance. Regards Takayuki Tsunakawa
On Tue, Jun 15, 2021 at 5:51 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > Postgres can do that, but other implementations can not necessaily do it, I'm afraid. But before that, the FDW interfacedocumentation doesn't describe anything about how to handle interrupts. Actually, odbc_fdw and possibly other FDWsdon't respond to interrupts. Well, I'd consider that a bug. > What we're talking here is mainly whether commit should return success or failure when some participants failed to commitin the second phase of 2PC. That's new to Postgres, isn't it? Anyway, we should respect existing relevant specificationsand (well-known) implementations before we conclude that we have to devise our own behavior. Sure ... but we can only decide to do things that the implementation can support, and running code that might fail after we've committed locally isn't one of them. > We talked about that, and unfortunately, I haven't seen a good and feasible idea to enhance the current approach that involvesthe resolver from the beginning of 2PC processing. Honestly, I don't understand why such a "one prepare, one commitin turn" serialization approach can be allowed in PostgreSQL where developers pursue best performance and even triesto refrain from adding an if statement in a hot path. As I showed and Ikeda-san said, other implementations have eachclient session send prepare and commit requests. That's a natural way to achieve reasonable concurrency and performance. I think your comparison here is quite unfair. We work hard to add overhead in hot paths where it might cost, but the FDW case involves a network round-trip anyway, so the cost of an if-statement would surely be insignificant. I feel like you want to assume without any evidence that a local resolver can never be quick enough, even thought the cost of IPC between local processes shouldn't be that high compared to a network round trip. But you also want to suppose that we can run code that might fail late in the commit process even though there is lots of evidence that this will cause problems, starting with the code comments that clearly say so. -- Robert Haas EDB: http://www.enterprisedb.com
RE: Transactions involving multiple postgres foreign servers, take 2
From: Robert Haas <robertmhaas@gmail.com> > On Tue, Jun 15, 2021 at 5:51 AM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > Postgres can do that, but other implementations can not necessaily do it, I'm > afraid. But before that, the FDW interface documentation doesn't describe > anything about how to handle interrupts. Actually, odbc_fdw and possibly > other FDWs don't respond to interrupts. > > Well, I'd consider that a bug. I kind of hesitate to call it a bug... Unlike libpq, JDBC (for jdbc_fdw) doesn't have asynchronous interface, and Oracleand PostgreSQL ODBC drivers don't support asynchronous interface. Even with libpq, COMMIT (and other SQL commands)is not always cancellable, e.g., when the (NFS) storage server gets hand while writing WAL. > > What we're talking here is mainly whether commit should return success or > failure when some participants failed to commit in the second phase of 2PC. > That's new to Postgres, isn't it? Anyway, we should respect existing relevant > specifications and (well-known) implementations before we conclude that we > have to devise our own behavior. > > Sure ... but we can only decide to do things that the implementation > can support, and running code that might fail after we've committed > locally isn't one of them. Yes, I understand that Postgres may not be able to conform to specifications or well-known implementations in all aspects. I'm just suggesting to take the stance "We carefully considered established industry specifications that we canbase on, did our best to design the desirable behavior learned from them, but couldn't implement a few parts", ratherthan "We did what we like and can do." > I think your comparison here is quite unfair. We work hard to add > overhead in hot paths where it might cost, but the FDW case involves a > network round-trip anyway, so the cost of an if-statement would surely > be insignificant. I feel like you want to assume without any evidence > that a local resolver can never be quick enough, even thought the cost > of IPC between local processes shouldn't be that high compared to a > network round trip. But you also want to suppose that we can run code > that might fail late in the commit process even though there is lots > of evidence that this will cause problems, starting with the code > comments that clearly say so. There may be better examples. What I wanted to say is just that I believe it's not PG developers' standard to allow serialprepare and commit. Let's make it clear what's difficult to do the 2PC from each client session in normal operationwithout going through the resolver. Regards Takayuki Tsunakawa
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Sawada-san, I also tried to play a bit with the latest patches similar to Ikeda-san, and with foreign 2PC parameter enabled/required. > > >> b. about performance bottleneck (just share my simple benchmark > > >> results) > > >> > > >> The resolver process can be performance bottleneck easily although > > >> I think some users want this feature even if the performance is not so > good. > > >> > > >> I tested with very simple workload in my laptop. > > >> > > >> The test condition is > > >> * two remote foreign partitions and one transaction inserts an > > >> entry in each partitions. > > >> * local connection only. If NW latency became higher, the > > >> performance became worse. > > >> * pgbench with 8 clients. > > >> > > >> The test results is the following. The performance of 2PC is only > > >> 10% performance of the one of without 2PC. > > >> > > >> * with foreign_twophase_commit = requried > > >> -> If load with more than 10TPS, the number of unresolved foreign > > >> -> transactions > > >> is increasing and stop with the warning "Increase > > >> max_prepared_foreign_transactions". > > > > > > What was the value of max_prepared_foreign_transactions? > > > > Now, I tested with 200. > > > > If each resolution is finished very soon, I thought it's enough > > because 8clients x 2partitions = 16, though... But, it's difficult how > > to know the stable values. > > During resolving one distributed transaction, the resolver needs both one > round trip and fsync-ing WAL record for each foreign transaction. > Since the client doesn’t wait for the distributed transaction to be resolved, > the resolver process can be easily bottle-neck given there are 8 clients. > > If foreign transaction resolution was resolved synchronously, 16 would > suffice. I tested the V36 patches on my 16-core machines. I setup two foreign servers (F1, F2) . F1 has addressbook table. F2 has pgbench tables (scale factor = 1). There is also 1 coordinator (coor) server where I created user mapping to access the foreign servers. I executed the benchmark measurement on coordinator. My custom scripts are setup in a way that queries from coordinator would have to access the two foreign servers. Coordinator: max_prepared_foreign_transactions = 200 max_foreign_transaction_resolvers = 1 foreign_twophase_commit = required Other external servers 1 & 2 (F1 & F2): max_prepared_transactions = 100 [select.sql] \set int random(1, 100000) BEGIN; SELECT ad.name, ad.age, ac.abalance FROM addressbook ad, pgbench_accounts ac WHERE ad.id = :int AND ad.id = ac.aid; COMMIT; I then executed: pgbench -r -c 2 -j 2 -T 60 -f select.sql coor While there were no problems with 1-2 clients, I started having problems when running the benchmark with more than 3 clients. pgbench -r -c 4 -j 4 -T 60 -f select.sql coor I got the following error on coordinator: [95396] ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422 [95396] STATEMENT: COMMIT; WARNING: there is no transaction in progress pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 with IDfx_151455979_1216200_16422 Here's the log on foreign server 2 <F2> matching the above error: <F2> LOG: statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422' <F2> ERROR: maximum number of prepared transactions reached <F2> HINT: Increase max_prepared_transactions (currently 100). <F2> STATEMENT: PREPARE TRANSACTION 'fx_151455979_1216200_16422' So I increased the max_prepared_transactions of <F1> and <F2> from 100 to 200. Then I got the error: [146926] ERROR: maximum number of foreign transactions reached [146926] HINT: Increase max_prepared_foreign_transactions: "200". So I increased the max_prepared_foreign_transactions to "300", and got the same error of need to increase the max_prepared_transactions of foreign servers. I just can't find the right tuning values for this. It seems that we always run out of memory in FdwXactState insert_fdwxact with multiple concurrent connections during PREPARE TRANSACTION. This one I only encountered for SELECT benchmark. Although I've got no problems with multiple connections for my custom scripts for UPDATE and INSERT benchmarks when I tested up to 30 clients. Would the following possibly solve this bottleneck problem? > > > To speed up the foreign transaction resolution, some ideas have been > > > discussed. As another idea, how about launching resolvers for each > > > foreign server? That way, we resolve foreign transactions on each > > > foreign server in parallel. If foreign transactions are concentrated > > > on the particular server, we can have multiple resolvers for the one > > > foreign server. It doesn’t change the fact that all foreign > > > transaction resolutions are processed by resolver processes. > > > > Awesome! There seems to be another pros that even if a foreign server > > is temporarily busy or stopped due to fail over, other foreign > > server's transactions can be resolved. > > Yes. We also might need to be careful about the order of foreign transaction > resolution. I think we need to resolve foreign transactions in arrival order at > least within a foreign server. Regards, Kirk Jamison
On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2021/05/11 13:37, Masahiko Sawada wrote: > > I've attached the updated patches that incorporated comments from > > Zhihong and Ikeda-san. > > Thanks for updating the patches! > > I'm still reading these patches, but I'd like to share some review comments > that I found so far. Thank you for the comments! > > (1) > +/* Remove the foreign transaction from FdwXactParticipants */ > +void > +FdwXactUnregisterXact(UserMapping *usermapping) > +{ > + Assert(IsTransactionState()); > + RemoveFdwXactEntry(usermapping->umid); > +} > > Currently there is no user of FdwXactUnregisterXact(). > This function should be removed? I think that this function can be used by other FDW implementations to unregister foreign transaction entry, although there is no use case in postgres_fdw. This function corresponds to xa_unreg in the XA specification. > > > (2) > When I ran the regression test, I got the following failure. > > ========= Contents of ./src/test/modules/test_fdwxact/regression.diffs > diff -U3 /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out > --- /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out 2021-06-10 02:19:43.808622747+0000 > +++ /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out 2021-06-10 02:29:53.452410462+0000 > @@ -174,7 +174,7 @@ > SELECT count(*) FROM pg_foreign_xacts; > count > ------- > - 1 > + 4 > (1 row) WIll fix. > > > (3) > + errmsg("could not read foreign transaction state from xlog at %X/%X", > + (uint32) (lsn >> 32), > + (uint32) lsn))); > > LSN_FORMAT_ARGS() should be used? Agreed. > > > (4) > +extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, > + int len); > > Since RecreateFdwXactFile() is used only in fdwxact.c, > the above "extern" is not necessary? Right. > > > (5) > +2. Pre-Commit phase (1st phase of two-phase commit) > +we record the corresponding WAL indicating that the foreign server is involved > +with the current transaction before doing PREPARE all foreign transactions. > +Thus, in case we loose connectivity to the foreign server or crash ourselves, > +we will remember that we might have prepared tranascation on the foreign > +server, and try to resolve it when connectivity is restored or after crash > +recovery. > > So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for > XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there > for WAL record to be replicated to the standby if sync replication is enabled? > Otherwise, when the failover happens, new primary (past-standby) > might not have enough XLOG_FDWXACT_INSERT WAL records and > might fail to find some in-doubt foreign transactions. But even if we wait for the record to be replicated, this problem isn't completely resolved, right? If the server crashes before the standy receives the record and the failover happens then the new master doesn't have the record. I wonder if we need to have another FDW API in order to get the list of prepared transactions from the foreign server (FDW). For example in postgres_fdw case, it gets the list of prepared transactions on the foreign server by executing a query. It seems to me that this corresponds to xa_recover in the XA specification. > (6) > XLogFlush() is called for each foreign transaction. So if there are many > foreign transactions, XLogFlush() is called too frequently. Which might > cause unnecessary performance overhead? Instead, for example, > we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions() > after inserting all WAL records for all foreign transactions? Agreed. > > > (7) > /* Open connection; report that we'll create a prepared statement. */ > fmstate->conn = GetConnection(user, true, &fmstate->conn_state); > + MarkConnectionModified(user); > > MarkConnectionModified() should be called also when TRUNCATE on > a foreign table is executed? Good catch. Will fix. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, Jun 24, 2021 at 9:46 PM k.jamison@fujitsu.com <k.jamison@fujitsu.com> wrote: > > Hi Sawada-san, > > I also tried to play a bit with the latest patches similar to Ikeda-san, > and with foreign 2PC parameter enabled/required. Thank you for testing the patch! > > > > >> b. about performance bottleneck (just share my simple benchmark > > > >> results) > > > >> > > > >> The resolver process can be performance bottleneck easily although > > > >> I think some users want this feature even if the performance is not so > > good. > > > >> > > > >> I tested with very simple workload in my laptop. > > > >> > > > >> The test condition is > > > >> * two remote foreign partitions and one transaction inserts an > > > >> entry in each partitions. > > > >> * local connection only. If NW latency became higher, the > > > >> performance became worse. > > > >> * pgbench with 8 clients. > > > >> > > > >> The test results is the following. The performance of 2PC is only > > > >> 10% performance of the one of without 2PC. > > > >> > > > >> * with foreign_twophase_commit = requried > > > >> -> If load with more than 10TPS, the number of unresolved foreign > > > >> -> transactions > > > >> is increasing and stop with the warning "Increase > > > >> max_prepared_foreign_transactions". > > > > > > > > What was the value of max_prepared_foreign_transactions? > > > > > > Now, I tested with 200. > > > > > > If each resolution is finished very soon, I thought it's enough > > > because 8clients x 2partitions = 16, though... But, it's difficult how > > > to know the stable values. > > > > During resolving one distributed transaction, the resolver needs both one > > round trip and fsync-ing WAL record for each foreign transaction. > > Since the client doesn’t wait for the distributed transaction to be resolved, > > the resolver process can be easily bottle-neck given there are 8 clients. > > > > If foreign transaction resolution was resolved synchronously, 16 would > > suffice. > > > I tested the V36 patches on my 16-core machines. > I setup two foreign servers (F1, F2) . > F1 has addressbook table. > F2 has pgbench tables (scale factor = 1). > There is also 1 coordinator (coor) server where I created user mapping to access the foreign servers. > I executed the benchmark measurement on coordinator. > My custom scripts are setup in a way that queries from coordinator > would have to access the two foreign servers. > > Coordinator: > max_prepared_foreign_transactions = 200 > max_foreign_transaction_resolvers = 1 > foreign_twophase_commit = required > > Other external servers 1 & 2 (F1 & F2): > max_prepared_transactions = 100 > > > [select.sql] > \set int random(1, 100000) > BEGIN; > SELECT ad.name, ad.age, ac.abalance > FROM addressbook ad, pgbench_accounts ac > WHERE ad.id = :int AND ad.id = ac.aid; > COMMIT; > > I then executed: > pgbench -r -c 2 -j 2 -T 60 -f select.sql coor > > While there were no problems with 1-2 clients, I started having problems > when running the benchmark with more than 3 clients. > > pgbench -r -c 4 -j 4 -T 60 -f select.sql coor > > I got the following error on coordinator: > > [95396] ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422 > [95396] STATEMENT: COMMIT; > WARNING: there is no transaction in progress > pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 withID fx_151455979_1216200_16422 > > Here's the log on foreign server 2 <F2> matching the above error: > <F2> LOG: statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422' > <F2> ERROR: maximum number of prepared transactions reached > <F2> HINT: Increase max_prepared_transactions (currently 100). > <F2> STATEMENT: PREPARE TRANSACTION 'fx_151455979_1216200_16422' > > So I increased the max_prepared_transactions of <F1> and <F2> from 100 to 200. > Then I got the error: > > [146926] ERROR: maximum number of foreign transactions reached > [146926] HINT: Increase max_prepared_foreign_transactions: "200". > > So I increased the max_prepared_foreign_transactions to "300", > and got the same error of need to increase the max_prepared_transactions of foreign servers. > > I just can't find the right tuning values for this. > It seems that we always run out of memory in FdwXactState insert_fdwxact > with multiple concurrent connections during PREPARE TRANSACTION. > This one I only encountered for SELECT benchmark. > Although I've got no problems with multiple connections for my custom scripts for > UPDATE and INSERT benchmarks when I tested up to 30 clients. > > Would the following possibly solve this bottleneck problem? With the following idea, the performance will get better but will not be completely solved. Because those results shared by you and Ikeda-san come from the fact that with the patch we asynchronously commit the foreign prepared transaction (i.g., asynchronously performing the second phase of 2PC), but not the architecture. As I mentioned before, I intentionally removed the synchronous committing foreign prepared transaction part from the patch set since we still need to have a discussion of that part. Therefore, with this version patch, the backend returns OK to the client right after the local transaction commits with neither committing foreign prepared transactions by itself nor waiting for those to be committed by the resolver process. As long as the backend doesn’t wait for foreign prepared transactions to be committed and there is a limit of the number of foreign prepared transactions to be held, it could reach the upper bound if committing foreign prepared transactions cannot keep up. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, Jun 24, 2021 at 10:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > > > > > (5) > > +2. Pre-Commit phase (1st phase of two-phase commit) > > +we record the corresponding WAL indicating that the foreign server is involved > > +with the current transaction before doing PREPARE all foreign transactions. > > +Thus, in case we loose connectivity to the foreign server or crash ourselves, > > +we will remember that we might have prepared tranascation on the foreign > > +server, and try to resolve it when connectivity is restored or after crash > > +recovery. > > > > So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for > > XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there > > for WAL record to be replicated to the standby if sync replication is enabled? > > Otherwise, when the failover happens, new primary (past-standby) > > might not have enough XLOG_FDWXACT_INSERT WAL records and > > might fail to find some in-doubt foreign transactions. > > But even if we wait for the record to be replicated, this problem > isn't completely resolved, right? Ah, I misunderstood the order of writing WAL records and preparing foreign transactions. You're right. Combining your suggestion below, perhaps we need to write all WAL records, call XLogFlush(), wait for those records to be replicated, and prepare all foreign transactions. Even in cases where the server crashes during preparing a foreign transaction and the failover happens, the new master has all foreign transaction entries. Some of them might not actually be prepared on the foreign servers but it should not be a problem. > > (6) > > XLogFlush() is called for each foreign transaction. So if there are many > > foreign transactions, XLogFlush() is called too frequently. Which might > > cause unnecessary performance overhead? Instead, for example, > > we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions() > > after inserting all WAL records for all foreign transactions? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 2021/06/24 22:27, Masahiko Sawada wrote: > On Thu, Jun 24, 2021 at 9:46 PM k.jamison@fujitsu.com > <k.jamison@fujitsu.com> wrote: >> >> Hi Sawada-san, >> >> I also tried to play a bit with the latest patches similar to Ikeda-san, >> and with foreign 2PC parameter enabled/required. > > Thank you for testing the patch! > >> >>>>>> b. about performance bottleneck (just share my simple benchmark >>>>>> results) >>>>>> >>>>>> The resolver process can be performance bottleneck easily although >>>>>> I think some users want this feature even if the performance is not so >>> good. >>>>>> >>>>>> I tested with very simple workload in my laptop. >>>>>> >>>>>> The test condition is >>>>>> * two remote foreign partitions and one transaction inserts an >>>>>> entry in each partitions. >>>>>> * local connection only. If NW latency became higher, the >>>>>> performance became worse. >>>>>> * pgbench with 8 clients. >>>>>> >>>>>> The test results is the following. The performance of 2PC is only >>>>>> 10% performance of the one of without 2PC. >>>>>> >>>>>> * with foreign_twophase_commit = requried >>>>>> -> If load with more than 10TPS, the number of unresolved foreign >>>>>> -> transactions >>>>>> is increasing and stop with the warning "Increase >>>>>> max_prepared_foreign_transactions". >>>>> >>>>> What was the value of max_prepared_foreign_transactions? >>>> >>>> Now, I tested with 200. >>>> >>>> If each resolution is finished very soon, I thought it's enough >>>> because 8clients x 2partitions = 16, though... But, it's difficult how >>>> to know the stable values. >>> >>> During resolving one distributed transaction, the resolver needs both one >>> round trip and fsync-ing WAL record for each foreign transaction. >>> Since the client doesn’t wait for the distributed transaction to be resolved, >>> the resolver process can be easily bottle-neck given there are 8 clients. >>> >>> If foreign transaction resolution was resolved synchronously, 16 would >>> suffice. >> >> >> I tested the V36 patches on my 16-core machines. >> I setup two foreign servers (F1, F2) . >> F1 has addressbook table. >> F2 has pgbench tables (scale factor = 1). >> There is also 1 coordinator (coor) server where I created user mapping to access the foreign servers. >> I executed the benchmark measurement on coordinator. >> My custom scripts are setup in a way that queries from coordinator >> would have to access the two foreign servers. >> >> Coordinator: >> max_prepared_foreign_transactions = 200 >> max_foreign_transaction_resolvers = 1 >> foreign_twophase_commit = required >> >> Other external servers 1 & 2 (F1 & F2): >> max_prepared_transactions = 100 >> >> >> [select.sql] >> \set int random(1, 100000) >> BEGIN; >> SELECT ad.name, ad.age, ac.abalance >> FROM addressbook ad, pgbench_accounts ac >> WHERE ad.id = :int AND ad.id = ac.aid; >> COMMIT; >> >> I then executed: >> pgbench -r -c 2 -j 2 -T 60 -f select.sql coor >> >> While there were no problems with 1-2 clients, I started having problems >> when running the benchmark with more than 3 clients. >> >> pgbench -r -c 4 -j 4 -T 60 -f select.sql coor >> >> I got the following error on coordinator: >> >> [95396] ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422 >> [95396] STATEMENT: COMMIT; >> WARNING: there is no transaction in progress >> pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 withID fx_151455979_1216200_16422 >> >> Here's the log on foreign server 2 <F2> matching the above error: >> <F2> LOG: statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422' >> <F2> ERROR: maximum number of prepared transactions reached >> <F2> HINT: Increase max_prepared_transactions (currently 100). >> <F2> STATEMENT: PREPARE TRANSACTION 'fx_151455979_1216200_16422' >> >> So I increased the max_prepared_transactions of <F1> and <F2> from 100 to 200. >> Then I got the error: >> >> [146926] ERROR: maximum number of foreign transactions reached >> [146926] HINT: Increase max_prepared_foreign_transactions: "200". >> >> So I increased the max_prepared_foreign_transactions to "300", >> and got the same error of need to increase the max_prepared_transactions of foreign servers. >> >> I just can't find the right tuning values for this. >> It seems that we always run out of memory in FdwXactState insert_fdwxact >> with multiple concurrent connections during PREPARE TRANSACTION. >> This one I only encountered for SELECT benchmark. >> Although I've got no problems with multiple connections for my custom scripts for >> UPDATE and INSERT benchmarks when I tested up to 30 clients. >> >> Would the following possibly solve this bottleneck problem? > > With the following idea, the performance will get better but will not > be completely solved. Because those results shared by you and > Ikeda-san come from the fact that with the patch we asynchronously > commit the foreign prepared transaction (i.g., asynchronously > performing the second phase of 2PC), but not the architecture. As I > mentioned before, I intentionally removed the synchronous committing > foreign prepared transaction part from the patch set since we still > need to have a discussion of that part. Therefore, with this version > patch, the backend returns OK to the client right after the local > transaction commits with neither committing foreign prepared > transactions by itself nor waiting for those to be committed by the > resolver process. As long as the backend doesn’t wait for foreign > prepared transactions to be committed and there is a limit of the > number of foreign prepared transactions to be held, it could reach the > upper bound if committing foreign prepared transactions cannot keep > up. Hi Jamison-san, sawada-san, Thanks for testing! FWIF, I tested using pgbench with "--rate=" option to know the server can execute transactions with stable throughput. As sawada-san said, the latest patch resolved second phase of 2PC asynchronously. So, it's difficult to control the stable throughput without "--rate=" option. I also worried what I should do when the error happened because to increase "max_prepared_foreign_transaction" doesn't work. Since too overloading may show the error, is it better to add the case to the HINT message? BTW, if sawada-san already develop to run the resolver processes in parallel, why don't you measure performance improvement? Although Robert-san, Tunakawa-san and so on are discussing what architecture is best, one discussion point is that there is a performance risk if adopting asynchronous approach. If we have promising solutions, I think we can make the discussion forward. In my understanding, there are three improvement idea. First is that to make the resolver processes run in parallel. Second is that to send "COMMIT/ABORT PREPARED" remote servers in bulk. Third is to stop syncing the WAL remove_fdwxact() after resolving is done, which I addressed in the mail sent at June 3rd, 13:56. Since third idea is not yet discussed, there may be my misunderstanding. -- Masahiro Ikeda NTT DATA CORPORATION
On 2021/06/24 22:11, Masahiko Sawada wrote: > On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> On 2021/05/11 13:37, Masahiko Sawada wrote: >> So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for >> XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there >> for WAL record to be replicated to the standby if sync replication is enabled? >> Otherwise, when the failover happens, new primary (past-standby) >> might not have enough XLOG_FDWXACT_INSERT WAL records and >> might fail to find some in-doubt foreign transactions. > > But even if we wait for the record to be replicated, this problem > isn't completely resolved, right? If the server crashes before the > standy receives the record and the failover happens then the new > master doesn't have the record. I wonder if we need to have another > FDW API in order to get the list of prepared transactions from the > foreign server (FDW). For example in postgres_fdw case, it gets the > list of prepared transactions on the foreign server by executing a > query. It seems to me that this corresponds to xa_recover in the XA > specification. FWIF, Citus implemented as sawada-san said above [1]. Since each WAL record for PREPARE is flushing in the latest patch, the latency became too much, especially under synchronous replication. For example, the transaction involving three foreign servers must wait to sync "three" WAL records for PREPARE and "one" WAL records for local commit in remote server one by one sequentially. So, I think that Sawada-san's idea is good to improve the latency although fdw developer's work increases. [1] SIGMOD 2021 525 Citus: Distributed PostgreSQL for Data Intensive Applications From 12:27 says that how to solve unresolved prepared xacts. https://www.youtube.com/watch?v=AlF4C60FdlQ&list=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > Hi Jamison-san, sawada-san, > > Thanks for testing! > > FWIF, I tested using pgbench with "--rate=" option to know the server > can execute transactions with stable throughput. As sawada-san said, > the latest patch resolved second phase of 2PC asynchronously. So, > it's difficult to control the stable throughput without "--rate=" option. > > I also worried what I should do when the error happened because to increase > "max_prepared_foreign_transaction" doesn't work. Since too overloading may > show the error, is it better to add the case to the HINT message? > > BTW, if sawada-san already develop to run the resolver processes in parallel, > why don't you measure performance improvement? Although Robert-san, > Tunakawa-san and so on are discussing what architecture is best, one > discussion point is that there is a performance risk if adopting asynchronous > approach. If we have promising solutions, I think we can make the discussion > forward. Yeah, if we can asynchronously resolve the distributed transactions without worrying about max_prepared_foreign_transaction error, it would be good. But we will need synchronous resolution at some point. I think we at least need to discuss it at this point. I've attached the new version patch that incorporates the comments from Fujii-san and Ikeda-san I got so far. We launch a resolver process per foreign server, committing prepared foreign transactions on foreign servers in parallel. To get a better performance based on the current architecture, we can have multiple resolver processes per foreign server but it seems not easy to tune it in practice. Perhaps is it better if we simply have a pool of resolver processes and we assign a resolver process to the resolution of one distributed transaction one by one? That way, we need to launch resolver processes as many as the concurrent backends using 2PC. > In my understanding, there are three improvement idea. First is that to make > the resolver processes run in parallel. Second is that to send "COMMIT/ABORT > PREPARED" remote servers in bulk. Third is to stop syncing the WAL > remove_fdwxact() after resolving is done, which I addressed in the mail sent > at June 3rd, 13:56. Since third idea is not yet discussed, there may > be my misunderstanding. Yes, those optimizations are promising. On the other hand, they could introduce complexity to the code and APIs. I'd like to keep the first version simple. I think we need to discuss them at this stage but can leave the implementation of both parallel execution and batch execution as future improvements. For the third idea, I think the implementation was wrong; it removes the state file then flushes the WAL record. I think these should be performed in the reverse order. Otherwise, FdwXactState entry could be left on the standby if the server crashes between them. I might be missing something though. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
- v37-0008-Documentation-update.patch
- v37-0006-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v37-0007-Add-GetPrepareId-API.patch
- v37-0009-Add-regression-tests-for-foreign-twophase-commit.patch
- v37-0005-Prepare-foreign-transactions-at-commit-time.patch
- v37-0004-postgres_fdw-supports-prepare-API.patch
- v37-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v37-0001-Introduce-transaction-manager-for-foreign-transa.patch
- v37-0003-Support-two-phase-commit-for-foreign-transaction.patch
On 2021/06/30 10:05, Masahiko Sawada wrote: > On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >> >> Hi Jamison-san, sawada-san, >> >> Thanks for testing! >> >> FWIF, I tested using pgbench with "--rate=" option to know the server >> can execute transactions with stable throughput. As sawada-san said, >> the latest patch resolved second phase of 2PC asynchronously. So, >> it's difficult to control the stable throughput without "--rate=" option. >> >> I also worried what I should do when the error happened because to increase >> "max_prepared_foreign_transaction" doesn't work. Since too overloading may >> show the error, is it better to add the case to the HINT message? >> >> BTW, if sawada-san already develop to run the resolver processes in parallel, >> why don't you measure performance improvement? Although Robert-san, >> Tunakawa-san and so on are discussing what architecture is best, one >> discussion point is that there is a performance risk if adopting asynchronous >> approach. If we have promising solutions, I think we can make the discussion >> forward. > > Yeah, if we can asynchronously resolve the distributed transactions > without worrying about max_prepared_foreign_transaction error, it > would be good. But we will need synchronous resolution at some point. > I think we at least need to discuss it at this point. > > I've attached the new version patch that incorporates the comments > from Fujii-san and Ikeda-san I got so far. We launch a resolver > process per foreign server, committing prepared foreign transactions > on foreign servers in parallel. To get a better performance based on > the current architecture, we can have multiple resolver processes per > foreign server but it seems not easy to tune it in practice. Perhaps > is it better if we simply have a pool of resolver processes and we > assign a resolver process to the resolution of one distributed > transaction one by one? That way, we need to launch resolver processes > as many as the concurrent backends using 2PC. Thanks for updating the patches. I have tested in my local laptop and summary is the following. (1) The latest patch(v37) can improve throughput by 1.5 times compared to v36. Although I expected it improves by 2.0 times because the workload is that one transaction access two remote servers... I think the reason is that the disk is bottleneck and I couldn't prepare disks for each postgresql servers. If I could, I think the performance can be improved by 2.0 times. (2) The latest patch(v37) throughput of foreign_twophase_commit = required is about 36% compared to the case if foreign_twophase_commit = disabled. Although the throughput is improved, the absolute performance is not good. It may be the fate of 2PC. I think the reason is that the number of WAL writes is much increase and, the disk writes in my laptop is the bottleneck. I want to know the result testing in richer environments if someone can do so. (3) The latest patch(v37) has no overhead if foreign_twophase_commit = disabled. On the contrary, the performance improved by 3%. It may be within the margin of error. The test detail is following. # condition * 1 coordinator and 3 foreign servers * 4 instance shared one ssd disk. * one transaction queries different two foreign servers. ``` fxact_update.pgbench \set id random(1, 1000000) \set partnum 3 \set p1 random(1, :partnum) \set p2 ((:p1 + 1) % :partnum) + 1 BEGIN; UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; COMMIT; ``` * pgbench generates load. I increased ${RATE} little by little until "maximum number of foreign transactions reached" error happens. ``` pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180 ``` * parameters max_prepared_transactions = 100 max_prepared_foreign_transactions = 200 max_foreign_transaction_resolvers = 4 # test source code patterns 1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required). 2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required). 3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled). 4. 2595e039 without 2pc patches(v37). # results 1. tps = 241.8000TPS latency average = 10.413ms 2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.) latency average = 15.427ms 3. tps = 987.372220 ( by 1.03% compared to 4. ) latency average = 8.102ms 4. tps = 955.984574 latency average = 8.368ms The disk is the bottleneck in my environment because disk util is almost 100% in every pattern. If disks for each instance can be prepared, I think we can expect more performance improvements. >> In my understanding, there are three improvement idea. First is that to make >> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT >> PREPARED" remote servers in bulk. Third is to stop syncing the WAL >> remove_fdwxact() after resolving is done, which I addressed in the mail sent >> at June 3rd, 13:56. Since third idea is not yet discussed, there may >> be my misunderstanding. > > Yes, those optimizations are promising. On the other hand, they could > introduce complexity to the code and APIs. I'd like to keep the first > version simple. I think we need to discuss them at this stage but can > leave the implementation of both parallel execution and batch > execution as future improvements. OK, I agree. > For the third idea, I think the implementation was wrong; it removes > the state file then flushes the WAL record. I think these should be > performed in the reverse order. Otherwise, FdwXactState entry could be > left on the standby if the server crashes between them. I might be > missing something though. Oh, I see. I think you're right though what you wanted to say is that it flushes the WAL records then removes the state file. If "COMMIT/ABORT PREPARED" statements execute in bulk, it seems enough to sync the wal only once, then remove all related state files. BTW, I tested the binary building with -O2, and I got the following warnings. It's needed to be fixed. ``` fdwxact.c: In function 'PrepareAllFdwXacts': fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this function [-Wmaybe-uninitialized] 897 | canceled = SyncRepWaitForLSN(flush_lsn, false); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Regards, -- Masahiro Ikeda NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
Hi, I'm interested in this patch and I also run the same test with Ikeda-san's fxact_update.pgbench. In my environment (poor spec VM), the result is following. * foreign_twophase_commit = disabled 363tps * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said) 13tps I analyzed the bottleneck using pstack and strace. I noticed that the open() during "COMMIT PREPARED" command is very slow. In my environment the latency of the "COMMIT PREPARED" is 16ms. (On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms) In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms. Therefore, open() is the bottleneck of "COMMIT PREPARED". Furthermore, I noticed that the backend process almost always open the same wal segment file. In the current patch, the backend process on foreign server which is associated with the connection from the resolver processalways run "COMMIT PREPARED" command. Therefore, the wal segment file of the current "COMMIT PREPARED" command probably be the same with the previous "COMMIT PREPARED"command. In order to improve the performance of the resolver process, I think it is useful to skip closing wal segment file duringthe "COMMIT PREPARED" and reuse file descriptor. Is it possible? Regards, Ryohei Takahashi
RE: Transactions involving multiple postgres foreign servers, take 2
On Wed, June 30, 2021 10:06 (GMT+9), Masahiko Sawada wrote: > I've attached the new version patch that incorporates the comments from > Fujii-san and Ikeda-san I got so far. We launch a resolver process per foreign > server, committing prepared foreign transactions on foreign servers in parallel. Hi Sawada-san, Thank you for the latest set of patches. I've noticed from cfbot that the regression test failed, and I also could not compile it. ============== running regression test queries ============== test test_fdwxact ... FAILED 21 ms ============== shutting down postmaster ============== ====================== 1 of 1 tests failed. ====================== > To get a better performance based on the current architecture, we can have > multiple resolver processes per foreign server but it seems not easy to tune it > in practice. Perhaps is it better if we simply have a pool of resolver processes > and we assign a resolver process to the resolution of one distributed > transaction one by one? That way, we need to launch resolver processes as > many as the concurrent backends using 2PC. Yes, finding the right value to tune of of max_foreign_prepared_transactions and max_prepared_transactions seem difficult. If we set the number of resolver process to number of concurrent backends using 2PC, how do we determine the value of max_foreign_transaction_resolvers? It might be good to set some statistics to judge the value, then we can compare the performance from the V37 version. - Also, this is a bit of side topic, and I know we've been discussing how to improve/fix the resolver process bottlenecks, and Takahashi-san provided the details above thread where V37 has problems. (I am joining the testing too.) I am not sure if this has been brought up before because of the years of thread. But I think that there is a need to consider the need to prevent for the resolver process from an infinite wait loop of resolving a prepared foreign transaction. Currently, when a crashed foreign server is recovered during resolution retries, the information is recovered from WAL and files, and the resolver process resumes the foreign transaction resolution. However, what if we cannot (or intentionally do not want to) recover the crashed server after a long time? An idea is to make the resolver process to automatically stop after some maximum number of retries. We can call the parameter as foreign_transaction_resolution_max_retry_count. There may be a better name, but I followed the pattern from your patch. The server downtime can be estimated considering the proposed parameter foreign_transaction_resolution_retry_interval (default 10s) from the patch set. In addition, according to docs, "a foreign server using the postgres_fdw foreign data wrapper can have the same options that libpq accepts in connection strings", so the connect_timeout set during CREATE SERVER can also affect it. Example: CREATE SERVER's connect_timeout setting = 5s foreign_transaction_resolution_retry_interval = 10s foreign_transaction_resolution_max_retry_count = 3 Estimated total time before resolver stops: = (5s) * (3 + 1) + (10s) * (3) = 50 s 00s: 1st connect start 05s: 1st connect timeout (retry interval) 15s: 2nd connect start (1st retry) 20s: 2nd connect timeout (retry interval) 30s: 3rd connect start (2nd retry) 35s: 3rd connect timeout (retry interval) 45s: 4th connect start (3rd retry) 50s: 4th connect timeout (resolver process stops) Then the resolver process will not wait indefinitely and will stop after some time depending on the setting of the above parameters. This could be the automatic implementation of pg_stop_foreign_xact_resolver. Assuming that resolver is stopped, then the crashed server is decided to be restored, the user can then execute pg_resolve_foreign_xact(). Do you think the idea is feasible and we can add it as part of the patch sets? Regards, Kirk Jamison
On 2021/06/30 10:05, Masahiko Sawada wrote: > I've attached the new version patch that incorporates the comments > from Fujii-san and Ikeda-san I got so far. Thanks for updating the patches! I'm now reading 0001 and 0002 patches and wondering if we can commit them at first because they just provide independent basic mechanism for foreign transaction management. One question regarding them is; Why did we add new API only for "top" foreign transaction? Even with those patches, old API (CallSubXactCallbacks) is still being used for foreign subtransaction and xact_depth is still being managed in postgres_fdw layer (not PostgreSQL core). Is this intentional? Sorry if this was already discussed before. As far as I read the code, keep using old API for foreign subtransaction doesn't cause any actual bug. But it's just strange and half-baked to manage top and sub transaction in the differenet layer and to use old and new API for them. OTOH, I'm afraid that adding new (not-essential) API for foreign subtransaction might increase the code complexity unnecessarily. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Sorry for the late reply. On Mon, Jul 5, 2021 at 3:29 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > On 2021/06/30 10:05, Masahiko Sawada wrote: > > On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > >> > >> Hi Jamison-san, sawada-san, > >> > >> Thanks for testing! > >> > >> FWIF, I tested using pgbench with "--rate=" option to know the server > >> can execute transactions with stable throughput. As sawada-san said, > >> the latest patch resolved second phase of 2PC asynchronously. So, > >> it's difficult to control the stable throughput without "--rate=" option. > >> > >> I also worried what I should do when the error happened because to increase > >> "max_prepared_foreign_transaction" doesn't work. Since too overloading may > >> show the error, is it better to add the case to the HINT message? > >> > >> BTW, if sawada-san already develop to run the resolver processes in parallel, > >> why don't you measure performance improvement? Although Robert-san, > >> Tunakawa-san and so on are discussing what architecture is best, one > >> discussion point is that there is a performance risk if adopting asynchronous > >> approach. If we have promising solutions, I think we can make the discussion > >> forward. > > > > Yeah, if we can asynchronously resolve the distributed transactions > > without worrying about max_prepared_foreign_transaction error, it > > would be good. But we will need synchronous resolution at some point. > > I think we at least need to discuss it at this point. > > > > I've attached the new version patch that incorporates the comments > > from Fujii-san and Ikeda-san I got so far. We launch a resolver > > process per foreign server, committing prepared foreign transactions > > on foreign servers in parallel. To get a better performance based on > > the current architecture, we can have multiple resolver processes per > > foreign server but it seems not easy to tune it in practice. Perhaps > > is it better if we simply have a pool of resolver processes and we > > assign a resolver process to the resolution of one distributed > > transaction one by one? That way, we need to launch resolver processes > > as many as the concurrent backends using 2PC. > > Thanks for updating the patches. > > I have tested in my local laptop and summary is the following. Thank you for testing! > > (1) The latest patch(v37) can improve throughput by 1.5 times compared to v36. > > Although I expected it improves by 2.0 times because the workload is that one > transaction access two remote servers... I think the reason is that the disk > is bottleneck and I couldn't prepare disks for each postgresql servers. If I > could, I think the performance can be improved by 2.0 times. > > > (2) The latest patch(v37) throughput of foreign_twophase_commit = required is > about 36% compared to the case if foreign_twophase_commit = disabled. > > Although the throughput is improved, the absolute performance is not good. It > may be the fate of 2PC. I think the reason is that the number of WAL writes is > much increase and, the disk writes in my laptop is the bottleneck. I want to > know the result testing in richer environments if someone can do so. > > > (3) The latest patch(v37) has no overhead if foreign_twophase_commit = > disabled. On the contrary, the performance improved by 3%. It may be within > the margin of error. > > > > The test detail is following. > > # condition > > * 1 coordinator and 3 foreign servers > > * 4 instance shared one ssd disk. > > * one transaction queries different two foreign servers. > > ``` fxact_update.pgbench > \set id random(1, 1000000) > > \set partnum 3 > \set p1 random(1, :partnum) > \set p2 ((:p1 + 1) % :partnum) + 1 > > BEGIN; > UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; > UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; > COMMIT; > ``` > > * pgbench generates load. I increased ${RATE} little by little until "maximum > number of foreign transactions reached" error happens. > > ``` > pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180 > ``` > > * parameters > max_prepared_transactions = 100 > max_prepared_foreign_transactions = 200 > max_foreign_transaction_resolvers = 4 > > > # test source code patterns > > 1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required). > 2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required). > 3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled). > 4. 2595e039 without 2pc patches(v37). > > > # results > > 1. tps = 241.8000TPS > latency average = 10.413ms > > 2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.) > latency average = 15.427ms > > 3. tps = 987.372220 ( by 1.03% compared to 4. ) > latency average = 8.102ms > > 4. tps = 955.984574 > latency average = 8.368ms > > The disk is the bottleneck in my environment because disk util is almost 100% > in every pattern. If disks for each instance can be prepared, I think we can > expect more performance improvements. It seems still not good performance. I'll also test using your script. > > > >> In my understanding, there are three improvement idea. First is that to make > >> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT > >> PREPARED" remote servers in bulk. Third is to stop syncing the WAL > >> remove_fdwxact() after resolving is done, which I addressed in the mail sent > >> at June 3rd, 13:56. Since third idea is not yet discussed, there may > >> be my misunderstanding. > > > > Yes, those optimizations are promising. On the other hand, they could > > introduce complexity to the code and APIs. I'd like to keep the first > > version simple. I think we need to discuss them at this stage but can > > leave the implementation of both parallel execution and batch > > execution as future improvements. > > OK, I agree. > > > > For the third idea, I think the implementation was wrong; it removes > > the state file then flushes the WAL record. I think these should be > > performed in the reverse order. Otherwise, FdwXactState entry could be > > left on the standby if the server crashes between them. I might be > > missing something though. > > Oh, I see. I think you're right though what you wanted to say is that it > flushes the WAL records then removes the state file. If "COMMIT/ABORT > PREPARED" statements execute in bulk, it seems enough to sync the wal only > once, then remove all related state files. > > > BTW, I tested the binary building with -O2, and I got the following warnings. > It's needed to be fixed. > > ``` > fdwxact.c: In function 'PrepareAllFdwXacts': > fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this > function [-Wmaybe-uninitialized] > 897 | canceled = SyncRepWaitForLSN(flush_lsn, false); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ``` Thank you for the report. I'll fix it in the next version patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Sorry for the late reply. On Tue, Jul 6, 2021 at 3:15 PM r.takahashi_2@fujitsu.com <r.takahashi_2@fujitsu.com> wrote: > > Hi, > > > I'm interested in this patch and I also run the same test with Ikeda-san's fxact_update.pgbench. Thank you for testing! > In my environment (poor spec VM), the result is following. > > * foreign_twophase_commit = disabled > 363tps > > * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said) > 13tps > > > I analyzed the bottleneck using pstack and strace. > I noticed that the open() during "COMMIT PREPARED" command is very slow. > > In my environment the latency of the "COMMIT PREPARED" is 16ms. > (On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms) > In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms. > Therefore, open() is the bottleneck of "COMMIT PREPARED". > Furthermore, I noticed that the backend process almost always open the same wal segment file. > > In the current patch, the backend process on foreign server which is associated with the connection from the resolver processalways run "COMMIT PREPARED" command. > Therefore, the wal segment file of the current "COMMIT PREPARED" command probably be the same with the previous "COMMITPREPARED" command. > > In order to improve the performance of the resolver process, I think it is useful to skip closing wal segment file duringthe "COMMIT PREPARED" and reuse file descriptor. > Is it possible? Not sure but it might be possible to keep holding an xlogreader for reading PREPARE WAL records even after the transaction commit. But I wonder how much open() for wal segment file accounts for the total execution time of 2PC. 2PC requires 2 network round trips for each participant. For example, if it took 500ms in total, we would not get benefits much from the point of view of 2PC performance even if we improved it from 14ms to 1ms. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Fri, Jul 9, 2021 at 3:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2021/06/30 10:05, Masahiko Sawada wrote: > > I've attached the new version patch that incorporates the comments > > from Fujii-san and Ikeda-san I got so far. > > Thanks for updating the patches! > > I'm now reading 0001 and 0002 patches and wondering if we can commit them > at first because they just provide independent basic mechanism for > foreign transaction management. > > One question regarding them is; Why did we add new API only for "top" foreign > transaction? Even with those patches, old API (CallSubXactCallbacks) is still > being used for foreign subtransaction and xact_depth is still being managed > in postgres_fdw layer (not PostgreSQL core). Is this intentional? Yes, it's not needed for 2PC support and I was also concerned to add complexity to the core by adding new API for subscriptions that are not necessarily necessary for 2PC. > As far as I read the code, keep using old API for foreign subtransaction doesn't > cause any actual bug. But it's just strange and half-baked to manage top and > sub transaction in the differenet layer and to use old and new API for them. That's a valid concern. I'm really not sure what we should do here but I guess that even if we want to support subscriptions we have another API dedicated for subtransaction commit and rollback. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Sawada-san, Thank you for your reply. > Not sure but it might be possible to keep holding an xlogreader for > reading PREPARE WAL records even after the transaction commit. But I > wonder how much open() for wal segment file accounts for the total > execution time of 2PC. 2PC requires 2 network round trips for each > participant. For example, if it took 500ms in total, we would not get > benefits much from the point of view of 2PC performance even if we > improved it from 14ms to 1ms. I made the patch based on your advice and re-run the test on the new machine. (The attached patch is just for test purpose.) * foreign_twophase_commit = disabled 2686tps * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said) 311tps * foreign_twophase_commit = required with attached patch (It is not necessary to set -R ${RATE}) 2057tps This indicate that if we can reduce the number of times to open() wal segment file during "COMMIT PREPARED", the performancecan be improved. This patch can skip closing wal segment file, but I don't know when we should close. One idea is to close when the wal segment file is recycled, but it seems difficult for backend process to do so. BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is proposed. I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', ... 'prep_n'". If we can open wal segment file during bulk COMMIT PREPARED, we can not only reduce the times of communication, but alsoreduce the times of open() wal segment file. Regards, Ryohei Takahashi
Attachment
Hi Sawada-san,Wouldn't it be better to explicitly initialize the pointer with NULL?
Thank you for your reply.
> Not sure but it might be possible to keep holding an xlogreader for
> reading PREPARE WAL records even after the transaction commit. But I
> wonder how much open() for wal segment file accounts for the total
> execution time of 2PC. 2PC requires 2 network round trips for each
> participant. For example, if it took 500ms in total, we would not get
> benefits much from the point of view of 2PC performance even if we
> improved it from 14ms to 1ms.
I made the patch based on your advice and re-run the test on the new machine.
(The attached patch is just for test purpose.)
I think it's common in Postgres.
* foreign_twophase_commit = disabled
2686tps
* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said)
311tps
* foreign_twophase_commit = required with attached patch (It is not necessary to set -R ${RATE})
2057tps
RE: Transactions involving multiple postgres foreign servers, take 2
Hi, > Wouldn't it be better to explicitly initialize the pointer with NULL? Thank you for your advice. You are correct. Anyway, I fixed it and re-run the performance test, it of course does not affect tps. Regards, Ryohei Takahashi
On Tue, Jul 13, 2021 at 1:14 PM r.takahashi_2@fujitsu.com <r.takahashi_2@fujitsu.com> wrote: > > Hi Sawada-san, > > > Thank you for your reply. > > > Not sure but it might be possible to keep holding an xlogreader for > > reading PREPARE WAL records even after the transaction commit. But I > > wonder how much open() for wal segment file accounts for the total > > execution time of 2PC. 2PC requires 2 network round trips for each > > participant. For example, if it took 500ms in total, we would not get > > benefits much from the point of view of 2PC performance even if we > > improved it from 14ms to 1ms. > > I made the patch based on your advice and re-run the test on the new machine. > (The attached patch is just for test purpose.) Thank you for testing! > > > * foreign_twophase_commit = disabled > 2686tps > > * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said) > 311tps > > * foreign_twophase_commit = required with attached patch (It is not necessary to set -R ${RATE}) > 2057tps Nice improvement! BTW did you test on the local? That is, the foreign servers are located on the same machine? > > > This indicate that if we can reduce the number of times to open() wal segment file during "COMMIT PREPARED", the performancecan be improved. > > This patch can skip closing wal segment file, but I don't know when we should close. > One idea is to close when the wal segment file is recycled, but it seems difficult for backend process to do so. I guess it would be better to start a new thread for this improvement. This idea helps not only 2PC case but also improves the COMMIT/ROLLBACK PREPARED performance itself. Rather than thinking it tied with this patch, I think it's good if we can discuss this patch separately and it gets committed alone. > BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is proposed. > I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', ... 'prep_n'". > If we can open wal segment file during bulk COMMIT PREPARED, we can not only reduce the times of communication, but alsoreduce the times of open() wal segment file. What if we successfully committed 'prep_1' but an error happened during committing another one for some reason (i.g., corrupted 2PC state file, OOM etc)? We might return an error to the client but have already committed 'prep_1'. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Sawada-san, Thank you for your reply. > BTW did you test on the local? That is, the foreign servers are > located on the same machine? Yes, I tested on the local since I cannot prepare the good network now. > I guess it would be better to start a new thread for this improvement. Thank you for your advice. I started a new thread [1]. > What if we successfully committed 'prep_1' but an error happened > during committing another one for some reason (i.g., corrupted 2PC > state file, OOM etc)? We might return an error to the client but have > already committed 'prep_1'. Sorry, I don't have good idea now. I imagined the command returns the list of the transaction id which ends with error. [1] https://www.postgresql.org/message-id/OS0PR01MB56828019B25CD5190AB6093282129%40OS0PR01MB5682.jpnprd01.prod.outlook.com Regards, Ryohei Takahashi
On 2021/07/09 22:44, Masahiko Sawada wrote: > On Fri, Jul 9, 2021 at 3:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> As far as I read the code, keep using old API for foreign subtransaction doesn't >> cause any actual bug. But it's just strange and half-baked to manage top and >> sub transaction in the differenet layer and to use old and new API for them. > > That's a valid concern. I'm really not sure what we should do here but > I guess that even if we want to support subscriptions we have another > API dedicated for subtransaction commit and rollback. Ok, so if possible I will write POC patch for new API for foreign subtransactions and consider whether it's enough simple that we can commit into core or not. +#define FDWXACT_FLAG_PARALLEL_WORKER 0x02 /* is parallel worker? */ This implies that parallel workers may execute PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED to the foreign server for atomic commit? If so, what happens if the PREPARE TRANSACTION that one of parallel workers issues fails? In this case, not only that parallel worker but also other parallel workers and the leader should rollback the transaction at all. That is, they should issue ROLLBACK PREPARED to the foreign servers. This issue was already handled and addressed in the patches? This seems not actual issue if only postgres_fdw is used. Because postgres_fdw doesn't have IsForeignScanParallelSafe API. Right? But what about other FDW? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Sawada-san, I noticed that this thread and its set of patches have been marked with "Returned with Feedback" by yourself. I find the feature (atomic commit for foreign transactions) very useful and it will pave the road for having a distributed transaction management in Postgres. Although we have not arrived at consensus at which approach is best, there were significant reviews and major patch changes in the past 2 years. By any chance, do you have any plans to continue this from where you left off? Regards, Kirk Jamison
Hi, On Tue, Oct 5, 2021 at 9:56 AM k.jamison@fujitsu.com <k.jamison@fujitsu.com> wrote: > > Hi Sawada-san, > > I noticed that this thread and its set of patches have been marked with "Returned with Feedback" by yourself. > I find the feature (atomic commit for foreign transactions) very useful > and it will pave the road for having a distributed transaction management in Postgres. > Although we have not arrived at consensus at which approach is best, > there were significant reviews and major patch changes in the past 2 years. > By any chance, do you have any plans to continue this from where you left off? As I could not reply to the review comments from Fujii-san for almost three months, I don't have enough time to move this project forward at least for now. That's why I marked this patch as RWF. I’d like to continue working on this project in my spare time but I know this is not a project that can be completed by using only my spare time. If someone wants to work on this project, I’d appreciate it and am happy to help. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 2021/10/05 10:38, Masahiko Sawada wrote: > Hi, > > On Tue, Oct 5, 2021 at 9:56 AM k.jamison@fujitsu.com > <k.jamison@fujitsu.com> wrote: >> >> Hi Sawada-san, >> >> I noticed that this thread and its set of patches have been marked with "Returned with Feedback" by yourself. >> I find the feature (atomic commit for foreign transactions) very useful >> and it will pave the road for having a distributed transaction management in Postgres. >> Although we have not arrived at consensus at which approach is best, >> there were significant reviews and major patch changes in the past 2 years. >> By any chance, do you have any plans to continue this from where you left off? > > As I could not reply to the review comments from Fujii-san for almost > three months, I don't have enough time to move this project forward at > least for now. That's why I marked this patch as RWF. I’d like to > continue working on this project in my spare time but I know this is > not a project that can be completed by using only my spare time. If > someone wants to work on this project, I’d appreciate it and am happy > to help. Probably it's time to rethink the approach. The patch introduces foreign transaction manager into PostgreSQL core, but as far as I review the patch, its changes look overkill and too complicated. This seems one of reasons why we could not have yet committed the feature even after several years. Another concern about the approach of the patch is that it needs to change a backend so that it additionally waits for replication during commit phase before executing PREPARE TRANSACTION to foreign servers. Which would decrease the performance during commit phase furthermore. So I wonder if it's worth revisiting the original approach, i.e., add the atomic commit into postgres_fdw. One disadvantage of this is that it supports atomic commit only between foreign PostgreSQL servers, not other various data resources like MySQL. But I'm not sure if we really want to do atomic commit between various FDWs. Maybe supporting only postgres_fdw is enough for most users. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Fujii-san and Sawada-san, Thank you very much for your replies. > >> I noticed that this thread and its set of patches have been marked with > "Returned with Feedback" by yourself. > >> I find the feature (atomic commit for foreign transactions) very > >> useful and it will pave the road for having a distributed transaction > management in Postgres. > >> Although we have not arrived at consensus at which approach is best, > >> there were significant reviews and major patch changes in the past 2 years. > >> By any chance, do you have any plans to continue this from where you left off? > > > > As I could not reply to the review comments from Fujii-san for almost > > three months, I don't have enough time to move this project forward at > > least for now. That's why I marked this patch as RWF. I’d like to > > continue working on this project in my spare time but I know this is > > not a project that can be completed by using only my spare time. If > > someone wants to work on this project, I’d appreciate it and am happy > > to help. > > Probably it's time to rethink the approach. The patch introduces foreign > transaction manager into PostgreSQL core, but as far as I review the patch, its > changes look overkill and too complicated. > This seems one of reasons why we could not have yet committed the feature even > after several years. > > Another concern about the approach of the patch is that it needs to change a > backend so that it additionally waits for replication during commit phase before > executing PREPARE TRANSACTION to foreign servers. Which would decrease the > performance during commit phase furthermore. > > So I wonder if it's worth revisiting the original approach, i.e., add the atomic > commit into postgres_fdw. One disadvantage of this is that it supports atomic > commit only between foreign PostgreSQL servers, not other various data > resources like MySQL. > But I'm not sure if we really want to do atomic commit between various FDWs. > Maybe supporting only postgres_fdw is enough for most users. Thought? The intention of Sawada-san's patch is grand although would be very much helpful because it accommodates possible future support of atomic commit for various types of FDWs. However, it's difficult to get the agreement altogether, as other reviewers also point out the performance of commit. Another point is that how it should work when we also implement atomic visibility (which is another topic for distributed transactions but worth considering). That said, if we're going to initially support it on postgres_fdw, which is simpler than the latest patches, we need to ensure that abnormalities and errors are properly handled and prove that commit performance can be improved, e.g. if we can commit not in serial but also possible in parallel. And if possible, although not necessary during the first step, it may put at ease the other reviewers if can we also think of the image on how to implement atomic visibility on postgres_fdw. Thoughts? Regards, Kirk Jamison
Hi, On Thu, Oct 7, 2021 at 1:29 PM k.jamison@fujitsu.com <k.jamison@fujitsu.com> wrote: > That said, if we're going to initially support it on postgres_fdw, which is simpler > than the latest patches, we need to ensure that abnormalities and errors > are properly handled and prove that commit performance can be improved, > e.g. if we can commit not in serial but also possible in parallel. If it's ok with you, I'd like to work on the performance issue. What I have in mind is commit all remote transactions in parallel instead of sequentially in the postgres_fdw transaction callback, as mentioned above, but I think that would improve the performance even for one-phase commit that we already have. Maybe I'm missing something, though. Best regards, Etsuro Fujita
On 2021/10/07 19:47, Etsuro Fujita wrote: > Hi, > > On Thu, Oct 7, 2021 at 1:29 PM k.jamison@fujitsu.com > <k.jamison@fujitsu.com> wrote: >> That said, if we're going to initially support it on postgres_fdw, which is simpler >> than the latest patches, we need to ensure that abnormalities and errors >> are properly handled Yes. One idea for this is to include the information required to resolve outstanding prepared transactions, in the transaction identifier that PREPARE TRANSACTION command uses. For example, we can use the XID of local transaction and the cluster ID of local server (e.g., cluster_name that users specify uniquely can be used for that) as that information. If the cluster_name of local server is "server1" and its XID is now 9999, postgres_fdw issues "PREPARE TRANSACTION 'server1_9999'" and "COMMIT PREPARED 'server1_9999'" to the foreign servers, to end those foreign transactions in two-phase way. If some troubles happen, the prepared transaction with "server1_9999" may remain unexpectedly in one foreign server. In this case we can determine whether to commit or rollback that outstanding transaction by checking whether the past transaction with XID 9999 was committed or rollbacked in the server "server1". If it's committed, the prepared transaction also should be committed, so we should execute "COMMIT PREPARED 'server1_9999'". If it's rollbacked, the prepared transaction also should be rollbacked. If it's in progress, we should do nothing for that transaction. pg_xact_status() can be used to check whether the transaction with the specified XID was committed or rollbacked. But pg_xact_status() can return invalid result if CLOG data for the specified XID has been truncated by VACUUM FREEZE. To handle this case, we might need the special table tracking the transaction status. DBA can use the above procedure and manually resolve the outstanding prepared transactions in foreign servers. Also probably we can implement the function doing the procedure. If so, it might be good idea to make background worker or cron periodically execute the function. >> and prove that commit performance can be improved, >> e.g. if we can commit not in serial but also possible in parallel. > > If it's ok with you, I'd like to work on the performance issue. What > I have in mind is commit all remote transactions in parallel instead > of sequentially in the postgres_fdw transaction callback, as mentioned > above, but I think that would improve the performance even for > one-phase commit that we already have. +100 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Fujii-san, On Thu, Oct 7, 2021 at 11:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2021/10/07 19:47, Etsuro Fujita wrote: > > On Thu, Oct 7, 2021 at 1:29 PM k.jamison@fujitsu.com > > <k.jamison@fujitsu.com> wrote: > >> and prove that commit performance can be improved, > >> e.g. if we can commit not in serial but also possible in parallel. > > > > If it's ok with you, I'd like to work on the performance issue. What > > I have in mind is commit all remote transactions in parallel instead > > of sequentially in the postgres_fdw transaction callback, as mentioned > > above, but I think that would improve the performance even for > > one-phase commit that we already have. > > +100 I’ve started working on this. Once I have a (POC) patch, I’ll post it in a new thread, as I think it can be discussed separately. Thanks! Best regards, Etsuro Fujita