Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | CAD21AoCKYaJNmJJxo=M04kFYUkcFaHf1-5jPGRnbhsu=0rb9-Q@mail.gmail.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Transactions involving multiple postgres foreign servers, take 2
|
List | pgsql-hackers |
On Sun, Nov 8, 2020 at 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Nov 5, 2020 at 12:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com > > > <tsunakawa.takay@fujitsu.com> wrote: > > > > > > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> > > > > > So what's your opinion? > > > > > > > > My opinion is simple and has not changed. Let's clarify and refine the design first in the following areas (othersmay have pointed out something else too, but I don't remember), before going deeper into the code review. > > > > > > > > * FDW interface > > > > New functions so that other FDWs can really implement. Currently, XA seems to be the only model we can rely on tovalidate the FDW interface. > > > > What FDW function would call what XA function(s)? What should be the arguments for the FEW functions? > > > > > > I guess since FDW interfaces may be affected by the feature > > > architecture we can discuss later. > > > > > > > * Performance > > > > Parallel prepare and commits on the client backend. The current implementation is untolerable and should not bethe first release quality. I proposed the idea. > > > > (If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit. I want to keep Postgres's reputation.) > > > > > > What is in your mind regarding the implementation of parallel prepare > > > and commit? Given that some FDW plugins don't support asynchronous > > > execution I guess we need to use parallel workers or something. That > > > is, the backend process launches parallel workers to > > > prepare/commit/rollback foreign transactions in parallel. I don't deny > > > this approach but it'll definitely make the feature complex and needs > > > more codes. > > > > > > My point is a small start and keeping simple the first version. Even > > > if we need one or more years for this feature, I think that > > > introducing the simple and minimum functionality as the first version > > > to the core still has benefits. We will be able to have the > > > opportunity to get real feedback from users and to fix bugs in the > > > main infrastructure before making it complex. In this sense, the patch > > > having the backend return without waits for resolution after the local > > > commit would be a good start as the first version (i.g., up to > > > applying v26-0006 patch). Anyway, the architecture should be > > > extensible enough for future improvements. > > > > > > For the performance improvements, we will be able to support > > > asynchronous and/or prepare/commit/rollback. Moreover, having multiple > > > resolver processes on one database would also help get better > > > through-put. For the user who needs much better through-put, the user > > > also can select not to wait for resolution after the local commit, > > > like synchronous_commit = ‘local’ in replication. > > > > > > > As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.) That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set. I'msimply imagining what is typically written in database textbooks and research papers. I'm asking this because I saw somediscussion in this thread that some new WAL records are added. I was worried that transactions have to write WAL recordsother than prepare and commit unlike textbook implementations. > > > > > > > > Atomic Commit of Distributed Transactions > > > > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions > > > > > > Understood. I'll add an explanation about the message flow and disk > > > writes to the wiki page. > > > > Done. > > > > > > > > We need to consider the point of error handling during resolving > > > foreign transactions too. > > > > > > > > > > > > I don’t think we need to stipulate the query cancellation. Anyway I > > > > > guess the facts neither that we don’t stipulate anything about query > > > > > cancellation now nor that postgres_fdw might not be cancellable in > > > > > some situations now are not a reason for not supporting query > > > > > cancellation. If it's a desirable behavior and users want it, we need > > > > > to put an effort to support it as much as possible like we’ve done in > > > > > postgres_fdw. Some FDWs unfortunately might not be able to support it > > > > > only by their functionality but it would be good if we can achieve > > > > > that by combination of PostgreSQL and FDW plugins. > > > > > > > > Let me comment on this a bit; this is a bit dangerous idea, I'm afraid. We need to pay attention to the FDW interfaceand its documentation so that FDW developers can implement what we consider important -- query cancellation in yourdiscussion. "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't use. That's what Tomas Vondra pointed out several years ago. > > > > > > I suspect the story is somewhat different. libpq fortunately supports > > > asynchronous execution, but when it comes to canceling the foreign > > > transaction resolution I think basically all FDW plugins are in the > > > same situation at this time. We can choose whether to make it > > > cancellable or not. According to the discussion so far, it completely > > > depends on the architecture of this feature. So my point is whether > > > it's worth to have this functionality for users and whether users want > > > it, not whether postgres_fdw is ok. > > > > > > > I've thought again about the idea that once the backend failed to > > resolve a foreign transaction it leaves to a resolver process. With > > this idea, the backend process perform the 2nd phase of 2PC only once. > > If an error happens during resolution it leaves to a resolver process > > and returns an error to the client. We used to use this idea in the > > previous patches and it’s discussed sometimes. > > > > First of all, this idea doesn’t resolve the problem of error handling > > that the transaction could return an error to the client in spite of > > having been committed the local transaction. There is an argument that > > this behavior could also happen even in a single server environment > > but I guess the situation is slightly different. Basically what the > > transaction does after the commit is cleanup. An error could happen > > during cleanup but if it happens it’s likely due to a bug of > > something wrong inside PostgreSQL or OS. On the other hand, during and > > after resolution the transaction does major works such as connecting a > > foreign server, sending an SQL, getting the result, and writing a WAL > > to remove the entry. These are more likely to happen an error. > > > > Also, with this idea, the client needs to check if the error got from > > the server is really true because the local transaction might have > > been committed. Although this could happen even in a single server > > environment how many users check that in practice? If a server > > crashes, subsequent transactions end up failing due to a network > > connection error but it seems hard to distinguish between such a real > > error and the fake error. > > > > Moreover, it’s questionable in terms of extensibility. We would not > > able to support keeping waiting for distributed transactions to > > complete even if an error happens, like synchronous replication. The > > user might want to wait in case where the failure is temporary such as > > temporary network disconnection. Trying resolution only once seems to > > have cons of both asynchronous and synchronous resolutions. > > > > So I’m thinking that with this idea the user will need to change their > > application so that it checks if the error they got is really true, > > which is cumbersome for users. Also, it seems to me we need to > > circumspectly discuss whether this idea could weaken extensibility. > > > > > > Anyway, according to the discussion, it seems to me that we got a > > consensus so far that the backend process prepares all foreign > > transactions and a resolver process is necessary to resolve in-doubt > > transaction in background. So I’ve changed the patch set as follows. > > Applying these all patches, we can support asynchronous foreign > > transaction resolution. That is, at transaction commit the backend > > process prepares all foreign transactions, and then commit the local > > transaction. After that, it returns OK of commit to the client while > > leaving the prepared foreign transaction to a resolver process. A > > resolver process fetches the foreign transactions to resolve and > > resolves them in background. Since the 2nd phase of 2PC is performed > > asynchronously a transaction that wants to see the previous > > transaction result needs to check its status. > > > > Here is brief explaination for each patches: > > > > v27-0001-Introduce-transaction-manager-for-foreign-transa.patch > > > > This commit adds the basic foreign transaction manager, > > CommitForeignTransaction, and RollbackForeignTransaction API. These > > APIs support only one-phase. With this change, FDW is able to control > > its transaction using the foreign transaction manager, not using > > XactCallback. > > > > v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch > > > > This commit implements both CommitForeignTransaction and > > RollbackForeignTransaction APIs in postgres_fdw. Note that since > > PREPARE TRANSACTION is still not supported there is nothing the user > > newly is able to do. > > > > v27-0003-Recreate-RemoveForeignServerById.patch > > > > This commit recreates RemoveForeignServerById that was removed by > > b1d32d3e3. This is necessary because we need to check if there is a > > foreign transaction involved with the foreign server that is about to > > be removed. > > > > v27-0004-Add-PrepareForeignTransaction-API.patch > > > > This commit adds prepared foreign transaction support including WAL > > logging and recovery, and PrepareForeignTransaction API. With this > > change, the user is able to do 'PREPARE TRANSACTION’ and > > 'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves > > foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the > > local transaction. It doesn't do anything for foreign transactions. > > Therefore, the user needs to resolve foreign transactions manually by > > executing the pg_resolve_foreign_xacts() SQL function which is also > > introduced by this commit. > > > > v27-0005-postgres_fdw-supports-prepare-API.patch > > > > This commit implements PrepareForeignTransaction API and makes > > CommitForeignTransaction and RollbackForeignTransaction supports > > two-phase commit. > > > > v27-0006-Add-GetPrepareId-API.patch > > > > This commit adds GetPrepareID API. > > > > v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch > > > > This commit introduces foreign transaction resolver and launcher > > processes. With this change, the user doesn’t need to manually execute > > pg_resolve_foreign_xacts() function to resolve foreign transactions > > prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED. > > Instead, a resolver process automatically resolves them in background. > > > > v27-0008-Prepare-foreign-transactions-at-commit-time.patch > > > > With this commit, the transaction prepares foreign transactions marked > > as modified at transaction commit if foreign_twophase_commit is > > ‘required’. Previously the user needs to do PREPARE TRANSACTION and > > COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC > > transparently to the user. But the transaction returns OK of commit to > > the client after committing the local transaction and notifying the > > resolver process, without waits. Foreign transactions are > > asynchronously resolved by the resolver process. > > > > v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch > > > > With this commit, the transactions started via postgres_fdw are marked > > as modified, which is necessary to use 2PC. > > > > v27-0010-Documentation-update.patch > > v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch > > > > Documentation update and regression tests. > > > > The missing piece from the previous version patch is synchronously > > transaction resolution. In the previous patch, foreign transactions > > are synchronously resolved by a resolver process. But since it's under > > discussion whether this is a good approach and I'm considering > > optimizing the logic it’s not included in the current patch set. > > > > > > Cfbot reported an error. I've attached the updated version patch set > to make cfbot happy. Since the previous version conflicts with the current HEAD I've attached the rebased version patch set. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
- v29-0011-Add-regression-tests-for-foreign-twophase-commit.patch
- v29-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
- v29-0006-Add-GetPrepareId-API.patch
- v29-0008-Prepare-foreign-transactions-at-commit-time.patch
- v29-0010-Documentation-update.patch
- v29-0007-Introduce-foreign-transaction-launcher-and-resol.patch
- v29-0005-postgres_fdw-supports-prepare-API.patch
- v29-0003-Recreate-RemoveForeignServerById.patch
- v29-0004-Add-PrepareForeignTransaction-API.patch
- v29-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
- v29-0001-Introduce-transaction-manager-for-foreign-transa.patch
pgsql-hackers by date: