Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CAD21AoBxZ_6vViP7tDaPOBK8kopc0NoKuRspOx1QCM+6t3A+MQ@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Nov 5, 2020 at 12:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 22, 2020 at 10:39 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 21 Oct 2020 at 18:33, tsunakawa.takay@fujitsu.com
> > <tsunakawa.takay@fujitsu.com> wrote:
> > >
> > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> > > > So what's your opinion?
> > >
> > > My opinion is simple and has not changed.  Let's clarify and refine the design first in the following areas
(othersmay have pointed out something else too, but I don't remember), before going deeper into the code review. 
> > >
> > > * FDW interface
> > > New functions so that other FDWs can really implement.  Currently, XA seems to be the only model we can rely on
tovalidate the FDW interface. 
> > > What FDW function would call what XA function(s)?  What should be the arguments for the FEW functions?
> >
> > I guess since FDW interfaces may be affected by the feature
> > architecture we can discuss later.
> >
> > > * Performance
> > > Parallel prepare and commits on the client backend.  The current implementation is untolerable and should not be
thefirst release quality.  I proposed the idea. 
> > > (If you insist you don't want to anything about this, I have to think you're just rushing for the patch commit.
Iwant to keep Postgres's reputation.) 
> >
> > What is in your mind regarding the implementation of parallel prepare
> > and commit? Given that some FDW plugins don't support asynchronous
> > execution I guess we need to use parallel workers or something. That
> > is, the backend process launches parallel workers to
> > prepare/commit/rollback foreign transactions in parallel. I don't deny
> > this approach but it'll definitely make the feature complex and needs
> > more codes.
> >
> > My point is a small start and keeping simple the first version. Even
> > if we need one or more years for this feature, I think that
> > introducing the simple and minimum functionality as the first version
> > to the core still has benefits. We will be able to have the
> > opportunity to get real feedback from users and to fix bugs in the
> > main infrastructure before making it complex. In this sense, the patch
> > having the backend return without waits for resolution after the local
> > commit would be a good start as the first version (i.g., up to
> > applying v26-0006 patch). Anyway, the architecture should be
> > extensible enough for future improvements.
> >
> > For the performance improvements, we will be able to support
> > asynchronous and/or prepare/commit/rollback. Moreover, having multiple
> > resolver processes on one database would also help get better
> > through-put. For the user who needs much better through-put, the user
> > also can select not to wait for resolution after the local commit,
> > like synchronous_commit = ‘local’ in replication.
> >
> > > As part of this, I'd like to see the 2PC's message flow and disk writes (via email and/or on the following wiki.)
That helps evaluate the 2PC performance, because it's hard to figure it out in the code of a large patch set.  I'm
simplyimagining what is typically written in database textbooks and research papers.  I'm asking this because I saw
somediscussion in this thread that some new WAL records are added.  I was worried that transactions have to write WAL
recordsother than prepare and commit unlike textbook implementations. 
> > >
> > > Atomic Commit of Distributed Transactions
> > > https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions
> >
> > Understood. I'll add an explanation about the message flow and disk
> > writes to the wiki page.
>
> Done.
>
> >
> > We need to consider the point of error handling during resolving
> > foreign transactions too.
> >
> > >
> > > > I don’t think we need to stipulate the query cancellation. Anyway I
> > > > guess the facts neither that we don’t stipulate anything about query
> > > > cancellation now nor that postgres_fdw might not be cancellable in
> > > > some situations now are not a reason for not supporting query
> > > > cancellation. If it's a desirable behavior and users want it, we need
> > > > to put an effort to support it as much as possible like we’ve done in
> > > > postgres_fdw.  Some FDWs unfortunately might not be able to support it
> > > > only by their functionality but it would be good if we can achieve
> > > > that by combination of PostgreSQL and FDW plugins.
> > >
> > > Let me comment on this a bit; this is a bit dangerous idea, I'm afraid.  We need to pay attention to the FDW
interfaceand its documentation so that FDW developers can implement what we consider important -- query cancellation in
yourdiscussion.  "postgres_fdw is OK, so the interface is good" can create interfaces that other FDW developers can't
use. That's what Tomas Vondra pointed out several years ago. 
> >
> > I suspect the story is somewhat different. libpq fortunately supports
> > asynchronous execution, but when it comes to canceling the foreign
> > transaction resolution I think basically all FDW plugins are in the
> > same situation at this time. We can choose whether to make it
> > cancellable or not. According to the discussion so far, it completely
> > depends on the architecture of this feature. So my point is whether
> > it's worth to have this functionality for users and whether users want
> > it, not whether postgres_fdw is ok.
> >
>
> I've thought again about the idea that once the backend failed to
> resolve a foreign transaction it leaves to a resolver process. With
> this idea, the backend process perform the 2nd phase of 2PC only once.
> If an error happens during resolution it leaves to a resolver process
> and returns an error to the client. We used to use this idea in the
> previous patches and it’s discussed sometimes.
>
> First of all, this idea doesn’t resolve the problem of error handling
> that the transaction could return an error to the client in spite of
> having been committed the local transaction. There is an argument that
> this behavior could also happen even in a single server environment
> but I guess the situation is slightly different. Basically what the
> transaction does after the commit is cleanup. An error could happen
> during cleanup but if it happens it’s likely due to a  bug of
> something wrong inside PostgreSQL or OS. On the other hand, during and
> after resolution the transaction does major works such as connecting a
> foreign server, sending an SQL, getting the result, and writing a WAL
> to remove the entry. These are more likely to happen an error.
>
> Also, with this idea, the client needs to check if the error got from
> the server is really true because the local transaction might have
> been committed. Although this could happen even in a single server
> environment how many users check that in practice? If a server
> crashes, subsequent transactions end up failing due to a network
> connection error but it seems hard to distinguish between such a real
> error and the fake error.
>
> Moreover, it’s questionable in terms of extensibility. We would not
> able to support keeping waiting for distributed transactions to
> complete even if an error happens, like synchronous replication. The
> user might want to wait in case where the failure is temporary such as
> temporary network disconnection. Trying resolution only once seems to
> have cons of both asynchronous and synchronous resolutions.
>
> So I’m thinking that with this idea the user will need to change their
> application so that it checks if the error they got is really true,
> which is cumbersome for users. Also, it seems to me we need to
> circumspectly discuss whether this idea could weaken extensibility.
>
>
> Anyway, according to the discussion, it seems to me that we got a
> consensus so far that the backend process prepares all foreign
> transactions and a resolver process is necessary to resolve in-doubt
> transaction in background. So I’ve changed the patch set as follows.
> Applying these all patches, we can support asynchronous foreign
> transaction resolution. That is, at transaction commit the backend
> process prepares all foreign transactions, and then commit the local
> transaction. After that, it returns OK of commit to the client while
> leaving the prepared foreign transaction to a resolver process. A
> resolver process fetches the foreign transactions to resolve and
> resolves them in background. Since the 2nd phase of 2PC is performed
> asynchronously a transaction that wants to see the previous
> transaction result needs to check its status.
>
> Here is brief explaination for each patches:
>
> v27-0001-Introduce-transaction-manager-for-foreign-transa.patch
>
> This commit adds the basic foreign transaction manager,
> CommitForeignTransaction, and RollbackForeignTransaction API. These
> APIs support only one-phase. With this change, FDW is able to control
> its transaction using the foreign transaction manager, not using
> XactCallback.
>
> v27-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch
>
> This commit implements both CommitForeignTransaction and
> RollbackForeignTransaction APIs in postgres_fdw. Note that since
> PREPARE TRANSACTION is still not supported there is nothing the user
> newly is able to do.
>
> v27-0003-Recreate-RemoveForeignServerById.patch
>
> This commit recreates RemoveForeignServerById that was removed by
> b1d32d3e3. This is necessary because we need to check if there is a
> foreign transaction involved with the foreign server that is about to
> be removed.
>
> v27-0004-Add-PrepareForeignTransaction-API.patch
>
> This commit adds prepared foreign transaction support including WAL
> logging and recovery, and PrepareForeignTransaction API. With this
> change, the user is able to do 'PREPARE TRANSACTION’ and
> 'COMMIT/ROLLBACK PREPARED' commands on the transaction that involves
> foreign servers. But note that COMMIT/ROLLBACK PREPARED ends only the
> local transaction. It doesn't do anything for foreign transactions.
> Therefore, the user needs to resolve foreign transactions manually by
> executing the pg_resolve_foreign_xacts() SQL function which is also
> introduced by this commit.
>
> v27-0005-postgres_fdw-supports-prepare-API.patch
>
> This commit implements PrepareForeignTransaction API and makes
> CommitForeignTransaction and RollbackForeignTransaction supports
> two-phase commit.
>
> v27-0006-Add-GetPrepareId-API.patch
>
> This commit adds GetPrepareID API.
>
> v27-0007-Introduce-foreign-transaction-launcher-and-resol.patch
>
> This commit introduces foreign transaction resolver and launcher
> processes. With this change, the user doesn’t need to manually execute
> pg_resolve_foreign_xacts() function to resolve foreign transactions
> prepared by PREPARE TRANSACTION and left by COMMIT/ROLLBACK PREPARED.
> Instead, a resolver process automatically resolves them in background.
>
> v27-0008-Prepare-foreign-transactions-at-commit-time.patch
>
> With this commit, the transaction prepares foreign transactions marked
> as modified at transaction commit if foreign_twophase_commit is
> ‘required’. Previously the user needs to do PREPARE TRANSACTION and
> COMMIT/ROLLBACK PREPARED to use 2PC but it enables us to use 2PC
> transparently to the user. But the transaction returns OK of commit to
> the client after committing the local transaction and notifying the
> resolver process, without waits. Foreign transactions are
> asynchronously resolved by the resolver process.
>
> v27-0009-postgres_fdw-marks-foreign-transaction-as-modifi.patch
>
> With this commit, the transactions started via postgres_fdw are marked
> as modified, which is necessary to use 2PC.
>
> v27-0010-Documentation-update.patch
> v27-0011-Add-regression-tests-for-foreign-twophase-commit.patch
>
> Documentation update and regression tests.
>
> The missing piece from the previous version patch is synchronously
> transaction resolution. In the previous patch, foreign transactions
> are synchronously resolved by a resolver process. But since it's under
> discussion whether this is a good approach and I'm considering
> optimizing the logic it’s not included in the current patch set.
>
>

Cfbot reported an error. I've attached the updated version patch set
to make cfbot happy.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Yet another (minor) fix in BRIN
Next
From: Dilip Kumar
Date:
Subject: Re: Re: parallel distinct union and aggregate support patch