Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CA+fd4k7c8780Q_fHReeyP9N8x+iG2zuEysAxJvTHXvQDjsPbjA@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Transactions involving multiple postgres foreign servers, take 2  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
RE: Transactions involving multiple postgres foreign servers, take 2  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Fri, 11 Sep 2020 at 11:58, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/09/11 0:37, Masahiko Sawada wrote:
> > On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com
> > <tsunakawa.takay@fujitsu.com> wrote:
> >>
> >> From: Amit Kapila <amit.kapila16@gmail.com>
> >>> I intend to say that the global-visibility work can impact this in a
> >>> major way and we have analyzed that to some extent during a discussion
> >>> on the other thread. So, I think without having a complete
> >>> design/solution that addresses both the 2PC and global-visibility, it
> >>> is not apparent what is the right way to proceed. It seems to me that
> >>> rather than working on individual (or smaller) parts one needs to come
> >>> up with a bigger picture (or overall design) and then once we have
> >>> figured that out correctly, it would be easier to decide which parts
> >>> can go first.
> >>
> >> I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to
discussthe big picture!  I don't know why I'm taking this long time; I feel I were captive in a time prison (yes,
nobodyis holding me captive; I'm just late.)  Please wait a few days. 
> >>
> >> But to proceed with the development, let me comment on the atomic commit and global visibility.
> >>
> >> * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and
ifwe can avoid it. 
> >>
> >> * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB.  That is, 2PC for
multi-nodeatomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly
available,and the timestamp-based global visibility.  However, the timestamp-based approach makes the database instance
shutdown when the node's clock is distant from the other nodes. 
> >>
> >> * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other
informationto be transferred among the cluster nodes.  However, this seems to have to track the order of read and write
operationsamong concurrent transactions to ensure the correct commit order, so I'm not sure about the performance.  The
MVCOpaper seems to present the information we need, but I haven't understood it well yet (it's difficult.)  Could you
anybodykindly interpret this? 
> >>
> >> Commitment ordering (CO) - yoavraz2
> >> https://sites.google.com/site/yoavraz2/the_principle_of_co
> >>
> >>
> >> As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following
issuesto be addressed: 
> >>
> >> 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.)  I think
oracle_fdwand jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA
featureas SQL statements, not C functions as defined in the XA specification. 
> >
> > I agree that we need to verify new FDW APIs will be suitable for other
> > FDWs than postgres_fdw as well.
> >
> >>
> >> 2. 2PC processing is queued and serialized in one background worker.  That severely subdues transaction
throughput. Each backend should perform 2PC. 
> >
> > Not sure it's safe that each backend perform PREPARE and COMMIT
> > PREPARED since the current design is for not leading an inconsistency
> > between the actual transaction result and the result the user sees.
>
> Can I check my understanding about why the resolver process is necessary?
>
> Firstly, you think that issuing COMMIT PREPARED command to the foreign server can cause an error, for example,
becauseof connection error, OOM, etc. On the other hand, only waiting for other process to issue the command is less
likelyto cause an error. Right? 
>
> If an error occurs in backend process after commit record is WAL-logged, the error would be reported to the client
andit may misunderstand that the transaction failed even though commit record was already flushed. So you think that
eachbackend should not issue COMMIT PREPARED command to avoid that inconsistency. To avoid that, it's better to make
otherprocess, the resolver, issue the command and just make each backend wait for that to completed. Right? 
>
> Also using the resolver process has another merit; when there are unresolved foreign transactions but the
correspondingbackend exits, the resolver can try to resolve them. If something like this automatic resolution is
necessary,the process like the resolver would be necessary. Right? 
>
> To the contrary, if we don't need such automatic resolution (i.e., unresolved foreign transactions always need to be
resolvedmanually) and we can prevent the code to issue COMMIT PREPARED command from causing an error (not sure if
that'spossible, though...), probably we don't need the resolver process. Right? 

Yes, I'm on the same page about all the above explanations.

The resolver process has two functionalities: resolving foreign
transactions automatically when the user issues COMMIT (the case you
described in the second paragraph), and resolving foreign transaction
when the corresponding backend no longer exist or when the server
crashes during in the middle of 2PC (described in the third
paragraph).

Considering the design without the resolver process, I think we can
easily replace the latter with the manual resolution. OTOH, it's not
easy for the former. I have no idea about better design for now,
although, as you described, if we could ensure that the process
doesn't raise an error during resolving foreign transactions after
committing the local transaction we would not need the resolver
process.

Or the second idea would be that the backend commits only the local
transaction then returns the acknowledgment of COMMIT to the user
without resolving foreign transactions. Then the user manually
resolves the foreign transactions by, for example, using the SQL
function pg_resolve_foreign_xact() within a separate transaction. That
way, even if an error occurred during resolving foreign transactions
(i.g., executing COMMIT PREPARED), it’s okay as the user is already
aware of the local transaction having been committed and can retry to
resolve the unresolved foreign transaction. So we won't need the
resolver process while avoiding such inconsistency.

But a drawback would be that the transaction commit doesn't ensure
that all foreign transactions are completed. The subsequent
transactions would need to check if the previous distributed
transaction is completed to see its results. I’m not sure it’s a good
design in terms of usability.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: On login trigger: take three
Next
From: Daniel Gustafsson
Date:
Subject: Re: copyright problem in REL_13_STABLE