Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CA+fd4k6z_KMAd0+qU+fM28eRaM8pnYKMsHnai5F2EamaxKfz4Q@mail.gmail.com
Whole thread Raw
In response to RE: Transactions involving multiple postgres foreign servers, take 2  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses Re: Transactions involving multiple postgres foreign servers, take 2  (Fujii Masao <masao.fujii@oss.nttdata.com>)
RE: Transactions involving multiple postgres foreign servers, take 2  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Tue, 8 Sep 2020 at 13:00, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Amit Kapila <amit.kapila16@gmail.com>
> > I intend to say that the global-visibility work can impact this in a
> > major way and we have analyzed that to some extent during a discussion
> > on the other thread. So, I think without having a complete
> > design/solution that addresses both the 2PC and global-visibility, it
> > is not apparent what is the right way to proceed. It seems to me that
> > rather than working on individual (or smaller) parts one needs to come
> > up with a bigger picture (or overall design) and then once we have
> > figured that out correctly, it would be easier to decide which parts
> > can go first.
>
> I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss
thebig picture!  I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is
holdingme captive; I'm just late.)  Please wait a few days. 
>
> But to proceed with the development, let me comment on the atomic commit and global visibility.
>
> * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if
wecan avoid it. 
>
> * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB.  That is, 2PC for
multi-nodeatomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly
available,and the timestamp-based global visibility.  However, the timestamp-based approach makes the database instance
shutdown when the node's clock is distant from the other nodes. 
>
> * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information
tobe transferred among the cluster nodes.  However, this seems to have to track the order of read and write operations
amongconcurrent transactions to ensure the correct commit order, so I'm not sure about the performance.  The MVCO paper
seemsto present the information we need, but I haven't understood it well yet (it's difficult.)  Could you anybody
kindlyinterpret this? 
>
> Commitment ordering (CO) - yoavraz2
> https://sites.google.com/site/yoavraz2/the_principle_of_co
>
>
> As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues
tobe addressed: 
>
> 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.)  I think
oracle_fdwand jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA
featureas SQL statements, not C functions as defined in the XA specification. 

I agree that we need to verify new FDW APIs will be suitable for other
FDWs than postgres_fdw as well.

>
> 2. 2PC processing is queued and serialized in one background worker.  That severely subdues transaction throughput.
Eachbackend should perform 2PC. 

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.
But in the future, I think we can have multiple background workers per
database for better performance.

>
> 3. postgres_fdw cannot detect remote updates when the UDF executed on a remote node updates data.

I assume that you mean the pushing the UDF down to a foreign server.
If so, I think we can do this by improving postgres_fdw. In the
current patch, registering and unregistering a foreign server to a
group of 2PC and marking a foreign server as updated is FDW
responsible. So perhaps if we had a way to tell postgres_fdw that the
UDF might update the data on the foreign server, postgres_fdw could
mark the foreign server as updated if the UDF is shippable.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Next
From: David Rowley
Date:
Subject: Re: Optimising compactify_tuples()