Re: [HACKERS] pg_prepared_xact_status - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [HACKERS] pg_prepared_xact_status
Date
Msg-id 47c17b10-9936-f2e9-63e0-d669bcd2d32e@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] pg_prepared_xact_status  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers


On 29.09.2017 11:27, Craig Ringer wrote:
On 29 September 2017 at 15:57, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
 
So you are saying that Postgresql 2PC mechanism is not complete and user needs to maintain some extra information to make it work?

No, it provides what's needed for an implementation of what in XA terms is a local resource manager (LRM). What it does not provide is infrastructure to make postgres its self into a global transaction manager (GTM) for co-ordinating multiple LRMs.

It sounds like you're trying to build a GTM using PostgreSQL's existing LRM book-keeping as your authorative data store, right?

No exactly. I am trying to add 2PC to our pg_shardman: combination of pg_pathman + postgres_fdw + logical replication, which should provide HA and write scalability.
This architecture definitely not assume presence of GTM. Most of transactions are expected to be local (involves only one node) and number of participants of distributed transaction is expected to be much smaller than total number of nodes (usually 2). So we need to perform 2PC without GTM.

 
The problems with 2PC arrive when coordinator node is not available but is expected to be recovered in future.
In this case we may have not enough information to make a decision whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other protocol to prevent such situation.

In that case the node sits and waits patiently for the GTM (or in more complex architectures, *a* valid voting quorum of GTMs) to be reachable again. Likely using a protocol like Raft, Paxos, 3PC etc to co-ordinate.

It can't do anything else, since if it unilaterally commits or rolls back it might later find out that the nodes on the other side of the network partition or whatever made the opposite decision and, boom!

Ok, I am not sure if  pg_prepared_xact_status can be really useful or not.
I agree with you that if we are implementing distributed transaction on top of Poasgres, then we need some better mechanism to determine transaction state.
But a lot of people are using 2PC without GTM or whatever else. For example, many Java ORMs are using 2PC for their transactions.
I think that it is better to provide to DBA or programmer some way to determine status of such transaction by GID (which is usually unique and known), as far as this information
is available in Postgres WAL.

In any case, I attached slightly improved version of this function which traverse log not only since last checkpoint, but also try iterates backward inspecting previous WAL segments.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

pgsql-hackers by date:

Previous
From: Maksim Milyutin
Date:
Subject: Re: [HACKERS] Partitions: \d vs \d+
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] SQL/JSON in PostgreSQL