Re: [HACKERS] pg_prepared_xact_status - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [HACKERS] pg_prepared_xact_status
Date
Msg-id 424501b5-4da5-9b56-04d1-54aa419b4eff@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] pg_prepared_xact_status  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] pg_prepared_xact_status
Re: [HACKERS] pg_prepared_xact_status
List pgsql-hackers

On 29.09.2017 06:02, Michael Paquier wrote:
> On Fri, Sep 29, 2017 at 1:53 AM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> In Postgres 10 we have txid_status function which returns status of
>> transaction by XID.
>> I wonder if it will be also useful to have similar function for 2PC
>> transactions which can operate with GID?
>> pg_prepared_xacts view allows to get information about prepared transaction
>> which are not yet committed or aborted.
>> But if transaction is committed, then there is no way now to find status of
>> this transaction.
> But you need to keep track of the transaction XID of each transaction
> happening on the remote nodes which are part of a global 2PC
> transaction, no?

Why? We have GID which allows to identify 2PC transaction at all 
participant nodes.

>   If you have this data at hand using txid_status is
> enough to guess if a prepared transaction has been marked as committed
> or prepared. And it seems to me that tracking those XIDs is mandatory
> anyway for other consistency checks.

It is certainly possible to maintain information about XIDs involved in 
2PC transaction.
And it can really simplify recovery. But I wonder why it is mandatory?
Keeping track of XIDs requires some persistent storage.
So you are saying that Postgresql 2PC mechanism is not complete and user 
needs to maintain some extra information to make it work?

Also, I think that it is not necessary to know XIDs of all local 
transactions involved in 2PC. It is enough to know XID of coordinator's 
transaction.
It can be included in GID (as I proposed in the end of my mail). In this 
case txid_status can be used at coordinator to check global status of 
2PC transaction.

The idea of pg_prepared_xact_status function is that it allows to get 
status of 2PC transaction without any additional requirements to GIDs 
and any other additional information about participants of 2PC transaction.


>
>> If crash happen during 2PC commit, then transaction can be in prepared state
>> at some nodes and committed/aborted at  other nodes.
> Handling inconsistencies here is a tricky problem, particularly if a
> given transaction is marked as both committed and aborted on many
> nodes.
How it can be?
Abort of transaction can happen only at prepare stage.
In this case coordinator should rollback transaction everywhere.
There should be no committed transactions in this case.

The following situations are possible:
1. Transaction is prepared at some nodes and information about it is not 
available at other nodes. It means that crash happen at prepare state 
and transaction was not able to
complete prepare at all nodes. It is safe to abort transaction in this case.
2. Transaction is prepared at some nodes and aborted at another nodes. 
The same as 1 - we can safely abort transaction everywhere.
3. Transaction is prepared at all nodes. It means that coordinator was 
crashed before sending commit message. It is safe to commit transaction 
everywhere.
4. Transaction is prepared at some nodes and committed at other nodes. 
Commit message was no delivered or proceeded by other nodes before crash.
It is safe to commit transaction at all nodes.


The problems with 2PC arrive when coordinator node is not available but 
is expected to be recovered in future.
In this case we may have not enough information to make a decision 
whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other protocol 
to prevent such situation.

> The only way that I could think of would be to perform PITR to
> recover from the inconsistent states. So that's not an easy problem,
> becoming even more tricky if more than one transaction is involved and
> many transactions are inter-dependent across nodes.
>
>> 3. Same GID can be reused multiple times. In this case
>> pg_prepared_xact_status function will return incorrect result, because it
>> will return information about first global transaction with such GID after
>> checkpoint and not the recent one.
> Yeah, this argument alone is why I think that this is a dead-end approach.

May be. But I think that in most real systems unique GIDs are generated, 
because otherwise it is difficult to address concurrency and recovery 
issues.

>
>> There is actually alternative approach to recovery of 2PC transactions. We
>> can include coordinator identifier in GID (we can use GetSystemIdentifier()
>> to identify coordinator's node)
>> and XID of coordinator's transaction. In this case we can use txid_status()
>> to check status of transaction at coordinator. It eliminates need to scan
>> WAL to determine status of prepared transaction.
> +    GetOldestRestartPoint(&lsn, &timeline);
> +
> +    xlogreader = XLogReaderAllocate(&read_local_xlog_page, NULL);
> +    if (!xlogreader)
> So you scan a bunch of records for each GID? This is really costly. I
> think that you would have an easier life by tracking the XID of each
> transaction involved remotely. In Postgres-XL, this is not a problem
> as XIDs are assigned globally and consistently. But you would gain in
> performance by keeping track of it on the coordinator node.

Yes, it can be costly.
But I just want to propose more or less universal mechanism which to 
determine status of 2PC transaction based just on existed information in 
WAL and not requiring some extra information stored in GID or in some 
other storage.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: [HACKERS] Index expression syntax
Next
From: Marko Tiikkaja
Date:
Subject: Re: [HACKERS] Index expression syntax