Thread: [HACKERS] pg_prepared_xact_status

[HACKERS] pg_prepared_xact_status

From

Konstantin Knizhnik

Date:

28 September 2017, 22:53:40

Hi,

In Postgres 10 we have txid_status function which returns status of 
transaction by XID.
I wonder if it will be also useful to have similar function for 2PC 
transactions which can operate with GID?
pg_prepared_xacts view allows to get information about prepared 
transaction which are not yet committed or aborted.
But if transaction is committed, then there is no way now to find status 
of this transaction.

If crash happen during 2PC commit, then transaction can be in prepared 
state at some nodes and committed/aborted at  other nodes.
Using pg_prepared_xacts view DBA can find out global transactions which 
were not completed.
But there is no way (except pg_waldump) to determine whether this 
transaction needs to be committed or aborted at rest of the nodes.

Attached please find small patch with pg_prepared_xact_status function.
This function has the following obvious drawbacks:

1. It is not able to extract information about prepared transaction 
preceding last checkpoint. It seems to be enough to perform recovery in 
case of failure unless
checkpoint happen just before failure or there is large gap between 
prepare and commit.
The only workaround I see at this moment is to pass to this function 
optional parameter with start position in the WAL.
Any better solution?

2. On systems with huge workload interval between checkpoints may be 
very large. In this case we have to scan large amount of WAL data to be 
able to locate our transaction.
Whoich make take significant amount of time.
We can traverse WAL in smarter way, starting from last segment, assuming 
that in-doubt transaction was prepared just before crash.
But it significantly complicates traverse logic.

3. Same GID can be reused multiple times. In this case 
pg_prepared_xact_status function will return incorrect result, because 
it will return information about first global transaction with such GID 
after checkpoint and not the recent one.


There is actually alternative approach to recovery of 2PC transactions. 
We can include coordinator identifier in GID (we can use 
GetSystemIdentifier() to identify coordinator's node)
and XID of coordinator's transaction. In this case we can use 
txid_status() to check status of transaction at coordinator. It 
eliminates need to scan WAL to determine status of prepared transaction.

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pg_prepared-xact_status.patch

Re: [HACKERS] pg_prepared_xact_status

From

Michael Paquier

Date:

29 September 2017, 09:02:29

On Fri, Sep 29, 2017 at 1:53 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> In Postgres 10 we have txid_status function which returns status of
> transaction by XID.
> I wonder if it will be also useful to have similar function for 2PC
> transactions which can operate with GID?
> pg_prepared_xacts view allows to get information about prepared transaction
> which are not yet committed or aborted.
> But if transaction is committed, then there is no way now to find status of
> this transaction.

But you need to keep track of the transaction XID of each transaction
happening on the remote nodes which are part of a global 2PC
transaction, no? If you have this data at hand using txid_status is
enough to guess if a prepared transaction has been marked as committed
or prepared. And it seems to me that tracking those XIDs is mandatory
anyway for other consistency checks.

> If crash happen during 2PC commit, then transaction can be in prepared state
> at some nodes and committed/aborted at  other nodes.

Handling inconsistencies here is a tricky problem, particularly if a
given transaction is marked as both committed and aborted on many
nodes. The only way that I could think of would be to perform PITR to
recover from the inconsistent states. So that's not an easy problem,
becoming even more tricky if more than one transaction is involved and
many transactions are inter-dependent across nodes.

> 3. Same GID can be reused multiple times. In this case
> pg_prepared_xact_status function will return incorrect result, because it
> will return information about first global transaction with such GID after
> checkpoint and not the recent one.

Yeah, this argument alone is why I think that this is a dead-end approach.

> There is actually alternative approach to recovery of 2PC transactions. We
> can include coordinator identifier in GID (we can use GetSystemIdentifier()
> to identify coordinator's node)
> and XID of coordinator's transaction. In this case we can use txid_status()
> to check status of transaction at coordinator. It eliminates need to scan
> WAL to determine status of prepared transaction.

+    GetOldestRestartPoint(&lsn, &timeline);
+
+    xlogreader = XLogReaderAllocate(&read_local_xlog_page, NULL);
+    if (!xlogreader)
So you scan a bunch of records for each GID? This is really costly. I
think that you would have an easier life by tracking the XID of each
transaction involved remotely. In Postgres-XL, this is not a problem
as XIDs are assigned globally and consistently. But you would gain in
performance by keeping track of it on the coordinator node.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prepared_xact_status

From

Konstantin Knizhnik

Date:

29 September 2017, 13:57:28

On 29.09.2017 06:02, Michael Paquier wrote:
> On Fri, Sep 29, 2017 at 1:53 AM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> In Postgres 10 we have txid_status function which returns status of
>> transaction by XID.
>> I wonder if it will be also useful to have similar function for 2PC
>> transactions which can operate with GID?
>> pg_prepared_xacts view allows to get information about prepared transaction
>> which are not yet committed or aborted.
>> But if transaction is committed, then there is no way now to find status of
>> this transaction.
> But you need to keep track of the transaction XID of each transaction
> happening on the remote nodes which are part of a global 2PC
> transaction, no?

Why? We have GID which allows to identify 2PC transaction at all 
participant nodes.

>   If you have this data at hand using txid_status is
> enough to guess if a prepared transaction has been marked as committed
> or prepared. And it seems to me that tracking those XIDs is mandatory
> anyway for other consistency checks.

It is certainly possible to maintain information about XIDs involved in 
2PC transaction.
And it can really simplify recovery. But I wonder why it is mandatory?
Keeping track of XIDs requires some persistent storage.
So you are saying that Postgresql 2PC mechanism is not complete and user 
needs to maintain some extra information to make it work?

Also, I think that it is not necessary to know XIDs of all local 
transactions involved in 2PC. It is enough to know XID of coordinator's 
transaction.
It can be included in GID (as I proposed in the end of my mail). In this 
case txid_status can be used at coordinator to check global status of 
2PC transaction.

The idea of pg_prepared_xact_status function is that it allows to get 
status of 2PC transaction without any additional requirements to GIDs 
and any other additional information about participants of 2PC transaction.

>
>> If crash happen during 2PC commit, then transaction can be in prepared state
>> at some nodes and committed/aborted at  other nodes.
> Handling inconsistencies here is a tricky problem, particularly if a
> given transaction is marked as both committed and aborted on many
> nodes.
How it can be?
Abort of transaction can happen only at prepare stage.
In this case coordinator should rollback transaction everywhere.
There should be no committed transactions in this case.

The following situations are possible:
1. Transaction is prepared at some nodes and information about it is not 
available at other nodes. It means that crash happen at prepare state 
and transaction was not able to
complete prepare at all nodes. It is safe to abort transaction in this case.
2. Transaction is prepared at some nodes and aborted at another nodes. 
The same as 1 - we can safely abort transaction everywhere.
3. Transaction is prepared at all nodes. It means that coordinator was 
crashed before sending commit message. It is safe to commit transaction 
everywhere.
4. Transaction is prepared at some nodes and committed at other nodes. 
Commit message was no delivered or proceeded by other nodes before crash.
It is safe to commit transaction at all nodes.

The problems with 2PC arrive when coordinator node is not available but 
is expected to be recovered in future.
In this case we may have not enough information to make a decision 
whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other protocol 
to prevent such situation.

> The only way that I could think of would be to perform PITR to
> recover from the inconsistent states. So that's not an easy problem,
> becoming even more tricky if more than one transaction is involved and
> many transactions are inter-dependent across nodes.
>
>> 3. Same GID can be reused multiple times. In this case
>> pg_prepared_xact_status function will return incorrect result, because it
>> will return information about first global transaction with such GID after
>> checkpoint and not the recent one.
> Yeah, this argument alone is why I think that this is a dead-end approach.

May be. But I think that in most real systems unique GIDs are generated, 
because otherwise it is difficult to address concurrency and recovery 
issues.

>
>> There is actually alternative approach to recovery of 2PC transactions. We
>> can include coordinator identifier in GID (we can use GetSystemIdentifier()
>> to identify coordinator's node)
>> and XID of coordinator's transaction. In this case we can use txid_status()
>> to check status of transaction at coordinator. It eliminates need to scan
>> WAL to determine status of prepared transaction.
> +    GetOldestRestartPoint(&lsn, &timeline);
> +
> +    xlogreader = XLogReaderAllocate(&read_local_xlog_page, NULL);
> +    if (!xlogreader)
> So you scan a bunch of records for each GID? This is really costly. I
> think that you would have an easier life by tracking the XID of each
> transaction involved remotely. In Postgres-XL, this is not a problem
> as XIDs are assigned globally and consistently. But you would gain in
> performance by keeping track of it on the coordinator node.

Yes, it can be costly.
But I just want to propose more or less universal mechanism which to 
determine status of 2PC transaction based just on existed information in 
WAL and not requiring some extra information stored in GID or in some 
other storage.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prepared_xact_status

From

Craig Ringer

Date:

29 September 2017, 14:22:01

On 29 September 2017 at 15:57, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

The idea of pg_prepared_xact_status function is that it allows to get status of 2PC transaction without any additional requirements to GIDs and any other additional information about participants of 2PC transaction.

This sounds kind-of like 1/4 of a distributed transaction resolver, without a way to make it reliable enough to build the other 3/4.

To make this practical I think you'd need a way to retain historic GIDs + their outcomes, and a way to prune that information only once an application knows all interested participants consider the transaction finalized.

I'd be all for a global xid status function if there were a good way to manage resource retention. But it's fuzzy enough for txid_status, which isn't really making any firm promises, just improving on the prior state of "no f'ing idea what happened to that tx, sorry". 2PC consumers will want absolute guarantees, not "dunno, sorry".

(Ahem, HeuristicMixedException raises its hand. You, sit down! You can only get that if you mix 2PC and !2PC xacts).

I can see it being useful for Pg to be able to report a stream of GIDs + outcomes for applications to consume. But unless you have either:

* some kind of app-controlled retention horizon and a way to multiplex it for >1 app (like slots do); or

* a node registry where Pg its self implements a full built-in transaction resolver;

then I think it's probably not going to get far.

I could see a full DTC resolver in postgres one day, once we have things like working in-core logical rep based multi-master with 2PC support. But that's a looooooong way off.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] pg_prepared_xact_status

From

Craig Ringer

Date:

29 September 2017, 14:27:58

On 29 September 2017 at 15:57, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

So you are saying that Postgresql 2PC mechanism is not complete and user needs to maintain some extra information to make it work?

No, it provides what's needed for an implementation of what in XA terms is a local resource manager (LRM). What it does not provide is infrastructure to make postgres its self into a global transaction manager (GTM) for co-ordinating multiple LRMs.

It sounds like you're trying to build a GTM using PostgreSQL's existing LRM book-keeping as your authorative data store, right?

The problems with 2PC arrive when coordinator node is not available but is expected to be recovered in future.
In this case we may have not enough information to make a decision whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other protocol to prevent such situation.

In that case the node sits and waits patiently for the GTM (or in more complex architectures, *a* valid voting quorum of GTMs) to be reachable again. Likely using a protocol like Raft, Paxos, 3PC etc to co-ordinate.

It can't do anything else, since if it unilaterally commits or rolls back it might later find out that the nodes on the other side of the network partition or whatever made the opposite decision and, boom!

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] pg_prepared_xact_status

From

Konstantin Knizhnik

Date:

29 September 2017, 15:31:48

On 29.09.2017 11:27, Craig Ringer wrote:

On 29 September 2017 at 15:57, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

So you are saying that Postgresql 2PC mechanism is not complete and user needs to maintain some extra information to make it work?

No, it provides what's needed for an implementation of what in XA terms is a local resource manager (LRM). What it does not provide is infrastructure to make postgres its self into a global transaction manager (GTM) for co-ordinating multiple LRMs.

It sounds like you're trying to build a GTM using PostgreSQL's existing LRM book-keeping as your authorative data store, right?

No exactly. I am trying to add 2PC to our pg_shardman: combination of pg_pathman + postgres_fdw + logical replication, which should provide HA and write scalability.
This architecture definitely not assume presence of GTM. Most of transactions are expected to be local (involves only one node) and number of participants of distributed transaction is expected to be much smaller than total number of nodes (usually 2). So we need to perform 2PC without GTM.

The problems with 2PC arrive when coordinator node is not available but is expected to be recovered in future.
In this case we may have not enough information to make a decision whether to abort or commit prepared transaction.
But it is a different story. We need to use 3PC or some other protocol to prevent such situation.

In that case the node sits and waits patiently for the GTM (or in more complex architectures, *a* valid voting quorum of GTMs) to be reachable again. Likely using a protocol like Raft, Paxos, 3PC etc to co-ordinate.

It can't do anything else, since if it unilaterally commits or rolls back it might later find out that the nodes on the other side of the network partition or whatever made the opposite decision and, boom!

Ok, I am not sure if pg_prepared_xact_status can be really useful or not.
I agree with you that if we are implementing distributed transaction on top of Poasgres, then we need some better mechanism to determine transaction state.
But a lot of people are using 2PC without GTM or whatever else. For example, many Java ORMs are using 2PC for their transactions.
I think that it is better to provide to DBA or programmer some way to determine status of such transaction by GID (which is usually unique and known), as far as this information
is available in Postgres WAL.

In any case, I attached slightly improved version of this function which traverse log not only since last checkpoint, but also try iterates backward inspecting previous WAL segments.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pg_prepared_xact_status.patch

Re: [HACKERS] pg_prepared_xact_status

From

Robert Haas

Date:

30 September 2017, 02:33:04

On Fri, Sep 29, 2017 at 4:22 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> This sounds kind-of like 1/4 of a distributed transaction resolver, without
> a way to make it reliable enough to build the other 3/4.
>
> To make this practical I think you'd need a way to retain historic GIDs +
> their outcomes, and a way to prune that information only once an application
> knows all interested participants consider the transaction finalized.
>
> I'd be all for a global xid status function if there were a good way to
> manage resource retention. But it's fuzzy enough for txid_status, which
> isn't really making any firm promises, just improving on the prior state of
> "no f'ing idea what happened to that tx, sorry". 2PC consumers will want
> absolute guarantees, not "dunno, sorry".

Very well said, and I agree.

I think the approach this patch takes is a non-starter for exactly the
reasons you have stated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prepared_xact_status

From

konstantin knizhnik

Date:

30 September 2017, 12:10:53

On Sep 29, 2017, at 11:33 PM, Robert Haas wrote:

> On Fri, Sep 29, 2017 at 4:22 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> This sounds kind-of like 1/4 of a distributed transaction resolver, without
>> a way to make it reliable enough to build the other 3/4.
>>
>> To make this practical I think you'd need a way to retain historic GIDs +
>> their outcomes, and a way to prune that information only once an application
>> knows all interested participants consider the transaction finalized.
>>
>> I'd be all for a global xid status function if there were a good way to
>> manage resource retention. But it's fuzzy enough for txid_status, which
>> isn't really making any firm promises, just improving on the prior state of
>> "no f'ing idea what happened to that tx, sorry". 2PC consumers will want
>> absolute guarantees, not "dunno, sorry".
>
> Very well said, and I agree.

txid_status() also not always be able to return status of transaction (if wraparound happen).
But it is still considered as one of the key features of 10 (transaction traceability...).

>
> I think the approach this patch takes is a non-starter for exactly the
> reasons you have stated.

Actually I do not propose pg_prepared_xact_status as mechanism for constructing GTM or some other complete 2PC
infrastructure.
It is just simple function, using existed PostgreSQL log iteration utilities, simplifying extraction of information
about2PC transactions. 
The same think can be done manually using pg_waldump. But it is very inconvenient.
So I do not see any troubles caused by adding this functions. And it can really be helpful for DBA in some cases.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prepared_xact_status

From

Robert Haas

Date:

02 October 2017, 06:09:39

On Sat, Sep 30, 2017 at 2:10 AM, konstantin knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> txid_status() also not always be able to return status of transaction (if wraparound happen).
> But it is still considered as one of the key features of 10 (transaction traceability...).

Not by me.  It's a feature, though, for sure.  It's also a LOT more
stable than what you're proposing.  Even on a busy system, it takes a
while to go through 200 million transactions; you probably can't
realistically do that in under an hour, and you'll probably raise the
200 million transaction limit if you're anywhere close to that rate.
In practice, you'll almost always be able to look up transactions for
several days, and often weeks or months.  With what you're proposing
here, the information could disappear nearly instantly.  I can't see
how that works out to a usable feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prepared_xact_status

From

Michael Paquier

Date:

02 October 2017, 06:16:22

On Mon, Oct 2, 2017 at 9:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Sep 30, 2017 at 2:10 AM, konstantin knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> txid_status() also not always be able to return status of transaction (if wraparound happen).
>> But it is still considered as one of the key features of 10 (transaction traceability...).
>
> Not by me.  It's a feature, though, for sure.  It's also a LOT more
> stable than what you're proposing.  Even on a busy system, it takes a
> while to go through 200 million transactions; you probably can't
> realistically do that in under an hour, and you'll probably raise the
> 200 million transaction limit if you're anywhere close to that rate.
> In practice, you'll almost always be able to look up transactions for
> several days, and often weeks or months.  With what you're proposing
> here, the information could disappear nearly instantly.  I can't see
> how that works out to a usable feature.

Also txid_status is way more performant as it requires only a clog
lookup. Going through potentially hundreds of WAL segments to get one
result status could take way longer than that. Even if network can
become easily the bottleneck when doing cross-node transaction
resolution, this would put too much load into the backends.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prepared_xact_status

From

Craig Ringer

Date:

02 October 2017, 07:32:40

On 2 October 2017 at 08:09, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Sep 30, 2017 at 2:10 AM, konstantin knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> txid_status() also not always be able to return status of transaction (if wraparound happen).
> But it is still considered as one of the key features of 10 (transaction traceability...).

Not by me. It's a feature, though, for sure.

Same here. It's nice, and obviously I wanted it since I submitted it, but it's not a key feature by any stretch.

Even if Pg also reported the xid to the client when assigned it'd still be a nice utility, not a huge headline feature.

It's also a LOT more
stable than what you're proposing. Even on a busy system, it takes a
while to go through 200 million transactions; you probably can't
realistically do that in under an hour, and you'll probably raise the
200 million transaction limit if you're anywhere close to that rate.
In practice, you'll almost always be able to look up transactions for
several days, and often weeks or months.

Not necessarily, since it doesn't hold down or delay clog truncation, so if the system is really eagerly VACUUMed it might discard things sooner.

At the moment though we're quite lazy about clog truncation, mainly because we're very lazy about vacuuming template1 and template0 (which we should autofreeze, really) so they tend to hold down global datfrozenxid. If we get better about that, then we might need some way to ask Pg to keep extra clog. But for now it works well enough.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] pg_prepared_xact_status

From

Craig Ringer

Date:

02 October 2017, 07:38:32

On 30 September 2017 at 14:10, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:

So I do not see any troubles caused by adding this functions. And it can really be helpful for DBA in some cases.

If it's self-contained and exposes existing functionality, then I'm not opposed, I just don't really see the point. I'm not seeing where it'd come in useful.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services