Thread: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18212 Logged by: Egor Chindyaskin Email address: kyzevan23@mail.ru PostgreSQL version: 16.1 Operating system: Ubuntu 23.10 Description: Hello! Recently, Me and Andrew Bille came across a situation where the txid_status() and pg_xact_status() functions return invalid status of the specified transaction if xid is very far ahead. The following script reproduces the situation: # This script uses pg_resetwal and dd. This is a hack to move xid far ahead PGDATA=/tmp/pgsql/data killall postgres rm -rf $PGDATA initdb -U postgres -k -D $PGDATA pg_ctl -D $PGDATA -l log start psql -U postgres -c "UPDATE pg_database SET datallowconn = 't' WHERE datname = 'template0';" pg_ctl -D $PGDATA -l log stop pg_resetwal -x 536870911 -l 000000010000000000000004 $PGDATA dd if=/dev/zero of=$PGDATA/pg_xact/01FF bs=8192 count=2048 pg_ctl -D $PGDATA -l log start vacuumdb --all -U postgres pg_ctl -D $PGDATA -l log stop pg_resetwal -x 1073741822 -l 000000010000000000000004 $PGDATA dd if=/dev/zero of=$PGDATA/pg_xact/03FF bs=8192 count=2048 pg_ctl -D $PGDATA -l log start vacuumdb --all -U postgres pg_ctl -D $PGDATA -l log stop pg_resetwal -x 1610612733 -l 000000010000000000000004 $PGDATA dd if=/dev/zero of=$PGDATA/pg_xact/05FF bs=8192 count=2048 pg_ctl -D $PGDATA -l log start vacuumdb --all -U postgres pg_ctl -D $PGDATA -l log stop pg_resetwal -x 2147484382 -l 000000010000000000000004 $PGDATA dd if=/dev/zero of=$PGDATA/pg_xact/0800 bs=8192 count=2048 pg_ctl -D $PGDATA -l log start vacuumdb --all -U postgres psql -U postgres -c "SELECT txid_current()" psql -U postgres -c "SELECT txid_status(3)" -c "SELECT pg_xact_status('3'::xid8)" As a result, we get that, instead of the expected NULL, the functions return the "in progress" status of transaction 3, which is far in the past: ... txid_current -------------- 2147484382 (1 row) txid_status ------------- in progress (1 row) pg_xact_status ---------------- in progress (1 row) --- Best regards, Egor Chindyaskin Postgres Professional: http://postgrespro.com
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Karina Litskevich
Date:
Hi,
The problem here is in TransactionIdInRecentPast() function. If
ShmemVariableCache->oldestClogXid had been stored as a FullTransactionId,
this function would be very simple for normal transaction ids: ids between
oldestClogXid and next full transaction id are in the recent past (return
true), ids before oldestClogXid are too far in the past (return false),
ids after next full xid are in the future (raise an error). Expected
results are demonstrated in the following picture, where vertical bar is
used to separate different epochs.
|pppppppppppppppp|pppppppppppprrff|ffffffffffffffff|
^ ^
| |
| next fxid
|
oldestClogXid
The problem is that oldestClogXid is a TransactionId, so it's epoch is not
stored, and the following condition is wrong:
if (xid_epoch + 1 < now_epoch
|| (xid_epoch + 1 == now_epoch && xid < now_epoch_next_xid)
|| TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid))
return false;
First, given full transaction id (xid_epoch, xid) is compared to the next
full transaction id (now_epoch, now_epoch_next_xid), and if it's more than
one epoch older, it's far in the past. For newer ids, it's transaction id
part without an epoch is compared to oldestClogXid, but it's a modulo-2^32
comparison, so only 2^31 xids before oldestClogXid are considered preceding
it. Thus, all full transaction ids between (next full transaction id - 2^32)
and (oldestClogXid - 2^31) are mistakenly considered to be in the recent
past.
|pppppppppppppprr|rrrrpppppppprrff|ffffffffffffffff|
^ ^
| |
| next fxid
|
oldestClogXid
In the attached patch I suggest calculating an epoch for oldestClogXid
assuming it's not much older than next full transaction id, make a full
transaction id for oldestClogXid, and then just compare it to the given
full transaction id.
Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
The problem here is in TransactionIdInRecentPast() function. If
ShmemVariableCache->oldestClogXid had been stored as a FullTransactionId,
this function would be very simple for normal transaction ids: ids between
oldestClogXid and next full transaction id are in the recent past (return
true), ids before oldestClogXid are too far in the past (return false),
ids after next full xid are in the future (raise an error). Expected
results are demonstrated in the following picture, where vertical bar is
used to separate different epochs.
|pppppppppppppppp|pppppppppppprrff|ffffffffffffffff|
^ ^
| |
| next fxid
|
oldestClogXid
The problem is that oldestClogXid is a TransactionId, so it's epoch is not
stored, and the following condition is wrong:
if (xid_epoch + 1 < now_epoch
|| (xid_epoch + 1 == now_epoch && xid < now_epoch_next_xid)
|| TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid))
return false;
First, given full transaction id (xid_epoch, xid) is compared to the next
full transaction id (now_epoch, now_epoch_next_xid), and if it's more than
one epoch older, it's far in the past. For newer ids, it's transaction id
part without an epoch is compared to oldestClogXid, but it's a modulo-2^32
comparison, so only 2^31 xids before oldestClogXid are considered preceding
it. Thus, all full transaction ids between (next full transaction id - 2^32)
and (oldestClogXid - 2^31) are mistakenly considered to be in the recent
past.
|pppppppppppppprr|rrrrpppppppprrff|ffffffffffffffff|
^ ^
| |
| next fxid
|
oldestClogXid
In the attached patch I suggest calculating an epoch for oldestClogXid
assuming it's not much older than next full transaction id, make a full
transaction id for oldestClogXid, and then just compare it to the given
full transaction id.
Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
Attachment
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Kyotaro Horiguchi
Date:
Good catch! At Fri, 24 Nov 2023 14:38:23 +0300, Karina Litskevich <litskevichkarina@gmail.com> wrote in > one epoch older, it's far in the past. For newer ids, it's transaction id > part without an epoch is compared to oldestClogXid, but it's a modulo-2^32 > comparison, so only 2^31 xids before oldestClogXid are considered preceding > it. Thus, all full transaction ids between (next full transaction id - 2^32) > and (oldestClogXid - 2^31) are mistakenly considered to be in the recent > past. I'm not entirely sure about the specific range where the error occurs, but I do believe that the second and third lines of the problematic comparison seem to be invalid. It seems like there is confusion between the epoch boundary of full XIDs and the wraparound boundary of 32-bit XIDs. > In the attached patch I suggest calculating an epoch for oldestClogXid > assuming it's not much older than next full transaction id, make a full > transaction id for oldestClogXid, and then just compare it to the given > full transaction id. I considered bringing this down to a comparison of 32-bit XIDs, but couldn't come up with a clean method. Therefore, using full XID seems to be the right approach. However, it seems like there is an error in the XID comparison condition. There are cases where oldest_xid and now_epoch_next_xid can have the same value. If we skip running txid_current() in the repro in the your previous mail, and directly execute txid_status(3), it would lead to assertion failure. Also, I feel the comments could be more straight forward and simple like this: > Convert oldest_xid into a full XID to compare with the given > XID. Alghouth it's guaranteed that the the oldest and newest XIDs > are within the XID wraparound distance, they may have different > epochs. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Karina Litskevich
Date:
On Tue, Nov 28, 2023 at 8:20 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
I considered bringing this down to a comparison of 32-bit XIDs, but
couldn't come up with a clean method. Therefore, using full XID seems
to be the right approach. However, it seems like there is an error in
the XID comparison condition. There are cases where oldest_xid and
now_epoch_next_xid can have the same value. If we skip running
txid_current() in the repro in the your previous mail, and directly
execute txid_status(3), it would lead to assertion failure.
Thank you for your feedback! You're right, I haven't thought about
this corner case. Fixed in v2.
this corner case. Fixed in v2.
Also, I feel the comments could be more straight forward and simple
like this:
> Convert oldest_xid into a full XID to compare with the given
> XID. Alghouth it's guaranteed that the the oldest and newest XIDs
> are within the XID wraparound distance, they may have different
> epochs.
it to whatever you like better.
Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
Attachment
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Alexander Korotkov
Date:
Hi! On Thu, Nov 30, 2023 at 7:30 AM Karina Litskevich <litskevichkarina@gmail.com> wrote: > On Tue, Nov 28, 2023 at 8:20 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: >> >> I considered bringing this down to a comparison of 32-bit XIDs, but >> couldn't come up with a clean method. Therefore, using full XID seems >> to be the right approach. However, it seems like there is an error in >> the XID comparison condition. There are cases where oldest_xid and >> now_epoch_next_xid can have the same value. If we skip running >> txid_current() in the repro in the your previous mail, and directly >> execute txid_status(3), it would lead to assertion failure. > > > Thank you for your feedback! You're right, I haven't thought about > this corner case. Fixed in v2. > >> Also, I feel the comments could be more straight forward and simple >> like this: >> >> > Convert oldest_xid into a full XID to compare with the given >> > XID. Alghouth it's guaranteed that the the oldest and newest XIDs >> > are within the XID wraparound distance, they may have different >> > epochs. > > > I tried to improve the comment too. Anyway, don't hesitate to change > it to whatever you like better. Thank you for your work on the subject. I've slightly revised the patch (attached). I'm going to push it if there are no objections. ------ Regards, Alexander Korotkov
Attachment
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Karina Litskevich
Date:
On Mon, Feb 5, 2024 at 4:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I've slightly revised the patch (attached). I'm going to push it if
there are no objections.
Thank you very much!
It looks like you based your v3 patch on the v1. Have you noticed Kyotaro's
comments and my corrections in the v2 patch?
Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Alexander Korotkov
Date:
Hi, Karina! On Mon, Feb 5, 2024 at 4:35 PM Karina Litskevich <litskevichkarina@gmail.com> wrote: > On Mon, Feb 5, 2024 at 4:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> I've slightly revised the patch (attached). I'm going to push it if >> there are no objections. > > > Thank you very much! > > It looks like you based your v3 patch on the v1. Have you noticed Kyotaro's > comments and my corrections in the v2 patch? Thank you for pointing this out. Please, take a look at my v4 of the patch. ------ Regards, Alexander Korotkov
Attachment
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Karina Litskevich
Date:
On Mon, Feb 5, 2024 at 5:50 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Thank you for pointing this out. Please, take a look at my v4 of the patch.
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Maxim Orlov
Date:
AFAICS, patch is committed around a month ago.
commit 165d921c9a883814a35e8161fc692793e9c945a4
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu Feb 8 12:45:26 2024 +0200
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu Feb 8 12:45:26 2024 +0200
So, I've mark this thread committed.
--
Best regards,
Maxim Orlov.
Re: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction
From
Alexander Korotkov
Date:
On Thu, Mar 7, 2024 at 3:17 PM Maxim Orlov <orlovmg@gmail.com> wrote: > AFAICS, patch is committed around a month ago. > > commit 165d921c9a883814a35e8161fc692793e9c945a4 > Author: Alexander Korotkov <akorotkov@postgresql.org> > Date: Thu Feb 8 12:45:26 2024 +0200 > > So, I've mark this thread committed. This was forgotten by me. Thanks! ------ Regards, Alexander Korotkov