Thread: BUG #18212: Functions txid_status() and pg_xact_status() return invalid status of the specified transaction

The following bug has been logged on the website:

Bug reference:      18212
Logged by:          Egor Chindyaskin
Email address:      kyzevan23@mail.ru
PostgreSQL version: 16.1
Operating system:   Ubuntu 23.10
Description:

Hello! Recently, Me and Andrew Bille came across a situation where the
txid_status() and pg_xact_status() functions return invalid status of the
specified transaction if xid is very far ahead.

The following script reproduces the situation:

# This script uses pg_resetwal and dd. This is a hack to move xid far
ahead

PGDATA=/tmp/pgsql/data

killall postgres
rm -rf $PGDATA
initdb -U postgres -k -D $PGDATA

pg_ctl -D $PGDATA -l log start
psql -U postgres -c "UPDATE pg_database SET datallowconn = 't' WHERE datname
= 'template0';"
pg_ctl -D $PGDATA -l log stop
 
pg_resetwal -x 536870911 -l 000000010000000000000004 $PGDATA 
dd if=/dev/zero of=$PGDATA/pg_xact/01FF bs=8192 count=2048 

pg_ctl -D $PGDATA -l log start
vacuumdb --all -U postgres
pg_ctl -D $PGDATA -l log stop

pg_resetwal -x 1073741822 -l 000000010000000000000004 $PGDATA 
dd if=/dev/zero of=$PGDATA/pg_xact/03FF bs=8192 count=2048 

pg_ctl -D $PGDATA -l log start
vacuumdb --all -U postgres
pg_ctl -D $PGDATA -l log stop

pg_resetwal -x 1610612733 -l 000000010000000000000004 $PGDATA 
dd if=/dev/zero of=$PGDATA/pg_xact/05FF bs=8192 count=2048 

pg_ctl -D $PGDATA -l log start
vacuumdb --all -U postgres
pg_ctl -D $PGDATA -l log stop

pg_resetwal -x 2147484382 -l 000000010000000000000004 $PGDATA 
dd if=/dev/zero of=$PGDATA/pg_xact/0800 bs=8192 count=2048 

pg_ctl -D $PGDATA -l log start
vacuumdb --all -U postgres

psql -U postgres -c "SELECT txid_current()"
psql -U postgres -c "SELECT txid_status(3)" -c "SELECT
pg_xact_status('3'::xid8)"

As a result, we get that, instead of the expected NULL, the functions return
the "in progress" status of transaction 3, which is far in the past:

...
 txid_current 
--------------
   2147484382
(1 row)

 txid_status 
-------------
 in progress
(1 row)

 pg_xact_status 
----------------
 in progress
(1 row)

---
Best regards,
Egor Chindyaskin
Postgres Professional: http://postgrespro.com


Hi,

The problem here is in TransactionIdInRecentPast() function. If
ShmemVariableCache->oldestClogXid had been stored as a FullTransactionId,
this function would be very simple for normal transaction ids: ids between
oldestClogXid and next full transaction id are in the recent past (return
true), ids before oldestClogXid are too far in the past (return false),
ids after next full xid are in the future (raise an error). Expected
results are demonstrated in the following picture, where vertical bar is
used to separate different epochs.

  |pppppppppppppppp|pppppppppppprrff|ffffffffffffffff|
                                ^ ^
                                | |
                                | next fxid
                                |
                          oldestClogXid


The problem is that oldestClogXid is a TransactionId, so it's epoch is not
stored, and the following condition is wrong:

    if (xid_epoch + 1 < now_epoch
        || (xid_epoch + 1 == now_epoch && xid < now_epoch_next_xid)
        || TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid))
        return false;


First, given full transaction id (xid_epoch, xid) is compared to the next
full transaction id (now_epoch, now_epoch_next_xid), and if it's more than
one epoch older, it's far in the past. For newer ids, it's transaction id
part without an epoch is compared to oldestClogXid, but it's a modulo-2^32
comparison, so only 2^31 xids before oldestClogXid are considered preceding
it. Thus, all full transaction ids between (next full transaction id - 2^32)
and (oldestClogXid - 2^31) are mistakenly considered to be in the recent
past.

  |pppppppppppppprr|rrrrpppppppprrff|ffffffffffffffff|
                                ^ ^
                                | |
                                | next fxid
                                |
                          oldestClogXid


In the attached patch I suggest calculating an epoch for oldestClogXid
assuming it's not much older than next full transaction id, make a full
transaction id for oldestClogXid, and then just compare it to the given
full transaction id.


Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Attachment
Good catch!

At Fri, 24 Nov 2023 14:38:23 +0300, Karina Litskevich <litskevichkarina@gmail.com> wrote in 
> one epoch older, it's far in the past. For newer ids, it's transaction id
> part without an epoch is compared to oldestClogXid, but it's a modulo-2^32
> comparison, so only 2^31 xids before oldestClogXid are considered preceding
> it. Thus, all full transaction ids between (next full transaction id - 2^32)
> and (oldestClogXid - 2^31) are mistakenly considered to be in the recent
> past.

I'm not entirely sure about the specific range where the error occurs,
but I do believe that the second and third lines of the problematic
comparison seem to be invalid. It seems like there is confusion
between the epoch boundary of full XIDs and the wraparound boundary of
32-bit XIDs.

> In the attached patch I suggest calculating an epoch for oldestClogXid
> assuming it's not much older than next full transaction id, make a full
> transaction id for oldestClogXid, and then just compare it to the given
> full transaction id.

I considered bringing this down to a comparison of 32-bit XIDs, but
couldn't come up with a clean method. Therefore, using full XID seems
to be the right approach. However, it seems like there is an error in
the XID comparison condition. There are cases where oldest_xid and
now_epoch_next_xid can have the same value. If we skip running
txid_current() in the repro in the your previous mail, and directly
execute txid_status(3), it would lead to assertion failure.

Also, I feel the comments could be more straight forward and simple
like this:

> Convert oldest_xid into a full XID to compare with the given
> XID. Alghouth it's guaranteed that the the oldest and newest XIDs
> are within the XID wraparound distance, they may have different
> epochs.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




On Tue, Nov 28, 2023 at 8:20 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
I considered bringing this down to a comparison of 32-bit XIDs, but
couldn't come up with a clean method. Therefore, using full XID seems
to be the right approach. However, it seems like there is an error in
the XID comparison condition. There are cases where oldest_xid and
now_epoch_next_xid can have the same value. If we skip running
txid_current() in the repro in the your previous mail, and directly
execute txid_status(3), it would lead to assertion failure.
 
Thank you for your feedback! You're right, I haven't thought about
this corner case. Fixed in v2.

Also, I feel the comments could be more straight forward and simple
like this:

> Convert oldest_xid into a full XID to compare with the given
> XID. Alghouth it's guaranteed that the the oldest and newest XIDs
> are within the XID wraparound distance, they may have different
> epochs.

I tried to improve the comment too. Anyway, don't hesitate to change
it to whatever you like better.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/ 
Attachment
Hi!

On Thu, Nov 30, 2023 at 7:30 AM Karina Litskevich
<litskevichkarina@gmail.com> wrote:
> On Tue, Nov 28, 2023 at 8:20 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>>
>> I considered bringing this down to a comparison of 32-bit XIDs, but
>> couldn't come up with a clean method. Therefore, using full XID seems
>> to be the right approach. However, it seems like there is an error in
>> the XID comparison condition. There are cases where oldest_xid and
>> now_epoch_next_xid can have the same value. If we skip running
>> txid_current() in the repro in the your previous mail, and directly
>> execute txid_status(3), it would lead to assertion failure.
>
>
> Thank you for your feedback! You're right, I haven't thought about
> this corner case. Fixed in v2.
>
>> Also, I feel the comments could be more straight forward and simple
>> like this:
>>
>> > Convert oldest_xid into a full XID to compare with the given
>> > XID. Alghouth it's guaranteed that the the oldest and newest XIDs
>> > are within the XID wraparound distance, they may have different
>> > epochs.
>
>
> I tried to improve the comment too. Anyway, don't hesitate to change
> it to whatever you like better.

Thank you for your work on the subject.
I've slightly revised the patch (attached).  I'm going to push it if
there are no objections.

------
Regards,
Alexander Korotkov

Attachment

On Mon, Feb 5, 2024 at 4:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I've slightly revised the patch (attached).  I'm going to push it if
there are no objections.

Thank you very much!

It looks like you based your v3 patch on the v1. Have you noticed Kyotaro's
comments and my corrections in the v2 patch?


Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
Hi, Karina!

On Mon, Feb 5, 2024 at 4:35 PM Karina Litskevich
<litskevichkarina@gmail.com> wrote:
> On Mon, Feb 5, 2024 at 4:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> I've slightly revised the patch (attached).  I'm going to push it if
>> there are no objections.
>
>
> Thank you very much!
>
> It looks like you based your v3 patch on the v1. Have you noticed Kyotaro's
> comments and my corrections in the v2 patch?

Thank you for pointing this out.  Please, take a look at my v4 of the patch.

------
Regards,
Alexander Korotkov

Attachment


On Mon, Feb 5, 2024 at 5:50 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Thank you for pointing this out.  Please, take a look at my v4 of the patch.

Looks good to me.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
 
AFAICS, patch is committed around a month ago.

commit 165d921c9a883814a35e8161fc692793e9c945a4
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Thu Feb 8 12:45:26 2024 +0200

So, I've mark this thread committed.

--
Best regards,
Maxim Orlov.
On Thu, Mar 7, 2024 at 3:17 PM Maxim Orlov <orlovmg@gmail.com> wrote:
> AFAICS, patch is committed around a month ago.
>
> commit 165d921c9a883814a35e8161fc692793e9c945a4
> Author: Alexander Korotkov <akorotkov@postgresql.org>
> Date:   Thu Feb 8 12:45:26 2024 +0200
>
> So, I've mark this thread committed.

This was forgotten by me. Thanks!

------
Regards,
Alexander Korotkov