Thread: Null commitTS bug

Null commitTS bug

From
"Kingsborough, Alex"
Date:
Hi Hackers,
I've been working with commitTS code recently and during my testing I 
found a bug when writing commit timestamps for subxids. Normally for 
sub-transaction commit timestamps in TransactionTreeSetCommitTsData(),
we iterate through the subxids until we find one that is on the next commits
page. In the code [1] this is the jth subxid. In SetXidCommitTsInPage()
where we set all the commit timestamps up to but not including the jth
timestamp. The jth timestamp then becomes the head timestamp for next
group of timestamps on the next page. However, if the jth timestamp is 
the last subxid (put another way, if the LAST subxid is the FIRST
timestamp on a new page), then the code will break on line 188 [2] and
the timestamp for the last subxid will never be written.

This can be reproduced by enabling track_commit_timestamp and running 
a simple loop that has a single sub-transaction like:

  psql -t -c 'create table t (id int);'

  for i in {1..500}
  do
    psql -t -c 'begin; insert into t select 1; savepoint a; insert into t select 2;commit'
  done

Then querying for NULL commitTS in that table will return that there are 
unwritten timestamps:

  postgres=# select count(*) from t where pg_xact_commit_timestamp(t.xmin) is NULL;
  count

    1
  (1 row)

The fix for this is very simple


  /* if we wrote out all subxids, we're done. /
  - if (j + 1 >= nsubxids)
  + if (j >= nsubxids)
  break;
 
[1] https://github.com/postgres/postgres/blame/master/src/backend/access/transam/commit_ts.c#L178
[2] https://github.com/postgres/postgres/blame/master/src/backend/access/transam/commit_ts.c#L188



Attachment

Re: Null commitTS bug

From
Kyotaro Horiguchi
Date:
At Fri, 14 Jan 2022 22:49:59 +0000, "Kingsborough, Alex" <kingsboa@amazon.com> wrote in
> The fix for this is very simple
>
>
>   /* if we wrote out all subxids, we're done. /
>   - if (j + 1 >= nsubxids)
>   + if (j >= nsubxids)
>   break;

It looks like a thinko and the fix is correct. (It's a matter of taste
choosing between it and "j == nsubxids").

I found some confusing lines around but they need not a fix
considering back-patching conflict?

> for (i = 0, headxid = xid;;)
..
> i += j - i + 1;

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Null commitTS bug

From
Michael Paquier
Date:
On Mon, Jan 17, 2022 at 11:17:24AM +0900, Kyotaro Horiguchi wrote:
> At Fri, 14 Jan 2022 22:49:59 +0000, "Kingsborough, Alex" <kingsboa@amazon.com> wrote in
>> The fix for this is very simple
>>
>>
>>   /* if we wrote out all subxids, we're done. /
>>   - if (j + 1 >= nsubxids)
>>   + if (j >= nsubxids)
>>   break;
>
> It looks like a thinko and the fix is correct. (It's a matter of taste
> choosing between it and "j == nsubxids").

It took me some time to understand the problem from the current code,
but I'd like to think that the suggested fix is less confusing.

> I found some confusing lines around but they need not a fix
> considering back-patching conflict?
>
>> for (i = 0, headxid = xid;;)
> ..
>> i += j - i + 1;

I am not sure.  Do you have anything specific in mind?  Perhaps
something that would help in making the code logic easier to follow?
--
Michael

Attachment

Re: Null commitTS bug

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Jan 17, 2022 at 11:17:24AM +0900, Kyotaro Horiguchi wrote:
>> I found some confusing lines around but they need not a fix
>> considering back-patching conflict?
>>> i += j - i + 1;

> I am not sure.  Do you have anything specific in mind?  Perhaps
> something that would help in making the code logic easier to follow?

Isn't that a very bad way to write "i = j + 1"?

I agree with Horiguchi-san that

    for (i = 0, headxid = xid;;)

is not great style either.  A for-loop ought to be used to control the
number of iterations, not as a confusing variable initialization.
I think more idiomatic would be

    headxid = xid;
    i = 0;
    for (;;)

which makes it clear that this is not where the loop control is.

            regards, tom lane



Re: Null commitTS bug

From
Kyotaro Horiguchi
Date:
At Tue, 18 Jan 2022 10:43:55 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Sun, Jan 16, 2022 at 11:01:25PM -0500, Tom Lane wrote:
> > Isn't that a very bad way to write "i = j + 1"?
> > 
> > I agree with Horiguchi-san that
> >     for (i = 0, headxid = xid;;)
> 
> Okay.  Horiguchi-san, would you like to write a patch?

Yes, I will.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Null commitTS bug

From
Kyotaro Horiguchi
Date:
At Tue, 18 Jan 2022 13:48:11 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Tue, 18 Jan 2022 10:43:55 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> > On Sun, Jan 16, 2022 at 11:01:25PM -0500, Tom Lane wrote:
> > > Isn't that a very bad way to write "i = j + 1"?
> > > 
> > > I agree with Horiguchi-san that
> > >     for (i = 0, headxid = xid;;)
> > 
> > Okay.  Horiguchi-san, would you like to write a patch?
> 
> Yes, I will.

This is that. I think this is a separate issue from the actual
bug. This is applicable at least back to 9.6 and I think this should
be applied back to all supported versions to avoid future backptach
conflicts.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 9211fa61513dcdbca273656454395a3dcf3ee4e7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 20 Jan 2022 10:16:48 +0900
Subject: [PATCH] Improve confusing code in TransactionTreeSetCommitTsData

TransactionTreeSetCommitTsData has a bit confusing use of the +=
operator.  Simplifying it makes the code easier to follow.  In the
same function for-loop initializes only non-controlling variables,
which is not great style.  They ought to be initialized outside the
loop.
---
 src/backend/access/transam/commit_ts.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 659109f8d4..88eac10456 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -168,7 +168,10 @@ TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
      * subxid not on the previous page as head.  This way, we only have to
      * lock/modify each SLRU page once.
      */
-    for (i = 0, headxid = xid;;)
+    headxid = xid;
+    i = 0;
+
+    for (;;)
     {
         int            pageno = TransactionIdToCTsPage(headxid);
         int            j;
@@ -192,7 +195,7 @@ TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
          * just wrote.
          */
         headxid = subxids[j];
-        i += j - i + 1;
+        i = j + 1;
     }
 
     /* update the cached value in shared memory */
-- 
2.27.0