Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Robert Haas
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CA+TgmoZuTuqTDDz+GeYkpyPFb3bx0Zjw+X86zzLasYG3J4ij+A@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Himanshu Upadhyaya <upadhyaya.himanshu@gmail.com>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers
On Wed, Nov 16, 2022 at 4:51 AM Himanshu Upadhyaya
<upadhyaya.himanshu@gmail.com> wrote:
> yes, got it, have tried to test and it is giving false corruption in case of subtransaction.
> I think a better way to have this check is, we need to check that if pred_xmin is
> aborted then current_xmin should be aborted only. So there is no way that we
> validate corruption with in_progress txid.

Please note that you can't use TransactionIdDidAbort here, because
that will return false for transactions aborted by a crash. You have
to check that it's not in progress and then afterwards check that it's
not committed. Also note that if you check whether it's committed
first and then check whether it's in progress afterwards, there's a
race condition: it might commit just after you verify that it isn't
committed yet, and then it won't be in progress any more and will look
aborted.

I disagree with the idea that we can't check in progress. I think the
checks could look something like this:

pred_in_progress = TransactionIdIsInProgress(pred_xmin);
current_in_progress = TransactionIdIsInProgress(current_xmin);
if (pred_in_progress)
{
     if (current_in_progress)
        return ok;
     // recheck to avoid race condition
     if (TransactionIdIsInProgress(pred_xmin))
     {
         if (TransactionIdDidCommit(current_xmin))
             return corruption: predecessor xmin in progress, but
current xmin committed;
         else
             return corruption: predecessor xmin in progress, but
current xmin aborted;
     }
     // fallthrough: when we entered this if-block pred_xmin was still
in progress but no longer;
     pred_in_progress = false;
}

if (TransactionIdDidCommit(pred_xmin))
    return ok;

if (current_in_progress)
     return corruption: predecessor xmin aborted, but current xmin in progress;
else if (TransactionIdDidCommit(current_xmin))
     return corruption: predecessor xmin aborted, but current xmin committed;

The error messages as phrased here aren't actually what we should use;
they would need rephrasing. But I think, or hope anyway, that the
logic works. I think you basically just need the 1 recheck: if you see
the predecessor xmin in progress and then the current xmin in
progress, you have to go back and check that the predecessor xmin is
still in progress, because otherwise both could have committed or
aborted together in between.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PoC] configurable out of disk space elog level
Next
From: Ranier Vilela
Date:
Subject: Re: Small miscellaneous fixes