Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: HOT chain validation in verify_heapam() |
Date | |
Msg-id | CA+Tgmoas7DOU9umrrvhZO1VMUc74eQsyKkeVEKeZZD4ODBLH8w@mail.gmail.com Whole thread Raw |
In response to | Re: HOT chain validation in verify_heapam() (Aleksander Alekseev <aleksander@timescale.com>) |
Responses |
Re: HOT chain validation in verify_heapam()
|
List | pgsql-hackers |
On Tue, Sep 6, 2022 at 9:38 AM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Please correct me if I'm wrong, but don't we have a race condition here: > > > > ``` > > + if ((TransactionIdDidAbort(pred_xmin) || > > TransactionIdIsInProgress(pred_xmin)) > > + && !TransactionIdEquals(pred_xmin, curr_xmin)) > > { > > ``` > > It looks like I had a slight brain fade here. > > In order to report a false error either TransactionIdDidAbort() or > TransactionIdIsInProgress() should return true and > TransactionIdEquals() should be false. So actually under rare > conditions the error will NOT be reported while it should. Other than > that we seem to be safe from the concurrency perspective, unless I'm > missing something again. > > Personally I don't have a strong opinion on whether we should bother > about this scenario. Probably an explicit comment will not hurt. > > Also I suggest checking TransactionIdEquals() first though since it's cheaper. I think the check should be written like this: !TransactionIdEquals(pred_xmin, curr_xmin) && !TransctionIdDidCommit(pred_xmin) The TransactionIdEquals check should be done first for the reason you state: it's cheaper. I think that we shouldn't be using TransactionIdDidAbort() at all, because it can sometimes return false even when the transaction actually did abort. See test_lockmode_for_conflict() and TransactionIdIsInProgress() for examples of logic that copes with this. What's happening here is that TransactionIdDidAbort doesn't ever get called if the system crashes while a transaction is running. So we can use TransactionIdDidAbort() only as an optimization: if it returns true, the transaction is definitely aborted, but if it returns false, we have to check whether it's still running. If not, it aborted anyway. TransactionIdDidCommit() does not have the same issue. A transaction can abort without updating CLOG if the system crashes, but it can never commit without updating CLOG. If the transaction didn't commit, then it is either aborted or still in progress (and we don't care which, because neither is an error here). As to whether the existing formulation of the test has an error condition, you're generally right that we should test TransactionIdIsInProgress() before TransactionIdDidCommit/Abort, because we during commit or abort, we first set the status in CLOG - which is queried by TransactionIdDidCommit/Abort - and only afterward update the procarray - which is queried by TransactionIdIsInProgress. So normally TransactionIdIsInProgress should be checked first, and TransactionIdDidCommit/Abort should only be checked if it returns false, at which point we know that the return values of the latter calls can't ever change. Possibly there is an argument for including the TransactionIdInProgress check here too: !TransactionIdEquals(pred_xmin, curr_xmin) && (TransactionIdIsInProgress(pred_xmin) || !TransctionIdDidCommit(pred_xmin)) ...but I don't think it could change the answer. Most places that check TransactionIdIsInProgress() first are concerned with MVCC semantics, and here we are not. I think the only effects of including or excluding the TransactionIdIsInProgress() test are (1) performance, in that searching the procarray might avoid expense if it's cheaper than searching clog, or add expense if the reverse is true and (2) slightly changing the time at which we're first able to detect this form of corruption. I am inclined to prefer the simpler form of the test without TransactionIdIsInProgress() unless someone can say why we shouldn't go that route. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: