Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Himanshu Upadhyaya
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CAPF61jBemJqfsc_nUcVYfQo924ZRWSiwyrhjc=D9fHvysOTi8A@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers


On Wed, Sep 7, 2022 at 2:49 AM Robert Haas <robertmhaas@gmail.com> wrote:

But here's one random idea: add a successor[] array and an lp_valid[]
array. In the first loop, set lp_valid[offset] = true if it passes the
check_lp() checks, and set successor[A] = B if A redirects to B or has
a CTID link to B, without matching xmin/xmax. Then, in a second loop,
iterate over the successor[] array. If successor[A] = B && lp_valid[A]
&& lp_valid[B], then check whether A.xmax = B.xmin; if so, then
complain if predecessor[B] is already set, else set predecessor[B] =
A. Then, in the third loop, iterate over the predecessor array just as
you're doing now. Then it's clear that we do the lp_valid checks
exactly once for every offset that might need them, and in order. And
it's also clear that the predecessor-based checks can never happen
unless the lp_valid checks passed for both of the offsets involved.



Approach of introducing a successor array is good but I see one overhead with having both successor and predecessor array, that is, we will traverse each offset on page thrice(one more for original loop on offset) and with each offset we have to retrieve 
/reach an ItemID(PageGetItemId) and Item(PageGetItem) itself. This is not much overhead as they are all preprocessors but there will be some overhead.
How about having new array(initialised with LP_NOT_CHECKED) of enum LPStatus as below

typedef enum LPStatus
{
LP_NOT_CHECKED,
LP_VALID,
LP_NOT_VALID
}LPStatus;

and validating and setting with proper status at three places
1) while validating Redirect Tuple
2) while validating populating predecessor array and
3) at original place  of "sanity check"


something like:
"     if (lpStatus[rdoffnum] == LP_NOT_CHECKED)
                                {
                                        ctx.offnum = rdoffnum;
                                        if (!check_lp(&ctx, ItemIdGetLength(rditem), ItemIdGetOffset(rditem)))
                                        {
                                                lpStatus[rdoffnum] = LP_NOT_VALID;
                                                continue;
                                        }
                                        lpStatus[rdoffnum] = LP_VALID;
                                }
                                else if (lpStatus[rdoffnum] == LP_NOT_VALID)
                                        continue;
"



thoughts?
--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: why can't a table be part of the same publication as its schema
Next
From: Tom Lane
Date:
Subject: Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf