Re: Amcheck verification of GiST and GIN - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Amcheck verification of GiST and GIN
Date
Msg-id ec5a0c8c-1a41-4a81-a54e-e46a200689be@vondra.me
Whole thread Raw
In response to Amcheck verification of GiST and GIN  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
On 5/9/25 14:43, Arseniy Mukhin wrote:
> Hello,
> 
> Thanks everybody for the patch.
> 
> I noticed there are no tests that GIN check fails if the index is
> corrupted, so I thought it would be great to have some.
> While writing tests I noticed some issues in the patch (all issues are
> for verify_gin.c)
> 
> 1) When we add new items to the entry tree stack, ptr->parenttup is always null
> because GinPageGetOpaque(page)->rightlink is never NULL.
> 
>              /* last tuple in layer has no high key */
>                 if (i != maxoff && !GinPageGetOpaque(page)->rightlink)
>                     ptr->parenttup = CopyIndexTuple(idxtuple);
>                 else
>                     ptr->parenttup = NULL;
> 
> Right way to check if entry doesn't have right neighbour is
> 
>             GinPageGetOpaque(page)->rightlink == InvalidBlockNumber
> 
> But even if we fix it, the condition would not do what the comment
> says. If we want to have NULL as parenttup
> only for the last tuple of the btree layer, the right check would be:
> 
>                 if (i == maxoff && rightlink == InvalidBlockNumber)
>                     ptr->parenttup = NULL;
>              else
>                     ptr->parenttup = CopyIndexTuple(idxtuple);
> 
> 2) Check don't use attnum in comparisons, but for multicolumn indexes
> attnum defines order. When we compare max entry page key
> with parent key we ignore attnum. It means we occasionally can try to
> compare keys of different columns.
> While checking order within the same page we skip checking order for
> tuples with different attnums now,
> but it can be checked. Fix is easy: using ginCompareAttEntries()
> instead of ginCompareEntries().
> 
> 
> 3) Here a code of the split detection
> 
>        if (rightlink != InvalidBlockNumber &&
>              ginCompareEntries(&state, attnum, page_max_key,
>                            page_max_key_category, parent_key,
>                            parent_key_category) > 0)
>           {
>              /* split page detected, install right link to the stack */
> 
> Condition seems not right, because the child page max item key never
> can be bigger then parent key.
> It can be equal to the parentkey, and it means that there was no split
> and the parent key that we cached in the stack is still
> relevant. Or it could be less then cached parent key and it means that
> split took place and old max item key moved to the
> right neighbour and current page max item key should be less then
> cached parent key. So I think we should replace > with <.
> 
> 4) Here is the code for checking the order within the entry page.
> 
>            /*
>            * First block is metadata, skip order check. Also, never check
>            * for high key on rightmost page, as this key is not really
>            * stored explicitly.
>            *
>            * Also make sure to not compare entries for different attnums,
>            * which may be stored on the same page.
>            */
>           if (i != FirstOffsetNumber && attnum == prev_attnum &&
> stack->blkno != GIN_ROOT_BLKNO &&
>              !(i == maxoff && rightlink == InvalidBlockNumber))
>           {
>              prev_key = gintuple_get_key(&state, prev_tuple,
> &prev_key_category);
>              if (ginCompareEntries(&state, attnum, prev_key,
>                               prev_key_category, current_key,
>                               current_key_category) >= 0)
> 
> We skip checking the order for the root page, it's not clear why.
> Probably there is some mess with the meta page, because
> comment says "First block is metadata, skip order check". So I think
> we can remove
> 
>                     stack->blkno != GIN_ROOT_BLKNO
> 
> 5) The same place as 4). We skip checking the order for the high key
> on the rightmost page, as this key is not really stored explicitly,
> but for leaf pages all keys are stored explicitly, so we can check the
> order for the last item of the leaf page too.
> So I think we can change the condition to this:
> 
>             !(i == maxoff && rightlink == InvalidBlockNumber &&
> !GinPageIsLeaf(page))
> 
> 6) In posting tree parent key check part:
> 
>               /*
>               * Check if this tuple is consistent with the downlink in the
>               * parent.
>               */
>              if (stack->parentblk != InvalidBlockNumber && i == maxoff &&
>                 ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)
>                 ereport(ERROR,
>                       ...
> Here we don't check if stack->parentkey is valid, so sometimes we
> compare invalid parentkey (because we can have
> valid parentblk and invalid parentkey the same time). Invalid
> parentkey is always bigger, so the code never triggers
> ereport, but it doesn't look right. so probably we can rewrite it this way:
> 
>                 if (i == maxoff && ItemPointerIsValid(&stack->parentkey) &&
>                 ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)
> 
> 7) When producing stack entries for posting tree check, we set parent
> key like this:
> 
>              /*
>               * Set rightmost parent key to invalid item pointer. Its value
>               * is 'Infinity' and not explicitly stored.
>               */
>              if (rightlink == InvalidBlockNumber)
>                 ItemPointerSetInvalid(&ptr->parentkey);
>              else
>                 ptr->parentkey = posting_item->key;
> 
> We set invalid parent key for all items of the rightmost page. But
> it's the only rightmost item that doesn't have an explicit
> parentkey (actually the comment says exactly this, but the code does a
> different thing). All others have an explicit parent
> key and we can set it. So fix can look like this:
> 
>                 if (rightlink == InvalidBlockNumber && i == maxoff)
>                 ItemPointerSetInvalid(&ptr->parentkey);
>              else
>                 ptr->parentkey = posting_item->key;
> 
> But for (rightlink == InvalidBlockNumber && i == maxoff)
> posting_item->key is always (0,0) (we check it a little bit earlier),
> so I think we can simplify it:
> 
>                 ptr->parentkey = posting_item->key;
> 
> 8) In the function gin_refind_parent() the code
> 
>                 if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)
> 
> triggers Assert(ItemPointerIsValid(pointer)) within the
> ItemPointerGetBlockNumber(), because itup->t_tid could be invalid.
> AFAIS GIN code uses a special method GinGetDownlink(itup) that avoids
> this Assert. So we can use it here too.
> 
> Please find the attached patch with fixes for issues above. Also there
> are added regression test for multicolumn index,
> and several tap tests with some basic corrupted index cases. I'm not
> sure if it's the right way to write such tests and would
> be glad to hear any feedback, especially about
> invalid_entry_columns_order_test() where it seems important to
> preserve
> byte ordering. Also all tests expect standard page size 8192 now.
> 
> 
> Also there are several points that I think also worth addressing:
> 
> 9) Field 'parentlsn' is set but never actually used for any check. Or
> I missed something.
> 
> 10) README says "Vacuum never deletes tuples or pages from the entry
> tree." But check assumes that it's possible to have
> deleted leaf page with 0 entries.
> 
>     if (GinPageIsDeleted(page))
>     {
>        if (!GinPageIsLeaf(page))
>           ereport(ERROR,
>                 (errcode(ERRCODE_INDEX_CORRUPTED),
>                  errmsg("index \"%s\" has deleted internal page %u",
>                       RelationGetRelationName(rel), blockNo)));
>        if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
>           ereport(ERROR,
>                 (errcode(ERRCODE_INDEX_CORRUPTED),
>                  errmsg("index \"%s\" has deleted page %u with tuples",
>                       RelationGetRelationName(rel), blockNo)));
>     }
> 11) When we compare entry tree max page key with parent key:
> 
>              if (ginCompareAttEntries(&state, attnum, current_key,
>                               current_key_category, parent_key_attnum,
>                                       parent_key, parent_key_category) > 0)
>              {
>                 /*
>                  * There was a discrepancy between parent and child
>                  * tuples. We need to verify it is not a result of
>                  * concurrent call of gistplacetopage(). So, lock parent
>                  * and try to find downlink for current page. It may be
>                  * missing due to concurrent page split, this is OK.
>                  */
>                 pfree(stack->parenttup);
>                 stack->parenttup = gin_refind_parent(rel, stack->parentblk,
>                                             stack->blkno, strategy);
> 
> I think we can remove gin_refind_parent() and do ereport right away here.
> The same logic as with 3). AFAIK it's impossible to have a child item
> with a key that is higher than the cached parent key.
> Parent key bounds what keys we can insert into the child page, so it
> seems there is no way how they can appear there.
> 

These look like good points. I've added it to open items so that we
don't forget about this, I won't have time to look at this until after
pgconf.dev.


thanks

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Next
From: Peter Geoghegan
Date:
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree