Re: Amcheck verification of GiST and GIN - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Amcheck verification of GiST and GIN |
Date | |
Msg-id | ec5a0c8c-1a41-4a81-a54e-e46a200689be@vondra.me Whole thread Raw |
In response to | Amcheck verification of GiST and GIN (Andrey Borodin <x4mmm@yandex-team.ru>) |
List | pgsql-hackers |
On 5/9/25 14:43, Arseniy Mukhin wrote: > Hello, > > Thanks everybody for the patch. > > I noticed there are no tests that GIN check fails if the index is > corrupted, so I thought it would be great to have some. > While writing tests I noticed some issues in the patch (all issues are > for verify_gin.c) > > 1) When we add new items to the entry tree stack, ptr->parenttup is always null > because GinPageGetOpaque(page)->rightlink is never NULL. > > /* last tuple in layer has no high key */ > if (i != maxoff && !GinPageGetOpaque(page)->rightlink) > ptr->parenttup = CopyIndexTuple(idxtuple); > else > ptr->parenttup = NULL; > > Right way to check if entry doesn't have right neighbour is > > GinPageGetOpaque(page)->rightlink == InvalidBlockNumber > > But even if we fix it, the condition would not do what the comment > says. If we want to have NULL as parenttup > only for the last tuple of the btree layer, the right check would be: > > if (i == maxoff && rightlink == InvalidBlockNumber) > ptr->parenttup = NULL; > else > ptr->parenttup = CopyIndexTuple(idxtuple); > > 2) Check don't use attnum in comparisons, but for multicolumn indexes > attnum defines order. When we compare max entry page key > with parent key we ignore attnum. It means we occasionally can try to > compare keys of different columns. > While checking order within the same page we skip checking order for > tuples with different attnums now, > but it can be checked. Fix is easy: using ginCompareAttEntries() > instead of ginCompareEntries(). > > > 3) Here a code of the split detection > > if (rightlink != InvalidBlockNumber && > ginCompareEntries(&state, attnum, page_max_key, > page_max_key_category, parent_key, > parent_key_category) > 0) > { > /* split page detected, install right link to the stack */ > > Condition seems not right, because the child page max item key never > can be bigger then parent key. > It can be equal to the parentkey, and it means that there was no split > and the parent key that we cached in the stack is still > relevant. Or it could be less then cached parent key and it means that > split took place and old max item key moved to the > right neighbour and current page max item key should be less then > cached parent key. So I think we should replace > with <. > > 4) Here is the code for checking the order within the entry page. > > /* > * First block is metadata, skip order check. Also, never check > * for high key on rightmost page, as this key is not really > * stored explicitly. > * > * Also make sure to not compare entries for different attnums, > * which may be stored on the same page. > */ > if (i != FirstOffsetNumber && attnum == prev_attnum && > stack->blkno != GIN_ROOT_BLKNO && > !(i == maxoff && rightlink == InvalidBlockNumber)) > { > prev_key = gintuple_get_key(&state, prev_tuple, > &prev_key_category); > if (ginCompareEntries(&state, attnum, prev_key, > prev_key_category, current_key, > current_key_category) >= 0) > > We skip checking the order for the root page, it's not clear why. > Probably there is some mess with the meta page, because > comment says "First block is metadata, skip order check". So I think > we can remove > > stack->blkno != GIN_ROOT_BLKNO > > 5) The same place as 4). We skip checking the order for the high key > on the rightmost page, as this key is not really stored explicitly, > but for leaf pages all keys are stored explicitly, so we can check the > order for the last item of the leaf page too. > So I think we can change the condition to this: > > !(i == maxoff && rightlink == InvalidBlockNumber && > !GinPageIsLeaf(page)) > > 6) In posting tree parent key check part: > > /* > * Check if this tuple is consistent with the downlink in the > * parent. > */ > if (stack->parentblk != InvalidBlockNumber && i == maxoff && > ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0) > ereport(ERROR, > ... > Here we don't check if stack->parentkey is valid, so sometimes we > compare invalid parentkey (because we can have > valid parentblk and invalid parentkey the same time). Invalid > parentkey is always bigger, so the code never triggers > ereport, but it doesn't look right. so probably we can rewrite it this way: > > if (i == maxoff && ItemPointerIsValid(&stack->parentkey) && > ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0) > > 7) When producing stack entries for posting tree check, we set parent > key like this: > > /* > * Set rightmost parent key to invalid item pointer. Its value > * is 'Infinity' and not explicitly stored. > */ > if (rightlink == InvalidBlockNumber) > ItemPointerSetInvalid(&ptr->parentkey); > else > ptr->parentkey = posting_item->key; > > We set invalid parent key for all items of the rightmost page. But > it's the only rightmost item that doesn't have an explicit > parentkey (actually the comment says exactly this, but the code does a > different thing). All others have an explicit parent > key and we can set it. So fix can look like this: > > if (rightlink == InvalidBlockNumber && i == maxoff) > ItemPointerSetInvalid(&ptr->parentkey); > else > ptr->parentkey = posting_item->key; > > But for (rightlink == InvalidBlockNumber && i == maxoff) > posting_item->key is always (0,0) (we check it a little bit earlier), > so I think we can simplify it: > > ptr->parentkey = posting_item->key; > > 8) In the function gin_refind_parent() the code > > if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno) > > triggers Assert(ItemPointerIsValid(pointer)) within the > ItemPointerGetBlockNumber(), because itup->t_tid could be invalid. > AFAIS GIN code uses a special method GinGetDownlink(itup) that avoids > this Assert. So we can use it here too. > > Please find the attached patch with fixes for issues above. Also there > are added regression test for multicolumn index, > and several tap tests with some basic corrupted index cases. I'm not > sure if it's the right way to write such tests and would > be glad to hear any feedback, especially about > invalid_entry_columns_order_test() where it seems important to > preserve > byte ordering. Also all tests expect standard page size 8192 now. > > > Also there are several points that I think also worth addressing: > > 9) Field 'parentlsn' is set but never actually used for any check. Or > I missed something. > > 10) README says "Vacuum never deletes tuples or pages from the entry > tree." But check assumes that it's possible to have > deleted leaf page with 0 entries. > > if (GinPageIsDeleted(page)) > { > if (!GinPageIsLeaf(page)) > ereport(ERROR, > (errcode(ERRCODE_INDEX_CORRUPTED), > errmsg("index \"%s\" has deleted internal page %u", > RelationGetRelationName(rel), blockNo))); > if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber) > ereport(ERROR, > (errcode(ERRCODE_INDEX_CORRUPTED), > errmsg("index \"%s\" has deleted page %u with tuples", > RelationGetRelationName(rel), blockNo))); > } > 11) When we compare entry tree max page key with parent key: > > if (ginCompareAttEntries(&state, attnum, current_key, > current_key_category, parent_key_attnum, > parent_key, parent_key_category) > 0) > { > /* > * There was a discrepancy between parent and child > * tuples. We need to verify it is not a result of > * concurrent call of gistplacetopage(). So, lock parent > * and try to find downlink for current page. It may be > * missing due to concurrent page split, this is OK. > */ > pfree(stack->parenttup); > stack->parenttup = gin_refind_parent(rel, stack->parentblk, > stack->blkno, strategy); > > I think we can remove gin_refind_parent() and do ereport right away here. > The same logic as with 3). AFAIK it's impossible to have a child item > with a key that is higher than the cached parent key. > Parent key bounds what keys we can insert into the child page, so it > seems there is no way how they can appear there. > These look like good points. I've added it to open items so that we don't forget about this, I won't have time to look at this until after pgconf.dev. thanks -- Tomas Vondra
pgsql-hackers by date: