From aad49bcc32a9f077362352dba987b13653c91979 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Fri, 16 Jun 2023 13:11:40 -0700 Subject: [PATCH v1] nbtree VACUUM: cope with topparent inconsistencies. Avoid "right sibling %u of block %u is not next child" errors when vacuuming a corrupt nbtree index. Just LOG the issue and press on. That way VACUUM will have a decent chance of finishing off all required processing for the index (and for the table as a whole). This error was seen in the field from time to time (it's more than a theoretical risk), so giving VACUUM the ability to press on like this has real value. Nothing short of a REINDEX is expected to fix the underlying index corruption, so giving up (by throwing an error) risks making a bad situation far worse. Anything that blocks forward progress by VACUUM like this might go unnoticed for a long time. This could eventually lead to a wraparound/xidStopLimit outage. This is similar to recent work from commit 5abff197, as well as work from commit 5b861baa (later backpatched as commit 43e409ce), which tuaght nbtree VACUUM to press on when its "re-find" check failed. The hardening added by this commit is closely related to the "re-find" check. It takes place right afterwards, in the first phase of page deletion. Author: Peter Geoghegan Discussion: https://postgr.es/m/?????????? Backpatch: 11- (all supported versions). --- src/backend/access/nbtree/nbtpage.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c index c2050656e..a7ea2c4ea 100644 --- a/src/backend/access/nbtree/nbtpage.c +++ b/src/backend/access/nbtree/nbtpage.c @@ -2147,12 +2147,6 @@ _bt_mark_page_halfdead(Relation rel, Relation heaprel, Buffer leafbuf, &topparent, &topparentrightsib)) return false; - /* - * Check that the parent-page index items we're about to delete/overwrite - * in subtree parent page contain what we expect. This can fail if the - * index has become corrupt for some reason. We want to throw any error - * before entering the critical section --- otherwise it'd be a PANIC. - */ page = BufferGetPage(subtreeparent); opaque = BTPageGetOpaque(page); @@ -2170,8 +2164,17 @@ _bt_mark_page_halfdead(Relation rel, Relation heaprel, Buffer leafbuf, nextoffset = OffsetNumberNext(poffset); itemid = PageGetItemId(page, nextoffset); itup = (IndexTuple) PageGetItem(page, itemid); + + /* + * Check that the parent-page index items we're about to delete/overwrite + * in subtree parent page contain what we expect. This can fail if the + * index has become corrupt for some reason. When that happens we back + * out of deletion in the leafbuf subtree. (This is just like the case + * where _bt_lock_subtree_parent() cannot "re-find" leafbuf's downlink.) + */ if (BTreeTupleGetDownLink(itup) != topparentrightsib) - ereport(ERROR, + { + ereport(LOG, (errcode(ERRCODE_INDEX_CORRUPTED), errmsg_internal("right sibling %u of block %u is not next child %u of block %u in index \"%s\"", topparentrightsib, topparent, @@ -2179,6 +2182,10 @@ _bt_mark_page_halfdead(Relation rel, Relation heaprel, Buffer leafbuf, BufferGetBlockNumber(subtreeparent), RelationGetRelationName(rel)))); + _bt_relbuf(rel, subtreeparent); + return false; + } + /* * Any insert which would have gone on the leaf block will now go to its * right sibling. In other words, the key space moves right. -- 2.40.1