Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index
Date
Msg-id a1701e0f-935a-4560-9a25-a40fd45cb35e@iki.fi
Whole thread Raw
In response to Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index  (Tender Wang <tndrwang@gmail.com>)
Responses Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index
List pgsql-bugs
On 15/11/2024 11:10, Tender Wang wrote:
> Heikki Linnakangas <hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> 于2024年3 
> 月19日周二 22:26写道:
> 
>     On 19/03/2024 14:07, Tender Wang wrote:
>      > Thanks for your report. I can reproduce this issue.
>      > I try to delete the Assert, no coredump anymore.
>      > I need some time to learn GiST to find the root cause.
>     At first glance, I think the assertion is too strict. In
>     gistFindCorrectParent(), if we walk right, we update the parent's block
>     number and reset its memorized downlinkoffnum to
>     InvalidOffsetNumber. If
>     we later call gistFindCorrectParent() again with the same stack,
>     because
>     the parent also needs to be split, we hit that assertion. But it's
>     OK in
>     that case, we don't know the downlink because we had moved right.
>     Attached patch relaxes that.
> 
> 
> I picked up this issue again, and I took some time to dig into it.  
> The patch works for me.
> Can we update the comments above the Assert together?  If we go to the 
> Assert() code,
> The comments say that the page has changed or we are in index build process.
> But in the issue, the page didn't change(because we created temp table, 
> no concurrent), but we were not in
> index build process, too. So the comments cannot reflect this case.
> 
> 
>     But now I'm having some second thoughts. gistFindCorrectParent() looks
>     like this:
> 
>      >       /*
>      >        * Scan the page to re-find the downlink. If the page was
>     split, it might
>      >        * have moved to a different page, so follow the right
>     links until we find
>      >        * it.
>      >        */
>      >       while (true)
>      >       {
>      >               OffsetNumber i;
>      >
>      >               maxoff = PageGetMaxOffsetNumber(parent->page);
>      >               for (i = FirstOffsetNumber; i <= maxoff; i =
>     OffsetNumberNext(i))
>      >               {
>      >                       iid = PageGetItemId(parent->page, i);
>      >                       idxtuple = (IndexTuple) PageGetItem(parent-
>      >page, iid);
>      >                       if (ItemPointerGetBlockNumber(&(idxtuple-
>      >t_tid)) == child->blkno)
>      >                       {
>      >                               /* yes!!, found */
>      >                               child->downlinkoffnum = i;
>      >                               return;
>      >                       }
>      >               }
>      >
>      >               parent->blkno = GistPageGetOpaque(parent->page)-
>      >rightlink;
>      >               parent->downlinkoffnum = InvalidOffsetNumber;
>      >               UnlockReleaseBuffer(parent->buffer);
>      >               if (parent->blkno == InvalidBlockNumber)
>      >               {
>      >                       /*
>      >                        * End of chain and still didn't find
>     parent. It's a very-very
>      >                        * rare situation when the root was split.
>      >                        */
>      >                       break;
>      >               }
>      >               parent->buffer = ReadBuffer(r, parent->blkno);
>      >               LockBuffer(parent->buffer, GIST_EXCLUSIVE);
>      >               gistcheckpage(r, parent->buffer);
>      >               parent->page = (Page) BufferGetPage(parent->buffer);
>      >       }
> 
>     When we step right and update parent->blkno, shouldn't we also update
>     parent->lsn? Otherwise, we might fail to detect a concurrent page split
>     with the LSN-NSN interlock checks. Or maybe it's correct, and we should
>     indeed not update the memorized LSN. Or maybe it doesn't matter because
>     we retry from the parent anyway, thanks to the 'retry_from_parent'
>     flag?
>     I'm not sure..
> 
> I agree that it doesn't matter. If internal page split due to adjust key 
> consistent with the key we're inserting.
> The 'retry_from_parent' flag can help us get corrent downlinkoffnum.

Ok, thanks for checking! Committed.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18876: HINT messages for mxid wrap-around say "drop stale slots", but that may not be appropriate
Next
From: Manika Singhal
Date:
Subject: Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows