Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index |
Date | |
Msg-id | a1701e0f-935a-4560-9a25-a40fd45cb35e@iki.fi Whole thread Raw |
In response to | Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index (Tender Wang <tndrwang@gmail.com>) |
Responses |
Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index
|
List | pgsql-bugs |
On 15/11/2024 11:10, Tender Wang wrote: > Heikki Linnakangas <hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> 于2024年3 > 月19日周二 22:26写道: > > On 19/03/2024 14:07, Tender Wang wrote: > > Thanks for your report. I can reproduce this issue. > > I try to delete the Assert, no coredump anymore. > > I need some time to learn GiST to find the root cause. > At first glance, I think the assertion is too strict. In > gistFindCorrectParent(), if we walk right, we update the parent's block > number and reset its memorized downlinkoffnum to > InvalidOffsetNumber. If > we later call gistFindCorrectParent() again with the same stack, > because > the parent also needs to be split, we hit that assertion. But it's > OK in > that case, we don't know the downlink because we had moved right. > Attached patch relaxes that. > > > I picked up this issue again, and I took some time to dig into it. > The patch works for me. > Can we update the comments above the Assert together? If we go to the > Assert() code, > The comments say that the page has changed or we are in index build process. > But in the issue, the page didn't change(because we created temp table, > no concurrent), but we were not in > index build process, too. So the comments cannot reflect this case. > > > But now I'm having some second thoughts. gistFindCorrectParent() looks > like this: > > > /* > > * Scan the page to re-find the downlink. If the page was > split, it might > > * have moved to a different page, so follow the right > links until we find > > * it. > > */ > > while (true) > > { > > OffsetNumber i; > > > > maxoff = PageGetMaxOffsetNumber(parent->page); > > for (i = FirstOffsetNumber; i <= maxoff; i = > OffsetNumberNext(i)) > > { > > iid = PageGetItemId(parent->page, i); > > idxtuple = (IndexTuple) PageGetItem(parent- > >page, iid); > > if (ItemPointerGetBlockNumber(&(idxtuple- > >t_tid)) == child->blkno) > > { > > /* yes!!, found */ > > child->downlinkoffnum = i; > > return; > > } > > } > > > > parent->blkno = GistPageGetOpaque(parent->page)- > >rightlink; > > parent->downlinkoffnum = InvalidOffsetNumber; > > UnlockReleaseBuffer(parent->buffer); > > if (parent->blkno == InvalidBlockNumber) > > { > > /* > > * End of chain and still didn't find > parent. It's a very-very > > * rare situation when the root was split. > > */ > > break; > > } > > parent->buffer = ReadBuffer(r, parent->blkno); > > LockBuffer(parent->buffer, GIST_EXCLUSIVE); > > gistcheckpage(r, parent->buffer); > > parent->page = (Page) BufferGetPage(parent->buffer); > > } > > When we step right and update parent->blkno, shouldn't we also update > parent->lsn? Otherwise, we might fail to detect a concurrent page split > with the LSN-NSN interlock checks. Or maybe it's correct, and we should > indeed not update the memorized LSN. Or maybe it doesn't matter because > we retry from the parent anyway, thanks to the 'retry_from_parent' > flag? > I'm not sure.. > > I agree that it doesn't matter. If internal page split due to adjust key > consistent with the key we're inserting. > The 'retry_from_parent' flag can help us get corrent downlinkoffnum. Ok, thanks for checking! Committed. -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-bugs by date: