Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index - Mailing list pgsql-bugs

From Tender Wang
Subject Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index
Date
Msg-id CAHewXNmg1NaPkhy1vw7Q-4TdHXN-0qH2_3WzrVTJCXc2em+pNw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18396: Assert in gistFindCorrectParent() fails on inserting large tuples into gist index  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-bugs
Hi Heikki and Alexander,

Heikki Linnakangas <hlinnaka@iki.fi> 于2024年3月19日周二 22:26写道:
On 19/03/2024 14:07, Tender Wang wrote:
> Thanks for your report. I can reproduce this issue.
> I try to delete the Assert, no coredump anymore.
> I need some time to learn GiST to find the root cause.
At first glance, I think the assertion is too strict. In
gistFindCorrectParent(), if we walk right, we update the parent's block
number and reset its memorized downlinkoffnum to InvalidOffsetNumber. If
we later call gistFindCorrectParent() again with the same stack, because
the parent also needs to be split, we hit that assertion. But it's OK in
that case, we don't know the downlink because we had moved right.
Attached patch relaxes that.

I picked up this issue again, and I took some time to dig into it.  The patch works for me.
Can we update the comments above the Assert together?  If we go to the Assert() code, 
The comments say that the page has changed or we are in index build process.
But in the issue, the page didn't change(because we created temp table, no concurrent), but we were not in
index build process, too. So the comments cannot reflect this case.


But now I'm having some second thoughts. gistFindCorrectParent() looks
like this:

>       /*
>        * Scan the page to re-find the downlink. If the page was split, it might
>        * have moved to a different page, so follow the right links until we find
>        * it.
>        */
>       while (true)
>       {
>               OffsetNumber i;
>
>               maxoff = PageGetMaxOffsetNumber(parent->page);
>               for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
>               {
>                       iid = PageGetItemId(parent->page, i);
>                       idxtuple = (IndexTuple) PageGetItem(parent->page, iid);
>                       if (ItemPointerGetBlockNumber(&(idxtuple->t_tid)) == child->blkno)
>                       {
>                               /* yes!!, found */
>                               child->downlinkoffnum = i;
>                               return;
>                       }
>               }
>
>               parent->blkno = GistPageGetOpaque(parent->page)->rightlink;
>               parent->downlinkoffnum = InvalidOffsetNumber;
>               UnlockReleaseBuffer(parent->buffer);
>               if (parent->blkno == InvalidBlockNumber)
>               {
>                       /*
>                        * End of chain and still didn't find parent. It's a very-very
>                        * rare situation when the root was split.
>                        */
>                       break;
>               }
>               parent->buffer = ReadBuffer(r, parent->blkno);
>               LockBuffer(parent->buffer, GIST_EXCLUSIVE);
>               gistcheckpage(r, parent->buffer);
>               parent->page = (Page) BufferGetPage(parent->buffer);
>       }

When we step right and update parent->blkno, shouldn't we also update
parent->lsn? Otherwise, we might fail to detect a concurrent page split
with the LSN-NSN interlock checks. Or maybe it's correct, and we should
indeed not update the memorized LSN. Or maybe it doesn't matter because
we retry from the parent anyway, thanks to the 'retry_from_parent' flag?
I'm not sure..

I agree that it doesn't matter. If internal page split due to adjust key consistent with the key we're inserting.
The 'retry_from_parent' flag can help us get corrent downlinkoffnum.  
 

>I'm also bothered by errors "no empty local buffer available" produced
>(on master, but not on REL_12_STABLE) when inserting larger tuples:
>INSERT INTO t SELECT '' FROM generate_series(1, 500) g;
>INSERT INTO t SELECT
>(SELECT string_agg('text' || g, ' ') FROM generate_series(1, 700) g)
>  FROM generate_series(1, 30);

>Are those errors expected?
I believe this error is expected. We have the same error reporting codes in the shared buffer.

By the way, I found some codes issue(maybe not a issue) in gistdoinsert(), as below:

stack->lsn = xlocked ?
PageGetLSN(stack->page) : BufferGetLSNAtomic(stack->buffer);

The buffer belongs to local buffers in this issue, so its value is lower than 0.
But in BufferGetLSNAtomic(), we call GetBufferDescriptor() to get BufferDesc.
GetBufferDescriptor() is used to share the buffer.  Out-of-bounds will happen.

I provided a patch to fix that in [1]. Any thoughts?



--
Thanks,
Tender Wang

pgsql-bugs by date:

Previous
From: Tender Wang
Date:
Subject: Re: BUG #18705: Segmentation fault when create brin index on user-defined type.
Next
From: "zengman"
Date:
Subject: Re:BUG #18710: "pg_get_viewdef" triggers assertions in special scenarios