Re: Reproducible GIST index corruption under concurrent updates - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: Reproducible GIST index corruption under concurrent updates |
Date | |
Msg-id | 907491b3-87c7-aadd-3fb1-be3c542b7fb7@iki.fi Whole thread Raw |
In response to | Re: Reproducible GIST index corruption under concurrent updates (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Reproducible GIST index corruption under concurrent updates
|
List | pgsql-bugs |
On 19/01/2021 23:53, Heikki Linnakangas wrote: > On 19/01/2021 19:12, Andrey Borodin wrote: >>> 19 янв. 2021 г., в 16:15, Duncan Sands <duncan.sands@deepbluecap.com> написал(а): >>> >>> Enjoy! >> >> Thanks! >> It reproduces on my machine. Seems like all tuples are indexed, but some root downlinks are not adjusted properly. Probably,after concurrent split. > > This is reproducable down to v12. I bisected it to this commit: > > commit e2e992c93145cfc0e3563fb84efd25b390a84bb9 (refs/bisect/bad) > Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Wed Jul 24 20:24:05 2019 +0300 > > Refactor checks for deleted GiST pages. > > The explicit check in gistScanPage() isn't currently really > necessary, as > a deleted page is always empty, so the loop would fall through without > doing anything, anyway. But it's a marginal optimization, and it > gives a > nice place to attach a comment to explain how it works. > > Backpatch to v12, where GiST page deletion was introduced. > > Reviewed-by: Andrey Borodin > Discussion: > https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru > > I'll dig deeper tomorrow. The culprit is this change: > @@ -858,12 +856,13 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, > * leaf/inner is enough to recognize split for root > */ > } > - else if (GistFollowRight(stack->page) || > - stack->parent->lsn < GistPageGetNSN(stack->page)) > + else if ((GistFollowRight(stack->page) || > + stack->parent->lsn < GistPageGetNSN(stack->page)) && > + GistPageIsDeleted(stack->page)) > { > /* > - * The page was split while we momentarily unlocked the > - * page. Go back to parent. > + * The page was split or deleted while we momentarily > + * unlocked the page. Go back to parent. > */ > UnlockReleaseBuffer(stack->buffer); > xlocked = false; The comment change was correct, but the condition used &&. Should've been ||. There is another copy of basically the same condition earlier in the function that was changed correctly, but I blundered this one. Oops. The attached patch fixes this. I also added an assertion to the gistplacetopage() function, to check that we never try to insert on a deleted page. This bug could've made that happen too, although in this case the problem was a concurrent split, not a deletion. I'll backpatch and push this tomorrow. Many thanks for the easy reproducer script, Duncan! - Heikki
Attachment
pgsql-bugs by date: