Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date
Msg-id CAH2-WzmqCtkxBrxB09RGmvpG0k52son1GOg_Ua+TDtFUQsTTDg@mail.gmail.com
Whole thread Raw
In response to Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-bugs
On Thu, Jul 15, 2021 at 3:56 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> I still think this is worth checking.  Despite the pending list wasn't
> involved in the index scan with wrong results, the bug could be
> related to insertion to the pending list.  Or it could be related to
> moving entries from the pending list to the posting list/tree.

If I had to guess I'd say that the chances of the pending list being
involved are high.

shiftList() deletes pages in the pending list -- this is called from
ginInsertCleanup(). But shiftList() doesn't call GinPageSetDeleteXid()
to set an XID that represents when the page is safe to recycle, which
is what ginDeletePage() always does. Why is that okay?

Note that we usually use read/share buffer locks when lock-coupling
inside ginInsertCleanup() -- so AFAICT we won't block-out concurrent
readers with a link that's about to go stale due to recycling of the
page. This looks unsafe. Of course it's very hard to tell what might
be going on, since none of this seems to be explained anywhere.

Even ginDeletePage() didn't do the right thing with XIDs until bugfix
commit 52ac6cd2d0. That commit didn't touch any pending list code --
it should of at least explained why ginInsertCleanup()/shiftList()
don't need to use the GinPageSetDeleteXid() stuff.

--
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Next
From: Japin Li
Date:
Subject: Re: BUG #17111: Database created, cannot be created, but reported as inexist