Re: BUG #17949: Adding an index introduces serialisation anomalies. - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #17949: Adding an index introduces serialisation anomalies.
Date
Msg-id CA+hUKGJP3g6PF4vES0X0zy34uuSvHxHUhoq65A_WtzqxPpJ_6g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17949: Adding an index introduces serialisation anomalies.  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: BUG #17949: Adding an index introduces serialisation anomalies.  (Thomas Munro <thomas.munro@gmail.com>)
Re: BUG #17949: Adding an index introduces serialisation anomalies.  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-bugs
On Tue, Jun 20, 2023 at 12:18 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > On Mon, Jun 19, 2023 at 09:30:12PM +1200, Thomas Munro wrote:
> > +#if 0
> >         /*
> >          * Ignore any claimed entries past what we think is the end of the
> >          * relation. It may have been extended after the start of our scan (we
> >          * only hold an AccessShareLock, and it could be inserts from this
> >          * backend).
> >          */
> >         if (block >= hscan->rs_nblocks)
> >                 return false;
> > +#endif
>
> Great, thanks! Can confirm, after applying both the posted patch and the
> change above the issue is not reproducible anymore.

Here's a cleaned-up version of the first two changes.  What do you
think about the assertions I make in the commit message for 0002?

> One thing I've noticed is that one can observe a similar issue using a
> gin index and int[] for the "path" column, even applying changes from
> the thread. The gin implementation does something similar to btree in
> startScanEntry -- it lands in "No entry found" branch, but instead of
> locking the relation it locks "the leaf page, to lock the place where
> the entry would've been, had there been one". The similar fix retrying
> ginFindLeafPage didn't solve the problem, even if locking the whole
> relation instead, but maybe I'm missing something.

Ouch.  I would have to go and study gin's interlocking model, but one
superficial bug I spotted is that ginget.c's collectMatchBitmap()
calls PredicateLockPage(stack->buffer), where a block number is
expected.  I wish we had strong typedefs, to reject stuff like that at
compile time.  But fixing that alone isn't enough.

In case someone who knows more about gin is interested in helping, I
attach Artem's repro, modified to use gin.

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17978: Unexpected error: "wrong varnullingrels (b) (expected (b 5)) for Var 6/2" triggered by JOIN
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon