On Tue, Jun 20, 2023 at 12:18 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > On Mon, Jun 19, 2023 at 09:30:12PM +1200, Thomas Munro wrote:
> > +#if 0
> > /*
> > * Ignore any claimed entries past what we think is the end of the
> > * relation. It may have been extended after the start of our scan (we
> > * only hold an AccessShareLock, and it could be inserts from this
> > * backend).
> > */
> > if (block >= hscan->rs_nblocks)
> > return false;
> > +#endif
>
> Great, thanks! Can confirm, after applying both the posted patch and the
> change above the issue is not reproducible anymore.
Here's a cleaned-up version of the first two changes. What do you
think about the assertions I make in the commit message for 0002?
> One thing I've noticed is that one can observe a similar issue using a
> gin index and int[] for the "path" column, even applying changes from
> the thread. The gin implementation does something similar to btree in
> startScanEntry -- it lands in "No entry found" branch, but instead of
> locking the relation it locks "the leaf page, to lock the place where
> the entry would've been, had there been one". The similar fix retrying
> ginFindLeafPage didn't solve the problem, even if locking the whole
> relation instead, but maybe I'm missing something.
Ouch. I would have to go and study gin's interlocking model, but one
superficial bug I spotted is that ginget.c's collectMatchBitmap()
calls PredicateLockPage(stack->buffer), where a block number is
expected. I wish we had strong typedefs, to reject stuff like that at
compile time. But fixing that alone isn't enough.
In case someone who knows more about gin is interested in helping, I
attach Artem's repro, modified to use gin.