On Sun, Jun 25, 2023 at 5:26 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> CheckForSerializableConflictIn() was written with the assumption that
> you're inserting a tuple (ie you have the page containing the tuple
> locked), so you'll either conflict with a reader who already has a
> predicate lock at that point OR you'll insert first and then the
> reader will see your (invisible-to-snapshot) tuples, but here we're
> doing some fancy footwork with a meta page, and we screwed up the
> ordering and left a window where neither of those things happens.
FWIW, there is no fundamental reason why nbtree couldn't always
preallocate a single empty leaf page during CREATE INDEX -- this leaf
page is where values whose keyspace is between negative and positive
infinity (i.e. all items) are located. That would fix this bug. Since,
of course, it would make the theory of operation that you describe
work reliably, even with an empty index. This isn't a serious
proposal, of course -- lazily allocating the first real page has
value, and we're hardly going to throw that away just to fix this bug.
My point is that not allocating a leaf page in CREATE INDEX is the
special case here, if anything.
I'm not surprised that GIN has deeper problems than nbtree due to
things like posting trees. Many GIN features rely on the fact that GIN
only supports lossy index scans. For example, it doesn't matter if the
pending list has TIDs that appear elsewhere within the same index for
a while. It doesn't matter if you have a hard crash when merging the
pending list -- VACUUM will eventually clean everything up. Having the
same TID in two different places at the same time is tolerable. I
imagine that that kind of foundation is harder to build SSI predicate
locks on top of.
--
Peter Geoghegan