Re: BUG #17949: Adding an index introduces serialisation anomalies. - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #17949: Adding an index introduces serialisation anomalies.
Date
Msg-id CA+hUKGLE3SONfzBbj3q7T7qYx5FQ9Pk_iGx6N7nYim1qgRrvUg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17949: Adding an index introduces serialisation anomalies.  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: BUG #17949: Adding an index introduces serialisation anomalies.  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
On Thu, Jun 15, 2023 at 7:29 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've tried to reproduce it as well, adding more logging around the
> serialization code. If it helps, what I observe is the second
> overlapping transaction, that has started a bit later, do not error out
> because in OnConflict_CheckForSerializationFailure (when checking for
> "writer has become a pivot") there are no more conflicts received from
> SHMQueueNext. All the rest of the reported serialization conflicts are
> coming from this check, so I assume the incorrect transaction should
> fail there too. Not sure yet why is that so.

Some more observations: happens on 11 and master, happens with btrees,
happens with bitmapscan disabled (eg with plain index scan), but so
far in my testing it doesn't happen if the table already contains one
other tuple (ie if you change the reproducer to insert another row
('foo') after the TRUNCATE).  There is a special case for predicate
locking empty indexes, which uses a relation-level (since there are no
pages to lock yet), but that doesn't seem to be wrong and if you hack
it to lock pages 1 and 2 instead, it still reproduces.  Pondering the
empty index case made me wonder if the case "If we found one of our
own SIREAD locks to remove, remove it now" was implicated (that's
something that would not happen for a relation-level lock), but it
still reproduces if you comment out that optimisation.  So far I have
not been able to reproduce it below 8 threads.  Hmm, I wonder if there
might be a missing check/lock in some racy code path around the
initial creation of the root page...



pgsql-bugs by date:

Previous
From: Jeff Davis
Date:
Subject: Re: pg_dump assertion failure with "-n pg_catalog"
Next
From: Alexander Lakhin
Date:
Subject: Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()