Re: BUG #13667: SSI violation... - Mailing list pgsql-bugs

From Kevin Grittner
Subject Re: BUG #13667: SSI violation...
Date
Msg-id 1285273220.277372.1446226561203.JavaMail.yahoo@mail.yahoo.com
Whole thread Raw
In response to Re: BUG #13667: SSI violation...  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: BUG #13667: SSI violation...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Friday, October 9, 2015 4:49 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

> I agree this is a bug, and it behave like a race condition with a
> very narrow window of time for the second process to hit.  It has
> proven very hard to find the cause.

Thanks to Thomas Munro joining me in a 2.5 day marathon hunt for
this bug, we have found it and squashed it with the attached patch.
The issue in heap_insert() in heapam.c was the bug that was
actually causing the failures in the specific tests we had, but
other locations doing insert, update, or delete had similar bugs,
and we found a few stray issues in predicate.c that are also fixed
here.

The basic problem was that in the initial implementation we tried a
little too hard to optimize, at the expense of correctness.  When
we were going to insert, for example, we checked for an existing
predicate lock on the relation first, to avoid the effort of
inserting the tuple, with WAL-logging, if we were just going to
roll back the transaction anyway because we found a rw-conflict
that completed a "dangerous structure".  The problem is, it allowed
this sequence to create an undetected serialization anomaly:

  T1
  ----------
                T2
                ----------
  lock
  read
  checklocks
                lock
                read
  insert

At this point things are hosed, regardless of timings past this.
The window in the test was small because T2 has to acquire its
SIReadLock and scan the whole table before T1 manages to insert a
tuple, but with enough contention T1 can stall long enough to see
this happen.  When T2 checks locks prior to its insert it will
detect the rw-conflict from T1 to T2, but the rw-conflict in the
other direction won't be detected, so no dangerous structure is
formed and nothing is rolled back.

Moving the check past the insert would fix the problem, but the
reason it was put in front is enshrined in a comment at each
problem location; for example:

 * We're about to do the actual insert -- but check for conflict first, to
 * avoid possibly having to roll back work we've just done.

These checks are about as close to free as you can get if the
transaction doing the check is not serializable; it doesn't even
need to take out a LW lock to determine there is nothing to be
done.  The reason given in the comment still has merit for
serializable transactions; even for them the check is orders of
magnitude cheaper than a WAL logged tuple insert.  It only requires
an occasional serialization failure detection there to come out
ahead.  So rather than move the existing check, we added a recheck
after.

Barring objections I will push this tomorrow, including
back-patching it to all supported branches.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Next
From: Tom Lane
Date:
Subject: Re: BUG #13667: SSI violation...