Thread: Working around spurious unique constraint errors due to SERIALIZABLE bug

Working around spurious unique constraint errors due to SERIALIZABLE bug

From
Florian Weimer
Date:
SERIALIZABLE isolation level doesn't really conform to the spec
because it doesn't deal with phantoms.  The only case I've come across
where this actually matters is when you're implementing some sort of
"insert into table if not yet present" operation.  This will typically
result in a unique constraint violation.[*]

Usually, constraint violations are programming errors, but not this
one.  It's more like a detected deadlock.  Is there a way to tell this
type of constraint violation from other types, so that the transaction
can be restarted automatically (as if there was a deadlock)?
Theoretically, PostgreSQL should detect that the conflicting row
wasn't there when the snapshot for the transaction was taken, and
somehow export this piece of information, but I'm not sure if it's
available to the client.

[*] One way to work around this is to batch inserts and eventually
perform them in a background task which doesn't run in parallel, but
this approach isn't always possible.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Working around spurious unique constraint errors due to SERIALIZABLE bug

From
"Albe Laurenz"
Date:
Florian Weimer wrote:
> SERIALIZABLE isolation level doesn't really conform to the spec
> because it doesn't deal with phantoms.  The only case I've come across
> where this actually matters is when you're implementing some sort of
> "insert into table if not yet present" operation.  This will typically
> result in a unique constraint violation.[*]
>
> Usually, constraint violations are programming errors, but not this
> one.  It's more like a detected deadlock.  Is there a way to tell this
> type of constraint violation from other types, so that the transaction
> can be restarted automatically (as if there was a deadlock)?
> Theoretically, PostgreSQL should detect that the conflicting row
> wasn't there when the snapshot for the transaction was taken, and
> somehow export this piece of information, but I'm not sure if it's
> available to the client.
>
> [*] One way to work around this is to batch inserts and eventually
> perform them in a background task which doesn't run in parallel, but
> this approach isn't always possible.

Let me construct an example:

CREATE TABLE a (id integer PRIMARY KEY);

CREATE FUNCTION ins(i integer) RETURNS boolean
   LANGUAGE plpgsql STRICT AS
$$DECLARE
   i2 integer;
BEGIN
   SELECT COUNT(id) INTO i2 FROM a WHERE id = i;
   IF i2 = 0 THEN
      /* This INSERT will never throw an exception if the
         transactions are truly serialized */
      INSERT INTO a (id) VALUES (i);
      RETURN TRUE;
   ELSE
      RETURN FALSE;
   END IF;
END;$$;

Now sessions A and B do the following:

A: START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
A: SELECT * FROM a;
 id
----
(0 rows)

B: START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
B: SELECT * FROM a;
 id
----
(0 rows)

B: SELECT ins(1);
 ins
-----
 t
(1 row)

A: SELECT ins(1);
Session A is blocked by B's exclusive lock.

B: COMMIT;

Now A gets:
ERROR:  duplicate key value violates unique constraint "a_pkey"
CONTEXT:  SQL statement "INSERT INTO a (id) VALUES ( $1 )"
PL/pgSQL function "ins" line 1 at SQL statement


This is what you are talking about, right?

I am not sure what exactly you mean by retrying the transaction in
Session A. Even on a second try A would not be able to insert the
duplicate key. But at least there would not be an error:

A: ROLLBACK;
A: SELECT ins(1);
 ins
-----
 f
(1 row)

The best way to work around a problem like this is to write
code that does not assume true serializability, for example:

BEGIN
   INSERT INTO a (id) VALUES (i);
   RETURN TRUE;
EXCEPTION
   WHEN unique_violation THEN
      RETURN FALSE;
END;

Maybe my example is too simple, but it should work similar to this whenever
error conditions are involved.

Other problems will be more tricky (I am thinking of the example I
constructed for http://archives.postgresql.org/pgsql-hackers/2009-05/msg00316.php).
I don't think that there is a "king's way" to cope with all possible problems.

Yours,
Laurenz Albe

Re: Working around spurious unique constraint errors due to SERIALIZABLE bug

From
Florian Weimer
Date:
* Albe Laurenz:

>    SELECT COUNT(id) INTO i2 FROM a WHERE id = i;
>    IF i2 = 0 THEN
>       /* This INSERT will never throw an exception if the
>          transactions are truly serialized */
>       INSERT INTO a (id) VALUES (i);
>       RETURN TRUE;
>    ELSE
>       RETURN FALSE;
>    END IF;

> This is what you are talking about, right?

Yes.

> I am not sure what exactly you mean by retrying the transaction in
> Session A. Even on a second try A would not be able to insert the
> duplicate key. But at least there would not be an error:

I often need to obtain the automatically generated primary key in both
cases (with and without INSERT).

> The best way to work around a problem like this is to write
> code that does not assume true serializability, for example:
>
> BEGIN
>    INSERT INTO a (id) VALUES (i);
>    RETURN TRUE;
> EXCEPTION
>    WHEN unique_violation THEN
>       RETURN FALSE;
> END;

Oh, since when does this perform an implicit snapshot?  I haven't
noticed this before.

The drawback is that some of the side effects of the INSERT occur
before the constraint check fails, so it seems to me that I still need
to perform the select.

My main concern is that the unqiue violation could occur for another
reason (which would be a bug), and I want to avoid an endless loop in
such cases.  But if it's possible to isolate this type of error
recovery to a single statement, this risk is greatly reduced.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

On Thu, 2009-07-16 at 14:13 +0000, Florian Weimer wrote:

> The drawback is that some of the side effects of the INSERT occur
> before the constraint check fails, so it seems to me that I still need
> to perform the select.

I was about to foolishly suggest:
Instead of:

SELECT 1 FROM x WHERE a = 4;
IF NOT FOUND THEN
  INSERT INTO x (a,b) VALUES (4,10);
END IF;

trying:

INSERT INTO x (a, b)
SELECT 4, 10 WHERE NOT EXISTS(SELECT 1 FROM x WHERE a = 4);

... but then realised I couldn't come up with any justification for how
it'd help (after all, the WHERE clause still has to be evaluated before
the INSERT can proceed, there's still no predicate locking, and the
statements can be evaluated concurrently) so I thought I'd test it.

The test program, attached, demonstrates what I should've known in the
first place. In SERIALIZABLE isolation, the above is *guaranteed* to
fail every time there's conflict, because concurrent transactions cannot
see changes committed by the others. So is a SELECT test then separate
INSERT, by the way.

In READ COMITTED you get away with it a lot of the time because the
statement can see other transaction(s)' committed changes so the
subquery often matches - but it's a race, and eventually you'll hit a
situation where the subquery for two concurrent transactions is
evaluated before either's insert is issued or at least is committed.

In my test program I've managed as many as 1283 steps before two racing
READ COMMITTED transactions collide. That's in a program designed to
synchronize each transaction before each insert for maximum collision
potential. With six racing transactions I've rarely seen more than three
steps without a collision.

( I've attached the test program in case it's of any interest. It's a
Python test controller that spawns slave processes which it synchronises
using Pg's advisory locking. It ensures that the slaves all start each
INSERT attempt together, and all finish before starting the next
attempt. Signals are used for failure notification, cleanup, etc. )

Anyway, the point is that you're actually worse off in this particular
situation thanks to your use of SERIALIZABLE isolation. However, READ
COMMITTED just gives you a race you're likely to win most of the time
instead of a guaranteed failure whenever there's a race, so it's not
really a solution.

Given that, it seems to me you'll have to rely on Pg's internal
lower-level synchonization around unique indexes. Try the insert and see
if it fails, then ROLLBACK TO SAVEPOINT (or use a PL/PgSQL exception
block). As you noted, this does mean that certain side-effects may
occur, including:

   - advancement of sequences due to nextval(...) calls

   - triggers that've done work that can't be rolled back, eg
     dblink calls, external file writes, inter-process communication etc

If you really can't afford the INSERT side effects and can't redesign
your code to be tolerant of them, you can always lock the table before
an INSERT.

If you can't afford to lock the table due to its impact on performance,
you can potentially use Pg's advisory locking mechanism to protect your
inserts. Eg (PL/PgSQL):

PERFORM pg_advisory_lock(4);
SELECT 1 FROM x WHERE a = 4;
IF NOT FOUND THEN
  INSERT INTO x (a,b) VALUES (4,10);
END IF;
PERFORM pg_advisory_unlock(4);

(You might want to use the two-argument form of the advisory locking
calls if your IDs are INTEGER size not INT8, and use the table oid for
the first argument.)

If every possible INSERTer ensures it holds the lock on the id of
interest before inserting, you'll be fine. Yes, it's ugly, but it
preserves concurrent insert performance while eliminating failed
INSERTs. A possible way to ensure that every possible INSERTer does do
the right thing is to drop the INSERT privilege on the table and then
use a SECURITY DEFINER function that checks the caller's rights and does
the INSERT.







Also: Is this really a phantom read? Your issue is not that you read a
record that then vanishes or no longer matches your filter criteria;
rather, it's that a record is created that matches your criteria after
you tested for it.

Certainly that wouldn't be possible if the concurrent transactions were
actually executed serially, but does the standard actually require that
this be the case? If it does, then compliant implementations would have
to do predicate locking. Ouch. Does anybody do that? It seems MS-SQL
implements very limited predicate locking (enough to handle your issue)
but not enough to tackle aggregates or anything complex.

--
Craig Ringer

Attachment

Re: Working around spurious unique constraint errors due to SERIALIZABLE bug

From
"Albe Laurenz"
Date:
Craig Ringer wrote:
> > The drawback is that some of the side effects of the INSERT occur
> > before the constraint check fails, so it seems to me that I still need
> > to perform the select.
> 
> If you really can't afford the INSERT side effects and can't redesign
> your code to be tolerant of them, you can always lock the table before
> an INSERT.

I wonder what could be the side effects of an INSERT that causes an error
that is caught in a plpgsql exception block.

What happens behind the scenes when an exception is caught is that the
transaction is rolled back to a savepoint that was set at the beginning
of the block.

So all changes to the database that were caused as side effects of the INSERT,
for example triggers, will be rolled back as well.

The only side effects that would remain could be caused by things that
go outside the limits of the database, e.g. modify files on the database
server or perform trigger based replication to another database.
Everything that is not under MVCC control, for short.

Is that the problem here?


On another line:
The original question asked was "how can I tell an error that is caused
by incomplete isolation from another error?"

If you have a code segment like
   SELECT COUNT(id) INTO i2 FROM a WHERE id = i;
   IF i2 = 0 THEN
      INSERT INTO a (id) VALUES (i);
   END IF;

Then you can be certain that any "unique_violation" thrown here must
be a serialization problem (if the only unique contraint is on "id").

Yours,
Laurenz Albe

Re: Working around spurious unique constraint errors due to SERIALIZABLE bug

From
Florian Weimer
Date:
* Craig Ringer:

> The test program, attached, demonstrates what I should've known in the
> first place. In SERIALIZABLE isolation, the above is *guaranteed* to
> fail every time there's conflict, because concurrent transactions cannot
> see changes committed by the others. So is a SELECT test then separate
> INSERT, by the way.

Yes, I forgot to mention that you can't use SERIALIZABLE if you use
this approach.

> Given that, it seems to me you'll have to rely on Pg's internal
> lower-level synchonization around unique indexes. Try the insert and see
> if it fails, then ROLLBACK TO SAVEPOINT (or use a PL/PgSQL exception
> block). As you noted, this does mean that certain side-effects may
> occur, including:
>
>    - advancement of sequences due to nextval(...) calls
>
>    - triggers that've done work that can't be rolled back, eg
>      dblink calls, external file writes, inter-process communication etc

It's also the cost of producing the input data for the INSERT.

> (You might want to use the two-argument form of the advisory locking
> calls if your IDs are INTEGER size not INT8, and use the table oid for
> the first argument.)

Locking on a hash value could also be an option (it's how concurrent
hash tables are sometimes implemented).

> Also: Is this really a phantom read? Your issue is not that you read a
> record that then vanishes or no longer matches your filter criteria;
> rather, it's that a record is created that matches your criteria after
> you tested for it.

It's the INSERT which performs the phantom read.

And is SQL's definition of serializability really different from the
textbook one?

> Certainly that wouldn't be possible if the concurrent transactions were
> actually executed serially, but does the standard actually require that
> this be the case? If it does, then compliant implementations would have
> to do predicate locking. Ouch. Does anybody do that?

You don't need predicate locking here.  You just have to lock on the
gap in the index you touched.  I think some implementations do this
(InnoDB calls it "next-key locking").

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Working around spurious unique constraint errors due to SERIALIZABLE bug

From
Florian Weimer
Date:
* Albe Laurenz:

> The original question asked was "how can I tell an error that is caused
> by incomplete isolation from another error?"
>
> If you have a code segment like
>    SELECT COUNT(id) INTO i2 FROM a WHERE id = i;
>    IF i2 = 0 THEN
>       INSERT INTO a (id) VALUES (i);
>    END IF;
>
> Then you can be certain that any "unique_violation" thrown here must
> be a serialization problem (if the only unique contraint is on "id").

I want to put this into a library, so I'd like something foolproof.
Right now, user code sets a flag which basically says that the wrapper
should retry the transaction a few times if a unique_violation is
detected, but I'd like to get rid of that because it's one thing less
the programmer needs to worry about.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99