Thread: postgres 9.3.6, serialize error with two independent, serially consistent transactions..

In the example below, client #1 and client #2 are not doing anything
that can possibly cause a inconsistent serial order of execution. You
can pick and choose whichever one you want to run first, the final
result will be 100% consistent and exactly the same, either way.

So why does postgres 9.3.6 give an error to the second client ? Unless I
am reading this docs wrong, this looks like a major bug.

Best,
-H

=============== Initial setup ==========
test# create table test (id int, value text);
CREATE TABLE

test# insert into test (id, value) values (1, 'a');
INSERT 0 1

=============== Client #1 ==========
test# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
test# select * from test where id = 1;
  id | value
----+-------
   1 | a
(1 row)
test=# insert into test (id, value) values (1, 'b');
INSERT 0 1
test=# commit;
COMMIT

========== Client #2 (concurrently with client #1) ==========
test# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
test# select * from test where id = 2;
  id | value
----+-------
(0 rows)
test# insert into test (id, value) values (2, 'xxx');
INSERT 0 1
test# commit;
ERROR:  could not serialize access due to read/write dependencies among
transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during
commit attempt.
HINT:  The transaction might succeed if retried.
Hursh Jain <hurshjain@beesell.com> wrote:

> In the example below, client #1 and client #2 are not doing
> anything that can possibly cause a inconsistent serial order of
> execution. You can pick and choose whichever one you want to run
> first, the final result will be 100% consistent and exactly the
> same, either way.
>
> So why does postgres 9.3.6 give an error to the second client ?
> Unless I am reading this docs wrong, this looks like a major bug.

This is not a bug.  What use of serializable transactions
guarantees is that the behavior of any set of successfully
committed serializable transactions is consistent with some serial
order of execution of those transactions.  If you retry the second
transaction (which got the serialization failure) it will succeed
and you have behavior consistent with either order of execution, so
the conditions are met.  There is no guarantee that there will be
no false positives (where a serialization failure is generated
where with infinite time and resources to analyze everything it
could have been avoided).

Note that if you add a primary key on the id column, and use values
that do not cause duplicate keys, you do not get any error.  Without
an index a full table scan is needed, which causes a predicate lock
which has a rw-conflict with any insert to the table.  Two
transactions doing this concurrently will cause a serialization
failure.  If we tracked enough information to avoid this case,
resource usage and performance would not be in a range most would
consider acceptable; heuristics are needed, which inevitably allows
false positives in some circumstances.

If you try your example on any database which uses strict two phase
locking (S2PL) to implement serializable transactions (e.g., MS SQL
Server, DB2, MySQL, Berkeley DB, etc.), I think you will find that
you also get a false positive serialization failure in the form of
a deadlock.  In databases which do not actually provide
serializable behavior when you ask for it (e.g., Oracle), you will
not get an error.  If you want that level of transaction isolation
in PostgreSQL you can request REPEATABLE READ, and you will
likewise not get an error.  For a description of the differences
between these isolation levels see the docs:

http://www.postgresql.org/docs/current/interactive/mvcc.html

For some examples of cases where you get different results between
these two isolation levels, see:

https://wiki.postgresql.org/wiki/SSI

PostgreSQL gives you a choice of which behavior you prefer.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thanks for this clarification.

If you look at the documentation here:
  http://www.postgresql.org/docs/9.4/static/transaction-iso.html

The Serialization example (with the sum being 330/300 based on execution
order) implies that the opposite would work, when there were serially
consistent updates.

However, the second transaction could still fail (unless retried).

 > There is no guarantee that there will be
 > no false positives (where a serialization failure is generated
 > where with infinite time and resources to analyze everything it
 > could have been avoided).

It would be very helpful to have the above paragraph (and in fact the
larger exposition you typed below) in the documentation itself! The
current example on that page implies that serialization *only* fails
because of order related inconsistencies, however, it can also fail
because of false positives and this should really be mentioned as well.

Currently, this is only vaguely alluded to, via 1 sentence (the last
line) on that page:
   "A sequential scan will always necessitate a relation-level predicate
lock"
which doesn't really mean too much to a lot of people who are getting
started..

Best,
--Hursh

Kevin Grittner wrote:
> Hursh Jain<hurshjain@beesell.com>  wrote:
>
>> In the example below, client #1 and client #2 are not doing
>> anything that can possibly cause a inconsistent serial order of
>> execution. You can pick and choose whichever one you want to run
>> first, the final result will be 100% consistent and exactly the
>> same, either way.
>>
>> So why does postgres 9.3.6 give an error to the second client ?
>> Unless I am reading this docs wrong, this looks like a major bug.
>
> This is not a bug.  What use of serializable transactions
> guarantees is that the behavior of any set of successfully
> committed serializable transactions is consistent with some serial
> order of execution of those transactions.  If you retry the second
> transaction (which got the serialization failure) it will succeed
> and you have behavior consistent with either order of execution, so
> the conditions are met.  There is no guarantee that there will be
> no false positives (where a serialization failure is generated
> where with infinite time and resources to analyze everything it
> could have been avoided).
>
> Note that if you add a primary key on the id column, and use values
> that do not cause duplicate keys, you do not get any error.  Without
> an index a full table scan is needed, which causes a predicate lock
> which has a rw-conflict with any insert to the table.  Two
> transactions doing this concurrently will cause a serialization
> failure.  If we tracked enough information to avoid this case,
> resource usage and performance would not be in a range most would
> consider acceptable; heuristics are needed, which inevitably allows
> false positives in some circumstances.
>
> If you try your example on any database which uses strict two phase
> locking (S2PL) to implement serializable transactions (e.g., MS SQL
> Server, DB2, MySQL, Berkeley DB, etc.), I think you will find that
> you also get a false positive serialization failure in the form of
> a deadlock.  In databases which do not actually provide
> serializable behavior when you ask for it (e.g., Oracle), you will
> not get an error.  If you want that level of transaction isolation
> in PostgreSQL you can request REPEATABLE READ, and you will
> likewise not get an error.  For a description of the differences
> between these isolation levels see the docs:
>
> http://www.postgresql.org/docs/current/interactive/mvcc.html
>
> For some examples of cases where you get different results between
> these two isolation levels, see:
>
> https://wiki.postgresql.org/wiki/SSI
>
> PostgreSQL gives you a choice of which behavior you prefer.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company