Thread: How do concurrent inserts work?

How do concurrent inserts work?

From
Yaroslav
Date:
Hi. After reading this:

http://postgresql.nabble.com/Re-BUG-12330-ACID-is-broken-for-unique-constraints-td5832085.html

I've come to conclusion that I don't understand PostgreSQL transaction
isolation. :(

Here's the example:
> CREATE TABLE t(a INT PRIMARY KEY);
> INSERT INTO t VALUES(1);

-- Test number 1:
> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SAVEPOINT a;
> INSERT INTO t VALUES(1);
-- This results in 'duplicate key' error, so I reason there is a row with
this value, check it:
> ROLLBACK TO SAVEPOINT a;
> SELECT * FROM t WHERE a = 1;
1 -- 1 row. So yes, there is such row.
> COMMIT; -- done with this test


-- Test number 2:
> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM t WHERE a = 1;
1 -- 1 row
> SAVEPOINT a;

In other session> INSERT INTO t VALUES(2);
-- Back to my session:
> INSERT INTO t VALUES(2);
-- This results in 'duplicate key' error, so I reason there is a row with
this value, check it:
> ROLLBACK TO SAVEPOINT a;
> SELECT * FROM t WHERE a = 2;
-- 0 rows
-- So, I reason... Stop, what? Error told me that there IS such row, but now
I see there ISN'T?!

Can you enlighten me?




-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: How do concurrent inserts work?

From
Serge Fonville
Date:
Hi,

Perhaps because your second session doesn't specify an isolation level?

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

2014-12-27 11:11 GMT+01:00 Yaroslav <ladayaroslav@yandex.ru>:
Hi. After reading this:

http://postgresql.nabble.com/Re-BUG-12330-ACID-is-broken-for-unique-constraints-td5832085.html

I've come to conclusion that I don't understand PostgreSQL transaction
isolation. :(

Here's the example:
> CREATE TABLE t(a INT PRIMARY KEY);
> INSERT INTO t VALUES(1);

-- Test number 1:
> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SAVEPOINT a;
> INSERT INTO t VALUES(1);
-- This results in 'duplicate key' error, so I reason there is a row with
this value, check it:
> ROLLBACK TO SAVEPOINT a;
> SELECT * FROM t WHERE a = 1;
1 -- 1 row. So yes, there is such row.
> COMMIT; -- done with this test


-- Test number 2:
> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM t WHERE a = 1;
1 -- 1 row
> SAVEPOINT a;

In other session> INSERT INTO t VALUES(2);
-- Back to my session:
> INSERT INTO t VALUES(2);
-- This results in 'duplicate key' error, so I reason there is a row with
this value, check it:
> ROLLBACK TO SAVEPOINT a;
> SELECT * FROM t WHERE a = 2;
-- 0 rows
-- So, I reason... Stop, what? Error told me that there IS such row, but now
I see there ISN'T?!

Can you enlighten me?




-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: How do concurrent inserts work?

From
Yaroslav
Date:
Serge Fonville wrote
> Perhaps because your second session doesn't specify an isolation level?

No. ;)
It's the same with any isolation level of the second session.




-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832160.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: How do concurrent inserts work?

From
Serge Fonville
Date:
When I tested the same queries I get the same behaviour.
When both are SERIALIZABLE, the second insert just waits

When one is default (unspecified) and the other is SERIALIZABLE, the behaviour is the same as you describe.

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

2014-12-27 12:54 GMT+01:00 Yaroslav <ladayaroslav@yandex.ru>:
Serge Fonville wrote
> Perhaps because your second session doesn't specify an isolation level?

No. ;)
It's the same with any isolation level of the second session.




-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832160.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: How do concurrent inserts work?

From
Serge Fonville
Date:
My Bad,

You don't have a BEGIN statement, so it's in autocommit.

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

2014-12-27 13:05 GMT+01:00 Serge Fonville <serge.fonville@gmail.com>:
When I tested the same queries I get the same behaviour.
When both are SERIALIZABLE, the second insert just waits

When one is default (unspecified) and the other is SERIALIZABLE, the behaviour is the same as you describe.

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

2014-12-27 12:54 GMT+01:00 Yaroslav <ladayaroslav@yandex.ru>:
Serge Fonville wrote
> Perhaps because your second session doesn't specify an isolation level?

No. ;)
It's the same with any isolation level of the second session.




-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832160.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice


Re: How do concurrent inserts work?

From
Yaroslav
Date:
Serge Fonville wrote
> When I tested the same queries I get the same behaviour.
> When both are SERIALIZABLE, the second insert just waits
>
> When one is default (unspecified) and the other is SERIALIZABLE, the
> behaviour is the same as you describe.

Just re-tested, and no, it doesn't wait and behaves exactly as I described.
My PostgreSQL version is:  "PostgreSQL 9.3.4, compiled by Visual C++ build
1600, 32-bit", if it's relevant.





-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832163.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: How do concurrent inserts work?

From
David G Johnston
Date:
Yaroslav wrote
>
> Serge Fonville wrote
>> When I tested the same queries I get the same behaviour.
>> When both are SERIALIZABLE, the second insert just waits
>>
>> When one is default (unspecified) and the other is SERIALIZABLE, the
>> behaviour is the same as you describe.
> Just re-tested, and no, it doesn't wait and behaves exactly as I
> described.
> My PostgreSQL version is:  "PostgreSQL 9.3.4, compiled by Visual C++ build
> 1600, 32-bit", if it's relevant.

Thinking aloud here but...

There is nothing that says you must be able to see the value with which you
are conflicting.  The argument boils down to when the error occurs:
mid-transaction or at commit.  Mid-transaction is definitely more useful...

Savepoints and serializable transactions, iirc, are problematic in joint
usage because of the usual flow of control and this behavior where you are
trying to use data in-transaction that you cannot see but that the system
knows you are conflicted with.  The system largely expects the error to
stick and for you to likely retry the whole thing and not just rollback to a
savepoint.

David J.



--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832167.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: How do concurrent inserts work?

From
Tom Lane
Date:
Yaroslav <ladayaroslav@yandex.ru> writes:
> http://postgresql.nabble.com/Re-BUG-12330-ACID-is-broken-for-unique-constraints-td5832085.html

> I've come to conclusion that I don't understand PostgreSQL transaction
> isolation. :(

In your example, you've already committed the other insertion of "2",
right?  So the serializable transaction *must* fail to insert "2".
The current coding chooses to give you a "duplicate key" error on
the grounds that that's more helpful than a generic "serialization
failure" error.  The debate around bug #12330 is about whether that
is the best choice of error code ... but one way or the other, you're
going to get an error.  On the other hand, the SELECT step isn't going
to show you the "2", because it's in the future so far as the
transaction's snapshot is concerned.

            regards, tom lane


Re: How do concurrent inserts work?

From
Yaroslav
Date:
Tom Lane-2 wrote
> In your example, you've already committed the other insertion of "2",
> right?  So the serializable transaction *must* fail to insert "2".

Sure.


Tom Lane-2 wrote
> The current coding chooses to give you a "duplicate key" error on
> the grounds that that's more helpful than a generic "serialization
> failure" error.

But it seems counterintuitive. PostgreSQL reports that there is conflicting
row, so I look... and don't see it! Surprising, IMHO.

But why is it more helpful?

It seems that in this situation, if I need, for example, to insert or update
this row (if it exists), my transaction is doomed anyway. So if I got
"serialization failure", I wouldn't even try to 'ROLLBACK TO SAVEPOINT', as
it's pointless (right?). With "duplicate key" error, I may decide that
committed row actually exists and try to update it in vain.


Tom Lane-2 wrote
> The debate around bug #12330 is about whether that
> is the best choice of error code ... but one way or the other, you're
> going to get an error.  On the other hand, the SELECT step isn't going
> to show you the "2", because it's in the future so far as the
> transaction's snapshot is concerned.

Ok, I understand the principle behind it. Thanks a lot!




-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832174.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: How do concurrent inserts work?

From
Daniel Staal
Date:
--As of December 27, 2014 12:37:28 PM -0700, Yaroslav is alleged to have
said:

> Tom Lane-2 wrote
>> The current coding chooses to give you a "duplicate key" error on
>> the grounds that that's more helpful than a generic "serialization
>> failure" error.
>
> But it seems counterintuitive. PostgreSQL reports that there is
> conflicting row, so I look... and don't see it! Surprising, IMHO.
>
> But why is it more helpful?
>
> It seems that in this situation, if I need, for example, to insert or
> update this row (if it exists), my transaction is doomed anyway. So if I
> got "serialization failure", I wouldn't even try to 'ROLLBACK TO
> SAVEPOINT', as it's pointless (right?). With "duplicate key" error, I may
> decide that committed row actually exists and try to update it in vain.

--As for the rest, it is mine.

On the other hand, with a 'duplicate key' error, you could in theory
generate a new key (if that's possible) and succeed, where with a
'serialization failure' error you wouldn't know that's an option; it could
be *any* serialization failure in any of the fields/rows being updated.

Neither's perfect.  I can see the arguments for this one - it's the more
specific error, so you can try to deal with it, but it also 'leaks'
serialization, which leads to your issues.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


Re: How do concurrent inserts work?

From
Kevin Grittner
Date:
Yaroslav <ladayaroslav@yandex.ru> wrote:

> Error told me that there IS such row, but now I see there ISN'T?!
>
> Can you enlighten me?

The most important issue here is that if there are concurrent
serializable transactions the effects of those which successfully
commit must be consistent with some serial (one-at-a-time) order of
execution.  Within that transaction you see the concurrent insert
(in the form of the duplicate key error), which means your
transaction must have logically run *after* the other transaction;
yet when you try to select the row you don't see it, which means
your transaction must have logically run *before* the other
transaction.  If your top-level transaction is allowed to commit,
that is indeed a failure to conform to the SQL standard and to what
(in my experience) most people who rely on serializable
transactions expect.  If it rolls back, then the invariant that the
effects on the database are consistent with some one-at-a-time
execution of the successful transactions holds, and ACID properties
(and conformance to the standard) are maintained.

As Tom mentioned, the argument on the bug report you cited is about
which error it is more useful to generate -- the one indicating
that the transaction failed due to the actions of one or more
concurrent transactions or the one indicating that there was a
duplicate key.

I'll try not to misrepresent the other position (although I admit
to having a strong opinion).

The argument in favor of the serialization failure is that most
software using that isolation level traps serialization failures at
a low level and is able to restart the transaction from the
beginning.  In your example, the new run of the transaction will
see the duplicate row, so it will be indistinguishable from your
first case (where the row exists before your transaction starts).
This retry-and-see-a-consistent-state approach is preferred in some
shops because it avoids the need to write any special code to
handle race conditions for concurrent transactions.

The argument in favor of the duplicate key error is that the detail
shows you what the duplicate key values are.  A secondary argument
is that any software framework which handles serialization failures
by retrying such transactions from the beginning should probably
also do one or more (but a finite number of) retries for a
duplicate key error, since it might indicate a problem with a
concurrent transaction.  In your case there could be special coding
to handle the duplicate key, and since it would be clear from the
contradictory visibility indications that it is the result of a
race condition, you might be able to write special edge-condition
code to handle it in a special way that might be faster than
restarting the transaction.

In almost all software I've worked with in the past few decades, a
serialization failure (whether in the form of a deadlock, an MVCC
write conflict, or other) is not reported back to the application
code.  The transaction retry is automatic, so the effect from the
user PoV and the application software PoV is identical to the
transaction having been temporarily blocked.  No special coding to
handle race conditions is needed, desirable, or (by management
fiat) allowed.  I have sometimes seen special handling of duplicate
keys, but since the error doesn't tell you whether the duplicate
was from a race condition these have most often just been left to
be user-visible errors.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company