Thread: How do concurrent inserts work?
Hi. After reading this: http://postgresql.nabble.com/Re-BUG-12330-ACID-is-broken-for-unique-constraints-td5832085.html I've come to conclusion that I don't understand PostgreSQL transaction isolation. :( Here's the example: > CREATE TABLE t(a INT PRIMARY KEY); > INSERT INTO t VALUES(1); -- Test number 1: > START TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SAVEPOINT a; > INSERT INTO t VALUES(1); -- This results in 'duplicate key' error, so I reason there is a row with this value, check it: > ROLLBACK TO SAVEPOINT a; > SELECT * FROM t WHERE a = 1; 1 -- 1 row. So yes, there is such row. > COMMIT; -- done with this test -- Test number 2: > START TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SELECT * FROM t WHERE a = 1; 1 -- 1 row > SAVEPOINT a; In other session> INSERT INTO t VALUES(2); -- Back to my session: > INSERT INTO t VALUES(2); -- This results in 'duplicate key' error, so I reason there is a row with this value, check it: > ROLLBACK TO SAVEPOINT a; > SELECT * FROM t WHERE a = 2; -- 0 rows -- So, I reason... Stop, what? Error told me that there IS such row, but now I see there ISN'T?! Can you enlighten me? ----- WBR, Yaroslav Schekin. -- View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
Hi,
Perhaps because your second session doesn't specify an isolation level?2014-12-27 11:11 GMT+01:00 Yaroslav <ladayaroslav@yandex.ru>:
Hi. After reading this:
http://postgresql.nabble.com/Re-BUG-12330-ACID-is-broken-for-unique-constraints-td5832085.html
I've come to conclusion that I don't understand PostgreSQL transaction
isolation. :(
Here's the example:
> CREATE TABLE t(a INT PRIMARY KEY);
> INSERT INTO t VALUES(1);
-- Test number 1:
> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SAVEPOINT a;
> INSERT INTO t VALUES(1);
-- This results in 'duplicate key' error, so I reason there is a row with
this value, check it:
> ROLLBACK TO SAVEPOINT a;
> SELECT * FROM t WHERE a = 1;
1 -- 1 row. So yes, there is such row.
> COMMIT; -- done with this test
-- Test number 2:
> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM t WHERE a = 1;
1 -- 1 row
> SAVEPOINT a;
In other session> INSERT INTO t VALUES(2);
-- Back to my session:
> INSERT INTO t VALUES(2);
-- This results in 'duplicate key' error, so I reason there is a row with
this value, check it:
> ROLLBACK TO SAVEPOINT a;
> SELECT * FROM t WHERE a = 2;
-- 0 rows
-- So, I reason... Stop, what? Error told me that there IS such row, but now
I see there ISN'T?!
Can you enlighten me?
-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Serge Fonville wrote > Perhaps because your second session doesn't specify an isolation level? No. ;) It's the same with any isolation level of the second session. ----- WBR, Yaroslav Schekin. -- View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832160.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
When I tested the same queries I get the same behaviour.
When both are SERIALIZABLE, the second insert just waits2014-12-27 12:54 GMT+01:00 Yaroslav <ladayaroslav@yandex.ru>:
Serge Fonville wrote
> Perhaps because your second session doesn't specify an isolation level?
No. ;)
It's the same with any isolation level of the second session.
-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832160.htmlSent from the PostgreSQL - novice mailing list archive at Nabble.com.
--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
My Bad,
You don't have a BEGIN statement, so it's in autocommit.2014-12-27 13:05 GMT+01:00 Serge Fonville <serge.fonville@gmail.com>:
When one is default (unspecified) and the other is SERIALIZABLE, the behaviour is the same as you describe.When I tested the same queries I get the same behaviour.When both are SERIALIZABLE, the second insert just waits2014-12-27 12:54 GMT+01:00 Yaroslav <ladayaroslav@yandex.ru>:Serge Fonville wrote
> Perhaps because your second session doesn't specify an isolation level?
No. ;)
It's the same with any isolation level of the second session.
-----
WBR, Yaroslav Schekin.
--
View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832160.htmlSent from the PostgreSQL - novice mailing list archive at Nabble.com.
--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Serge Fonville wrote > When I tested the same queries I get the same behaviour. > When both are SERIALIZABLE, the second insert just waits > > When one is default (unspecified) and the other is SERIALIZABLE, the > behaviour is the same as you describe. Just re-tested, and no, it doesn't wait and behaves exactly as I described. My PostgreSQL version is: "PostgreSQL 9.3.4, compiled by Visual C++ build 1600, 32-bit", if it's relevant. ----- WBR, Yaroslav Schekin. -- View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832163.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
Yaroslav wrote > > Serge Fonville wrote >> When I tested the same queries I get the same behaviour. >> When both are SERIALIZABLE, the second insert just waits >> >> When one is default (unspecified) and the other is SERIALIZABLE, the >> behaviour is the same as you describe. > Just re-tested, and no, it doesn't wait and behaves exactly as I > described. > My PostgreSQL version is: "PostgreSQL 9.3.4, compiled by Visual C++ build > 1600, 32-bit", if it's relevant. Thinking aloud here but... There is nothing that says you must be able to see the value with which you are conflicting. The argument boils down to when the error occurs: mid-transaction or at commit. Mid-transaction is definitely more useful... Savepoints and serializable transactions, iirc, are problematic in joint usage because of the usual flow of control and this behavior where you are trying to use data in-transaction that you cannot see but that the system knows you are conflicted with. The system largely expects the error to stick and for you to likely retry the whole thing and not just rollback to a savepoint. David J. -- View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832167.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
Yaroslav <ladayaroslav@yandex.ru> writes: > http://postgresql.nabble.com/Re-BUG-12330-ACID-is-broken-for-unique-constraints-td5832085.html > I've come to conclusion that I don't understand PostgreSQL transaction > isolation. :( In your example, you've already committed the other insertion of "2", right? So the serializable transaction *must* fail to insert "2". The current coding chooses to give you a "duplicate key" error on the grounds that that's more helpful than a generic "serialization failure" error. The debate around bug #12330 is about whether that is the best choice of error code ... but one way or the other, you're going to get an error. On the other hand, the SELECT step isn't going to show you the "2", because it's in the future so far as the transaction's snapshot is concerned. regards, tom lane
Tom Lane-2 wrote > In your example, you've already committed the other insertion of "2", > right? So the serializable transaction *must* fail to insert "2". Sure. Tom Lane-2 wrote > The current coding chooses to give you a "duplicate key" error on > the grounds that that's more helpful than a generic "serialization > failure" error. But it seems counterintuitive. PostgreSQL reports that there is conflicting row, so I look... and don't see it! Surprising, IMHO. But why is it more helpful? It seems that in this situation, if I need, for example, to insert or update this row (if it exists), my transaction is doomed anyway. So if I got "serialization failure", I wouldn't even try to 'ROLLBACK TO SAVEPOINT', as it's pointless (right?). With "duplicate key" error, I may decide that committed row actually exists and try to update it in vain. Tom Lane-2 wrote > The debate around bug #12330 is about whether that > is the best choice of error code ... but one way or the other, you're > going to get an error. On the other hand, the SELECT step isn't going > to show you the "2", because it's in the future so far as the > transaction's snapshot is concerned. Ok, I understand the principle behind it. Thanks a lot! ----- WBR, Yaroslav Schekin. -- View this message in context: http://postgresql.nabble.com/How-do-concurrent-inserts-work-tp5832157p5832174.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
--As of December 27, 2014 12:37:28 PM -0700, Yaroslav is alleged to have said: > Tom Lane-2 wrote >> The current coding chooses to give you a "duplicate key" error on >> the grounds that that's more helpful than a generic "serialization >> failure" error. > > But it seems counterintuitive. PostgreSQL reports that there is > conflicting row, so I look... and don't see it! Surprising, IMHO. > > But why is it more helpful? > > It seems that in this situation, if I need, for example, to insert or > update this row (if it exists), my transaction is doomed anyway. So if I > got "serialization failure", I wouldn't even try to 'ROLLBACK TO > SAVEPOINT', as it's pointless (right?). With "duplicate key" error, I may > decide that committed row actually exists and try to update it in vain. --As for the rest, it is mine. On the other hand, with a 'duplicate key' error, you could in theory generate a new key (if that's possible) and succeed, where with a 'serialization failure' error you wouldn't know that's an option; it could be *any* serialization failure in any of the fields/rows being updated. Neither's perfect. I can see the arguments for this one - it's the more specific error, so you can try to deal with it, but it also 'leaks' serialization, which leads to your issues. Daniel T. Staal --------------------------------------------------------------- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---------------------------------------------------------------
Yaroslav <ladayaroslav@yandex.ru> wrote: > Error told me that there IS such row, but now I see there ISN'T?! > > Can you enlighten me? The most important issue here is that if there are concurrent serializable transactions the effects of those which successfully commit must be consistent with some serial (one-at-a-time) order of execution. Within that transaction you see the concurrent insert (in the form of the duplicate key error), which means your transaction must have logically run *after* the other transaction; yet when you try to select the row you don't see it, which means your transaction must have logically run *before* the other transaction. If your top-level transaction is allowed to commit, that is indeed a failure to conform to the SQL standard and to what (in my experience) most people who rely on serializable transactions expect. If it rolls back, then the invariant that the effects on the database are consistent with some one-at-a-time execution of the successful transactions holds, and ACID properties (and conformance to the standard) are maintained. As Tom mentioned, the argument on the bug report you cited is about which error it is more useful to generate -- the one indicating that the transaction failed due to the actions of one or more concurrent transactions or the one indicating that there was a duplicate key. I'll try not to misrepresent the other position (although I admit to having a strong opinion). The argument in favor of the serialization failure is that most software using that isolation level traps serialization failures at a low level and is able to restart the transaction from the beginning. In your example, the new run of the transaction will see the duplicate row, so it will be indistinguishable from your first case (where the row exists before your transaction starts). This retry-and-see-a-consistent-state approach is preferred in some shops because it avoids the need to write any special code to handle race conditions for concurrent transactions. The argument in favor of the duplicate key error is that the detail shows you what the duplicate key values are. A secondary argument is that any software framework which handles serialization failures by retrying such transactions from the beginning should probably also do one or more (but a finite number of) retries for a duplicate key error, since it might indicate a problem with a concurrent transaction. In your case there could be special coding to handle the duplicate key, and since it would be clear from the contradictory visibility indications that it is the result of a race condition, you might be able to write special edge-condition code to handle it in a special way that might be faster than restarting the transaction. In almost all software I've worked with in the past few decades, a serialization failure (whether in the form of a deadlock, an MVCC write conflict, or other) is not reported back to the application code. The transaction retry is automatic, so the effect from the user PoV and the application software PoV is identical to the transaction having been temporarily blocked. No special coding to handle race conditions is needed, desirable, or (by management fiat) allowed. I have sometimes seen special handling of duplicate keys, but since the error doesn't tell you whether the duplicate was from a race condition these have most often just been left to be user-visible errors. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company