Thread: Question about conccurrency control and Insert
Hi
I currently use pgsql 7.2.4 (but the following has also been seen on pgsql 7.3.3) with a transaction level set to "read committed".
It do a lot of little tests to understand how concurrency control works. Let see this scenario:
We have a table named "test_count" and a field named "count"
The table contains 1 entry with count=1
Client 1:
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"
...
Client 2 :
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok
...
We continue :
Client 1:
INSERT INTO test_count VALUES (2);
COMMIT;
Client 2: (after commit of client 1)
[The select that was blocked is now free. But the result is the first row containing "1". I'm surprised by this result]
SELECT count FROM test_count; --> now returns the two rows, on containing "1", the other containing "2"
COMMIT;
So my question is : why the SELECT...FOR UPDATE of client 2, when unblocked, returns only one row, and a following SELECT in the same transaction returns two rows ? Is there a mechanisme I don't understand ?
Thanks for your response.
I currently use pgsql 7.2.4 (but the following has also been seen on pgsql 7.3.3) with a transaction level set to "read committed".
It do a lot of little tests to understand how concurrency control works. Let see this scenario:
We have a table named "test_count" and a field named "count"
The table contains 1 entry with count=1
Client 1:
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"
...
Client 2 :
BEGIN;
SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok
...
We continue :
Client 1:
INSERT INTO test_count VALUES (2);
COMMIT;
Client 2: (after commit of client 1)
[The select that was blocked is now free. But the result is the first row containing "1". I'm surprised by this result]
SELECT count FROM test_count; --> now returns the two rows, on containing "1", the other containing "2"
COMMIT;
So my question is : why the SELECT...FOR UPDATE of client 2, when unblocked, returns only one row, and a following SELECT in the same transaction returns two rows ? Is there a mechanisme I don't understand ?
Thanks for your response.
On Wednesday 10 September 2003 08:34, Stéphane Cazeaux wrote: > Client 1: > BEGIN; > SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1" > > Client 2 : > BEGIN; > SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok > > We continue : > > Client 1: > INSERT INTO test_count VALUES (2); > COMMIT; > > Client 2: (after commit of client 1) > [The select that was blocked is now free. But the result is the > first row containing "1". I'm surprised by this result] > SELECT count FROM test_count; --> now returns the two rows, on > containing "1", the other containing "2" > COMMIT; > > So my question is : why the SELECT...FOR UPDATE of client 2, when > unblocked, returns only one row, and a following SELECT in the same > transaction returns two rows ? Is there a mechanisme I don't understand ? Client2's first SELECT started before you commited the INSERT, the second SELECT started after you commited. Since you are using READ COMMITTED you can read the results of transactions committed *before the current statement started* See Ch 9.2.1 (in Concurrency Control) for details: "Since in Read Committed mode each new query starts with a new snapshot that includes all transactions committed up to that instant, subsequent queries in the same transaction will see the effects of the committed concurrent transaction in any case." You'll be wanting "SERIALIZABLE" transaction isolation if you don't want this to happen. -- Richard Huxton Archonet Ltd
Richard Huxton wrote:
I'm ok about this, but, if I try exactly the same scenario, where I replace the INSERT by a DELETE, the first SELECT of the client 2 won't return any row. This is the same behaviour with an UPDATE. If client 1 updates the row and commits, the first SELECT of client 2 will return the updated row. Why isn't it the same behaviour with INSERT ?On Wednesday 10 September 2003 08:34, Stéphane Cazeaux wrote:Client 1: BEGIN; SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1" Client 2 : BEGIN; SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok We continue : Client 1: INSERT INTO test_count VALUES (2); COMMIT; Client 2: (after commit of client 1) [The select that was blocked is now free. But the result is the first row containing "1". I'm surprised by this result] SELECT count FROM test_count; --> now returns the two rows, on containing "1", the other containing "2" COMMIT; So my question is : why the SELECT...FOR UPDATE of client 2, when unblocked, returns only one row, and a following SELECT in the same transaction returns two rows ? Is there a mechanisme I don't understand ?Client2's first SELECT started before you commited the INSERT, the second SELECT started after you commited. Since you are using READ COMMITTED you can read the results of transactions committed *before the current statement started*
See Ch 9.2.1 (in Concurrency Control) for details: "Since in Read Committed mode each new query starts with a new snapshot that includes all transactions committed up to that instant, subsequent queries in the same transaction will see the effects of the committed concurrent transaction in any case." You'll be wanting "SERIALIZABLE" transaction isolation if you don't want this to happen.
On Wed, 10 Sep 2003, [ISO-8859-1] St�phane Cazeaux wrote: > So my question is : why the SELECT...FOR UPDATE of client 2, when > unblocked, returns only one row, and a following SELECT in the same > transaction returns two rows ? Is there a mechanisme I don't understand ? From the docs: Read Committed is the default isolation level in PostgreSQL. When a transaction runs on this isolation level, a SELECT query sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.... Notice that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes during execution of the first SELECT. UPDATE, DELETE and SELECT FOR UPDATE commangs behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time.
=?ISO-8859-1?Q?St=E9phane_Cazeaux?= <stephane.cazeaux@netcentrex.net> writes: > I'm ok about this, but, if I try exactly the same scenario, where I > replace the INSERT by a DELETE, the first SELECT of the client 2 won't > return any row. This is the same behaviour with an UPDATE. If client 1 > updates the row and commits, the first SELECT of client 2 will return > the updated row. Why isn't it the same behaviour with INSERT ? A row inserted after your statement starts is not visible, period. When SELECT FOR UPDATE encounters a row that would be visible to a normal select (ie, it existed at the start of the statement), then it tries to acquire a row lock on that row. A byproduct of that is waiting for any other transaction that had already locked the row. When the other transaction is gone, then what you get back is the *latest version* of the row (or no row, if the other guy deleted it). This is a special exception to the ordinary row visibility rules, which is made on the grounds that if you SELECT FOR UPDATE you had better get back the latest data, else you might update the row incorrectly. (Consider for example that the row represents a bank account, and the different transactions are adding or subtracting amounts in the balance field. You'd better add to the latest balance even if it shouldn't theoretically be visible to you.) Another way of thinking about it is that if you SELECT FOR UPDATE a particular row, and then look at it again later in the same transaction, you're guaranteed to see the same data (other than any changes you made yourself of course). Ordinary SELECTs in read committed mode don't act that way. This is all documented btw: see second para in http://www.postgresql.org/docs/7.3/static/transaction-iso.html#XACT-READ-COMMITTED regards, tom lane
On Wednesday 10 September 2003 12:54, Stéphane Cazeaux wrote: > Richard Huxton wrote: > > > >Client2's first SELECT started before you commited the INSERT, the second > >SELECT started after you commited. Since you are using READ COMMITTED you > > can read the results of transactions committed *before the current > > statement started* > > I'm ok about this, but, if I try exactly the same scenario, where I > replace the INSERT by a DELETE, the first SELECT of the client 2 won't > return any row. This is the same behaviour with an UPDATE. If client 1 > updates the row and commits, the first SELECT of client 2 will return > the updated row. Why isn't it the same behaviour with INSERT ? Client2's first select can't decide whether to block until it has built a "candidate list" of rows it wants to return. Having built its list, it notices that one is locked by Client1 and waits to see what happens. If Client1 deletes the row, Client2 can't return it. If Client1 inserts an additional row, the "candidate list" is already built so it doesn't get included. For Client2 to notice any "new" rows, it would have to re-run the query. Obviously, for some queries this could be very expensive. If that wasn't bad enough, if another client was updating the table you'd have to wait again until it committed/rolled back its transaction to see what to do. With a busy database you could end up running the query dozens of times to check whether any data you were interested in had been inserted. What's worse, all your other clients might well be doing the same waiting for Client2. If you use a SERIALIZABLE transaction level, then it guarantees Client2 only sees data that was committed before Client2's transaction started. This means within your transaction, your view is guaranteed to be consistent. If you want to guarantee that Client2 sees all relevant data and that no more can be added while Client2 is operating, then you need to get an exclusive write lock on the table. Of course this can kill your performance with multiple updating clients. Phew! That was longer than I was expecting. Does it make it any clearer? Concurrency issues can get involved - there are books and university courses that just deal with this one topic. Hopefully I haven't got anything wrong above. Oh - the description of "candidate list" stuff above isn't necessarily precisely the way that PG does it, but that's how I think of it. HTH -- Richard Huxton Archonet Ltd
It's now clearer for me. Thanks a lot for you long explanation.
--
Stéphane
Richard Huxton wrote:
--
Stéphane
Richard Huxton wrote:
On Wednesday 10 September 2003 12:54, Stéphane Cazeaux wrote:Richard Huxton wrote:Client2's first SELECT started before you commited the INSERT, the second SELECT started after you commited. Since you are using READ COMMITTED you can read the results of transactions committed *before the current statement started*I'm ok about this, but, if I try exactly the same scenario, where I replace the INSERT by a DELETE, the first SELECT of the client 2 won't return any row. This is the same behaviour with an UPDATE. If client 1 updates the row and commits, the first SELECT of client 2 will return the updated row. Why isn't it the same behaviour with INSERT ?Client2's first select can't decide whether to block until it has built a "candidate list" of rows it wants to return. Having built its list, it notices that one is locked by Client1 and waits to see what happens. If Client1 deletes the row, Client2 can't return it. If Client1 inserts an additional row, the "candidate list" is already built so it doesn't get included. For Client2 to notice any "new" rows, it would have to re-run the query. Obviously, for some queries this could be very expensive. If that wasn't bad enough, if another client was updating the table you'd have to wait again until it committed/rolled back its transaction to see what to do. With a busy database you could end up running the query dozens of times to check whether any data you were interested in had been inserted. What's worse, all your other clients might well be doing the same waiting for Client2. If you use a SERIALIZABLE transaction level, then it guarantees Client2 only sees data that was committed before Client2's transaction started. This means within your transaction, your view is guaranteed to be consistent. If you want to guarantee that Client2 sees all relevant data and that no more can be added while Client2 is operating, then you need to get an exclusive write lock on the table. Of course this can kill your performance with multiple updating clients. Phew! That was longer than I was expecting. Does it make it any clearer? Concurrency issues can get involved - there are books and university courses that just deal with this one topic. Hopefully I haven't got anything wrong above. Oh - the description of "candidate list" stuff above isn't necessarily precisely the way that PG does it, but that's how I think of it. HTH