Thread: Question about conccurrency control and Insert

Question about conccurrency control and Insert

From
Stéphane Cazeaux
Date:
Hi

I currently use pgsql 7.2.4 (but the following has also been seen on pgsql 7.3.3) with a transaction level set to "read committed".
It do a lot of little tests to understand how concurrency control works. Let see this scenario:

We have a table named "test_count" and a field named "count"
The table contains 1 entry with count=1

Client 1:
    BEGIN;
    SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"
    ...

Client 2 :
    BEGIN;
    SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok
    ...

We continue :
 
Client 1:
    INSERT INTO test_count VALUES (2);
    COMMIT;

Client 2: (after commit of client 1)
    [The select that was blocked is now free. But the result is the first row containing "1". I'm surprised by this result]
    SELECT count FROM test_count;  --> now returns the two rows, on containing "1", the other containing "2"
    COMMIT;

So my question is : why the SELECT...FOR UPDATE of client 2, when unblocked, returns only one row, and a following SELECT in the same transaction returns two rows ? Is there a mechanisme I don't understand ?


Thanks for your response.
   
   

Re: Question about conccurrency control and Insert

From
Richard Huxton
Date:
On Wednesday 10 September 2003 08:34, Stéphane Cazeaux wrote:
> Client 1:
>     BEGIN;
>     SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"
>
> Client 2 :
>     BEGIN;
>     SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok
>
> We continue :
>
> Client 1:
>     INSERT INTO test_count VALUES (2);
>     COMMIT;
>
> Client 2: (after commit of client 1)
>     [The select that was blocked is now free. But the result is the
> first row containing "1". I'm surprised by this result]
>     SELECT count FROM test_count;  --> now returns the two rows, on
> containing "1", the other containing "2"
>     COMMIT;
>
> So my question is : why the SELECT...FOR UPDATE of client 2, when
> unblocked, returns only one row, and a following SELECT in the same
> transaction returns two rows ? Is there a mechanisme I don't understand ?

Client2's first SELECT started before you commited the INSERT, the second
SELECT started after you commited. Since you are using READ COMMITTED you can
read the results of transactions committed *before the current statement
started*

See Ch 9.2.1 (in Concurrency Control) for details:
"Since in Read Committed mode each new query starts with a new snapshot that
includes all transactions committed up to that instant, subsequent queries in
the same transaction will see the effects of the committed concurrent
transaction in any case."

You'll be wanting "SERIALIZABLE" transaction isolation if you don't want this
to happen.
--
  Richard Huxton
  Archonet Ltd

Re: Question about conccurrency control and Insert

From
Stéphane Cazeaux
Date:


Richard Huxton wrote:
On Wednesday 10 September 2003 08:34, Stéphane Cazeaux wrote: 
Client 1:   BEGIN;   SELECT count FROM test_count FOR UPDATE; --> returns the only entry "1"

Client 2 :   BEGIN;   SELECT count FROM test_count FOR UPDATE; --> this query is blocked, ok

We continue :

Client 1:   INSERT INTO test_count VALUES (2);   COMMIT;

Client 2: (after commit of client 1)   [The select that was blocked is now free. But the result is the
first row containing "1". I'm surprised by this result]   SELECT count FROM test_count;  --> now returns the two rows, on
containing "1", the other containing "2"   COMMIT;

So my question is : why the SELECT...FOR UPDATE of client 2, when
unblocked, returns only one row, and a following SELECT in the same
transaction returns two rows ? Is there a mechanisme I don't understand ?   
Client2's first SELECT started before you commited the INSERT, the second 
SELECT started after you commited. Since you are using READ COMMITTED you can 
read the results of transactions committed *before the current statement 
started*
I'm ok about this, but, if I try exactly the same scenario, where I replace the INSERT by a DELETE, the first SELECT of the client 2 won't return any row. This is the same behaviour with an UPDATE. If client 1 updates the row and commits, the first SELECT of client 2 will return the updated row. Why isn't it the same behaviour with INSERT ?


See Ch 9.2.1 (in Concurrency Control) for details:
"Since in Read Committed mode each new query starts with a new snapshot that 
includes all transactions committed up to that instant, subsequent queries in 
the same transaction will see the effects of the committed concurrent 
transaction in any case."

You'll be wanting "SERIALIZABLE" transaction isolation if you don't want this 
to happen. 

Re: Question about conccurrency control and Insert

From
Stephan Szabo
Date:
On Wed, 10 Sep 2003, [ISO-8859-1] St�phane Cazeaux wrote:

> So my question is : why the SELECT...FOR UPDATE of client 2, when
> unblocked, returns only one row, and a following SELECT in the same
> transaction returns two rows ? Is there a mechanisme I don't understand ?

From the docs:

Read Committed is the default isolation level in PostgreSQL. When a
transaction runs on this isolation level, a SELECT query sees only data
committed before the query began; it never sees either uncommitted data or
changes committed during query execution by concurrent transactions....
Notice that two successive SELECT commands can see different data, even
though they are within a single transaction, if other transactions commit
changes during execution of the first SELECT.
UPDATE, DELETE and SELECT FOR UPDATE commangs behave the same as SELECT in
terms of searching for target rows: they will only find target rows that
were committed as of the command start time.


Re: Question about conccurrency control and Insert

From
Tom Lane
Date:
=?ISO-8859-1?Q?St=E9phane_Cazeaux?= <stephane.cazeaux@netcentrex.net> writes:
> I'm ok about this, but, if I try exactly the same scenario, where I
> replace the INSERT by a DELETE, the first SELECT of the client 2 won't
> return any row. This is the same behaviour with an UPDATE. If client 1
> updates the row and commits, the first SELECT of client 2 will return
> the updated row. Why isn't it the same behaviour with INSERT ?

A row inserted after your statement starts is not visible, period.

When SELECT FOR UPDATE encounters a row that would be visible to a
normal select (ie, it existed at the start of the statement), then
it tries to acquire a row lock on that row.  A byproduct of that is
waiting for any other transaction that had already locked the row.
When the other transaction is gone, then what you get back is the
*latest version* of the row (or no row, if the other guy deleted it).
This is a special exception to the ordinary row visibility rules,
which is made on the grounds that if you SELECT FOR UPDATE you had
better get back the latest data, else you might update the row
incorrectly.  (Consider for example that the row represents a bank
account, and the different transactions are adding or subtracting
amounts in the balance field.  You'd better add to the latest balance
even if it shouldn't theoretically be visible to you.)

Another way of thinking about it is that if you SELECT FOR UPDATE
a particular row, and then look at it again later in the same
transaction, you're guaranteed to see the same data (other than any
changes you made yourself of course).  Ordinary SELECTs in read
committed mode don't act that way.

This is all documented btw: see second para in
http://www.postgresql.org/docs/7.3/static/transaction-iso.html#XACT-READ-COMMITTED

            regards, tom lane

Re: Question about conccurrency control and Insert

From
Richard Huxton
Date:
On Wednesday 10 September 2003 12:54, Stéphane Cazeaux wrote:
> Richard Huxton wrote:
> >
> >Client2's first SELECT started before you commited the INSERT, the second
> >SELECT started after you commited. Since you are using READ COMMITTED you
> > can read the results of transactions committed *before the current
> > statement started*
>
> I'm ok about this, but, if I try exactly the same scenario, where I
> replace the INSERT by a DELETE, the first SELECT of the client 2 won't
> return any row. This is the same behaviour with an UPDATE. If client 1
> updates the row and commits, the first SELECT of client 2 will return
> the updated row. Why isn't it the same behaviour with INSERT ?

Client2's first select can't decide whether to block until it has built a
"candidate list" of rows it wants to return. Having built its list, it
notices that one is locked by Client1 and waits to see what happens.

If Client1 deletes the row, Client2 can't return it. If Client1 inserts an
additional row, the "candidate list" is already built so it doesn't get
included.

For Client2 to notice any "new" rows, it would have to re-run the query.
Obviously, for some queries this could be very expensive.
If that wasn't bad enough, if another client was updating the table you'd have
to wait again until it committed/rolled back its transaction to see what to
do. With a busy database you could end up running the query dozens of times
to check whether any data you were interested in had been inserted. What's
worse, all your other clients might well be doing the same waiting for
Client2.

If you use a SERIALIZABLE transaction level, then it guarantees Client2 only
sees data that was committed before Client2's transaction started. This means
within your transaction, your view is guaranteed to be consistent.

If you want to guarantee that Client2 sees all relevant data and that no more
can be added while Client2 is operating, then you need to get an exclusive
write lock on the table. Of course this can kill your performance with
multiple updating clients.

Phew! That was longer than I was expecting. Does it make it any clearer?
Concurrency issues can get involved - there are books and university courses
that just deal with this one topic. Hopefully I haven't got anything wrong
above. Oh - the description of "candidate list" stuff above isn't necessarily
precisely the way that PG does it, but that's how I think of it.

HTH

--
  Richard Huxton
  Archonet Ltd

Re: Question about conccurrency control and Insert

From
Stéphane Cazeaux
Date:
It's now clearer for me. Thanks a lot for you long explanation.


--
Stéphane

Richard Huxton wrote:
On Wednesday 10 September 2003 12:54, Stéphane Cazeaux wrote: 
Richard Huxton wrote:   
Client2's first SELECT started before you commited the INSERT, the second
SELECT started after you commited. Since you are using READ COMMITTED you
can read the results of transactions committed *before the current
statement started*     
I'm ok about this, but, if I try exactly the same scenario, where I
replace the INSERT by a DELETE, the first SELECT of the client 2 won't
return any row. This is the same behaviour with an UPDATE. If client 1
updates the row and commits, the first SELECT of client 2 will return
the updated row. Why isn't it the same behaviour with INSERT ?   
Client2's first select can't decide whether to block until it has built a 
"candidate list" of rows it wants to return. Having built its list, it 
notices that one is locked by Client1 and waits to see what happens.

If Client1 deletes the row, Client2 can't return it. If Client1 inserts an 
additional row, the "candidate list" is already built so it doesn't get 
included.

For Client2 to notice any "new" rows, it would have to re-run the query. 
Obviously, for some queries this could be very expensive.
If that wasn't bad enough, if another client was updating the table you'd have 
to wait again until it committed/rolled back its transaction to see what to 
do. With a busy database you could end up running the query dozens of times 
to check whether any data you were interested in had been inserted. What's 
worse, all your other clients might well be doing the same waiting for 
Client2.

If you use a SERIALIZABLE transaction level, then it guarantees Client2 only 
sees data that was committed before Client2's transaction started. This means 
within your transaction, your view is guaranteed to be consistent.

If you want to guarantee that Client2 sees all relevant data and that no more 
can be added while Client2 is operating, then you need to get an exclusive 
write lock on the table. Of course this can kill your performance with 
multiple updating clients.

Phew! That was longer than I was expecting. Does it make it any clearer? 
Concurrency issues can get involved - there are books and university courses 
that just deal with this one topic. Hopefully I haven't got anything wrong 
above. Oh - the description of "candidate list" stuff above isn't necessarily 
precisely the way that PG does it, but that's how I think of it.

HTH