Thread: A few beginner's questions concerning concurrency control

A few beginner's questions concerning concurrency control

From
Yoram Biberman
Date:

I have a few questions concerning concurrency control. I shall thank whoever can help me.

Question #1
=========
Assume the following (concurrent) schedule, in which both transactions run in a serializable isolation level:
T1 begin
                                              T2 begin
T1 modifies an item A
                                              T2 reads (selects) A
                                              T2 modifies an item B
T1 reads (selects) B
T1 commits
                                              T2 commits
If I understand correctly then both transactions would be able to commit (as they modified different items). Each would see a snapshot of the database as if it ran alone,

and each would read the initial value of the item it reads (and not its value after the update). But we cannot say that the schedule is equivalent nor to the serial schedule that run T1 first (as in this schedule T2 would read the value of B after it was modified by T1), neither to the schedule that run T2 first (from a symmetric argument concerning item B). So the schedule is not serializable in the sense of the theory of database systems (e.g. Ullman’s Principles of Database Systems book). Am I right?

Question #2
=========
I was not able to understand the difference between all the lock modes, when would a transaction (or the db system) use each lock, and which data structures each lock locks. For example: what is the difference between a ROW SHARE lock mode, and a ROW EXCLUSIVE lock mode. I understand that the former is acquired by a select … for update, while the latter is acquired, for example, by an UPDATE command; but after a transaction issues select … for update it has the opportunity to modify the row, so why do we need two different lock modes? Or what is the difference between SHARE and ACCESS SHARE? Which data structures are being locked by each lock? Why is EXCLUSIVE congruent with ACCESS SHARE? If I may say so, I find

The documentation a bit partial concerning the intuition behind the lock modes,
and examples of using them (beyond the Star War example).

Question #3
=========
In some places it is said that a transaction that only reads does not lock any table or row, and is never blocked. But if a transaction T1 modifies a row r, and at the same time transaction T2 selects r, then T2 need to wait until T1 finishes (as T1 might have deleted the row, or modified it in a way that would cause T1 not to need it, as the row does not satisfy T2’s WHERE clause). Am I right? On the other hand in order to read a table T2 gets an ACCESS SHARE lock on the table, so it blocks transactions that want to drop the table (and I do not understand why it does not block transactions that want to add/delete/update rows of the table).

I would thank you if you could find the time to help me with those questions.

Thanking you in advance

Yoram Biberamn

Re: A few beginner's questions concerning concurrency control

From
Karsten Hilbert
Date:
> Question #3
> =========
> In some places it is said that a transaction that only reads does not lock
> any table or row, and is never blocked. But if a transaction T1 modifies a
> row r, and at the same time transaction T2 selects r, then T2 need to wait
> until T1 finishes (as T1 might have deleted the row, or modified it in a way
> that would cause T1 not to need it, as the row does not satisfy T2?s WHERE
> clause).
The answer lies in the application of MVCC. You already said above
that each transaction sees tuples as if it ran all by itself
(which is only true for some of the transaction isolation levels).
Aborts (if attempts to write to changed data occur) are involved.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: A few beginner's questions concerning concurrency control

From
Tom Lane
Date:
Yoram Biberman <yoramb@hadassah-col.ac.il> writes:
> ... But we cannot say that the schedule is equivalent
> nor to the serial schedule that run T1 first (as in this schedule T2 would
> read the value of B after it was modified by T1), neither to the schedule
> that run T2 first (from a symmetric argument concerning item B). So the
> schedule is not serializable in the sense of the theory of database systems
> (e.g. Ullman=92s Principles of Database Systems book). Am I right?

Right.  To make the behavior mathematically serializable we would have
to add predicate locking --- that is, when T2 reads A it would have to
take out a read-lock on A (in fact, on all rows potentially matching
the WHERE condition it used for its SELECT).  This would be amazingly
expensive and it wouldn't actually improve functionality for most
applications :-(

> I was not able to understand the difference between all the lock modes, when
> would a transaction (or the db system) use each lock, and which data
> structures each lock locks.

All the locks are table locks, and the only differences between them are
which other lock types are blocked by a given lock type.  The reason for
having so many is to allow fine-grained control of what sorts of things
can be done to a table concurrently.  For instance, we can allow reads
and writes to occur concurrently, but we can't allow reads or writes to
occur concurrently with a table schema modification (such as DROP INDEX).

> In some places it is said that a transaction that only reads does not lock
> any table or row, and is never blocked. But if a transaction T1 modifies a
> row r, and at the same time transaction T2 selects r, then T2 need to wait
> until T1 finishes (as T1 might have deleted the row, or modified it in a way
> that would cause T1 not to need it, as the row does not satisfy T2=92s WHERE
> clause). Am I right?

No, because T2 is selecting the row or not based on its state at T2's
snapshot time.  What T1 did to it immediately after that time is not
interesting.

> On the other hand in order to read a table T2 gets an
> ACCESS SHARE lock on the table, so it blocks transactions that want to drop
> the table (and I do not understand why it does not block transactions that
> want to add/delete/update rows of the table).

We do that mainly because the physical act of dropping the table (ie,
removing the storage file) isn't transactional.  We could make this work
if we found a way to postpone the file unlink until after the table is
no longer visible to any running transaction --- but that seems like
more trouble than it's really worth.  In practice, concurrent read and
write is what you want for real applications; concurrent schema changes
are not important enough to install a large amount of mechanism to
support.  (IMHO anyway.)

            regards, tom lane

Re: A few beginner's questions concerning concurrency control

From
Bruno Wolff III
Date:
On Tue, Jun 29, 2004 at 11:55:50 +0200,
  Yoram Biberman <yoramb@hadassah-col.ac.il> wrote:
> that run T2 first (from a symmetric argument concerning item B). So the
> schedule is not serializable in the sense of the theory of database systems
> (e.g. Ullman’s Principles of Database Systems book). Am I right?

Your example is not serializable. The main problem case for this is when
people try to select the maximum value in a table and use a higher value
in a newly inserted record. You need to lock the table to make sure this
works.

> The documentation a bit partial concerning the intuition behind the lock
> modes,
> and examples of using them (beyond the Star War example).

I think the best way to understand the documentation is to note which
locks conflict and which statements acquire them. That way if you
want to block an operation you can figure out which locks block that
operation and choose the one that has the fewest side effects.

> In some places it is said that a transaction that only reads does not lock
> any table or row, and is never blocked. But if a transaction T1 modifies a
> row r, and at the same time transaction T2 selects r, then T2 need to wait
> until T1 finishes (as T1 might have deleted the row, or modified it in a way
> that would cause T1 not to need it, as the row does not satisfy T2’s WHERE
> clause). Am I right? On the other hand in order to read a table T2 gets an
> ACCESS SHARE lock on the table, so it blocks transactions that want to drop
> the table (and I do not understand why it does not block transactions that
> want to add/delete/update rows of the table).

No. T2 sees the row in the state before T1 touched it. Only if T2 also needs
to update that row, does it need to wait.

Re: A few beginner's questions concerning concurrency

From
"Scott Marlowe"
Date:
On Tue, 2004-06-29 at 03:55, Yoram Biberman wrote:
> I have a few questions concerning concurrency control. I shall thank
> whoever can help me.
>
> Question #1
> =========
> Assume the following (concurrent) schedule, in which both transactions
> run in a serializable isolation level:
> T1 begin
>                                               T2 begin
> T1 modifies an item A
>                                               T2 reads (selects) A
>                                               T2 modifies an item B
> T1 reads (selects) B
> T1 commits
>                                               T2 commits
> If I understand correctly then both transactions would be able to
> commit (as they modified different items). Each would see a snapshot
> of the database as if it ran alone,

If T2 is going to be using information from A to update B, then the
select for A should include a "for update" clause.  I know one would
think that "for update" should be used on the destination row, but in
this case, the for update clause on T2's select means the transaction
will fail, and the client should now detect that, rollback, and attempt
the transaction again.  Here's what happens:

T1 begin;
T1 set transaction_isolation = serializable;
   T2 begin;
   T2 set transaction_isolation = serializable;
T1 update test set info='ghi' where id='A';
  T2 select * from test where id=1 for update;
  (T2 now waits for T1 to complete or rollback)

If T1 now commits, we get this error on T2:

ERROR:  could not serialize access due to concurrent update

If T1 rolls back, T2 will complete.  This provides the user with the
equivalent to a predicate locking model.  However, with such locking
rollbacks become more common, and the client has to know to try again
should a transaction fail.

> Question #3
> =========
> In some places it is said that a transaction that only reads does not
> lock any table or row, and is never blocked. But if a transaction T1
> modifies a row r, and at the same time transaction T2 selects r, then
> T2 need to wait until T1 finishes (as T1 might have deleted the row,
> or modified it in a way that would cause T1 not to need it, as the row
> does not satisfy T2’s WHERE clause). Am I right?

Yes.  Because that's very useful for when people reading the database
aren't writing to it.  It allows the database to handle a very large
read load while still updating underneath it.

However, it requires some forethought to make update procedures that
operate on the dataset run properly.  While the readers can run in
serializable or read committed depening on their need to see coherent
data between rows and all, the writers would by necessity have to run
not just serializable, but also have to mark the rows they're reading as
for update to make sure another writer didn't get the wrong number.
Since writers, by definition, are usually written by wizards, they can
be expected to know to use for update et. al. to ensure things work out
ok.

Basically, data just read with no write lock shouldn't be used to update
other data in a serializable transaction.

> On the other hand in order to read a table T2 gets an ACCESS SHARE
> lock on the table, so it blocks transactions that want to drop the
> table (and I do not understand why it does not block transactions that
> want to add/delete/update rows of the table).

OK, you have 10,000,000,000 rows in the table.  You have 1,000 people
connected to the database.  Every time one of the 1,000 people read a
dozen to a thousand rows, you lock each one.  These locks have to be
shared with all the other backends.  How much does that cost?

OTOH, you have the same number as above, every time they read the table,
you throw one single, cheap, almost never failing lock on the table.
How much does that cost?

Finally, you have the same data set.  Out of the 1,000 people connected
to the database, three are modifiers / writers.  The rest are report
generators or sales / marketing folks doing data mining.  The three
modifiers, working on small data sets each time, lock each table they
use for select with for update.  Those individual rows get a row level
lock, that has to be shared out with all the other backends.  How much
does that cost?

In general, the third scenario is the most useful.  It allows for truly
huge read loads to exist on top of complex, slow modifiers and still get
their job done.  But it requires the writers to know what they are
doing, semantically.