Re: A few beginner's questions concerning concurrency - Mailing list pgsql-general

From Scott Marlowe
Subject Re: A few beginner's questions concerning concurrency
Date
Msg-id 1088529221.12350.26.camel@localhost.localdomain
Whole thread Raw
In response to A few beginner's questions concerning concurrency control  (Yoram Biberman <yoramb@hadassah-col.ac.il>)
List pgsql-general
On Tue, 2004-06-29 at 03:55, Yoram Biberman wrote:
> I have a few questions concerning concurrency control. I shall thank
> whoever can help me.
>
> Question #1
> =========
> Assume the following (concurrent) schedule, in which both transactions
> run in a serializable isolation level:
> T1 begin
>                                               T2 begin
> T1 modifies an item A
>                                               T2 reads (selects) A
>                                               T2 modifies an item B
> T1 reads (selects) B
> T1 commits
>                                               T2 commits
> If I understand correctly then both transactions would be able to
> commit (as they modified different items). Each would see a snapshot
> of the database as if it ran alone,

If T2 is going to be using information from A to update B, then the
select for A should include a "for update" clause.  I know one would
think that "for update" should be used on the destination row, but in
this case, the for update clause on T2's select means the transaction
will fail, and the client should now detect that, rollback, and attempt
the transaction again.  Here's what happens:

T1 begin;
T1 set transaction_isolation = serializable;
   T2 begin;
   T2 set transaction_isolation = serializable;
T1 update test set info='ghi' where id='A';
  T2 select * from test where id=1 for update;
  (T2 now waits for T1 to complete or rollback)

If T1 now commits, we get this error on T2:

ERROR:  could not serialize access due to concurrent update

If T1 rolls back, T2 will complete.  This provides the user with the
equivalent to a predicate locking model.  However, with such locking
rollbacks become more common, and the client has to know to try again
should a transaction fail.

> Question #3
> =========
> In some places it is said that a transaction that only reads does not
> lock any table or row, and is never blocked. But if a transaction T1
> modifies a row r, and at the same time transaction T2 selects r, then
> T2 need to wait until T1 finishes (as T1 might have deleted the row,
> or modified it in a way that would cause T1 not to need it, as the row
> does not satisfy T2’s WHERE clause). Am I right?

Yes.  Because that's very useful for when people reading the database
aren't writing to it.  It allows the database to handle a very large
read load while still updating underneath it.

However, it requires some forethought to make update procedures that
operate on the dataset run properly.  While the readers can run in
serializable or read committed depening on their need to see coherent
data between rows and all, the writers would by necessity have to run
not just serializable, but also have to mark the rows they're reading as
for update to make sure another writer didn't get the wrong number.
Since writers, by definition, are usually written by wizards, they can
be expected to know to use for update et. al. to ensure things work out
ok.

Basically, data just read with no write lock shouldn't be used to update
other data in a serializable transaction.

> On the other hand in order to read a table T2 gets an ACCESS SHARE
> lock on the table, so it blocks transactions that want to drop the
> table (and I do not understand why it does not block transactions that
> want to add/delete/update rows of the table).

OK, you have 10,000,000,000 rows in the table.  You have 1,000 people
connected to the database.  Every time one of the 1,000 people read a
dozen to a thousand rows, you lock each one.  These locks have to be
shared with all the other backends.  How much does that cost?

OTOH, you have the same number as above, every time they read the table,
you throw one single, cheap, almost never failing lock on the table.
How much does that cost?

Finally, you have the same data set.  Out of the 1,000 people connected
to the database, three are modifiers / writers.  The rest are report
generators or sales / marketing folks doing data mining.  The three
modifiers, working on small data sets each time, lock each table they
use for select with for update.  Those individual rows get a row level
lock, that has to be shared out with all the other backends.  How much
does that cost?

In general, the third scenario is the most useful.  It allows for truly
huge read loads to exist on top of complex, slow modifiers and still get
their job done.  But it requires the writers to know what they are
doing, semantically.




pgsql-general by date:

Previous
From: Doug McNaught
Date:
Subject: Re: Backup/Restore to a point in time
Next
From: "Joe Maldonado"
Date:
Subject: query failing with out of memory error message.