Re: A few beginner's questions concerning concurrency - Mailing list pgsql-general
From | Scott Marlowe |
---|---|
Subject | Re: A few beginner's questions concerning concurrency |
Date | |
Msg-id | 1088529221.12350.26.camel@localhost.localdomain Whole thread Raw |
In response to | A few beginner's questions concerning concurrency control (Yoram Biberman <yoramb@hadassah-col.ac.il>) |
List | pgsql-general |
On Tue, 2004-06-29 at 03:55, Yoram Biberman wrote: > I have a few questions concerning concurrency control. I shall thank > whoever can help me. > > Question #1 > ========= > Assume the following (concurrent) schedule, in which both transactions > run in a serializable isolation level: > T1 begin > T2 begin > T1 modifies an item A > T2 reads (selects) A > T2 modifies an item B > T1 reads (selects) B > T1 commits > T2 commits > If I understand correctly then both transactions would be able to > commit (as they modified different items). Each would see a snapshot > of the database as if it ran alone, If T2 is going to be using information from A to update B, then the select for A should include a "for update" clause. I know one would think that "for update" should be used on the destination row, but in this case, the for update clause on T2's select means the transaction will fail, and the client should now detect that, rollback, and attempt the transaction again. Here's what happens: T1 begin; T1 set transaction_isolation = serializable; T2 begin; T2 set transaction_isolation = serializable; T1 update test set info='ghi' where id='A'; T2 select * from test where id=1 for update; (T2 now waits for T1 to complete or rollback) If T1 now commits, we get this error on T2: ERROR: could not serialize access due to concurrent update If T1 rolls back, T2 will complete. This provides the user with the equivalent to a predicate locking model. However, with such locking rollbacks become more common, and the client has to know to try again should a transaction fail. > Question #3 > ========= > In some places it is said that a transaction that only reads does not > lock any table or row, and is never blocked. But if a transaction T1 > modifies a row r, and at the same time transaction T2 selects r, then > T2 need to wait until T1 finishes (as T1 might have deleted the row, > or modified it in a way that would cause T1 not to need it, as the row > does not satisfy T2’s WHERE clause). Am I right? Yes. Because that's very useful for when people reading the database aren't writing to it. It allows the database to handle a very large read load while still updating underneath it. However, it requires some forethought to make update procedures that operate on the dataset run properly. While the readers can run in serializable or read committed depening on their need to see coherent data between rows and all, the writers would by necessity have to run not just serializable, but also have to mark the rows they're reading as for update to make sure another writer didn't get the wrong number. Since writers, by definition, are usually written by wizards, they can be expected to know to use for update et. al. to ensure things work out ok. Basically, data just read with no write lock shouldn't be used to update other data in a serializable transaction. > On the other hand in order to read a table T2 gets an ACCESS SHARE > lock on the table, so it blocks transactions that want to drop the > table (and I do not understand why it does not block transactions that > want to add/delete/update rows of the table). OK, you have 10,000,000,000 rows in the table. You have 1,000 people connected to the database. Every time one of the 1,000 people read a dozen to a thousand rows, you lock each one. These locks have to be shared with all the other backends. How much does that cost? OTOH, you have the same number as above, every time they read the table, you throw one single, cheap, almost never failing lock on the table. How much does that cost? Finally, you have the same data set. Out of the 1,000 people connected to the database, three are modifiers / writers. The rest are report generators or sales / marketing folks doing data mining. The three modifiers, working on small data sets each time, lock each table they use for select with for update. Those individual rows get a row level lock, that has to be shared out with all the other backends. How much does that cost? In general, the third scenario is the most useful. It allows for truly huge read loads to exist on top of complex, slow modifiers and still get their job done. But it requires the writers to know what they are doing, semantically.
pgsql-general by date: