Thread: Question: update and transaction isolation
Take this update statement: update mytable set foo=foo+1 where bar='xxx'; If that gets executed more than once at the same time by multiple instances of postgresql. Will foo ever lose a count? I am assumed that foo will always be correct and that the database will manage any contention, but when I think about transaction isolation, I'm not so sure. Is it possible for two or more instances of this update to run simultaneously, each getting the same value for foo, then each updating foo to the same incremented value? Is this a stupid question?
mlw writes: > update mytable set foo=foo+1 where bar='xxx'; > > If that gets executed more than once at the same time by multiple instances of > postgresql. Will foo ever lose a count? No, but if you run this in read committed isolation mode then you might get into non-repeatable read type problems, i.e., you run it twice but every foo was only increased once. If you use serializable mode then all but one concurrent update will be aborted. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote: > > mlw writes: > > > update mytable set foo=foo+1 where bar='xxx'; > > > > If that gets executed more than once at the same time by multiple instances of > > postgresql. Will foo ever lose a count? > > No, but if you run this in read committed isolation mode then you might > get into non-repeatable read type problems, i.e., you run it twice but > every foo was only increased once. If you use serializable mode then all > but one concurrent update will be aborted. I'm not sure you answered my question. Let me put it to you like this: Suppose I wanted to make a table of page counts, like this: create table pagecounts (counter int4, pagename varchar) For each page hit, I do this: update pagecounts set counter = counter + 1 where pagename = 'testpag.php' Do I have to set a particular isolation level? Or does this not work in general?
mlw writes: > For each page hit, I do this: > > update pagecounts set counter = counter + 1 where pagename = 'testpag.php' > > Do I have to set a particular isolation level? Or does this not work in > general? In read committed level, if the second update launches before the first update is finished (commits), then both of these updates will operate on the old counter value. That is, you miss one page hit. If it's possible, you might want to consider "logging" your page hits and make a view for the page counts (with group by, etc.). That will get you around the concurrency issues altogether. -- Peter Eisentraut peter_e@gmx.net
mlw <markw@mohawksoft.com> writes: > I'm not sure you answered my question. Let me put it to you like this: > Suppose I wanted to make a table of page counts, like this: > create table pagecounts (counter int4, pagename varchar) > For each page hit, I do this: > update pagecounts set counter = counter + 1 where pagename = 'testpag.php' > Do I have to set a particular isolation level? Or does this not work in > general? This will work; and you are best off with the default read-committed isolation level. (In serializable level, you would sometimes get serialization failures and have to repeat the transaction.) In more complex cases the answer is different, though. The reason it works in read-committed mode is that the second guy to arrive at the row will observe that the row has an update in progress; will block waiting for the previous updater to commit or abort; and if commit, will use the updated version of the row as the starting point for his update. (This is what the EvalPlanQual ugliness in the executor is all about.) There are some interesting properties of this solution if your transaction actually tries to look at the row, and not just issue an UPDATE, though. Example: regression=# create table foo (key int, val int); CREATE regression=# insert into foo values(1, 0); INSERT 394248 1 regression=# begin; BEGIN regression=# update foo set val = val + 1 where key = 1; UPDATE 1 regression=# select * from foo;key | val -----+----- 1 | 1 (1 row) << leaving this transaction open, in a second window do >> regression=# begin; BEGIN regression=# select * from foo;key | val -----+----- 1 | 0 (1 row) regression=# update foo set val = val + 1 where key = 1; << blocks waiting for first xact to be committed or aborted. In first window, now issue END. Second window then completes its UPDATE: >> UPDATE 1 regression=# select * from foo;key | val -----+----- 1 | 2 (1 row) regression=# end; << at this point the value "2" is visible in other transactions. >> Notice how xact 2 could only read val=0 in its first SELECT, even though it saw val=1 for purposes of the UPDATE. If your application-side logic is complex enough to get messed up by this inconsistency, then you should either use SELECT FOR UPDATE to read the values, or use serializable isolation level and be prepared to retry failed transactions. In serializable mode, you'd have gotten a failure when you tried to update the already-updated row. This tells you that you might have tried to update on the basis of stale information. You abort and restart the transaction, taking care to re-read the info that is going to determine what you write. For example, suppose you wanted to do the increment like this:BEGIN;SELECT val FROM foo WHERE key = 1;-- internally compute newval = val + 1UPDATE foo SET val = $newvalWHERE key = 1;END; (This is a tad silly here, but is not silly if the "internal computation" is too complex to write as an SQL expression.) In read-committed mode, concurrent executions of this sequence would do the Wrong Thing. In serializable mode, you'd get concurrent-update failures; retrying from the top of the transaction would eventually succeed with correct results. Alternatively you could doBEGIN;SELECT val FROM foo WHERE key = 1 FOR UPDATE;-- internally compute newval = val + 1UPDATEfoo SET val = $newval WHERE key = 1;END; which will work reliably in read-committed mode; but if conflicts are infrequent then the serializable approach will give better performance. (Basically, the serializable approach is like optimistic locking with retries; the FOR UPDATE approach is pessimistic locking.) If you are propagating information from one row to another (or across tables) then serializable mode with a retry loop is probably the easiest way of avoiding consistency problems; especially if you are reading multiple rows to derive the info you will write back. (The FOR UPDATE approach is prone to deadlocks with multiple source rows.) The basic EvalPlanQual behavior works nicely for simple updates that only read and write individual rows, but it does not scale to cases where you read some rows and write other rows. BTW, I've promised to give a talk at the O'Reilly con on exactly these issues ... regards, tom lane
Tom Lane writes: > The reason it works in read-committed mode is that the second guy to > arrive at the row will observe that the row has an update in progress; > will block waiting for the previous updater to commit or abort; and if > commit, will use the updated version of the row as the starting point > for his update. (This is what the EvalPlanQual ugliness in the executor > is all about.) Isn't that a violation of the principle that transactions in read committed mode will look at the data that was committed *before* the statement had begun? -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> The reason it works in read-committed mode is that the second guy to >> arrive at the row will observe that the row has an update in progress; >> will block waiting for the previous updater to commit or abort; and if >> commit, will use the updated version of the row as the starting point >> for his update. (This is what the EvalPlanQual ugliness in the executor >> is all about.) > Isn't that a violation of the principle that transactions in read > committed mode will look at the data that was committed *before* the > statement had begun? Hey, I didn't design it. Complain to Vadim ... But actually, SELECT FOR UPDATE also violates the principle you allege, and must do so if it's to be useful at all. The results you get are whatever's in the row after it's been locked, not what was in the row at the instant of statement start. UPDATE is essentially behaving in the same way. To my mind, full SERIALIZABLE mode is the only approach that can be explained in terms of simple notions like "you see only the data that existed at time T". Read-committed mode is conceptually much dirtier, even though it's often simpler to use in practice. regards, tom lane
Tom Lane wrote: > > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > To my mind, full SERIALIZABLE mode is the only approach that can be > explained in terms of simple notions like "you see only the data that > existed at time T". There's another way. If the current value is different from that at time T, we may be able to reset the time when the statement begun, which is equivalent to replaceing the snapshot (this isn't allowed in serializable mode). Of cource it would be very difficult to implement(at least effectively). As I've already mentioned many times SELECT and SELECT .. FOR UPDATE are alike in appearance but quite different in nature. For example, the meaning of the snapshot isn't the same as you've pointed out already in this thread. It's meaingless for SELECT and UPDATE(SELECT .. FOR UPDATE) to have a common snapshot. regards, Hiroshi Inoue
> > For each page hit, I do this: > > > > update pagecounts set counter = counter + 1 where pagename = > 'testpag.php' > > > > Do I have to set a particular isolation level? Or does this not work in > > general? > > In read committed level, if the second update launches before the first > update is finished (commits), then both of these updates will operate on > the old counter value. That is, you miss one page hit. can you break it into this: begin; select counter from pagecounts where pagename='testpag.php' for update; update pagecounts set counter=counter+1 where pagename='testpag.php'; commit; Chris