Thread: Re: Failed Statements within Transactions
> ... if a query fails inside a transactions block, > PostgreSQL "RollBack"s the whole transaction ... In my experience, it's unusual for SELECT statements (queries) to cause errors. If they do, the statements themselves are likely broken. (A SELECT that returns zero rows is not an error.) Likewise, many other SQL statements (UPDATE, DELETE) rarely cause errors; they usually "fail" by modifying zero rows, which does not affect the transaction state. However, I do agree that a failed INSERT should not automatically rollback the entire transaction. INSERTs often fail due to integrity constraints. The common suggestion of doing a test SELECT first to determine whether to INSERT or UPDATE is simple nonsense; that just creates a race condition. If you lose the race, you have to replay the entire transaction, which is a waste of CPU cycles. (Worse, you're more likely to lose the race on a more heavily loaded system, which is exactly when you most want to avoid replaying complex transactions.) The INSERT/UPDATE problem is an inherent part of the SQL language. There is no particularly direct way in standard SQL to ensure that a particular (unique) piece of data is in a table. The most consistently reliable way to handle this is to have an integrity constraint, try the INSERT, and then possibly do an UPDATE if the INSERT fails. I'm told this is the preferred strategy on other DB systems, and it's the only one that avoids any kind of race condition. Furthermore, it is usually much faster than doing a pre-SELECT. (If the data isn't already there, the INSERT is clearly fastest, if it is, the INSERT will typically fail and return an error more quickly than a SELECT would complete. Other strategies are more efficient if the data is usually already there.) PostgreSQL's behavior essentially forces you to do BOTH of the following: * SELECT first to see whether to INSERT or UPDATE * AND be prepared to replay the _entire_ transaction when you lose the race. You will sometimes lose the race, so the second step is not optional. I suspect many complex PostgreSQL applications have infrequent failures precisely because they aren't prepared to replay entire transactions (with the correct INSERTs converted to UPDATEs). For a complex transaction which must ensure certain data exists in several tables, this gets pretty ugly. Basically, the PostgreSQL developers have decided that any integrity violation is a serious error; therefore, PostgreSQL does not really permit tentative INSERTs within transactions. This violates SQL conventions that are pretty well-established in some circles, needlessly complicates applications that use complex transactions and introduces a fairly minor performance issue. - Tim
Tim Kientzle wrote: > ... > However, I do agree that a failed INSERT should > not automatically rollback the entire transaction. > INSERTs often fail due to integrity constraints. > The common suggestion of doing a test SELECT > first to determine whether to INSERT or UPDATE > is simple nonsense; that just creates a race > condition... > ... > PostgreSQL's behavior essentially forces you to > do BOTH of the following: > * SELECT first to see whether to INSERT or UPDATE > * AND be prepared to replay the _entire_ transaction > when you lose the race. > You will sometimes lose the race, so the second > step is not optional. > ... How about: CREATE TABLE foo (key int4, value text); Then, in the client application, INSERT INTO foo SELECT 1, 'Hello' WHERE NOT EXISTS (SELECT key FROM foo WHERE key = 1); The insert will then insert either (1) a single row if the row did not already exist, or (2) no rows if the row already existed. If the latter, you may then update the row using an update. An UPDATE of zero rows would suggest that the row was deleted between the time of the INSERT and the time of the UPDATE, so one would either have to loop or take the appropriate action for when another user wishes to delete the row. And even that possibility is dependent upon the selected TRANSACTION ISOLATION LEVEL. But it would *not* cause an error in a transaction block forcing a rollback of the entire transaction. Mike Mascari
Mike Mascari <mascarm@mascari.com> writes: > How about: > INSERT INTO foo SELECT 1, 'Hello' WHERE NOT EXISTS (SELECT > key FROM foo WHERE key = 1); Cute idea, but it doesn't eliminate the race-condition problem. If two backends execute this code concurrently, both will find that the subselect returns no rows, and both will try to do the INSERT. If you have a unique index on the table, then one will succeed and the other will get an error (after waiting to see if the first commits or not). There's no doubt that savepoints within transactions would be a good improvement to Postgres, and I believe Vadim has hopes of making that happen in 7.2 or 7.3. But in the meantime, transactions without savepoints are a lot better than "transactions" that fail to notice errors at all. Any MySQL partisans who contend otherwise just plain Don't Get It. regards, tom lane
At 11:14 PM 12/29/00 -0500, Mike Mascari wrote: >INSERT INTO foo SELECT 1, 'Hello' WHERE NOT EXISTS (SELECT >key FROM foo WHERE key = 1); > >The insert will then insert either (1) a single row if the >row did not already exist, or (2) no rows if the row already >existed. If the latter, you may then update the row using an Will that really work? I've tried something like that before and it didn't. This is because the stuff inserted may not be seen by selects of other transactions until the transaction has been committed (depends on your transaction isolation level, but the default behaves that way, and in this case the default is recommended!). Try starting up two psqls, do a begin in each, then do your query in both transactions, then only do a commit in both. I strongly suspect you'll see duplicate records. Anyone relying on this method should change it "Real Soon Now". The options I see are 1) Use a UNIQUE to constrain stuff. I don't like this as it means rolling back stuff, and in some cases you may have to redo almost everything just to do an update instead, so you may have to store the data elsewhere first, instead of stuffing it into the database as you go. Also I prefer database errors to be exceptions, not something that occurs normally. Maybe I'm strange but I dislike the idea of _using_ errors. Seems rather kludgy, and hints that there is a design issue somewhere. Dealing with errors is of course ok. 2) Lock the table, do the select, then insert, then unlock table (in Postgresql that means rollback/commit, can't unlock earlier). 3) Use a different locking mechanism. I've suggested adding a lock on an arbitrary string feature to postgresql much like the one in MySQL (get lock). So what you would be able to do is begin transaction do lots of stuff lock string "table=foo,key=1" select count(*) from foo where key=1 if count=0 insert new row (key=1) link row to other tables update the other tables else update row where key=1. update the other tables. fi commit (and release lock) This method will not block everybody, unlike locking the entire table. It just blocks those about to do the same thing. Other people can insert,foo,key=2 and it doesn't matter, full speed ahead ;). When you port your application to less capable databases, you lock the entire table instead, no need for a huge rewrite. Trouble is, I seem to be the only one interested in this feature. I'm not confident that my coding skills are up to scratch for adding this feature in myself. I'd personally try to copy and adapt the lock table mechanism, but I haven't even looked at it yet to see whether that is crazy or viable :). It involves touching some core stuff. Still, it may be easier to do this than adding transaction savepoints. What do the Postgresql developers think? Don't you think it'll be cool to be able to: lock string "Working deep magic type 2 on e-commerce tables" ;). Hmm, hierachical locks would be interesting but rather academic. I'll be happy with just lock string :). Cheerio, Link. So much procastination left to do ;).
Lincoln Yeoh wrote: > Also I prefer database errors to be exceptions, not > something that occurs normally. Maybe I'm strange > but I dislike the idea of _using_ errors. Seems > rather kludgy, and hints that there is a design > issue somewhere. Unfortunately, in this case, the design issue is in the SQL language itself which does not provide good support for this common operation (atomically ensuring that a unique data item is in a table). Unfortunately, it's a little late to change that... ;-) An "error" is just a return code that indicates what happened. If return codes are well designed, then they are useful and should be used. Don't be fooled by the word "error." It's only an error if the program ends up doing the wrong thing. ;-) - Tim
Tim Kientzle wrote: > [...] > > Basically, the PostgreSQL developers have decided > that any integrity violation is a serious error; > therefore, PostgreSQL does not really permit > tentative INSERTs within transactions. This violates > SQL conventions that are pretty well-established > in some circles, needlessly complicates > applications that use complex transactions > and introduces a fairly minor performance issue. We haven't, it's just that our code inherited this feature by the way, ROLLBACK is implemented - and if you'd know only a little about the storage management in PostgreSQL, you'd never tell it the way you did. With the current implementation of tuple visibility there is only all-or- nothing. Savepoints will get us out of there, but not before 7.2 or 7.3. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #