Thread: Re: Revisited: Transactions, insert unique.
Hi Lincoln, I'm not sure I'm understanding your question, but it seems like this is something that ought to be handled programmatically. That is, query the table to see if the row exists, then decide what you are going to do (insert or update) based on the results of your query. Am I completely missing the point? David Boerwinkle -----Original Message----- From: Lincoln Yeoh <lylyeoh@mecomb.com> To: pgsql-general@postgresql.org <pgsql-general@postgresql.org> Date: Monday, April 24, 2000 1:13 AM Subject: [GENERAL] Revisited: Transactions, insert unique. >Hi, > >Previously I wanted to ensure that I am inserting something unique into a >table, the answer was to create a unique index on the relevant columns. > >But what if I don't want to get an error which would force a rollback? Say >I want to insert something if it doesn't already exist, but update it if it >does. > >Do I have to lock the whole table? > >Would it be a good idea to be able to request a lock on an arbitrary string >like in MySQL? Then I could perhaps do something like > >LOCK HANDLE('max255charstring',TimeoutInSeconds) >e.g. >LOCK HANDLE('mytable,field1=x,field2=y',10) > >Then I could control access to a row that may not even exist, or do other >snazzy transaction stuff. > >Cheerio, >Link. >
davidb@vectormath.com wrote: > > Hi Lincoln, > > I'm not sure I'm understanding your question, but it seems like this is > something that > ought to be handled programmatically. That is, query the table to see if > the row exists, > then decide what you are going to do (insert or update) based on the results > of your > query. Good point. And you can combine the check David suggests with the insert statement, e.g., INSERT INTO mytable (id, ...) SELECT 7, ... FROM mytable WHERE NOT EXISTS (SELECT * FROM mytable WHERE id = 7) And then check the return result for number of rows inserted. '0' means an update is needed. I don't remember if there is cleaner more efficient manner for doing that, but probably so... Regards, Ed Loehr > David Boerwinkle > > -----Original Message----- > From: Lincoln Yeoh <lylyeoh@mecomb.com> > To: pgsql-general@postgresql.org <pgsql-general@postgresql.org> > Date: Monday, April 24, 2000 1:13 AM > Subject: [GENERAL] Revisited: Transactions, insert unique. > > >Hi, > > > >Previously I wanted to ensure that I am inserting something unique into a > >table, the answer was to create a unique index on the relevant columns. > > > >But what if I don't want to get an error which would force a rollback? Say > >I want to insert something if it doesn't already exist, but update it if it > >does. > > > >Do I have to lock the whole table? > > > >Would it be a good idea to be able to request a lock on an arbitrary string > >like in MySQL? Then I could perhaps do something like > > > >LOCK HANDLE('max255charstring',TimeoutInSeconds) > >e.g. > >LOCK HANDLE('mytable,field1=x,field2=y',10) > > > >Then I could control access to a row that may not even exist, or do other > >snazzy transaction stuff. > > > >Cheerio, > >Link. > >
Today, in a message to pgsql-general, David Boerwinkle wrote: > > it seems like this is something that ought to be handled > programmatically. That is, query the table to see if the row exists, > then decide what you are going to do (insert or update) based on the > results of your query. It certainly 'can' be handled the way you describe, but to say that it 'ought' to be handled this way is going too far. It is common practice in database programming to simply try the most likely case and fall back to alternatives when an error is encountered. For example, if one expects 99% of inserts to be unique one may simply try the insert and when this fails because of a duplicate key error one can update instead. This is slightly more efficient than doing the extra query in 100% of cases. In any case, if I'm not mistaken the SQL standard permits an automatic rollback only for deadlock errors and equivalent types of errors where the rollback may be required to resolve a lockup situation. Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
Joachim Achtzehnter wrote: > > Today, in a message to pgsql-general, David Boerwinkle wrote: > > > > it seems like this is something that ought to be handled > > programmatically. That is, query the table to see if the row exists, > > then decide what you are going to do (insert or update) based on the > > results of your query. > > It certainly 'can' be handled the way you describe, but to say that it > 'ought' to be handled this way is going too far. It is common practice in > database programming to simply try the most likely case and fall back to > alternatives when an error is encountered. For example, if one expects 99% > of inserts to be unique one may simply try the insert and when this fails > because of a duplicate key error one can update instead. This is slightly > more efficient than doing the extra query in 100% of cases. More efficient, yes. However, given the lack of "statement-only" aborts and the state of pg error codes (all strings, no numeric codes), the programmatic/query check appears not only compulsory but simpler code-wise. Regards, Ed Loehr
Today, in a message to Joachim Achtzehnter, Ed Loehr wrote: > > More efficient, yes. However, given the lack of "statement-only" > aborts and the state of pg error codes (all strings, no numeric > codes), the programmatic/query check appears not only compulsory but > simpler code-wise. Given current limitations of the postgresql implementation, you are certainly right: there is no other choice. I was merely countering the suggestion that there was something wrong with the approach of simply trying the insert first. It is a perfectly valid approach when used with an SQL92 compliant database. We just have to live without it until postgresql improves on this point. This is certainly not a show stopper for most of us unless perhaps when somebody has to port a ton of code from another database :-( Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
On Mon, Apr 24, 2000 at 11:01:57AM -0700, Joachim Achtzehnter wrote: > [...] It is a perfectly valid approach when used with an SQL92 compliant > database. We just have to live without it until postgresql improves > on this point. This is certainly not a show stopper for most of us > unless perhaps when somebody has to port a ton of code from another > database :-( I'm going to jump in here, because this is a particular axe I grind: I've bent my brain around the SQL92 standards docs, and there's _no_ requirement for this type of behavior on error. Yes, it's a useful thing to have, and yes, all the bigname commercial RDBMS's work that way, but that doesn't mean postgres isn't SQL92 compliant on that point (it misses on other points, though). So, go ahead and complain, I agree it's a pain for those porting code. But don't say it's a standards issue, until you can point to chapter and verse to defend your position. Ross (touchy about SQL92 standards compliance, for some reason) -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
"Ross J. Reedstrom" wrote: > > On Mon, Apr 24, 2000 at 11:01:57AM -0700, Joachim Achtzehnter wrote: > > > [...] It is a perfectly valid approach when used with an SQL92 compliant > > database. We just have to live without it until postgresql improves > > on this point. This is certainly not a show stopper for most of us > > unless perhaps when somebody has to port a ton of code from another > > database :-( > > I'm going to jump in here, because this is a particular axe I grind: > > I've bent my brain around the SQL92 standards docs, and there's _no_ > requirement for this type of behavior on error. Yes, it's a useful thing > to have, and yes, all the bigname commercial RDBMS's work that way, but that > doesn't mean postgres isn't SQL92 compliant on that point (it misses on > other points, though). So, go ahead and complain, I agree it's a pain for > those porting code. But don't say it's a standards issue, until you can > point to chapter and verse to defend your position. Fair enough. My "non-std" comment an unfortunate choice of words. I was alluding to pgsql "doing it differently" from everyone else (i.e., in a "non-std manner"), not to a violation of an "official" SQL standard. My recollection of the recent SQL standards discussion on this point was that at least one standards document (which one?) was ambiguous on this point. Regards, Ed Loehr
Today, in a message to pgsql-general, Ross J. Reedstrom wrote: > > I've bent my brain around the SQL92 standards docs, and there's _no_ > requirement for this type of behavior on error. Don't have access to the actual standard text, perhaps somebody who has can confirm whether the following quotes from an earlier draft (identified by the code X3H2-92-154/DBL CBR-002) are also in the final text. In section 4.10.1 (Checking of constraints) I find this: When a constraint is checked other than at the end of an SQL- transaction, if it is not satisfied, then an exception condition is raised and the SQL-statement that caused the constraint to be checked has no effect other than entering the exception information into the diagnostics area. An automatic rollback of the whole transaction in response to a violated primary key constraint is hardly consistent with the "no effect" requirement expressed here. The following passages from section 4.28 (SQL-transactions) also very strongly imply that an automatic rollback should not occur except in circumstances where there is no choice (serialization failure and unrecoverable errors): The execution of a <rollback statement> may be initiated implicitly by an implementation when it detects the inability to guarantee the serializability of two or more concurrent SQL-transactions. When this error occurs, an exception condition is raised: transaction rollback-serialization failure. The execution of a <rollback statement> may be initiated implicitly by an implementation when it detects unrecoverable errors. When such an error occurs, an exception condition is raised: transaction rollback with an implementation-defined subclass code. The execution of an SQL-statement within an SQL-transaction has no effect on SQL-data or schemas other than the effect stated in the General Rules for that SQL-statement, in the General Rules for Subclause 11.8, "<referential constraint definition>", and in the General Rules for Subclause 12.3, "<procedure>". Perhaps, you can make the argument that an automatic rollback in all error situations is compliant by claiming that all errors are unrecoverable. In my view this is definitely against the spirit of the standard. As you said yourself, all big-name databases behave according to my interpretation, hence it is understandable that the authors of the standard didn't see a need to spell this out more explicitly. Joachim -- joachim@kraut.bc.ca (http://www.kraut.bc.ca) joachim@mercury.bc.ca (http://www.mercury.bc.ca)
Joachim Achtzehnter wrote: > > Today, in a message to pgsql-general, Ross J. Reedstrom wrote: > > > > I've bent my brain around the SQL92 standards docs, and there's _no_ > > requirement for this type of behavior on error. > > Don't have access to the actual standard text, perhaps somebody who has > can confirm whether the following quotes from an earlier draft (identified > by the code X3H2-92-154/DBL CBR-002) are also in the final text. > > In section 4.10.1 (Checking of constraints) I find this: > > When a constraint is checked other than at the end of an SQL- > transaction, if it is not satisfied, then an exception condition > is raised and the SQL-statement that caused the constraint to be > checked has no effect other than entering the exception > information into the diagnostics area. > > An automatic rollback of the whole transaction in response to a violated > primary key constraint is hardly consistent with the "no effect" > requirement expressed here. > > The following passages from section 4.28 (SQL-transactions) also very > strongly imply that an automatic rollback should not occur except in > circumstances where there is no choice (serialization failure and > unrecoverable errors): > > The execution of a <rollback statement> may be initiated implicitly > by an implementation when it detects the inability to guarantee the > serializability of two or more concurrent SQL-transactions. When > this error occurs, an exception condition is raised: transaction > rollback-serialization failure. > > The execution of a <rollback statement> may be initiated implicitly > by an implementation when it detects unrecoverable errors. When > such an error occurs, an exception condition is raised: transaction > rollback with an implementation-defined subclass code. > > The execution of an SQL-statement within an SQL-transaction has > no effect on SQL-data or schemas other than the effect stated in > the General Rules for that SQL-statement, in the General Rules > for Subclause 11.8, "<referential constraint definition>", and > in the General Rules for Subclause 12.3, "<procedure>". > > Perhaps, you can make the argument that an automatic rollback in all error > situations is compliant by claiming that all errors are unrecoverable. In > my view this is definitely against the spirit of the standard. As you said > yourself, all big-name databases behave according to my interpretation, > hence it is understandable that the authors of the standard didn't see a > need to spell this out more explicitly. I found that pretty informative. I dug up the previous conversation on this, in which Tom Lane cited section 3.3.4.1 (of what std?). Its emphasis on *statements* as opposed to *transactions* suggests to me that aborting the transaction is beyond the scope of what they had in mind, though I admittedly don't fully understand the jargon here... The phrase "an exception condition is raised:", followed by the name of a condition, is used in General Rules and elsewhere to indicate that the execution of a statement is unsuccessful, ap- plication of General Rules, other than those of Subclause 12.3, "<procedure>", and Subclause 20.1, "<direct SQL statement>", may be terminated, diagnostic information is to be made available, and execution of the statement is to have no effect on SQL-data or schemas. The effect on <target specification>s and SQL descriptor areas of an SQL-statement that terminates with an exception condi- tion, unless explicitly defined by this International Standard, is implementation-dependent. Regards, Ed Loehr
On Mon, Apr 24, 2000 at 01:10:55PM -0700, Joachim Achtzehnter wrote: > > Perhaps, you can make the argument that an automatic rollback in all error > situations is compliant by claiming that all errors are unrecoverable. In > my view this is definitely against the spirit of the standard. As you said > yourself, all big-name databases behave according to my interpretation, > hence it is understandable that the authors of the standard didn't see a > need to spell this out more explicitly. > Joachim - I see you haven't done much Language Lawyering, have you? There is no such thing as the 'spirit' of the standard, only the written document. ;-) This is exactly my argument, with regard to errors and the standard: _which_ errors are considered unrecoverable is not spelled out in the standard, therefore, it is implementation defined. The fact the the definition chosen by postgresql is inconvenient for users of the database is, I agree, unfortunate, but it doesn't stand in the way of us claiming compliance, which is the name of the game for these sort of standards. Note that postgres is careful not to _automatically_ rollback: the standard (as you quoted) indicated only certain conditions that allow for an implicit rollback of that sort. Postgres just won't let you do anything else in the current transaction. Yes, it's splitting hairs, but if you dig into any of the 'bigname' DBs, you'll find similar split ends. Often, the end is able to be split, i.e. the language in the standard is ambigious, _because_ the commercial DB had representitives on the committee, making sure the standard didn't get too far from their exisiting implementation. I might even argue that the original definition was a good, conservative choice, for the early days of postgres as a research database: you _know_ people have been messing with the server code, and if something throws an error, bailing out is the safest course. Now that the core developers have done an amazing job at cleaning up and stabilizing the code, a more relaxed attitude toward certain classes of errors is desirable. There's been a fair amount of discussion about cleaning up (defining!) the error codes returned, as well, so a complete overhaul may be in the works. That'd clearly be the time to fix this up. I beleive it's already on the TODO list. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
To All - O.K., I give! I'm surprised at the number of people willing to dig into the standards docs, and try to come up with justifications. Note the fact that whole paragraphs must be quoted in order to get at the general tone of the standard supports my argument that the behavior on error is _not_ spelled out in the standard: the exact case we're talking about is almost conspicious by it's absence, given the number of other cases covered, and the depth of the coverage. The rest of the standard is written in such away that I keep thinking that all there Exception Conditions must be defined somewhere, with appropriate actions, but, to the best of my knowledge, they're not. Makes me think there must have been a big commercial DB without good error recovery with representitives on the committee ;-) Suffice to say, the _right_ thing to do is make the users happy: now that postgres is more robust, there's little reason not to allow capture or 'recoverable' errors, and allow the DB developer to decide when to rollback. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
Today, in a message to pgsql-general, Ross J. Reedstrom wrote: > > I see you haven't done much Language Lawyering, have you? No, and I have no intention of getting into it now :-) > There is no such thing as the 'spirit' of the standard, only the > written document. ;-) The spirit of the standard comes into play when people who are not Language Lawyers try to decide how something should work that is not spelled out explicitly, but where the standard text contains suggestive statements that imply that the authors assumed something without spelling it out, because they thought everybody reading the standard would agree on this as a matter of course. Of course, as soon as somebody comes along who has some motivation to make a contrary assumption, perhaps to claim compliance, the fact that the assumption was not spelled out leads to the kinds of arguments we are having. > This is exactly my argument, with regard to errors and the standard: > _which_ errors are considered unrecoverable is not spelled out in the > standard, therefore, it is implementation defined. The fact the the > definition chosen by postgresql is inconvenient for users of the database > is, I agree, unfortunate, but it doesn't stand in the way of us claiming > compliance, which is the name of the game for these sort of standards. This is precisely NOT the game I'm playing! I don't care whether something is technically 100% compliant or not. I do care a lot about improving a free software database management system that is in the same league as the big-name databases. The reason I entered this discussion was not to discuss whether postgresql is or is not 100% compliant with SQL92. Supporting statement level aborts is a feature that should be supported at some point, and this talk about the current practice somehow being compliant with the letter of the standard doesn't help. > Note that postgres is careful not to _automatically_ rollback: the > standard (as you quoted) indicated only certain conditions that allow for > an implicit rollback of that sort. The standard is very explicit about some types of errors, namely constraint violations, where it says that this must have no effect except an entry in the diagnostics area. It is precisely these errors where one would like to be able to continue the transaction. Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
On Mon, Apr 24, 2000 at 02:41:46PM -0700, Joachim Achtzehnter wrote: > > The spirit of the standard comes into play when people who are not > Language Lawyers try to decide how something should work that is not > spelled out explicitly, but where the standard text contains suggestive > statements that imply that the authors assumed something without spelling > it out, because they thought everybody reading the standard would agree on > this as a matter of course. Of course, as soon as somebody comes along who > has some motivation to make a contrary assumption, perhaps to claim > compliance, the fact that the assumption was not spelled out leads to the > kinds of arguments we are having. While I agree with you in theory, that while many more casual standards documents need to be read this way, the SQL standards are highly engineered, passing through multiple national and international committee bodies. Heck, any document that goes to the trouble to define the BNF for <simple Latin letter>, let alone <digit> (section 5.1), clearly aspires to being complete in and of itself. If that is so, omissions are as significant as inclusions. As to other motives, the complete hash that these same bodies have made of the SQL3 spec. leads me to believe that every possible contrary assumption is already present. > > is, I agree, unfortunate, but it doesn't stand in the way of us claiming > > compliance, which is the name of the game for these sort of standards. > > This is precisely NOT the game I'm playing! I don't care whether something > is technically 100% compliant or not. I do care a lot about improving a > free software database management system that is in the same league as the > big-name databases. > > The reason I entered this discussion was not to discuss whether PostgreSQLql > is or is not 100% compliant with SQL92. Supporting statement level aborts > is a feature that should be supported at some point, and this talk about > the current practice somehow being compliant with the letter of the > standard doesn't help. But it doesn't hurt (much). This is why we're having this discussion on GENERAL, and not HACKERS: the developers have already agreed that the error system needs an overhaul, mostly to provide the interface writers with consistent error numbers, rather than the current text strings. Inclusion of the ability to ignore some errors will happen. I would not have started this branch of the discussion if the original complaint had not ventured from 'other DBMSs' to 'SQL92 compliant DBMSs' I was _very_ specific that the _only_ thing I disagree with in this is being careful to not provide the enemy with ammunition, as it were, and over interpret the standard to PostgreSQL's detriment. This is why _not_ having this discussion can hurt. In order to aid acceptance of PostgreSQL into many enviroments, being able to play the 'technically SQL92 compliant' card, without having to cross your fingers behind your back, is very important. Heck, I'd be wrestling with Oracle right now, and had a lot less grant money to put into the hardware for my server, if I hadn't been able to play the 'mostly SQL92 compliant, and climbing' card. > > The standard is very explicit about some types of errors, namely > constraint violations, where it says that this must have no effect except > an entry in the diagnostics area. It is precisely these errors where one > would like to be able to continue the transaction. > And this interpretation will guide the developers in _extending_ the standard in a consistent way. I know, because the developers that implemented the constraints for 7.0 used this (and the SQL3 spec) as guides. How's that? Ross P.S. I think we're in (quasi) violent agreement, don't you? -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> -----Original Message----- > From: pgsql-general-owner@hub.org [mailto:pgsql-general-owner@hub.org]On > Behalf Of Ross J. Reedstrom > > And this interpretation will guide the developers in _extending_ > the standard in a consistent way. I know, because the developers that > implemented the constraints for 7.0 used this (and the SQL3 spec) as > guides. How's that? > I don't know what is standard. However as far as I see,few people prefer entire rollback on abort. The problem is that PostgreSQL lacks a per statement rollback functionality and unfortunately it isn't easy to implement. Vadim has already planned the implementation. AFAIK one of the purpose of WAL is to implement savepoint functionality. Savepoint functionality would enable per statement rollback functionality easily. The following is an extract of Vadim's posting about 10 months ago. Re: [HACKERS] RE: [GENERAL] Transaction logging Well, I'm thinking about WAL last two weeks. Hiroshi pointed me problems in my approach to savepoints (when a tuple was marked for update and updated after it) and solution would require new tid field in header and both t_cmin/t_cmax => bigger header. I don't like it and so I switched my mind -:). Regards. Hiroshi Inoue Inoue@tpf.co.jp
Hi David, That can work if there's only one session using the database. But it's not 100% if there are multiple users. There's a small chance that a row may not exist during the select, but exist by the time of the insert. If I'm wrong please correct me - then I'll go optimize some code :). By having the unorthodox locking mechanism suggested I can ensure at the application level that no one else is going to insert stuff before my select, update/insert, without having to lock the whole table. So it will be LOCK arbitrary select if exist update else insert UNLOCK arbitrary Which would be faster- doing the lock arbitrary method, or doing an insert with unique indexes and recovering if necessary (assuming postgresql does what other databases do)? I suspect unique indexes could slow inserts and updates down significantly. If we don't want to do all that, how about we have a select for insert (and update), which locks things? But I figured that it would be problematic to implement in a number of scenarios tho. Cheerio, Link. At 09:56 AM 24-04-2000 -0500, davidb@vectormath.com wrote: >Hi Lincoln, > >I'm not sure I'm understanding your question, but it seems like this is >something that >ought to be handled programmatically. That is, query the table to see if >the row exists, >then decide what you are going to do (insert or update) based on the results >of your >query. > >Am I completely missing the point? > >David Boerwinkle
Lincoln Yeoh wrote: > > Hi David, > > That can work if there's only one session using the database. But it's not > 100% if there are multiple users. There's a small chance that a row may not > exist during the select, but exist by the time of the insert. If I'm wrong > please correct me - then I'll go optimize some code :). > > By having the unorthodox locking mechanism suggested I can ensure at the > application level that no one else is going to insert stuff before my > select, update/insert, without having to lock the whole table. > > So it will be > LOCK arbitrary > select > if exist update > else insert > UNLOCK arbitrary > > Which would be faster- doing the lock arbitrary method, or doing an insert > with unique indexes and recovering if necessary (assuming postgresql does > what other databases do)? I suspect unique indexes could slow inserts and > updates down significantly. > > If we don't want to do all that, how about we have a select for insert (and > update), which locks things? But I figured that it would be problematic to > implement in a number of scenarios tho. PostgreSQL implements SELECT...FOR UPDATE to allow for the sequence you'be described: Session 1: BEGIN; SELECT webuser FROM webusers WHERE webuser = 'webuser1'; Session 2: BEGIN; UPDATE webusers SET webuser = 'webuser2' WHERE webuser = 'webuser1';* *At this point Session 2 blocks, waiting for Session 1 to complete. This prevents the race condition you've described and only locks those rows which were selected by Session 1's SELECT. With MVCC, table-level locking is largely a thing of the past. The MVCC documentation describes this in detail: http://www.postgresql.org/docs/postgres/mvcc4496.htm Hope that helps, Mike Mascari
Maybe it's late, maybe it'll all be clear to me in the morning, but I seem to have lost the destroydb command after installing PostgreSQL 7.0 RC1. I'm still a newbie to PostgreSQL, but I believe destroydb should be executed at the command line, meaning there should be a destroydb command somewhere on my HD. I swear it isn't there. Appreciate any help. -=michael=- ***************************************************** * Michael S. Kelly * 4800 SW Griffith Dr., Ste. 202 * Beaverton, OR 97005 USA * voice: (503)644-6106 x122 fax: (503)643-8425 * <michaelk@axian.com> * http://www.axian.com/ ***************************************************** * Axian: Software Consulting and Training *****************************************************
Today, in a message to pgsql-general, Ross J. Reedstrom wrote: > > the developers have already agreed that the error system needs an > overhaul, ... Inclusion of the ability to ignore some errors will > happen. This is good to hear. I suspect, though, that to support statement-level aborts more is needed than an overhaul of the error system. Statement level aborts are essentially a form of nested transaction, i.e. the ability to rollback a precisely defined subset of the top-level transaction. > I was _very_ specific that the _only_ thing I disagree with in this > is being careful to not provide the enemy with ammunition, as it were, > and over interpret the standard to PostgreSQL's detriment. The fact that some vendors of proprietary systems think in these terms is no reason for the postgresql community to do the same. Unlike them we don't HAVE to sell the system at all cost to be successful. > This is why _not_ having this discussion can hurt. Don't agree. > being able to play the 'technically SQL92 compliant' card, without > having to cross your fingers behind your back, is very important. But you are essentially doing that, crossing your fingers I mean, by relying on a shaky interpretation solely to be able to claim compliance, while you know full well that the alternative interpretation is the better one. > Heck, I'd be wrestling with Oracle right now, Well, since you mention Oracle: A few years ago I had a similar argument with them about their peculiar interpretation of what SERIALIZABLE in SQL92 means. The early versions of Oracle's multi-version concurrancy mechanism didn't provide truely serializable transactions, yet they claimed it did by resorting to rather twisted interpretations of the standard. I didn't think this was acceptable then, and I certainly don't condone similar window dressing today. Postgresql doesn't need this kind of PR. > P.S. I think we're in (quasi) violent agreement, don't you? Well, we do seem to agree on some important points, like the fact that statement-level aborts are important to have. We seem to be talking past each other in other respects. Let me get back to my contention that statement-level aborts, or atomicity on the level of SQL statements, are considered an obvious requirement in the database community. Because everybody assumes this to be the case, it is hard to find it explicitly mentioned. Let me quote from Gray and Reuter's "Transaction Processing", considered the bible on the subject by some: Even current database systems ... do not rely on nested transactions for their own implementation. This is quite surprising, because nesting the scope of commitment and backout is commonplace even in today's SQL systems (although the user cannot influence it). This is particularly true for SQL update statements that are executed as the subtransactions. Think of this transaction as the root transaction and the SQL statements as the subtransactions. Obviously, an SQL update statement commits its modifications to the top-level transaction. If it fails (assume an INSERT being rejected because a unique value condition is violated), it is implicitly undone and so appears to be atomic even in the case of a failure. In other words, update SQL statements have all the properties of subtransactions, but since the implementation techniques are typically ad hoc rather than guided by the idea of transaction nesting, the general mechanism is not available for the application programmer. Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
>I'm still a newbie to PostgreSQL, but I believe destroydb should be executed >at the command line, meaning there should be a destroydb command somewhere >on my HD. I swear it isn't there. > >Appreciate any help. I do not want to insult you, but did you try: > find / -name destroydb Regards, Stephan -- Stephan Richter - (901) 573-3308 - srichter@cbu.edu CBU - Physics & Chemistry; Framework Web - Web Design & Development PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391
At 09:16 PM 4/24/00 -0700, Michael S. Kelly wrote: >Maybe it's late, maybe it'll all be clear to me in the morning, but I seem >to have lost the destroydb command after installing PostgreSQL 7.0 RC1. I just checked. It is called dropdb now, to be consistent with other commands. Regards, Stephan -- Stephan Richter - (901) 573-3308 - srichter@cbu.edu CBU - Physics & Chemistry; Framework Web - Web Design & Development PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391
At 10:17 PM 24-04-2000 -0400, Mike Mascari wrote: >Lincoln Yeoh wrote: >PostgreSQL implements SELECT...FOR UPDATE to allow for the >sequence you'be described: > >Session 1: > >BEGIN; >SELECT webuser FROM webusers WHERE webuser = 'webuser1'; > >Session 2: > >BEGIN; >UPDATE webusers SET webuser = 'webuser2' WHERE webuser = >'webuser1';* The 6.5.3 running on my site does not block for the case I described which has an INSERT after the select. begin; select from mytable where field=value for update; if rows=0 { insert into mytable (field,field2) values (value,value2); } else { update mytable set field=value, field2=value2; } commit; Tested on 6.5.3. Did a select for update in two concurrent transactions, they did not block, and both inserts went through (if there was a unique, one of the transactions would have had to be rolled back and redone from scratch). If the behaviour is different for 7.0, I'm interested! Cheerio, Link.
On Mon, Apr 24, 2000 at 09:23:48PM -0700, Joachim Achtzehnter wrote: > > This is good to hear. I suspect, though, that to support statement-level > aborts more is needed than an overhaul of the error system. Statement > level aborts are essentially a form of nested transaction, i.e. the > ability to rollback a precisely defined subset of the top-level > transaction. Hmm, sounds like your moving the goal posts: now you want checkpoints, no? The problem under discussion is allowing the tranaction to procede when an error occurs that both the programmer and the DBMS knows leaves the system in a consistent state: a simple SQL parse error, for example, or an attempt to violate a 'unique' constraint on an INSERT. Peter Eisentraut already offered a very high level, potential dangerous one line patch, to deal with the 'table not found' case. Digging into the code a little, I think this will also handle the SQL parse error case correctly, but may have a problem with constaint case: I've recompiling code right now to test it. ;-) (Later) Hmm, not good: the bad tuple ends up going live, not immediately, but after the next select. I think I better find that old message and follow up with this info, so it's in the mailing list archives, at least. > > > being able to play the 'technically SQL92 compliant' card, without > > having to cross your fingers behind your back, is very important. > > But you are essentially doing that, crossing your fingers I mean, by > relying on a shaky interpretation solely to be able to claim compliance, > while you know full well that the alternative interpretation is the better > one. Right, but often one is in a situation where technical requirements are being enforced by beaurocracies for non-technical reasons: if I _know_ that the product is good enough for the task at hand, and there exists no mechanism to get an exception to the requirement, 'technical compliance' serves my purposes. Basically, if management is going to play games, I'm going to outplay them, if I have to. > > > Heck, I'd be wrestling with Oracle right now, > > Well, since you mention Oracle: A few years ago I had a similar argument > with them about their peculiar interpretation of what SERIALIZABLE in > SQL92 means. The early versions of Oracle's multi-version concurrancy > mechanism didn't provide truely serializable transactions, yet they > claimed it did by resorting to rather twisted interpretations of the > standard. > > I didn't think this was acceptable then, and I certainly don't condone > similar window dressing today. Postgresql doesn't need this kind of PR. > The difference between the Oracle case, and the Postgresql case is that the corporation tried to give you that twisting of the standard as the _only_ answer: What I'm saying is, well, technically, this _is_ compliant, but yes, it's not the right way to do it, and we're working on it, and by the way, how do you want it to work, and here's the code, if we can't get to it fast enough, maybe you can? I'm just asking you not to make Postgresql jump a higher _political_ bar than its competitors. Hmm, I find it odd to be on the 'practical, business' side of one of these arguments: I'm usually the one pushing the 'theoretical, academic' side. One thing I should mention: PostgreSQL's overall transaction semantics _are not_ SQL92 standard compliant, in that the client must explicitly start the first transaction. the 'autocommit' mode is a non-SQL92 extension, which is also commonly available in other systems. Having it be the default is non-standard, however. > > Let me get back to my contention that statement-level aborts, or atomicity > on the level of SQL statements, are considered an obvious requirement in > the database community. Because everybody assumes this to be the case, it > is hard to find it explicitly mentioned. Let me quote from Gray and > Reuter's "Transaction Processing", considered the bible on the subject by > some: > > Even current database systems ... do not rely on nested transactions for > their own implementation. This is quite surprising, because nesting the > scope of commitment and backout is commonplace even in today's SQL > systems (although the user cannot influence it). This is particularly > true for SQL update statements that are executed as the subtransactions. > Think of this transaction as the root transaction and the SQL statements > as the subtransactions. Obviously, an SQL update statement commits its > modifications to the top-level transaction. If it fails (assume an > INSERT being rejected because a unique value condition is violated), it > is implicitly undone and so appears to be atomic even in the case of a > failure. In other words, update SQL statements have all the properties > of subtransactions, but since the implementation techniques are > typically ad hoc rather than guided by the idea of transaction nesting, > the general mechanism is not available for the application programmer. > Doesn't this quote mearly bemoan the fact that nesting of transactions is _not_ a general requirement of "even today's SQL systems"? I can't follow the example, which seems to move back and forth between an UPDATE and an INSERT statement. Are they suggesting that the UPDATE could be implemented as a nested transaction containing a DELETE and an INSERT, and if the INSERT fails, the DELETE must rollback? More likely, I suppose, is that the context is of an overall transaction, and by 'SQL update statements' they mean any SQL statement that alters the data in the tables. With that interpretation, I agree it supports your argument that "even today's SQL systems" (I can hear their academic disdain in that, can't you?) support an "ad hoc" form of transaction nesting, which postgresql does not. Hmm, on further reflection, I'm going to come over to your side of this one: while I reserve the right to twist the standards for those who play standards compliance games, I will agree that much of the verbage in the standard make the _most_ sense when statement level aborts are assumed to be present. Ross P.S. Thanks for having the patience to bring me around on this one. -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
I have not researched this issue thoroughly, or even superficially, but I understand that PostgreSQL has something called multi-version concurrency control (MVCC). This is, I'm told, a GOOD THING, and I believe it. But the documentation and tutorials on MVCC are far too terse for me to feel comfortable with my understanding. If both transactions (yours and your competitor's) are serializable, then you cannot see his insertion, and his transaction will be aborted (I think? maybe?). This is not standard SQL, but it has the look and feel of a better way. I wish there were more information available on MVCC, and how to use it effectively. PostgreSQL General Digest (mime) - Volume 1 : Issue 97 Today's Topics: RE: Revisited: Transactions, insert unique. ["Hiroshi Inoue" <Inoue@tpf.co.jp>] Subject: RE: Revisited: Transactions, insert unique. Date: Tue, 25 Apr 2000 09:18:52 +0900 From: "Hiroshi Inoue" <Inoue@tpf.co.jp> To: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> CC: <pgsql-general@postgresql.org> > -----Original Message----- > From: pgsql-general-owner@hub.org [mailto:pgsql-general-owner@hub.org]On > Behalf Of Ross J. Reedstrom > > And this interpretation will guide the developers in _extending_ > the standard in a consistent way. I know, because the developers that > implemented the constraints for 7.0 used this (and the SQL3 spec) as > guides. How's that? > I don't know what is standard. However as far as I see,few people prefer entire rollback on abort. The problem is that PostgreSQL lacks a per statement rollback functionality and unfortunately it isn't easy to implement. Vadim has already planned the implementation. AFAIK one of the purpose of WAL is to implement savepoint functionality. Savepoint functionality would enable per statement rollback functionality easily. The following is an extract of Vadim's posting about 10 months ago. Re: [HACKERS] RE: [GENERAL] Transaction logging Well, I'm thinking about WAL last two weeks. Hiroshi pointed me problems in my approach to savepoints (when a tuple was marked for update and updated after it) and solution would require new tid field in header and both t_cmin/t_cmax => bigger header. I don't like it and so I switched my mind -:). Regards. Hiroshi Inoue Inoue@tpf.co.jp Subject: Re: Revisited: Transactions, insert unique. Date: Tue, 25 Apr 2000 08:52:49 +0800 From: Lincoln Yeoh <lylyeoh@mecomb.com> To: davidb@vectormath.com, <pgsql-general@postgresql.org> Hi David, That can work if there's only one session using the database. But it's not 100% if there are multiple users. There's a small chance that a row may not exist during the select, but exist by the time of the insert. If I'm wrong please correct me - then I'll go optimize some code :). By having the unorthodox locking mechanism suggested I can ensure at the application level that no one else is going to insert stuff before my select, update/insert, without having to lock the whole table. So it will be LOCK arbitrary select if exist update else insert UNLOCK arbitrary Which would be faster- doing the lock arbitrary method, or doing an insert with unique indexes and recovering if necessary (assuming postgresql does what other databases do)? I suspect unique indexes could slow inserts and updates down significantly. If we don't want to do all that, how about we have a select for insert (and update), which locks things? But I figured that it would be problematic to implement in a number of scenarios tho. Cheerio, Link. At 09:56 AM 24-04-2000 -0500, davidb@vectormath.com wrote: >Hi Lincoln, > >I'm not sure I'm understanding your question, but it seems like this is >something that >ought to be handled programmatically. That is, query the table to see if >the row exists, >then decide what you are going to do (insert or update) based on the results >of your >query. > >Am I completely missing the point? > >David Boerwinkle
Thanks Stephan, dropdb worked just fine. Also, I was not at all offended by your previous suggestion. I'm still pretty new to Linux. This issue surfaces an opportunity for me to help the cause. As I go through the process of setting up PostgreSQL 7.0 and developing a new database in it, I could provide some QA on the documentation. Two items have already surfaced. The first, of course, is that destroydb is now dropdb in 7.0. The second is in the INSTALL document. In the section that describes how to make PostgreSQL start at boot-time there is the following paragraph: In RedHat Linux add a file /etc/rc.d/init.d/postgres.init which is based on the example in contrib/linux/. Then make a softlink to this file from /etc/rc.d/rc5.d/S98postgres.init. There's a much simpler way to accomplish this. Simply execute the following: contrib/linux/postgres.init.sh install Who would I pass these suggestions on to? -=michael=- -----Original Message----- From: Stephan Richter [mailto:srichter@cbu.edu] Sent: Monday, April 24, 2000 9:53 PM To: Michael S. Kelly Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Can't find destroydb command in 7.0 At 09:16 PM 4/24/00 -0700, Michael S. Kelly wrote: >Maybe it's late, maybe it'll all be clear to me in the morning, but I seem >to have lost the destroydb command after installing PostgreSQL 7.0 RC1. I just checked. It is called dropdb now, to be consistent with other commands. Regards, Stephan -- Stephan Richter - (901) 573-3308 - srichter@cbu.edu CBU - Physics & Chemistry; Framework Web - Web Design & Development PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391
At 10:58 AM 24-04-2000 -0500, Ed Loehr wrote: >Good point. And you can combine the check David suggests with the insert >statement, e.g., > > INSERT INTO mytable (id, ...) > SELECT 7, ... > FROM mytable > WHERE NOT EXISTS (SELECT * FROM mytable WHERE id = 7) > >And then check the return result for number of rows inserted. '0' means an >update is needed. I don't remember if there is cleaner more efficient >manner for doing that, but probably so... At first it looked like it might work, but it doesn't ensure uniqueness when I checked. And after that I realised the situation was even more difficult. e.g. create table test (a int, b text); Transaction A begin; insert into test (a,b) select 4,'four' from test where not exists (select * from test where a=4); Transaction B begin; insert into test (a,b) select 4,'four' from test where not exists (select * from test where a=4); Then you do a commit on both, and you end up with two rows. The reason this happens is because inserted stuff in one transaction is not visible to other transactions until a commit. So even David's suggestion won't work, nor will a "select for update", since with the current implementation there is nothing to lock on till the commit. This is correct behaviour, but that means we have to try something else to ensure unique rows. If we resort to using UNIQUE then we'll have to redo whole transactions till the error point. If we use the proposed arbitrary lock, it's a lot more work for the developers plus it shifts so much more responsibility to the application compared to stuff like select for update. I kind of like the arbitrary lock thingy, but I'm not in a "large" environment. For large environments maybe there should be a select for insert? Or the behaviour for select for update should change? That is to mean that the statement blocks if there's any other prior select for insert with a potentially interfering WHERE clause (ouch!), even if there are NO relevant ROWS to return (ouch!), . It does not even begin to fetch rows till after the other transactions are completed. This doesn't look easy to do. But it's one of the ways to maintain the Postgresql "no error transaction" behaviour. To maintain this behaviour there need to be a few more ways to check for stuff (and lock the state) before attempting to change or create stuff. Stuff like create table and so on. Aside: would an arbitrary lock mechanism be better for performance than using a UNIQUE on a table? In most cases you're inserting to an indexed table, so having a UNIQUE isn't going to cost much more right? Cheerio, Link.
Today, in a message to pgsql-general, Ross J. Reedstrom wrote: > > On Mon, Apr 24, 2000 at 09:23:48PM -0700, Joachim Achtzehnter wrote: > > > > I suspect, though, that to support statement-level aborts > > more is needed than an overhaul of the error system. Statement > > level aborts are essentially a form of nested transaction,... > > Hmm, sounds like your moving the goal posts: now you want checkpoints, > no? I didn't say that. Was merely pointing out that supporting statement level aborts may be more difficult than it seems. This is not just an error reporting issue. What it amounts to is a special case of nested transactions, i.e. where every statement is an implicit nested transaction. Keep in mind that a statement can be very complex, and it can update, directly or indirectly via triggers, many rows in many tables. To precisely undo the effect of such a statement may not be trivial, in general. [quote from Gray and Reuter omitted] > Doesn't this quote mearly bemoan the fact that nesting of transactions > is _not_ a general requirement of "even today's SQL systems"? I can't > follow the example, The quote essentially said this: Most existing SQL systems don't support user-controlled nested transactions. The authors find this surprising because these same systems DO support a special form of nested transactions, namely statement-level abort. What they are complaining about is the fact that vendors choose ad hoc approaches to implement the special case, instead of implementing the general case. The general mechanism could then be used to achieve statement-level aborts as well as supporting user-controlled nested transactions. Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
In a message to Ed Loehr and pgsql-general, Lincoln Yeoh wrote: > > Transaction A > begin; > insert into test (a,b) select 4,'four' from test > where not exists (select * from test where a=4); > > Transaction B > begin; > insert into test (a,b) select 4,'four' from test > where not exists (select * from test where a=4); > > Then you do a commit on both, and you end up with two rows. This is dissapointing indeed! What this means is that Postgresql transactions are, in fact, not truely serializable. The concurrent execution of these two transactions is not equivalent to some serial execution. Have to read up on the documentation, which I thought was claiming serializability. Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
> -----Original Message----- > From: pgsql-general-owner@hub.org [mailto:pgsql-general-owner@hub.org]On > Behalf Of Joachim Achtzehnter > > In a message to Ed Loehr and pgsql-general, Lincoln Yeoh wrote: > > > > Transaction A > > begin; > > insert into test (a,b) select 4,'four' from test > > where not exists (select * from test where a=4); > > > > Transaction B > > begin; > > insert into test (a,b) select 4,'four' from test > > where not exists (select * from test where a=4); > > > > Then you do a commit on both, and you end up with two rows. > > This is dissapointing indeed! What this means is that Postgresql > transactions are, in fact, not truely serializable. The concurrent > execution of these two transactions is not equivalent to some serial > execution. > Strictly speaking SERIALIZABLE isolation level of PosgreSQL isn't SERIALIZABLE. It's same as Oracle. AFAIK,there's no way to block (logically) non-existent row(key) except unique constraint. Regards. Hiroshi Inoue Inoue@tpf.co.jp P.S. Note that the default isolation level of PostgreSQL is READ COMMITTED.
In a message to pgsql-general, Hiroshi Inoue wrote: > > Strictly speaking SERIALIZABLE isolation level of PosgreSQL isn't > SERIALIZABLE. Thanks for clarifying this. > It's same as Oracle. So, even the latest versions of Oracle still have this problem? > AFAIK,there's no way to block (logically) non-existent row(key) except > unique constraint. A couple of comments: There is, of course, a way to achieve this with a concurrancy mechanism that is based on locking, rather than a multi-version system. Locking systems in serializable mode would use shared locks for reads and hold the locks until the end of the transaction. The trick with the non-existent rows is that the locks must be placed on the access path rather than just individual rows. For example, if the select query is executed using a hash index, it would place a shared lock on the hash bucket where the non-existing row would have been. If the second transaction does its select before the first one upgrades its lock to exclusive mode the two transactions will deadlock, otherwise the second transaction's select blocks. Either way, the problem is avoided. Clearly, if a table has no index the lock would have to be placed on the table because the access method is a full table scan. The 1,000,000 dollar question is whether there is some way to achieve the same with a multi version concurrancy system without giving up its advantages? My second comment is about the remark "except unique constraint": It is true that a unique constraint would have stopped the incorrect second insert in the example under discussion. Furthermore, a unique constraint or primary key is definitely the way to go in such a case because performance would be really bad with large tables. However, a unique constraint does not prevent all serialization violations caused by non-existent row effects. > P.S. Note that the default isolation level of PostgreSQL is READ > COMMITTED. I see. This is another important point to keep in mind. Thanks for pointing this out (another minor SQL92 non-compliance :-) Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
> -----Original Message----- > From: pgsql-general-owner@hub.org [mailto:pgsql-general-owner@hub.org]On > Behalf Of Joachim Achtzehnter > > In a message to pgsql-general, Hiroshi Inoue wrote: > > > > Strictly speaking SERIALIZABLE isolation level of PosgreSQL isn't > > SERIALIZABLE. > > Thanks for clarifying this. > > > It's same as Oracle. > > So, even the latest versions of Oracle still have this problem? > I don't know recent versions of Oracle. However truly SERIALIZABLE isolation level seems very hard to implement if read-only queries acquire no lock. > > AFAIK,there's no way to block (logically) non-existent row(key) except > > unique constraint. > Sorry,my above comment lacks the words *in current PostgreSQL*. > A couple of comments: > > There is, of course, a way to achieve this with a concurrancy mechanism > that is based on locking, rather than a multi-version system. Locking > systems in serializable mode would use shared locks for reads and hold the > locks until the end of the transaction. The trick with the non-existent > rows is that the locks must be placed on the access path rather than just > individual rows. > > For example, if the select query is executed using a hash index, it would > place a shared lock on the hash bucket where the non-existing row would > have been. If the second transaction does its select before the first one > upgrades its lock to exclusive mode the two transactions will deadlock, > otherwise the second transaction's select blocks. Either way, the problem > is avoided. > > Clearly, if a table has no index the lock would have to be placed on the > table because the access method is a full table scan. > > The 1,000,000 dollar question is whether there is some way to achieve the > same with a multi version concurrancy system without giving up its > advantages? > There might be some discussion about dirty read though I don't remember well. Unique constraint stuff has to read dirty data to check uniqueness. I don't know other code using *dirty read* in current PostgreSQL. It's what I meant by "except unique constraint". > My second comment is about the remark "except unique constraint": It is > true that a unique constraint would have stopped the incorrect second > insert in the example under discussion. Furthermore, a unique constraint > or primary key is definitely the way to go in such a case because > performance would be really bad with large tables. > However, a unique > constraint does not prevent all serialization violations caused by > non-existent row effects. > You are right. So Oracle 8.1.5's manual seems to refer FOR UPDATE, Referential Integrity etc. as to serializability. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Joachim Achtzehnter wrote: > > In a message to Ed Loehr and pgsql-general, Lincoln Yeoh wrote: > > > > Transaction A > > begin; > > insert into test (a,b) select 4,'four' from test > > where not exists (select * from test where a=4); > > > > Transaction B > > begin; > > insert into test (a,b) select 4,'four' from test > > where not exists (select * from test where a=4); > > > > Then you do a commit on both, and you end up with two rows. > > This is dissapointing indeed! What this means is that Postgresql > transactions are, in fact, not truely serializable. The concurrent > execution of these two transactions is not equivalent to some serial > execution. > > Have to read up on the documentation, which I thought was claiming > serializability. Hmmm. This surprised me, too. The serialization doesn't work as I expected it to. Maybe someone can illuminate. Here's what confused me... The documentation says: Serializable provides the highest transaction isolation. When a transaction is on the serializable level, a query sees only data committed before the transaction began and never see either dirty data or concurrent transaction changes committed during transaction execution. So, this level emulates serial transaction execution, as if transactions would be executed one after another, serially, rather than concurrently. - http://www.postgresql.org/docs/postgres/mvcc4508.htm (4/25/2000) But the following sequence seems to contradict this: -- Transaction A DROP TABLE foo; CREATE TABLE foo (id INTEGER); SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN; -- Transaction B SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN; INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); COMMIT; SELECT * FROM foo; -- Transaction A SELECT * FROM foo; INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); SELECT * FROM foo; COMMIT; This sequence allows B's inserts, AFTER A began, to be seen by A (at least, in pgsql 7.0beta3). Anyone understand what's going on here? Bug? Regards, Ed Loehr
Doesn't SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; have to come within transaction - ie BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; In this order your test seems to behave correctly - only 1 row inserted. -- Rex McMaster rex@mcmaster.wattle.id.au http://www.compsoft.com.au/~rmcm/pgp-pk Ed Loehr writes: > > But the following sequence seems to contradict this: > > -- Transaction A > DROP TABLE foo; > CREATE TABLE foo (id INTEGER); > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > BEGIN; > -- Transaction B > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > BEGIN; > INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id > = 1); > COMMIT; > SELECT * FROM foo; > -- Transaction A > SELECT * FROM foo; > INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id > = 1); > SELECT * FROM foo; > COMMIT; > > This sequence allows B's inserts, AFTER A began, to be seen by A (at least, > in pgsql 7.0beta3). > > Anyone understand what's going on here? Bug? > > Regards, > Ed Loehr
rmcm@compsoft.com.au wrote: > > Doesn't > > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > > have to come within transaction - ie > > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > > In this order your test seems to behave correctly - only 1 row inserted. Yes, my initial ordering was in error. But even after reordering, the point is not that only 1 row was inserted, but rather that Transaction A was able to see the effects of transaction B when it clearly should not. Regards, Ed Loehr
Sorry, mistake in my previous email - > - only 1 row inserted this was before the second commit. After both commits, 2 rows are visible. Neither transactions can see effects of the other till both are commited. A: CREATE TABLE foo (id INTEGER); ===> CREATE BEGIN; ===> BEGIN SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; ===> SET VARIABLE INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); ===> INSERT 959179 1 B: BEGIN; ===> BEGIN SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; ===> SET VARIABLE SELECT * FROM foo; ===> 0 rows A: SELECT * FROM foo; ===> 1 rows <== the row inserted in A: B: INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); ===> INSERT 959155 1 SELECT * FROM foo; ===> 1 rows <== the row inserted in B: A: SELECT * FROM foo; ===> 1 rows B: COMMIT; ===> END SELECT * FROM foo; ===> 1 rows A: SELECT * FROM foo; ===> 1 rows COMMIT; ===> END SELECT * FROM foo; ===> 2 rows B: SELECT * FROM foo; ===> 2 rows Ed Loehr writes: > rmcm@compsoft.com.au wrote: > > > > Doesn't > > > > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > > > > have to come within transaction - ie > > > > BEGIN; > > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > > > > In this order your test seems to behave correctly - only 1 row inserted. > > Yes, my initial ordering was in error. But even after reordering, the > point is not that only 1 row was inserted, but rather that Transaction A > was able to see the effects of transaction B when it clearly should not. > > Regards, > Ed Loehr -- Rex McMaster rex@mcmaster.wattle.id.au http://www.compsoft.com.au/~rmcm/pgp-pk
Rex McMaster wrote in an email addressed to Ed Loehr and pgsql-general: > > After both commits, 2 rows are visible. Neither transactions can see > effects of the other till both are commited. > > A: > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); > B: > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); > COMMIT; > A: > COMMIT; While it is true that neither transaction sees the effect of the other, the sequence of execution in postgresql depicted above is not serializable. There is no serial execution of the two transactions, neither A followed by B, nor B followed by A, that would produce the same results as the interleaved execution of the two transactions. The fact that NEITHER transaction sees ANY effect of the other is precisely the problem! One SHOULD see the effects of the other. The outcome should be equivalent to one occuring entirely before the other. With a concurrancy control implementation based on locks one transaction would see the effect of the other by being blocked at a suitable point, or by being terminated with a deadlock exception. Joachim -- joachim@kraut.bc.ca (http://www.kraut.bc.ca) joachim@mercury.bc.ca (http://www.mercury.bc.ca)
> -----Original Message----- > From: pgsql-general-owner@hub.org [mailto:pgsql-general-owner@hub.org]On > Behalf Of Ed Loehr > > > > > Have to read up on the documentation, which I thought was claiming > > serializability. > > Hmmm. This surprised me, too. The serialization doesn't work as I > expected it to. Maybe someone can illuminate. Here's what confused me... > > The documentation says: > > Serializable provides the highest transaction isolation. > When a transaction is on the serializable level, a query > sees only data committed before the transaction began and > never see either dirty data or concurrent transaction > changes committed during transaction execution. So, this > level emulates serial transaction execution, as if > transactions would be executed one after another, serially, > rather than concurrently. > - http://www.postgresql.org/docs/postgres/mvcc4508.htm (4/25/2000) > > But the following sequence seems to contradict this: > > -- Transaction A > DROP TABLE foo; > CREATE TABLE foo (id INTEGER); > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > BEGIN; > -- Transaction B > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > BEGIN; > INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id > = 1); > COMMIT; > SELECT * FROM foo; > -- Transaction A > SELECT * FROM foo; > INSERT INTO foo (id) SELECT 1 WHERE NOT EXISTS (SELECT * FROM foo WHERE id > = 1); > SELECT * FROM foo; > COMMIT; > > This sequence allows B's inserts, AFTER A began, to be seen by A > (at least, > in pgsql 7.0beta3). > > Anyone understand what's going on here? Bug? > You should call SET TRANSACTION immediately after BEGIN. Note that SET TRANSACTION .. is per transaction command. PostgreSQL's SERIALIZABLE isolation level would allow both inserts. READ COMMITED isolation level wouldn't allow A's inserts. As I mentioned in another posting,PostgreSQL's SERIALIZABLE isolation level isn't completely serializable and it's same as Oracle. Probably Vadim refers to this incomplete serializability somewhere in documentation. It seems almost impossible to implement complete serializability under MVCC without using table level locking. I love MVCC much more than theoretically beautiful complete serializability. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Hiroshi Inoue wrote: > > You should call SET TRANSACTION immediately after BEGIN. > Note that SET TRANSACTION .. is per transaction command. > > PostgreSQL's SERIALIZABLE isolation level would allow both inserts. > READ COMMITED isolation level wouldn't allow A's inserts. Even if I call SET after BEGIN, it is not consistent with docs or the standard (see pghackers discussion under same subject), as the two scripts below seem to demonstrate. [Rex: There is a subtle difference between your sequence and mine. Insert from B first, and don't do any select from A before you attempt the insert, as below.] > As I mentioned in another posting,PostgreSQL's SERIALIZABLE > isolation level isn't completely serializable and it's same as Oracle. > Probably Vadim refers to this incomplete serializability somewhere > in documentation. > It seems almost impossible to implement complete serializability > under MVCC without using table level locking. I love MVCC much > more than theoretically beautiful complete serializability. Interesting, thanks. I certainly don't have any big gripes about PG concurrency, as it has worked flawlessly for me so far with READ COMMITTED level. All this has been strictly in the interest of clarifying a discrepancy between the docs/std and 7.0. I could see it mattering to some, but not for my app. Regards, Ed Loehr === This sequence, AFAICT, appears to fail the SERIALIZABLE standard, allowing A to see effects from B that permute the serializability... --------------------- START ----------------------------- -- Within transaction A -------------------------- DROP TABLE foo; CREATE TABLE foo (id INTEGER, msg VARCHAR); BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Within transaction B -------------------------- BEGIN; --SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; INSERT INTO foo (id, msg) SELECT 1, 'From B' WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); COMMIT; SELECT * FROM foo; -- Within transaction A -------------------------- SELECT * FROM foo; INSERT INTO foo (id, msg) SELECT 1, 'From A' WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); SELECT * FROM foo; COMMIT; --------------------- END ----------------------------- ...while this sequence, which I would've thought to be functionally identical, is compliant... --------------------- START ----------------------------- -- Within transaction A ---------------------------- DROP TABLE foo; CREATE TABLE foo (id INTEGER, msg VARCHAR); BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Within transaction B ---------------------------- BEGIN; INSERT INTO foo (id, msg) SELECT 1, 'From B' WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); -- Within transaction A ---------------------------- SELECT * FROM foo; -- The magic query. -- Within transaction B ---------------------------- COMMIT; SELECT * FROM foo; -- Within transaction A ---------------------------- SELECT * FROM foo; INSERT INTO foo (id, msg) SELECT 1, 'From A' WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); SELECT * FROM foo; COMMIT;
At 11:59 AM 26-04-2000 -0500, Ed Loehr wrote: >Joachim Achtzehnter wrote: >> >> In a message to Ed Loehr and pgsql-general, Lincoln Yeoh wrote: >> > Then you do a commit on both, and you end up with two rows. >> >> This is dissapointing indeed! What this means is that Postgresql >> transactions are, in fact, not truely serializable. The concurrent Well, it's not that disappointing as it's not easily avoidable given that transactions usually consist of multiple consecutive nonzerotime order sensitive statements. :) So the database has to figure out how much information to share amongst partially completed transactions in order to achieve maximum serialisation/atomicity. There will also be transactions which to be serialised must actually occur serially. That is a whole transaction has to go through before the other one (or every other one), or at least a group of statements within the transaction. Stuff like select for update gives the database engine some hints, but in the case I mentioned it is not enough (unless modified). Basically the database needs to get a need-to-know summary of the transaction in a single atomic statement, then it can better decide what to block. The application is in the best position to give this summary. So in the end for maximum serialisation it may be a good idea to have an arbitrary lock mechanism to cover the troublesome cases. e.g. getlock('arbitrary string'[,timeout]); getlock('all temporary product tables, created or about to be') getlock('all rows potential or real on all tables where key1=X or key2=Y') If another transaction tries to get the same lock, it will wait for timeout then abort with an error. (possible timeout values, -1= never, 0 or greater=time to wait in seconds). Ok so I'm biased to how MySQL does it (it's simple and has a good chance of working well). Yes it shifts a lot to the application. But if people have to do things like do their multiple select for updates in the right order (to prevent deadlocks), they might as well start using something like this instead (or fix their architecture if possible ;) ). And it's likely to be faster! Anyone else here like this arbitrary lock thingy? I'm very interested to know of other ways to achieve good serialisation, especially database centric methods. Cheerio, Link. p.s. Would anyone actually need timeouts of a day (86400) or greater?
> -----Original Message----- > From: ed [mailto:ed]On Behalf Of Ed Loehr > > Hiroshi Inoue wrote: > > > > You should call SET TRANSACTION immediately after BEGIN. > > Note that SET TRANSACTION .. is per transaction command. > > > > PostgreSQL's SERIALIZABLE isolation level would allow both inserts. > > READ COMMITED isolation level wouldn't allow A's inserts. > > Even if I call SET after BEGIN, it is not consistent with docs or the > standard (see pghackers discussion under same subject), as the two scripts > below seem to demonstrate. > Hmm,the magic query ??? Snapshot may be created when the first query is issued though I don't remember well. > > This sequence, AFAICT, appears to fail the SERIALIZABLE standard, allowing > A to see effects from B that permute the serializability... > AFAIK,serializability doesn't imply permutability. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Ed, I think the lack of SELECT may be the critical item here, with respect to reconciling pg's behaviour with (some of) it's docs. The documentation for SET indicates that the snapshot is taken from the first DML statement (I cannot locate a similar reference in the MVCC dos). This appears to be consistent with your observation below. Rex Ed Loehr writes: > Even if I call SET after BEGIN, it is not consistent with docs or the > standard (see pghackers discussion under same subject), as the two scripts > below seem to demonstrate. > > [Rex: There is a subtle difference between your sequence and mine. Insert > from B first, and don't do any select from A before you attempt the insert, > as below.] > -- Rex McMaster rex@mcmaster.wattle.id.au http://www.compsoft.com.au/~rmcm/pgp-pk
In a message to and Hiroshi Inoue pgsql-general, Ed Loehr wrote: > > -- Within transaction A -------------------------- > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > > -- Within transaction B -------------------------- > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > INSERT INTO foo (id, msg) > SELECT 1, 'From B' > WHERE NOT EXISTS (SELECT * FROM foo WHERE id = 1); > COMMIT; > > -- Within transaction A -------------------------- > SELECT * FROM foo; > ... In this case, it is actually OK for A to see the committed results of B because the overall outcome is then equivalent to B occuring entirely before A. In general, it is important to remember what SERIALIZABLE means: A particular concurrent execution of several transactions must have an observable outcome that is equivalent to running the same transactions one after the other (serialized). It is NOT required that the outcome be equivalent to the result that would be observed by running the transactions in a particular order, such as in the order they were actually started. The outcome is only required to be equivalent to some (arbitrary) order. A concurrancy mechanism supports the SERIALIZABLE isolation level if it guarantees that every concurrent execution of transactions is serializable. Joachim -- private: joachim@kraut.bc.ca (http://www.kraut.bc.ca) work: joachim@mercury.bc.ca (http://www.mercury.bc.ca)
On Thu, 27 Apr 2000, Hiroshi Inoue wrote: > PostgreSQL's SERIALIZABLE isolation level would allow both inserts. > READ COMMITED isolation level wouldn't allow A's inserts. > > As I mentioned in another posting,PostgreSQL's SERIALIZABLE > isolation level isn't completely serializable and it's same as Oracle. > Probably Vadim refers to this incomplete serializability somewhere > in documentation. > It seems almost impossible to implement complete serializability > under MVCC without using table level locking. I love MVCC much > more than theoretically beautiful complete serializability. Given that Postgres correctly recognizes concurrent updates and aborts one of the transactions, how difficult would it be to do the same for inserts? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> -----Original Message----- > From: Peter Eisentraut [mailto:e99re41@DoCS.UU.SE] > Sent: Thursday, April 27, 2000 4:46 PM > To: Hiroshi Inoue > > On Thu, 27 Apr 2000, Hiroshi Inoue wrote: > > > PostgreSQL's SERIALIZABLE isolation level would allow both inserts. > > READ COMMITED isolation level wouldn't allow A's inserts. > > > > As I mentioned in another posting,PostgreSQL's SERIALIZABLE > > isolation level isn't completely serializable and it's same as Oracle. > > Probably Vadim refers to this incomplete serializability somewhere > > in documentation. > > It seems almost impossible to implement complete serializability > > under MVCC without using table level locking. I love MVCC much > > more than theoretically beautiful complete serializability. > > Given that Postgres correctly recognizes concurrent updates and aborts one > of the transactions, Is what you mean the following ? When a transaction is about to update a row which has been updated by other transactions under SERIALIZABLE isolation level,update is rejected with message ERROR: Can't serialize access due to concurrent update. > how difficult would it be to do the same for inserts? Should INSERT/UPDATE/SELECT .. FOR UPDATE statements wait until being inserted rows to be commit/aborted ? This means INSERT operations block all update operations for the same table. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> Ok so I'm biased to how MySQL does it (it's simple and has a good chance of > working well). Yes it shifts a lot to the application. But if people have > to do things like do their multiple select for updates in the right order > (to prevent deadlocks), they might as well start using something like this > instead (or fix their architecture if possible ;) ). > > And it's likely to be faster! Anyone else here like this arbitrary lock > thingy? > > I'm very interested to know of other ways to achieve good serialisation, > especially database centric methods. > > Cheerio, > > Link. > > p.s. Would anyone actually need timeouts of a day (86400) or greater? Are you asking for sub-second timeout values? If so, we could add that. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
rmcm@compsoft.com.au wrote: > > ...the snapshot is taken from the first DML statement... That explains it. I missed that in the docs, and was mislead by the SERIALIZABLE doc sections. Regards, Ed Loehr
Joachim Achtzehnter wrote: > > ...It is NOT required that the outcome be > equivalent to the result that would be observed by running the > transactions in a particular order, such as in the order they were > actually started. The outcome is only required to be equivalent to some > (arbitrary) order. That was another fundamental misunderstanding of mine about SERIALIZABLE. I appreciate the explanation, and have a deeper appreciation for Vadim's MVCC work!! Regards, Ed Loehr
On Thu, 27 Apr 2000, Bruce Momjian wrote: > Are you asking for sub-second timeout values? If so, we could add that. I already have a millisecond resolution for deadlock timeouts implemented. Is that okay? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 07:35 AM 27-04-2000 -0400, Bruce Momjian wrote: >> p.s. Would anyone actually need timeouts of a day (86400) or greater? > >Are you asking for sub-second timeout values? If so, we could add that. No. Was wondering about the other end - timeouts of a day. That would affect how many bytes to used timeout values. Cheerio, Link.
[Charset iso-8859-1 unsupported, filtering to ASCII...] > On Thu, 27 Apr 2000, Bruce Momjian wrote: > > > Are you asking for sub-second timeout values? If so, we could add that. > > I already have a millisecond resolution for deadlock timeouts implemented. > Is that okay? Sure, that should work. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026