Thread: Replaceing records
Hi, MySQL has a nice feature that allows to call a replace rather insert which will attempt to insert if record not present and replace if it does. Is there any similar feature ? Currently I run a select prior to any insert and then update or insert depending on the result of the select. The problem here is that I just doubled the queries. Doing so on a table with 5Mio rows and on 100k inserts will take time and I would like to have an efficient way of doing it. Any ideas ? Thanks Alex
On Thu, 2003-09-04 at 03:00, Alex wrote: > Hi, > MySQL has a nice feature that allows to call a replace rather insert > which will attempt to insert if record not present and replace if it does. > > Is there any similar feature ? > > Currently I run a select prior to any insert and then update or insert > depending on the result of the select. The problem here is that I just > doubled the queries. Doing so on a table with 5Mio rows and on 100k > inserts will take time and I would like to have an efficient way of > doing it. What if you try do the INSERT, and if it returns with a "key exists" error, do the UPDATE? Will the SELECT really slow things down that much, since the record will be in buffers after you touch it the 1st time? -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "All machines, no matter how complex, are considered to be based on 6 simple elements: the lever, the pulley, the wheel and axle, the screw, the wedge and the inclined plane." Marilyn Vos Savant
[philosophical post regarding a missing feature of Postgres] Hi all, This is exactly the problem I'm facing right now, and I found there's no good solution to this in postgres. Basically I have a complex database operation, which spans a transaction across multiple simple DB operations which can be also executed atomically. The separate pieces must work also separately. Now one of the pieces is updating a table with data if the row with the given key exists, and inserts if not. There is a unique constraint on the key. I found there's no way to avoid failed inserts because of unique constraint violations, causing automatic roll-back of the running transaction. Now contention on insert has a quite high probability for this operation in our application. It's unacceptable to roll back and retry the whole transaction just because this insert failed, partly because of performance (there's a lot of stuff done before, and there are lots of threads/clustered machines doing inserts at the same time, and constantly retrying would painfully slow down things), partly because it would make our code a lot more complex than it is already. Locking is also a bad option, as this is about inserts, so you don't have anything useful to lock, unless locking the whole table. Finally I'm using this solution, because performance-wise is about the same as retrying the transaction (in this particular case at least), but I'm completely unhappy about this. This problem would be easily solved if the current transaction would not be automatically rolled back on the failed insert. Given this, it would be as easy as trying the insert, and if fails, do the update. I know that this feature is not an easy one, but I would like to point out that it's really useful and it's one of the barriers for porting complex applications to postgres, given that other databases have it readily available. Cheers, Csaba. On Thu, 2003-09-04 at 11:24, Ron Johnson wrote: > On Thu, 2003-09-04 at 03:00, Alex wrote: > > Hi, > > MySQL has a nice feature that allows to call a replace rather insert > > which will attempt to insert if record not present and replace if it does. > > > > Is there any similar feature ? > > > > Currently I run a select prior to any insert and then update or insert > > depending on the result of the select. The problem here is that I just > > doubled the queries. Doing so on a table with 5Mio rows and on 100k > > inserts will take time and I would like to have an efficient way of > > doing it. > > What if you try do the INSERT, and if it returns with a "key exists" > error, do the UPDATE? > > Will the SELECT really slow things down that much, since the record > will be in buffers after you touch it the 1st time? > > -- > ----------------------------------------------------------------- > Ron Johnson, Jr. ron.l.johnson@cox.net > Jefferson, LA USA > > "All machines, no matter how complex, are considered to be based > on 6 simple elements: the lever, the pulley, the wheel and axle, > the screw, the wedge and the inclined plane." > Marilyn Vos Savant > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On 4 Sep 2003 at 12:17, Csaba Nagy wrote: > This problem would be easily solved if the current transaction would not > be automatically rolled back on the failed insert. Given this, it would > be as easy as trying the insert, and if fails, do the update. That violates definition of a transaction. You need nested transaction which aren't there.. You can use a sequence to insert. If next value of sequence is more than value you have, probably somebody has inserted the value. Then modify it. Or do a select for update. If it returns the error, there is no record. So insert, otherwise update. It might still fail though but chances will be much less. HTH Bye Shridhar -- Air Force Inertia Axiom: Consistency is always easier to defend than correctness.
Csaba Nagy <nagy@ecircle-ag.com> writes: > This problem would be easily solved if the current transaction would not > be automatically rolled back on the failed insert. Given this, it would > be as easy as trying the insert, and if fails, do the update. Yeah, that would be nested transactions, it's on the TODO list :) Fwiw, even if you took that approach you would still need to handle retrying if the record was deleted between the attempted insert and the attempted update. Unless you know nothing is deleting these records. Is there any possibility of moving this insert outside the transaction into a transaction of its own? If the rest of the transaction commits but the insert/update hasn't been committed yet is the database in an invalid state? If not you could try postponing the insert/update until after the main transaction commits and then performing it in its own transaction. A more complex, also flawed, approach would be to do the insert/update in a separate connection. This would mean it would commit first before the rest of the transaction was committed. Out of curiosity, what does the mysql syntax look like? How would you handle something where the insert and update were quite different like: INSERT INFO foo (pk,value,count,date_ins) values (?,?,1,now()) OR UPDATE foo set value=?, count=count+1, date_upd=now() where pk = ? -- greg
On Thu, 2003-09-04 at 15:52, Greg Stark wrote: > > Csaba Nagy <nagy@ecircle-ag.com> writes: > > > This problem would be easily solved if the current transaction would not > > be automatically rolled back on the failed insert. Given this, it would > > be as easy as trying the insert, and if fails, do the update. > > Yeah, that would be nested transactions, it's on the TODO list :) Very good :) The sooner implemented the better ;) > > Fwiw, even if you took that approach you would still need to handle retrying > if the record was deleted between the attempted insert and the attempted > update. Unless you know nothing is deleting these records. In this case there's a burst of insert/updates and no deletion for sure. In any case it would be acceptable is sometimes the transaction fails, but only if it happens with a very low probability. These records are deleted only after a considerable time after all updating is finished. > Is there any possibility of moving this insert outside the transaction into a > transaction of its own? If the rest of the transaction commits but the > insert/update hasn't been committed yet is the database in an invalid state? > If not you could try postponing the insert/update until after the main > transaction commits and then performing it in its own transaction. > > A more complex, also flawed, approach would be to do the insert/update in a > separate connection. This would mean it would commit first before the rest of > the transaction was committed. Any of these is out of question. Or all should succede, or nothing. The problem here is that "success" from a logical point of view can happen also when some individual queries fail. This is where nested transactions can come handy, or the possibility of by default continuing the in-process transaction instead of failing it (as Oracle does). > Out of curiosity, what does the mysql syntax look like? How would you handle > something where the insert and update were quite different like: > > INSERT INFO foo (pk,value,count,date_ins) values (?,?,1,now()) > OR UPDATE foo set value=?, count=count+1, date_upd=now() where pk = ? No idea, I'm not using mysql. Just the problem was the same. Cheers, Csaba.
In article <87ekywbqz1.fsf@stark.dyndns.tv>, Greg Stark <gsstark@mit.edu> writes: > Out of curiosity, what does the mysql syntax look like? How would you handle > something where the insert and update were quite different like: > INSERT INFO foo (pk,value,count,date_ins) values (?,?,1,now()) > OR UPDATE foo set value=?, count=count+1, date_upd=now() where pk = ? You can't. The only thing MySQL has to offer is REPLACE INTO tbl_name [(col_name,...)] VALUES (expr,...) `REPLACE' works exactly like `INSERT', except that if an old record in the table has the same value as a new record on a `UNIQUE' index or `PRIMARY KEY', the old record is deleted before the new record is inserted.
Harald Fuchs <nospam@sap.com> writes: > You can't. The only thing MySQL has to offer is Actually I found two things related to this: http://www.mysql.com/doc/en/INSERT.html http://www.mysql.com/doc/en/REPLACE.html You can do INSERT INTO tab (...) VALUES (...) ON DUPLICATE KEY UPDATE col=val, ... It seems to be newer than REPLACE. In any case, both seem doable in postgres since in its MVCC every update is a delete+insert anyways. It means doing the delete if necessary and doing the insert unconditionally. But I'm not sure how convinced the developers are of its usefulness beyond satisfying mysql migrations. I've never used mysql and I have seen a few times it might have been useful. Not a lot, but a few. And it seems to be a frequent question on the mailing list. -- greg
> You can do > INSERT INTO tab (...) VALUES (...) ON DUPLICATE KEY UPDATE col=val, ... This would allow making sure insert won't throw exceptions on unique constraint violations. Good enough to avoid breaking transactions. > > But I'm not sure how convinced the developers are of its usefulness beyond > satisfying mysql migrations. I've never used mysql and I have seen a few times > it might have been useful. Not a lot, but a few. And it seems to be a frequent > question on the mailing list. If nested transactions is easier to implement, that would be better. Covers more of the current problems. Cheers, Csaba.
I think this would be very useful. Jon On 4 Sep 2003, Greg Stark wrote: > > Harald Fuchs <nospam@sap.com> writes: > > > You can't. The only thing MySQL has to offer is > > Actually I found two things related to this: > > http://www.mysql.com/doc/en/INSERT.html > > http://www.mysql.com/doc/en/REPLACE.html > > You can do > INSERT INTO tab (...) VALUES (...) ON DUPLICATE KEY UPDATE col=val, ... > > It seems to be newer than REPLACE. > > In any case, both seem doable in postgres since in its MVCC every update is a > delete+insert anyways. It means doing the delete if necessary and doing the > insert unconditionally. > > But I'm not sure how convinced the developers are of its usefulness beyond > satisfying mysql migrations. I've never used mysql and I have seen a few times > it might have been useful. Not a lot, but a few. And it seems to be a frequent > question on the mailing list. > > -- > greg > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote: > [philosophical post regarding a missing feature of Postgres] > > I found there's no way to avoid failed inserts because of > unique constraint violations, causing automatic roll-back of the running > transaction. > > Now contention on insert has a quite high probability for this operation > in our application. Did you ever try this: insert into test (a, b, c, d) (select 1, 2, 3, 4 where not exists (select 1 from test where a=1 and b=2 and c=3 and d=4) ); If your table contains a=1, b=2, c=3, and d=4, nothing will happen, and there will be no failed transaction. If your table does not contain a=1, b=2, c=3, and d=4, you'll get an insert of a row containing 1, 2, 3, 4.
On Thu, 4 Sep 2003, Richard Ellis wrote: > On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote: > > [philosophical post regarding a missing feature of Postgres] > > > > I found there's no way to avoid failed inserts because of > > unique constraint violations, causing automatic roll-back of the running > > transaction. > > > > Now contention on insert has a quite high probability for this operation > > in our application. > > Did you ever try this: > > insert into test (a, b, c, d) > (select 1, 2, 3, 4 where not exists > (select 1 from test where a=1 and b=2 and c=3 and d=4) > ); > > If your table contains a=1, b=2, c=3, and d=4, nothing will happen, and > there will be no failed transaction. If your table does not contain a=1, > b=2, c=3, and d=4, you'll get an insert of a row containing 1, 2, 3, 4. Unfortunately that doesn't work if two transactions want to insert a row containing 1,2,3,4 that are running concurrently.
However, that doesn't cover the case where you want to update the record if it already exists. Jon > insert into test (a, b, c, d) > (select 1, 2, 3, 4 where not exists > (select 1 from test where a=1 and b=2 and c=3 and d=4) > ); > > If your table contains a=1, b=2, c=3, and d=4, nothing will happen, and > there will be no failed transaction. If your table does not contain a=1, > b=2, c=3, and d=4, you'll get an insert of a row containing 1, 2, 3, 4. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Thu, Sep 04, 2003 at 12:29:17PM -0700, Stephan Szabo wrote: > > On Thu, 4 Sep 2003, Richard Ellis wrote: > > > On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote: > > > [philosophical post regarding a missing feature of Postgres] > > > > > > I found there's no way to avoid failed inserts because of > > > unique constraint violations, causing automatic roll-back of > > > the running transaction. > > > > > > Now contention on insert has a quite high probability for this > > > operation in our application. > > > > Did you ever try this: > > > > insert into test (a, b, c, d) > > (select 1, 2, 3, 4 where not exists > > (select 1 from test where a=1 and b=2 and c=3 and d=4) > > ); > > > > If your table contains a=1, b=2, c=3, and d=4, nothing will > > happen, and there will be no failed transaction. If your table > > does not contain a=1, b=2, c=3, and d=4, you'll get an insert of > > a row containing 1, 2, 3, 4. > > Unfortunately that doesn't work if two transactions want to insert > a row containing 1,2,3,4 that are running concurrently. True, if the row does not already exist. But in that situation, because of the unique constraint premise in the original quote, there is always going to be at least one failed transaction. So the battle is already lost before it's even begun. If, however, the same row already exists in the table, then both of these inserts will silently do nothing, and both transactions will continue without aborting.
Whatever you guy's try or suggest, it's doomed to suffer. The whole problem stems from using a non-standard feature. And in my opinion MySQL's "REPLACE INTO" is less a feature or extension to the standard than more another stupid and lesser thought through addition of apparently speed gaining crap at the cost of proper design. One possible reason why this sort of "feature" was left out of the SQL standard could be that the source of an ID, that is supposed to be unique in the end, should by default ensure it's uniqueness. Defining a column UNIQUE is a last line of defense, and aborted actions because of constraint violation should be the exception, not the normal mode of operation. If it's the DB to ensure uniqueness, it has to generate the ID and one can use a sequence. If it's the application to generate it, the application should know if this is an INSERT or an UPDATE. Wherever one is using this "REPLACE INTO" language violation, the client application or even something in front of it is generating ID's but it's not sure if it is sending down a new or existing one. The real question is "why is this piece of garbage unable to tell the ID is newly created or has to exist already?" I don't think there should be a way to subsitute this. Fix the application design instead. Jan Richard Ellis wrote: > On Thu, Sep 04, 2003 at 12:29:17PM -0700, Stephan Szabo wrote: >> >> On Thu, 4 Sep 2003, Richard Ellis wrote: >> >> > On Thu, Sep 04, 2003 at 12:17:35PM +0200, Csaba Nagy wrote: >> > > [philosophical post regarding a missing feature of Postgres] >> > > >> > > I found there's no way to avoid failed inserts because of >> > > unique constraint violations, causing automatic roll-back of >> > > the running transaction. >> > > >> > > Now contention on insert has a quite high probability for this >> > > operation in our application. >> > >> > Did you ever try this: >> > >> > insert into test (a, b, c, d) >> > (select 1, 2, 3, 4 where not exists >> > (select 1 from test where a=1 and b=2 and c=3 and d=4) >> > ); >> > >> > If your table contains a=1, b=2, c=3, and d=4, nothing will >> > happen, and there will be no failed transaction. If your table >> > does not contain a=1, b=2, c=3, and d=4, you'll get an insert of >> > a row containing 1, 2, 3, 4. >> >> Unfortunately that doesn't work if two transactions want to insert >> a row containing 1,2,3,4 that are running concurrently. > > True, if the row does not already exist. But in that situation, > because of the unique constraint premise in the original quote, there > is always going to be at least one failed transaction. So the battle > is already lost before it's even begun. > > If, however, the same row already exists in the table, then both of these > inserts will silently do nothing, and both transactions will continue > without aborting. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Richard Ellis <rellis9@yahoo.com> writes: > True, if the row does not already exist. But in that situation, > because of the unique constraint premise in the original quote, there > is always going to be at least one failed transaction. So the battle > is already lost before it's even begun. Well, no, that was the point. Ideally he wants to try to catch the duplicate without producing an exception because he can't do nested transactions. There's no parent record to this record in another table? You could lock the parent record with SELECT FOR UPDATE, then do a SELECT count(*) on this table, and do the insert or update as appropriate, then release the lock on the parent record. That's not great if you're doing lots of inserts on the same parent record, or if the parent record is being updated frequently, but it's way better than doing a table lock. FWIW: Jan Wieck <JanWieck@Yahoo.com> writes: > Defining a column UNIQUE is a last line of defense, and aborted actions > because of constraint violation should be the exception, not the normal mode > of operation. Well that's one approach. I don't agree. The database is a tool, unique key constraints are a tool, they're good at doing certain things, like ensuring atomic semantics for cases just like this. Why try to reinvent the wheel using inferior tools in the application. You're doomed to fail and introduce race conditions. In fact in this situation I usually prefer to try the insert and handle exceptions over any of the other approaches. It's cleaner, clearer, faster in the normal case, and has the least likelihood of race conditions (none if the table never has deletes). > Wherever one is using this "REPLACE INTO" language violation, the client > application or even something in front of it is generating ID's but it's not > sure if it is sending down a new or existing one. The real question is "why is > this piece of garbage unable to tell the ID is newly created or has to exist > already?" Well, because that's the database's job. If the application tried to do that it would have to solve all the same concurrency and atomicity issues that the database already solves it. That's why I'm using a database in the first place. -- greg
[rant mode] I have to answer this: I'm not trying to use a non-standard feature, I try to solve a problem. Namely to be able to try to insert and on failure continue the transaction. This is by no means a non-standard feature. AFAIKT the standard says nothing about rolling back automatically a transaction on error, it just says that YOU should be able to roll it back or commit it, and then all or nothing of the changes should be executed. The application design can be "fixed", but that means ugly workarounds. In my case a simple fix would be to always insert all the possible records before any update would happen, but that would bloat the table 10-fold - I think you agree this is unacceptable. Please understand me: I'm not after pissing off the postgres developers by telling Postgres is not up to it, I try to insist that nested transactions are a very important feature, which can solve lots of problems which apparently might have nothing to do with nested transactions. Cheers, Csaba. On Fri, 2003-09-05 at 04:38, Jan Wieck wrote: > Whatever you guy's try or suggest, it's doomed to suffer. > > The whole problem stems from using a non-standard feature. And in my > opinion MySQL's "REPLACE INTO" is less a feature or extension to the > standard than more another stupid and lesser thought through addition of > apparently speed gaining crap at the cost of proper design. > > One possible reason why this sort of "feature" was left out of the SQL > standard could be that the source of an ID, that is supposed to be > unique in the end, should by default ensure it's uniqueness. Defining a > column UNIQUE is a last line of defense, and aborted actions because of > constraint violation should be the exception, not the normal mode of > operation. If it's the DB to ensure uniqueness, it has to generate the > ID and one can use a sequence. If it's the application to generate it, > the application should know if this is an INSERT or an UPDATE. > > Wherever one is using this "REPLACE INTO" language violation, the client > application or even something in front of it is generating ID's but it's > not sure if it is sending down a new or existing one. The real question > is "why is this piece of garbage unable to tell the ID is newly created > or has to exist already?" > > I don't think there should be a way to subsitute this. Fix the > application design instead. > > > Jan
Fine says he seeing both sides and wanting to back both sides. REPLACE INTO throws away data with out the user knowing that they have ditched the data. This means it has side effects. Standard Programming Theory says that functions should not have unintended side REPLACE INTO is in-fact a DELETE followed by a INSERT Its also unclear what replace should do with missing fields 1. Fill them in with the defaults. 2. Leave them alone. If 1 its very dangerous and if 2 you should know what you are changing anyway and the very least it should return "UPDATE" or "INSERT" depending on what "REPLACE" actually did! On the other hand Nesting are a good idea. There is one problem however. When do you impose referential integrity for deferred checks. at the last commit. or do you need a check references command. Its not the point that a UPDATE on most databases infers a COPY, CHANGE COPY, DELETE steps so you can roll back if nessessary. Replace also needs to know the table stucture to work Update, Insert and Delete don't they only need to check the constraints. As I'm sure I've said before SQL has huge holes and inconsistencies and needs a complete re-write. its like VHS-Video not the best just whats been sold to everyone. Peter Childs On 5 Sep 2003, Csaba Nagy wrote: > [rant mode] > I have to answer this: I'm not trying to use a non-standard feature, I > try to solve a problem. Namely to be able to try to insert and on > failure continue the transaction. This is by no means a non-standard > feature. > AFAIKT the standard says nothing about rolling back automatically a > transaction on error, it just says that YOU should be able to roll it > back or commit it, and then all or nothing of the changes should be > executed. > The application design can be "fixed", but that means ugly workarounds. > In my case a simple fix would be to always insert all the possible > records before any update would happen, but that would bloat the table > 10-fold - I think you agree this is unacceptable. > Please understand me: I'm not after pissing off the postgres developers > by telling Postgres is not up to it, I try to insist that nested > transactions are a very important feature, which can solve lots of > problems which apparently might have nothing to do with nested > transactions. > > Cheers, > Csaba. > > > On Fri, 2003-09-05 at 04:38, Jan Wieck wrote: > > Whatever you guy's try or suggest, it's doomed to suffer. > > > > The whole problem stems from using a non-standard feature. And in my > > opinion MySQL's "REPLACE INTO" is less a feature or extension to the > > standard than more another stupid and lesser thought through addition of > > apparently speed gaining crap at the cost of proper design. > > > > One possible reason why this sort of "feature" was left out of the SQL > > standard could be that the source of an ID, that is supposed to be > > unique in the end, should by default ensure it's uniqueness. Defining a > > column UNIQUE is a last line of defense, and aborted actions because of > > constraint violation should be the exception, not the normal mode of > > operation. If it's the DB to ensure uniqueness, it has to generate the > > ID and one can use a sequence. If it's the application to generate it, > > the application should know if this is an INSERT or an UPDATE. > > > > Wherever one is using this "REPLACE INTO" language violation, the client > > application or even something in front of it is generating ID's but it's > > not sure if it is sending down a new or existing one. The real question > > is "why is this piece of garbage unable to tell the ID is newly created > > or has to exist already?" > > > > I don't think there should be a way to subsitute this. Fix the > > application design instead. > > > > > > Jan > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Thinking about it, there's probably no easy way to avoid race conditions (in a true transactional DB at least) when inserting into a table with unique constraints. The REPLACE syntax will definitely not do it, because I can't imagine what it should do when 2 threads try to REPLACE the same key in concurrent transactions. Both will see the key as missing, and try to insert it, so back we are at the same problem INSERT has. Cheers, Csaba. On Fri, 2003-09-05 at 12:06, Peter Childs wrote: > REPLACE INTO throws away data with out the user knowing that they > have ditched the data. This means it has side effects. Standard > Programming Theory says that functions should not have unintended side > REPLACE INTO is in-fact a > > DELETE followed by a INSERT > > Its also unclear what replace should do with missing fields > 1. Fill them in with the defaults. > 2. Leave them alone. > > If 1 its very dangerous and if 2 you should know what you are > changing anyway and the very least it should return "UPDATE" or "INSERT" > depending on what "REPLACE" actually did! > > On the other hand Nesting are a good idea. There is one problem > however. When do you impose referential integrity for deferred checks. at > the last commit. or do you need a check references command. > > Its not the point that a UPDATE on most databases infers a COPY, > CHANGE COPY, DELETE steps so you can roll back if nessessary. > > Replace also needs to know the table stucture to work Update, > Insert and Delete don't they only need to check the constraints. > > As I'm sure I've said before SQL has huge holes and inconsistencies > and needs a complete re-write. its like VHS-Video not the best just whats > been sold to everyone. > > Peter Childs
It was not meant against anyone in person and I agree that nested transactions and/or catchable exceptions and continuing afterwards is usefull and missing in PostgreSQL. What Stephan and Richard where actually discussing was more like emulating the REPLACE INTO, and I was responding to that. However, even with nested transactions and exceptions and all that, your problem will not be cleanly solvable. You basically have 2 choices, trying the INSERT first and if that fails with a duplicate key then do the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT. Now if 2 concurrent transactions do try the UPDATE they can both not find the row and do INSERT - one has a dupkey error. But if you try to INSERT and get a duplicate key, in the time between you get the error and issue the UPDATE someone else can issue a DELETE - the row is gone and your UPDATE will fail. So you have to go into a loop and try INSERTorUPDATEorINSERT... until you either get bored or succeed ... that's not exactly what I call a solution. Jan Csaba Nagy wrote: > [rant mode] > I have to answer this: I'm not trying to use a non-standard feature, I > try to solve a problem. Namely to be able to try to insert and on > failure continue the transaction. This is by no means a non-standard > feature. > AFAIKT the standard says nothing about rolling back automatically a > transaction on error, it just says that YOU should be able to roll it > back or commit it, and then all or nothing of the changes should be > executed. > The application design can be "fixed", but that means ugly workarounds. > In my case a simple fix would be to always insert all the possible > records before any update would happen, but that would bloat the table > 10-fold - I think you agree this is unacceptable. > Please understand me: I'm not after pissing off the postgres developers > by telling Postgres is not up to it, I try to insist that nested > transactions are a very important feature, which can solve lots of > problems which apparently might have nothing to do with nested > transactions. > > Cheers, > Csaba. > > > On Fri, 2003-09-05 at 04:38, Jan Wieck wrote: >> Whatever you guy's try or suggest, it's doomed to suffer. >> >> The whole problem stems from using a non-standard feature. And in my >> opinion MySQL's "REPLACE INTO" is less a feature or extension to the >> standard than more another stupid and lesser thought through addition of >> apparently speed gaining crap at the cost of proper design. >> >> One possible reason why this sort of "feature" was left out of the SQL >> standard could be that the source of an ID, that is supposed to be >> unique in the end, should by default ensure it's uniqueness. Defining a >> column UNIQUE is a last line of defense, and aborted actions because of >> constraint violation should be the exception, not the normal mode of >> operation. If it's the DB to ensure uniqueness, it has to generate the >> ID and one can use a sequence. If it's the application to generate it, >> the application should know if this is an INSERT or an UPDATE. >> >> Wherever one is using this "REPLACE INTO" language violation, the client >> application or even something in front of it is generating ID's but it's >> not sure if it is sending down a new or existing one. The real question >> is "why is this piece of garbage unable to tell the ID is newly created >> or has to exist already?" >> >> I don't think there should be a way to subsitute this. Fix the >> application design instead. >> >> >> Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> You're completely right on that not all possible problems are solved by > this, but different solutions are better or worse based also on the odds > for the problem to actually occur. My code can deal with broken > transactions, it's just unacceptable if they are broken too often - that > slows down the system. You must admit that the odds of the > insert-update-delete to happen at the same time is much lower than just > 2 inserts happening at the same time, whatever the application usage > pattern would be. In particular, it's fairly easy to make sure there's > no delete when updates happen: select the row for update. Only the ^^^^^^^^^^^^^^^^^^^^^^^^^ Of course that's stupid. When you do an update it selects the row for update... and that will not help in this case. But the update will not fail. It will just have nothing to update, which usually is just alright if the row was deleted, meaning that it's life time ended. BTW, in my particular problem I can make sure there will be no delete until all insert/updates are finished. > insert is the problem cause you don't have the row to lock beforehand. > > Cheers, > Csaba. >
On Fri, 2003-09-05 at 15:29, Jan Wieck wrote: > However, even with nested transactions and exceptions and all that, your > problem will not be cleanly solvable. You basically have 2 choices, > trying the INSERT first and if that fails with a duplicate key then do > the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT. > Now if 2 concurrent transactions do try the UPDATE they can both not > find the row and do INSERT - one has a dupkey error. But if you try to > INSERT and get a duplicate key, in the time between you get the error > and issue the UPDATE someone else can issue a DELETE - the row is gone > and your UPDATE will fail. > > So you have to go into a loop and try INSERTorUPDATEorINSERT... until > you either get bored or succeed ... that's not exactly what I call a > solution. You're completely right on that not all possible problems are solved by this, but different solutions are better or worse based also on the odds for the problem to actually occur. My code can deal with broken transactions, it's just unacceptable if they are broken too often - that slows down the system. You must admit that the odds of the insert-update-delete to happen at the same time is much lower than just 2 inserts happening at the same time, whatever the application usage pattern would be. In particular, it's fairly easy to make sure there's no delete when updates happen: select the row for update. Only the insert is the problem cause you don't have the row to lock beforehand. Cheers, Csaba.
On Fri, 2003-09-05 at 08:29, Jan Wieck wrote: > It was not meant against anyone in person and I agree that nested > transactions and/or catchable exceptions and continuing afterwards is > usefull and missing in PostgreSQL. What Stephan and Richard where > actually discussing was more like emulating the REPLACE INTO, and I was > responding to that. > > However, even with nested transactions and exceptions and all that, your > problem will not be cleanly solvable. You basically have 2 choices, > trying the INSERT first and if that fails with a duplicate key then do > the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT. > Now if 2 concurrent transactions do try the UPDATE they can both not > find the row and do INSERT - one has a dupkey error. But if you try to > INSERT and get a duplicate key, in the time between you get the error > and issue the UPDATE someone else can issue a DELETE - the row is gone > and your UPDATE will fail. SERIALIZABLE transactions will solve this. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA Thanks to the good people in Microsoft, a great deal of the data that flows is dependent on one company. That is not a healthy ecosystem. The issue is that creativity gets filtered through the business plan of one company. Mitchell Baker, "Chief Lizard Wrangler" at Mozilla
Csaba Nagy <nagy@ecircle-ag.com> writes: > Thinking about it, there's probably no easy way to avoid race conditions > (in a true transactional DB at least) when inserting into a table with > unique constraints. The REPLACE syntax will definitely not do it, > because I can't imagine what it should do when 2 threads try to REPLACE > the same key in concurrent transactions. Both will see the key as > missing, and try to insert it, so back we are at the same problem INSERT > has. Assuming that you've got a unique constraint defined, one thread will succeed in doing the INSERT, and the other will fail with a duplicate key error --- whereupon it should loop back and try the REPLACE part again. So what this all comes down to is having control over recovery from a dup-key error. You have to be able to not have that abort your whole transaction. regards, tom lane
Ron Johnson wrote: > On Fri, 2003-09-05 at 08:29, Jan Wieck wrote: >> It was not meant against anyone in person and I agree that nested >> transactions and/or catchable exceptions and continuing afterwards is >> usefull and missing in PostgreSQL. What Stephan and Richard where >> actually discussing was more like emulating the REPLACE INTO, and I was >> responding to that. >> >> However, even with nested transactions and exceptions and all that, your >> problem will not be cleanly solvable. You basically have 2 choices, >> trying the INSERT first and if that fails with a duplicate key then do >> the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT. >> Now if 2 concurrent transactions do try the UPDATE they can both not >> find the row and do INSERT - one has a dupkey error. But if you try to >> INSERT and get a duplicate key, in the time between you get the error >> and issue the UPDATE someone else can issue a DELETE - the row is gone >> and your UPDATE will fail. > > SERIALIZABLE transactions will solve this. Sure will they. Care to elaborate a bit about the side effects of SERIALIZABLE? I mean semantics AND performance wise ... people tend to use suggestions like this without thinking (about the consequences). Jan :-T -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Wed, 2003-09-10 at 00:31, Jan Wieck wrote: > Ron Johnson wrote: > > > On Fri, 2003-09-05 at 08:29, Jan Wieck wrote: > >> It was not meant against anyone in person and I agree that nested > >> transactions and/or catchable exceptions and continuing afterwards is > >> usefull and missing in PostgreSQL. What Stephan and Richard where > >> actually discussing was more like emulating the REPLACE INTO, and I was > >> responding to that. > >> > >> However, even with nested transactions and exceptions and all that, your > >> problem will not be cleanly solvable. You basically have 2 choices, > >> trying the INSERT first and if that fails with a duplicate key then do > >> the UPDATE, or try the UPDATE first and if no rows got hit do an INSERT. > >> Now if 2 concurrent transactions do try the UPDATE they can both not > >> find the row and do INSERT - one has a dupkey error. But if you try to > >> INSERT and get a duplicate key, in the time between you get the error > >> and issue the UPDATE someone else can issue a DELETE - the row is gone > >> and your UPDATE will fail. > > > > SERIALIZABLE transactions will solve this. > > Sure will they. > > Care to elaborate a bit about the side effects of SERIALIZABLE? I mean > semantics AND performance wise ... people tend to use suggestions like > this without thinking (about the consequences). Well, unless INSERT/UPDATE/DELETE transactions are very short, there will definitely be a performance hit because of increased locking. However, I prefer that consequence rather than the artifacts from READ COMMITTED. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA LUKE: Is Perl better than Python? YODA: No... no... no. Quicker, easier, more seductive. LUKE: But how will I know why Python is better than Perl? YODA: You will know. When your code you try to read six months from now.