Thread: someone working to add merge?
Hi, there is someone working in add the MERGE statement? i don't find much about what a good implementation of merge must have... i think what it needs to do is something like: - try to lock the rows for update - if the lock cannot be immediatly acquire ask why - if the rows are already locked,wait and try again? - if no rows were found try de insert part - if there was any other error, abort - elseupdate so i suppose we can reuse many of the code breaking the merge in 3 pieces... for now they are just thougths, i will think more in this and try to implement it... comments? ideas? suggestions? -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Jaime, > so i suppose we can reuse many of the code breaking the merge in 3 > pieces... for now they are just thougths, i will think more in this > and try to implement it... > > comments? ideas? suggestions? Funny, we were just discussing this at OpenDBCon. Seems that you can't do a full implementation of MERGE without Predicate Locking (the ability to say "lock this table against inserts or updates of any row with key=5"). However, Peter suggested that we could do a proof-of-concept implementation, working out syntax and trigger issues, based on a full table lock and do the hard work once it was proved to be feasable. Peter? -- Josh Berkus Aglio Database Solutions San Francisco
On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote: > Jaime, > > > so i suppose we can reuse many of the code breaking the merge in 3 > > pieces... for now they are just thougths, i will think more in this > > and try to implement it... > > > > comments? ideas? suggestions? > > Funny, we were just discussing this at OpenDBCon. Seems that you can't do a > full implementation of MERGE without Predicate Locking (the ability to say > "lock this table against inserts or updates of any row with key=5"). it isn't what select for update does? > However, Peter suggested that we could do a proof-of-concept implementation, > working out syntax and trigger issues, based on a full table lock and do the > hard work once it was proved to be feasable. > > Peter? > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;)
Jaime Casanova wrote: >>Funny, we were just discussing this at OpenDBCon. Seems that you can't do a >>full implementation of MERGE without Predicate Locking (the ability to say >>"lock this table against inserts or updates of any row with key=5"). >> >> > >it isn't what select for update does? > > > > > It won't prevent the insertion of a row with the given predicate. cheers andrew
On Fri, 2005-11-11 at 18:15, Jaime Casanova wrote: > On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote: > > Jaime, > > > > > so i suppose we can reuse many of the code breaking the merge in 3 > > > pieces... for now they are just thougths, i will think more in this > > > and try to implement it... > > > > > > comments? ideas? suggestions? > > > > Funny, we were just discussing this at OpenDBCon. Seems that you can't do a > > full implementation of MERGE without Predicate Locking (the ability to say > > "lock this table against inserts or updates of any row with key=5"). > > it isn't what select for update does? Select for update only works if the row is already there. If there's no row, you can't lock it. So you want then to insert it, but then it is possible that somebody inserted it before you, immediately after your update... so the solution would be more like: - try insert; - if insert fails, do update; You can already do that, but you have to place a save-point before the insert, so you can continue your transaction even if the insert fails. Without knowledge of postgres internals, the simplest would be to be able to do the "continue transaction if insert fails" with the cheapest prise to pay. This would mean wrap up existing code, except that "continue transaction after failure of insert" part. All this might be completely bull*it of course, I don't know too much about postgres internals. [snip] Cheers, Csaba.
Josh Berkus wrote: > Funny, we were just discussing this at OpenDBCon. Seems that you > can't do a full implementation of MERGE without Predicate Locking > (the ability to say "lock this table against inserts or updates of > any row with key=5"). However, Peter suggested that we could do a > proof-of-concept implementation, working out syntax and trigger > issues, based on a full table lock and do the hard work once it was > proved to be feasable. Yes, I've started to work on this. Realizing that the current way to manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a table lock anyway, a MERGE implementation using a table lock would at least give some convenience benefit to users. (And possibly some performance, too, if the decision logic is currently run in the client.) A predicate locking implementation for MERGE might actually not be all that complicated, because you only need to look on pk = constant, not on arbitrary expressions. Nevertheless, I think it's best to write the MERGE command first and then optimize the locking. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Josh Berkus <josh@agliodbs.com> writes: > Funny, we were just discussing this at OpenDBCon. Seems that you can't do a > full implementation of MERGE without Predicate Locking (the ability to say > "lock this table against inserts or updates of any row with key=5"). > However, Peter suggested that we could do a proof-of-concept implementation, > working out syntax and trigger issues, based on a full table lock and do the > hard work once it was proved to be feasable. If you don't have any better idea how to do it than a full table lock, you might as well not do it at all. A "proof of concept" that doesn't solve the hard part of the problem is no proof :-( My first guess about a real implementation would involve extending the index AM API to offer a function "insert this key, or return the existing match if there already is one". This might tie into refactoring the existing implementation of unique indexes, in which all the problem is put on the AM's head (which is why only btree copes at the moment). regards, tom lane
OK, I'm relatively new on this list, and I might have missed a few discussions on this topic. I wonder if doing it this way would not be better than using a table lock: - set a save point;- insert the row; - on error: - roll back to the save point; - update the row; - onsuccess release the save point; This would provide less contention while paying the prise for the save point. In low contention scenarios the table lock would be better, and I wonder for high contention scenarios which is better, the table lock, or the save point version... Of course the table lock version is the future if predicate locking is going to be implemented later. Cheers, Csaba. On Fri, 2005-11-11 at 18:37, Peter Eisentraut wrote: > Josh Berkus wrote: > > Funny, we were just discussing this at OpenDBCon. Seems that you > > can't do a full implementation of MERGE without Predicate Locking > > (the ability to say "lock this table against inserts or updates of > > any row with key=5"). However, Peter suggested that we could do a > > proof-of-concept implementation, working out syntax and trigger > > issues, based on a full table lock and do the hard work once it was > > proved to be feasable. > > Yes, I've started to work on this. Realizing that the current way to > manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a > table lock anyway, a MERGE implementation using a table lock would at > least give some convenience benefit to users. (And possibly some > performance, too, if the decision logic is currently run in the > client.) > > A predicate locking implementation for MERGE might actually not be all > that complicated, because you only need to look on pk = constant, not > on arbitrary expressions. Nevertheless, I think it's best to write the > MERGE command first and then optimize the locking.
Tom Lane wrote: > If you don't have any better idea how to do it than a full table > lock, you might as well not do it at all. A "proof of concept" that > doesn't solve the hard part of the problem is no proof :-( But the problem here is not to break any kind of performance barrier, but to give people migrating from MySQL and alternative for REPLACE command. > My first guess about a real implementation would involve extending > the index AM API to offer a function "insert this key, or return the > existing match if there already is one". This assumes that there are indexes defined for the columns involved in the merge condition, which is not required anywhere. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > This assumes that there are indexes defined for the columns involved in > the merge condition, which is not required anywhere. Surely they require a unique constraint --- else the behavior isn't even well defined, is it? regards, tom lane
On 11/11/05, Peter Eisentraut <peter_e@gmx.net> wrote: > Tom Lane wrote: > > If you don't have any better idea how to do it than a full table > > lock, you might as well not do it at all. A "proof of concept" that > > doesn't solve the hard part of the problem is no proof :-( > > But the problem here is not to break any kind of performance barrier, > but to give people migrating from MySQL and alternative for REPLACE > command. > But MERGE isn't REPLACE... REPLACE will delete old records to insert new ones; MERGE try to insert and if the record exists then can UPDATE just a few values, maybe incrementing them with a value (all the calculation are doing by the MERGE) > > My first guess about a real implementation would involve extending > > the index AM API to offer a function "insert this key, or return the > > existing match if there already is one". > > This assumes that there are indexes defined for the columns involved in > the merge condition, which is not required anywhere. > -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Tom Lane wrote: > Surely they require a unique constraint --- else the behavior isn't > even well defined, is it? They require that the merge condition does not match for more than one row, but since the merge condition can do just about anything, there is no guarantee that a unique constraint encompasses it. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Jaime Casanova wrote: > REPLACE will delete old records to insert new ones; MERGE try to > insert and if the record exists then can UPDATE just a few values, > maybe incrementing them with a value (all the calculation are doing > by the MERGE) I'm not the expert on REPLACE, but it would seem that REPLACE is a special case of MERGE. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Jaime Casanova Wrote: > But MERGE isn't REPLACE... > > REPLACE will delete old records to insert new ones; MERGE try > to insert and if the record exists then can UPDATE just a few > values, maybe incrementing them with a value (all the > calculation are doing by the MERGE) That sounds like MySQL's 'INSERT INTO ... ON DUPLICATE KEY UPDATE', which they recommend over REPLACE anyways.
On Fri, Nov 11, 2005 at 18:48:33 +0100, Csaba Nagy <nagy@ecircle-ag.com> wrote: > OK, I'm relatively new on this list, and I might have missed a few > discussions on this topic. > I wonder if doing it this way would not be better than using a table > lock: > > - set a save point; > - insert the row; > - on error: > - roll back to the save point; > - update the row; > - on success release the save point; > > This would provide less contention while paying the prise for the save > point. In low contention scenarios the table lock would be better, and I > wonder for high contention scenarios which is better, the table lock, or > the save point version... You may not be able to update the row after the insert fails. If there is insert occurring in another transaction, the row may not be visible to the current transaction. In which case you can neither insert or update the row. You need to wait for the other transaction to commit or rollback.
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> Surely they require a unique constraint --- else the behavior isn't >> even well defined, is it? > They require that the merge condition does not match for more than one > row, but since the merge condition can do just about anything, there is > no guarantee that a unique constraint encompasses it. ISTM to be a reasonable implementation restriction that there be a constraint by which the system can prove that there is at most one matching row. Per other comments in this thread, we'd not be the only implementation making such a restriction. (Certainly, if I were a DBA and were told that the performance of MERGE would go to hell in a handbasket if I had no such constraint, I'd make sure there was one. I don't think there is very much of a use-case for the general scenario.) regards, tom lane
Tom Lane Wrote: > Surely they require a unique constraint --- else the behavior > isn't even well defined, is it? From the mysql manual: 'REPLACE works exactly like INSERT, except that if an old record in the table has the same value as a new record for a PRIMARY KEY or a UNIQUE index, the old record is deleted before the new record is inserted. See Section 13.2.4, "INSERT Syntax".' ... John
I Wrote: > From the mysql manual: > > 'REPLACE works exactly like INSERT, except that if an old > record in the table has the same value as a new record for a > PRIMARY KEY or a UNIQUE index, the old record is deleted > before the new record is inserted. See Section 13.2.4, > "INSERT Syntax".' It also says: Note that unless the table has a PRIMARY KEY or UNIQUE index, using a REPLACE statement makes no sense. It becomes equivalent to INSERT, because there is no index to be used to determine whether a new row duplicates another. ... John
On Fri, 2005-11-11 at 20:22, Bruno Wolff III wrote: > On Fri, Nov 11, 2005 at 18:48:33 +0100, > Csaba Nagy <nagy@ecircle-ag.com> wrote: > > OK, I'm relatively new on this list, and I might have missed a few > > discussions on this topic. > > I wonder if doing it this way would not be better than using a table > > lock: > > > > - set a save point; > > - insert the row; > > - on error: > > - roll back to the save point; > > - update the row; > > - on success release the save point; > > > > This would provide less contention while paying the prise for the save > > point. In low contention scenarios the table lock would be better, and I > > wonder for high contention scenarios which is better, the table lock, or > > the save point version... > > You may not be able to update the row after the insert fails. If there is > insert occurring in another transaction, the row may not be visible to > the current transaction. In which case you can neither insert or update the > row. You need to wait for the other transaction to commit or rollback. Are you sure ? From what I understand, the insert will only fail when the other transaction commits, and actively wait for the commit or rollback. Look at this: session_1=> create table test (col smallint primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE session_1=> begin; BEGIN cnagy=> insert into test values (1); INSERT 165068987 1 session_2=> begin; BEGIN session_2=> insert into test values (1); [session_2 is now waiting] session_1=> commit; COMMIT [session_2 wakes up] ERROR: duplicate key violates unique constraint "test_pkey" So it looks like predicate locking is already in place for primary key conditions... Cheers, Csaba.
Csaba Nagy wrote: > session_1=> create table test (col smallint primary key); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "test_pkey" for table "test" > CREATE TABLE > session_1=> begin; > BEGIN > cnagy=> insert into test values (1); > INSERT 165068987 1 > > session_2=> begin; > BEGIN > session_2=> insert into test values (1); > > [session_2 is now waiting] This only happens because of the unique index. There's no predicate locking involved. The btree code goes some lengths to make this work; it would be probably simple to modify this to support MERGE or REPLACE on the limited cases where there's a UNIQUE index. Tom has already said this twice (on this thread only; he has already said it before IIRC.) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Well, from my point of view it is a special case of predicate locking supported well by existing code, in this case the unique index (you said that, I'm not familiar with the code). I don't see why this cannot be capitalized on, to implement a sub-set of what predicate locking is, based on the mechanism already existing. I guess most of the users who need some kind of merge, replace, insert-or-update feature (or other reasons to block inserts/updates on a specific row) would be happy for now with the restriction that the condition must be backed by a unique index. So basically the only thing I'm trying to say is that a partial implementation which might be easily implementable (I might be wrong here), without too big performance penalties and covers a very valid problem is better than chasing the complete solution which is too complex to be implemented easily or it could be easily implemented but then has performance disadvantages. The only thing is to be labeled correctly so people don't expect something it isn't. Of course all this is hand-waving, I can't really help even if I wanted, my C skills are less then mediocre. Cheers, Csaba. On Tue, 2005-11-15 at 12:14, Alvaro Herrera wrote: > Csaba Nagy wrote: > > > session_1=> create table test (col smallint primary key); > > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > > "test_pkey" for table "test" > > CREATE TABLE > > session_1=> begin; > > BEGIN > > cnagy=> insert into test values (1); > > INSERT 165068987 1 > > > > session_2=> begin; > > BEGIN > > session_2=> insert into test values (1); > > > > [session_2 is now waiting] > > This only happens because of the unique index. There's no predicate > locking involved. The btree code goes some lengths to make this work; > it would be probably simple to modify this to support MERGE or REPLACE > on the limited cases where there's a UNIQUE index. Tom has already said > this twice (on this thread only; he has already said it before IIRC.)
On 11/15/05, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Csaba Nagy wrote: > > > session_1=> create table test (col smallint primary key); > > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > > "test_pkey" for table "test" > > CREATE TABLE > > session_1=> begin; > > BEGIN > > cnagy=> insert into test values (1); > > INSERT 165068987 1 > > > > session_2=> begin; > > BEGIN > > session_2=> insert into test values (1); > > > > [session_2 is now waiting] > > This only happens because of the unique index. There's no predicate > locking involved. The btree code goes some lengths to make this work; > it would be probably simple to modify this to support MERGE or REPLACE > on the limited cases where there's a UNIQUE index. Tom has already said > this twice (on this thread only; he has already said it before IIRC.) > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > And the only type of predicate locking we need for MySQL REPLACE because it needs a pk or unique index to know it has to replace otherwise it inserts the row... that's the way it works as mysql spec said... -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;)
Alvaro Herrera <alvherre@commandprompt.com> writes: > This only happens because of the unique index. There's no predicate > locking involved. The btree code goes some lengths to make this work; That's one way to look at it; the other is to say that we have predicate locking for a very specific class of predicate, ie, equality of a unique key. In practice I think we only have a useful lock there for *primary* keys, because unique without NOT NULL doesn't actually constrain you to just one matching row ... regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > This only happens because of the unique index. There's no predicate > > locking involved. The btree code goes some lengths to make this work; > > That's one way to look at it; the other is to say that we have predicate > locking for a very specific class of predicate, ie, equality of a > unique key. > > In practice I think we only have a useful lock there for *primary* keys, > because unique without NOT NULL doesn't actually constrain you to just > one matching row ... I agree --- an implementation that needs to use a table lock is useless, and one with no primary key is too hard to implement and also near useless. I have update the TODO item to reflect this: * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT (rules, triggers?) To implement this cleanly requiresthat the table have a unique index so duplicate checking can be easily performed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > I agree --- an implementation that needs to use a table lock is > useless, and one with no primary key is too hard to implement and > also near useless. Well, there were just a couple of people saying the opposite. > I have update the TODO item to reflect this: > > * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT > (rules, triggers?) > > To implement this cleanly requires that the table have a unique > index so duplicate checking can be easily performed. We're still trying to work out the semantic relationship between MERGE and REPLACE and what-we-actually-want. This entry doesn't seem to take that into account. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Fri, Nov 18, 2005 at 05:30:34PM +0100, Peter Eisentraut wrote: > Bruce Momjian wrote: > > I have update the TODO item to reflect this: > > > > * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT > > (rules, triggers?) > > > > To implement this cleanly requires that the table have a unique > > index so duplicate checking can be easily performed. > > We're still trying to work out the semantic relationship between MERGE > and REPLACE and what-we-actually-want. This entry doesn't seem to take > that into account. Right. From my reading of the spec (which someone posted here somewhere) MERGE has no special rules regarding visibility. So it's just as susceptable to "duplicate keys" and "lost updates" as any current method. That doesn't dimish its usefulness, it's just not what some people thought it was. My current position is that since REPLACE seems to violate normal transaction semantics (must not fail no matter what other backends are doing) that any implementation will probably have to play fancy footwork with locking and savepoints within a single statement. And that's not MERGE. I'd say implement SQL MERGE which doesn't have any really unusual features. And seperately implement some kind of INSERT OR UPDATE which works only for a table with a primary key. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Peter Eisentraut wrote: > Bruce Momjian wrote: > > I agree --- an implementation that needs to use a table lock is > > useless, and one with no primary key is too hard to implement and > > also near useless. > > Well, there were just a couple of people saying the opposite. > > > I have update the TODO item to reflect this: > > > > * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT > > (rules, triggers?) > > > > To implement this cleanly requires that the table have a unique > > index so duplicate checking can be easily performed. > > We're still trying to work out the semantic relationship between MERGE > and REPLACE and what-we-actually-want. This entry doesn't seem to take > that into account. Right. Once we are done I will update it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Fri, Nov 18, 2005 at 09:03:25PM +0100, Martijn van Oosterhout wrote: > I'd say implement SQL MERGE which doesn't have any really unusual > features. And seperately implement some kind of INSERT OR UPDATE which > works only for a table with a primary key. Is there any reeason this has to be a PK; shouldn't a unique index with no nullable fields work just as well? It seems bad to limit this to just a PK if we can avoid it. For example, if you have something that's logging hits to web pages, you might have this table: CREATE TABLE url ( url_id serial PRIMARY KEY, url text NOT NULL UNIQUE ); I prefer having url_id as the PK because it's how you normally access the table. But ISTM that there are cases where yo'd want to be able to merge on two different sets of fields in one table, which is impossible if we limit it to PK merges only. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 11/11/2005 2:20 PM, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Tom Lane wrote: >>> Surely they require a unique constraint --- else the behavior isn't >>> even well defined, is it? > >> They require that the merge condition does not match for more than one >> row, but since the merge condition can do just about anything, there is >> no guarantee that a unique constraint encompasses it. > > ISTM to be a reasonable implementation restriction that there be a > constraint by which the system can prove that there is at most one > matching row. Per other comments in this thread, we'd not be the only > implementation making such a restriction. > > (Certainly, if I were a DBA and were told that the performance of MERGE > would go to hell in a handbasket if I had no such constraint, I'd make > sure there was one. I don't think there is very much of a use-case for > the general scenario.) Such restriction does look reasonable. Especially because ... The largest problem I see with MERGE is the question of BEFORE triggers. Consider a BEFORE INSERT trigger that modifies a third table, after which the constraint or whatever post-heap_insert-attempt we might use detects a conflict. How do we undo the actions of the BEFORE trigger? The only way to do that is to plan the query as a nestloop, with the USING part as the outer loop. If the (updating) scan of the INTO relation did not hit any tuple, then do the INSERT. We can only undo the side effects of any BEFORE trigger by wrapping each and evey nested INTO relation insert attempt into its own subtransaction. Sure, we "could" of course do the insert and then rescan the whole thing with read-committed to see if our row is now the only one ... needless to say that in the case of a sequential scan inside the loop, that nestloop will suck big times even without that second scan. But ... hmmm ... we could get away with that and if we don't find a constraint that will ensure uniqueness, then we do a rescan to check for it. But I would vote for a "please_no_notice_about_stupid_usage_of_merge" runtime option that suppresses the corresponding NOTICE. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Wed, Nov 23, 2005 at 04:55:25PM -0500, Jan Wieck wrote: > The largest problem I see with MERGE is the question of BEFORE triggers. > Consider a BEFORE INSERT trigger that modifies a third table, after > which the constraint or whatever post-heap_insert-attempt we might use > detects a conflict. How do we undo the actions of the BEFORE trigger? > The only way to do that is to plan the query as a nestloop, with the > USING part as the outer loop. If the (updating) scan of the INTO > relation did not hit any tuple, then do the INSERT. We can only undo the > side effects of any BEFORE trigger by wrapping each and evey nested INTO > relation insert attempt into its own subtransaction. Umm, if there are any errors you abort the transaction, just like any other case. ACID requires that either the whole statement is done, or none. If a trigger causes the INSERT or UPDATE to fail you have no choice but to abort the transaction. Besides, someone posted an example on Oracle, they don't require an index so I don't think we realistically can say that people need one. If two concurrent MERGEs, which can't see eachothers output, both end up INSERTing, that not an error unless the user has a UNIQUE constraint, so the problem vanishes. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On 11/24/2005 1:30 AM, Martijn van Oosterhout wrote: > On Wed, Nov 23, 2005 at 04:55:25PM -0500, Jan Wieck wrote: >> The largest problem I see with MERGE is the question of BEFORE triggers. >> Consider a BEFORE INSERT trigger that modifies a third table, after >> which the constraint or whatever post-heap_insert-attempt we might use >> detects a conflict. How do we undo the actions of the BEFORE trigger? >> The only way to do that is to plan the query as a nestloop, with the >> USING part as the outer loop. If the (updating) scan of the INTO >> relation did not hit any tuple, then do the INSERT. We can only undo the >> side effects of any BEFORE trigger by wrapping each and evey nested INTO >> relation insert attempt into its own subtransaction. > > Umm, if there are any errors you abort the transaction, just like any > other case. ACID requires that either the whole statement is done, or > none. If a trigger causes the INSERT or UPDATE to fail you have no > choice but to abort the transaction. I guess you misunderstood. What I am talking about is a problem in the order of execution. since we don't have predicate locking, there is a possibility that our implementation of MERGE decides to do an INSERT while another transaction does the same. What has to happen is that the BEFORE INSERT trigger is called, then the heap tuple inserted, then the index tuples created. At this time, the duplicate key error occurs, telling us that we had a conflict and that we have to try an UPDATE instead. That means, in the end this particular row's INSERT has never happened and we have to undo the BEFORE INSERT triggers actions too. > > Besides, someone posted an example on Oracle, they don't require an > index so I don't think we realistically can say that people need one. > If two concurrent MERGEs, which can't see eachothers output, both end > up INSERTing, that not an error unless the user has a UNIQUE > constraint, so the problem vanishes. Not following the semantics is an error. MERGE is not supposed to do multiple inserts for the same match, concurrency or not. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Thu, Nov 24, 2005 at 11:11:34AM -0500, Jan Wieck wrote: > On 11/24/2005 1:30 AM, Martijn van Oosterhout wrote: > >Umm, if there are any errors you abort the transaction, just like any > >other case. ACID requires that either the whole statement is done, or > >none. If a trigger causes the INSERT or UPDATE to fail you have no > >choice but to abort the transaction. > > I guess you misunderstood. What I am talking about is a problem in the > order of execution. since we don't have predicate locking, there is a > possibility that our implementation of MERGE decides to do an INSERT > while another transaction does the same. What has to happen is that the > BEFORE INSERT trigger is called, then the heap tuple inserted, then the > index tuples created. At this time, the duplicate key error occurs, > telling us that we had a conflict and that we have to try an UPDATE > instead. That means, in the end this particular row's INSERT has never > happened and we have to undo the BEFORE INSERT triggers actions too. But I'm not sure we're supposed to handle that case anyway. Oracle at least doesn't require an index on the table being merged. And if I look at it from a visibility view point, if someone else does an INSERT in another transaction, then MERGE cannot see it and thus it will INSERT too. This isn't an error. Consider the case of a deferred unique constraint (Postgres doesn't support these yet but bear with me). Say in one session you start a transaction, do a MERGE and then a few other statements. In the meantime in another session someone inserts a row on the table you merged. Are you asserting that the first session should undo everything, do an UPDATE instead of an INSERT and redo all your queries since? Obviously not. Though the above relies on something Postgres doesn't support, but you would be able to emulate the some without a unique key. For example: Session 1: CREATE TEMP TABLE foo (id integer, val integer); BEGIN; SELECT * FROM foo; Session 2: BEGIN; INSERT INTO foo (1,3); Session 1: MERGE INTO foo USING (SELECT 1) ON (foo.id = 1) WHEN MATCHED THEN UPDATE SET val = 1 WHEN NOT MATCHED THEN INSERT (id,val)VALUES (1,2); Session 2: COMMIT; Session 1: COMMIT; Now, (IMO) in serializable mode, the MERGE should block on reading the row inserted by the second session and when that commits do the UPDATE, thus leaving you with a table with one row (1.1). In read committed mode, the MERGE shouldn't block and you should end up with a table with two rows (1,3) and (1,2). If you switch the order of the insert and merge you should get the same results in both cases, (1,2) and (1,3). I think you are arguing that the result should be (1,1) in all cases. I honestly don't see how that is feasible and certainly not supported by my reading of the standard. I would be interested to know how other databases handle the above case. > Not following the semantics is an error. MERGE is not supposed to do > multiple inserts for the same match, concurrency or not. Yes, any single MERGE cannot insert twice. However, two MERGEs running concurrently, or a MERGE with an INSERT/UPDATE/DELETE in another session could very well end up with multiple rows on the same key. I maintain that MERGE has no special rules w.r.t. visibility or locking and we should not be acting as if it does. If at the end of the transaction there a duplicate key we should throw the error and let the client deal with it. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On 11/25/2005 7:14 AM, Martijn van Oosterhout wrote: > On Thu, Nov 24, 2005 at 11:11:34AM -0500, Jan Wieck wrote: >> I guess you misunderstood. [...] > > But I'm not sure we're supposed to handle that case anyway. Oracle at > least doesn't require an index on the table being merged. And if I look > at it from a visibility view point, if someone else does an INSERT in > another transaction, then MERGE cannot see it and thus it will INSERT > too. This isn't an error. Hmmm ... so you maintain that MERGE without an explicit LOCK TABLE, done by the user before performing the MERGE, can create duplicate rows (WRT the merge condition) and consequently raise a duplicate key error if there is a UNIQUE constraint. If that is what the standard describes, then it can be implemented without any sort of index or constraint requirement. The query tree for MERGE will have the INTO relation as a left outer join. In the case of a match of this outer join, one set of targetlist expressions is used to form the result tuple containing the INTO-relations ctid. That result tuple us useable for heap_update() or heap_delete(). In the case of no-match another set of target list expressions is used, suitable for heap_insert(). This way, MERGE will work with one single sequential scan of the INTO relation in case there is no suitable index. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Fri, Nov 25, 2005 at 09:14:47AM -0500, Jan Wieck wrote: > Hmmm ... so you maintain that MERGE without an explicit LOCK TABLE, done > by the user before performing the MERGE, can create duplicate rows (WRT > the merge condition) and consequently raise a duplicate key error if > there is a UNIQUE constraint. > > If that is what the standard describes, then it can be implemented > without any sort of index or constraint requirement. The query tree for > MERGE will have the INTO relation as a left outer join. In the case of a > match of this outer join, one set of targetlist expressions is used to > form the result tuple containing the INTO-relations ctid. That result > tuple us useable for heap_update() or heap_delete(). In the case of > no-match another set of target list expressions is used, suitable for > heap_insert(). This way, MERGE will work with one single sequential scan > of the INTO relation in case there is no suitable index. Yes, that's the way I read the standard and how I was thinking it could be implemented. It does simplify the case suggested by people that want atomic REPLACE, because you only have one statement to repeat until you get success. do MERGE ...; while( not error and modified_rows <> 1 ) I was thinking that we could make a seperate REPLACE command which can only insert or update one row, but can do it atomically. Basically a loop like above with builtin savepoints. Alternativly, we could special case the MERGE-without-USING case, in the cases where the plan simply devolves into an Index Scan, but I don't like special casing. If we're going to have special semantics we should have a seperate statement so it's clear that it's special. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.