Thread: MERGE vs REPLACE
It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. That said, what kind of support for insert-or-update-this-row do we want to provide, if any? Should it be a REPLACE command, an extension of the INSERT command, a modication of the MERGE syntax, or something else? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > It seems to me that it has always been implicitly assumed around here > that the MERGE command would be a substitute for a MySQL-like REPLACE > functionality. After rereading the spec it seems that this is not the > case. MERGE always operates on two different tables, which REPLACE > doesn't do. > > That said, what kind of support for insert-or-update-this-row do we want > to provide, if any? Should it be a REPLACE command, an extension of > the INSERT command, a modication of the MERGE syntax, or something > else? > MERGE of course, it's standard, REPLACE is mysql extension -- Regards Petr Jelinek (PJMODOS) www.parba.cz
On 11/11/05, Peter Eisentraut <peter_e@gmx.net> wrote: > It seems to me that it has always been implicitly assumed around here > that the MERGE command would be a substitute for a MySQL-like REPLACE > functionality. After rereading the spec it seems that this is not the > case. MERGE always operates on two different tables, which REPLACE > doesn't do. > > That said, what kind of support for insert-or-update-this-row do we want > to provide, if any? Should it be a REPLACE command, an extension of > the INSERT command, a modication of the MERGE syntax, or something > else? > > -- > Peter Eisentraut > http://developer.postgresql.org/~petere/ > MERGE seems to me the better option... not just because is standard but at least i can see some use cases for it... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Jaime Casanova wrote: > MERGE seems to me the better option... not just because is standard > but at least i can see some use cases for it... I don't think you understand my message: MERGE does not do what REPLACE does. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > It seems to me that it has always been implicitly assumed around here > that the MERGE command would be a substitute for a MySQL-like REPLACE > functionality. After rereading the spec it seems that this is not the > case. MERGE always operates on two different tables, which REPLACE > doesn't do. Normally I'd plump for following the standard ... but AFAIR, we have had bucketloads of requests for REPLACE functionality, and not one request for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a whole lot harder and slower than REPLACE, it seems that we could do worse than to concentrate on doing REPLACE for now. (We can always come back to MERGE some other day.) regards, tom lane
On 11/11/05, Peter Eisentraut <peter_e@gmx.net> wrote: > Jaime Casanova wrote: > > MERGE seems to me the better option... not just because is standard > > but at least i can see some use cases for it... > > I don't think you understand my message: MERGE does not do what REPLACE > does. > > -- > Peter Eisentraut > http://developer.postgresql.org/~petere/ > I understand you well... what i was trying to say is that i prefer MERGE (standard SQL command) to be done because the functionally it has (basically a merge of two tables) seems to me to be more usefull than REPLACE (MySql Command)... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Guys, > I understand you well... what i was trying to say is that i prefer > MERGE (standard SQL command) to be done because the functionally it > has (basically a merge of two tables) seems to me to be more usefull > than REPLACE (MySql Command)... But even REPLACE requires predicate locking. There's no real way to get around it. --Josh -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote: > Guys, > > > I understand you well... what i was trying to say is that i prefer > > MERGE (standard SQL command) to be done because the functionally it > > has (basically a merge of two tables) seems to me to be more usefull > > than REPLACE (MySql Command)... > > But even REPLACE requires predicate locking. There's no real way to get > around it. > > --Josh > why? seems that REPLACE only work if there are at least one row matching... -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;)
Jaime, > why? seems that REPLACE only work if there are at least one row > matching... Scenario: session1: REPLACE .... 1 session2: REPLACE ..... 1 session1: check to see that "1" exists .... no session2: check to see that "1" exists .... no session1: INSERT 1 session2: INSERT 1 .... ERROR Get the picture? The only way to avoid a race condition is to be able to do "predicate locking", that is to lock the table against any data write matching that predicate. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote: > Jaime, > > > why? seems that REPLACE only work if there are at least one row > > matching... > Get the picture? The only way to avoid a race condition is to be able to > do "predicate locking", that is to lock the table against any data write > matching that predicate. So? That is what save points are for. You can even skip the select for update if you don't mind dead tuples from the attempted insert. SELECT ... FOR UPDATE; IF not exists THENSAVEPOINT;INSERT ;IF UNIQUE VIOLATION THEN /* Someone else inserted between the SELECT and our INSERT*/ ROLLBACK TO SAVEPOINT; UPDATE;ELSE RELEASE SAVEPOINT;FI ELSEUPDATE; FI --
Josh Berkus <josh@agliodbs.com> writes: > But even REPLACE requires predicate locking. There's no real way to get > around it. The point though is that REPLACE is restricted to a type of predicate narrow enough to be enforced through a unique-index mechanism, and so it's implementable without solving the general case of predicate locking. Predicate locking for narrow cases isn't very hard; it's the general case of arbitrary predicates that's hard. regards, tom lane
On Fri, 11 Nov 2005, Josh Berkus wrote: > Jaime, > > > why? seems that REPLACE only work if there are at least one row > > matching... > > Scenario: > > session1: REPLACE .... 1 > session2: REPLACE ..... 1 > session1: check to see that "1" exists .... no > session2: check to see that "1" exists .... no > session1: INSERT 1 > session2: INSERT 1 .... ERROR > > Get the picture? The only way to avoid a race condition is to be able to > do "predicate locking", that is to lock the table against any data write > matching that predicate. When it comes to predicate locking, I think we should defer to Peter's comment at Open DB Con: http://www.treehou.se/~swm/peter_merge.jpg Gavin
On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote: > So? That is what save points are for. You can even skip the select for > update if you don't mind dead tuples from the attempted insert. > SELECT ... FOR UPDATE; > IF not exists THEN > SAVEPOINT; > INSERT ; > IF UNIQUE VIOLATION THEN > /* Someone else inserted between the SELECT and our INSERT */ > ROLLBACK TO SAVEPOINT; > UPDATE; > ELSE > RELEASE SAVEPOINT; > FI > ELSE > UPDATE; > FI Isn't there still a race between INSERT and UPDATE? Low probability, for sure, as it would have had to not exist, then exist, then not exist, but still possible. I'd like a REPLACE that could be safe, or at least cause a COMMIT to fail, for this reason. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Fri, 2005-11-11 at 18:36 -0500, mark@mark.mielke.cc wrote: > On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote: > > So? That is what save points are for. You can even skip the select for > > update if you don't mind dead tuples from the attempted insert. > > SELECT ... FOR UPDATE; > > IF not exists THEN > > SAVEPOINT; > > INSERT ; > > IF UNIQUE VIOLATION THEN > > /* Someone else inserted between the SELECT and our INSERT */ > > ROLLBACK TO SAVEPOINT; > > UPDATE; > > ELSE > > RELEASE SAVEPOINT; > > FI > > ELSE > > UPDATE; > > FI > > Isn't there still a race between INSERT and UPDATE? I suppose there is although I hadn't noticed before. I've never run into it and always check to ensure the expected number of tuples were touched by the update or delete. Within the PostgreSQL backend you might get away with having your insert hold a lock on the index page and follow it up with a FOR UPDATE lock on the offending tuple thus ensuring that your update will succeed. If you hack index mechanisms for the support you don't need the SAVEPOINT either -- just don't throw an error when you run across the existing entry. For client side code one possibility is to repeat until successful. WHILESELECT FOR UPDATE;IF NOT EXISTS THEN SAVEPOINT INSERT; IF UNIQUE VIOLATION THEN ROLLBACK TO SAVEPOINT; ELSE RELEASE SAVEPOINT EXIT; FIELSE UPDATE; EXIT;END -- Check for infinite loop END --
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> It seems to me that it has always been implicitly assumed around here >> that the MERGE command would be a substitute for a MySQL-like REPLACE >> functionality. After rereading the spec it seems that this is not the >> case. MERGE always operates on two different tables, which REPLACE >> doesn't do. > > Normally I'd plump for following the standard ... but AFAIR, we have had > bucketloads of requests for REPLACE functionality, and not one request > for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a > whole lot harder and slower than REPLACE, it seems that we could do > worse than to concentrate on doing REPLACE for now. (We can always come > back to MERGE some other day.) I would also like to add that MySQL's REPLACE is not exactly an INSERT OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the fields not specified in the query are set to their defaults: i.e. CREATE TABLE t (a int PRIMARY KEY, b int, c int); INSERT INTO t (a, b, c) VALUES (1, 1, 2); SELECT * FROM t; +---+------+------+ | a | b | c | +---+------+------+ | 1 | 1 | 2 | +---+------+------+ REPLACE INTO t (a, b) VALUES (1, 1); SELECT * FROM t; +---+------+------+ | a | b | c | +---+------+------+ | 1 | 1 | NULL | +---+------+------+ I wanted to point it out this because people are commonly mistaking this. Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com
On Saturday 12 November 2005 04:06, Matteo Beccati wrote: > Tom Lane wrote: > > Peter Eisentraut <peter_e@gmx.net> writes: > >> It seems to me that it has always been implicitly assumed around here > >> that the MERGE command would be a substitute for a MySQL-like REPLACE > >> functionality. After rereading the spec it seems that this is not the > >> case. MERGE always operates on two different tables, which REPLACE > >> doesn't do. > > > > Normally I'd plump for following the standard ... but AFAIR, we have had > > bucketloads of requests for REPLACE functionality, and not one request > > for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a > > whole lot harder and slower than REPLACE, it seems that we could do > > worse than to concentrate on doing REPLACE for now. (We can always come > > back to MERGE some other day.) > > I would also like to add that MySQL's REPLACE is not exactly an INSERT > OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the > fields not specified in the query are set to their defaults: > > i.e. > > CREATE TABLE t (a int PRIMARY KEY, b int, c int); > > INSERT INTO t (a, b, c) VALUES (1, 1, 2); > > SELECT * FROM t; > +---+------+------+ > > | a | b | c | > > +---+------+------+ > > | 1 | 1 | 2 | > > +---+------+------+ > > REPLACE INTO t (a, b) VALUES (1, 1); > > SELECT * FROM t; > +---+------+------+ > > | a | b | c | > > +---+------+------+ > > | 1 | 1 | NULL | > > +---+------+------+ > > > I wanted to point it out this because people are commonly mistaking this. > > Wow, that seems ugly.... maybe there's a reason for it, but I'm not sure we could deviate from my$ql's behavior on this even if we wanted... they are the "standard" here. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On 11/13/05, Robert Treat <xzilla@users.sourceforge.net> wrote: > On Saturday 12 November 2005 04:06, Matteo Beccati wrote: > > | 1 | 1 | NULL | > Wow, that seems ugly.... maybe there's a reason for it, but I'm not sure we > could deviate from my$ql's behavior on this even if we wanted... they are the > "standard" here. I don't think that's ugly, I think that's exactly working as advertised. Replace behaves exactly like deleting the record with the matching primary key and inserting the provided input. ... not merging together old data with new.
On Sunday 13 November 2005 10:01, Gregory Maxwell wrote: > On 11/13/05, Robert Treat <xzilla@users.sourceforge.net> wrote: > > On Saturday 12 November 2005 04:06, Matteo Beccati wrote: > > > | 1 | 1 | NULL | > > > > Wow, that seems ugly.... maybe there's a reason for it, but I'm not sure > > we could deviate from my$ql's behavior on this even if we wanted... they > > are the "standard" here. > > I don't think that's ugly, I think that's exactly working as > advertised. Replace behaves exactly like deleting the record with the > matching primary key and inserting the provided input. ... not merging > together old data with new. I disagree in that REPLACE is advertised as a solution for the INSERT else UPDATE problem, but has a different behavior than a true INSERT else UPDATE would produce. Maybe that's a problem with the implementation, or maybe it's a problem in the advertisment, but there is certainly a discrepency there. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > I disagree in that REPLACE is advertised as a solution for the INSERT else > UPDATE problem, but has a different behavior than a true INSERT else UPDATE > would produce. Maybe that's a problem with the implementation, or maybe > it's a problem in the advertisment, but there is certainly a discrepency > there. Yeah. REPLACE fails to solve common examples like a web hit counter ("if key doesn't exist, insert row with count 1; if it does exist, add 1 to the current count"). IIRC, SQL's MERGE deals with this by offering two quite separate specifications of what to do when there is or isn't already a matching row. I don't necessarily feel that we have to slavishly duplicate what MySQL offers. I do think that it's reasonable to restrict the functionality to updating/replacing a row with matching primary key --- that gets us out of the problem of needing a full predicate-locking mechanism, while still covering most all of the practical use-cases that I can see. It'd be useful to look at what comparable functionality is offered by other DBs besides MySQL. Anyone know what DB2 or Oracle have in this area? regards, tom lane
Tom Lane wrote: > > It'd be useful to look at what comparable functionality is offered by > other DBs besides MySQL. Anyone know what DB2 or Oracle have in this > area? > IIRC they both have MERGE. -- Regards Petr Jelinek (PJMODOS)
Petr Jelinek wrote: > Tom Lane wrote: > >> >> It'd be useful to look at what comparable functionality is offered by >> other DBs besides MySQL. Anyone know what DB2 or Oracle have in this >> area? > Oracle: http://www.psoug.org/reference/merge.html http://www.psoug.org/reference/translate_replace.html But the oracle replace seems completely different to the topic at hand. DB2: Merge: http://www.databasejournal.com/features/db2/article.php/10896_3322041_2 I was not able to easily find information on REPLACE. Joshua D. Drake -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Tom Lane wrote: > IIRC, SQL's MERGE deals with this by offering two quite separate > specifications of what to do when there is or isn't already a > matching row. In that regard, MERGE is quite flexible, but MERGE doesn't address the point of REPLACE, because MERGE requires *two* tables as input, whereas REPLACE only takes *one*. Unless someone can show that you can trick MERGE into doing the REPLACE job anyway, we're not discussing the same thing. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > > In that regard, MERGE is quite flexible, but MERGE doesn't address the > point of REPLACE, because MERGE requires *two* tables as input, whereas > REPLACE only takes *one*. Unless someone can show that you can trick > MERGE into doing the REPLACE job anyway, we're not discussing the same > thing. > I am really not db expert and I don't have copy of sql standard but you don't need to use 2 tables I think - USING part can also be subquery (some SELECT) and if I am right then you could simulate what REPLACE does because in PostgreSQL you are not forced to specify FROM clause in SELECT. So you could in theory do MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... But I am not sure if this is what you want. -- Regards Petr Jelinek (PJMODOS)
On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote: > I am really not db expert and I don't have copy of sql standard but you > don't need to use 2 tables I think - USING part can also be subquery > (some SELECT) and if I am right then you could simulate what REPLACE > does because in PostgreSQL you are not forced to specify FROM clause in > SELECT. So you could in theory do > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... > But I am not sure if this is what you want. Well, the obvious extension to this is that the extire USING clause is in fact optional: MERGE INTO tablename ON id = 1 ... Which starts looking a lot simpler. BTW, my reading of the MERGE examples given earlier is that there no notes in there at all about guarenteeing concurrency. None of the documentation says that using MERGE will avoid duplicate key errors if someone else does the same thing concurrently. It seems more like a performence hack to avoid scanning the table twice. Basically, you could implement this by taking the USING clause, do a left outer join with the merge table and for the blank rows fill in a CTID for insert and instead of NULLs the values of the INSERT portion. Which is kind of a bummer for the people who want to do the "insert zero if not there else add 1" thing a lot and expecting this to solve the concurrency for them. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On 11/13/05, Petr Jelinek wrote: > > I am really not db expert and I don't have copy of sql standard but you > don't need to use 2 tables I think - USING part can also be subquery > (some SELECT) and if I am right then you could simulate what REPLACE > does because in PostgreSQL you are not forced to specify FROM clause in > SELECT. So you could in theory do > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... I think the MySQL statement: REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '0000-00-00', NULL, 3) would translate into the following MERGE statement: MERGE INTO table target USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source ON target.pknew = source.pk WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = col3new WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew, col1new, col2new, col3new) It might not be the most elegant solution, but I don't see why it won't work. Jochem
I think you translated it correctly, MySQL has another way of specifying this which is "INSERT ... ON DUPLICATE KEY UPDATE ..." (http://dev.mysql.com/doc/refman/5.0/en/insert.html) Regards Paolo Jochem van Dieten <jochemd@gmail.com> ha scritto > On 11/13/05, Petr Jelinek wrote: > > > > I am really not db expert and I don't have copy of sql standard but you > > don't need to use 2 tables I think - USING part can also be subquery > > (some SELECT) and if I am right then you could simulate what REPLACE > > does because in PostgreSQL you are not forced to specify FROM clause in > > SELECT. So you could in theory do > > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... > I think the MySQL statement: > REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '0000-00-00', NULL, 3) > would translate into the following MERGE statement: > MERGE INTO table target > USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source > ON target.pknew = source.pk > WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = col3new > WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew, > col1new, col2new, col3new) > It might not be the most elegant solution, but I don't see why it won't work. > Jochem > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > http://archives.postgresql.org >
See 'merge_db' in http://lnk.nu/postgresql.org/5sl.html On Fri, Nov 11, 2005 at 10:07:07PM -0500, Rod Taylor wrote: > On Fri, 2005-11-11 at 18:36 -0500, mark@mark.mielke.cc wrote: > > On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote: > > > So? That is what save points are for. You can even skip the select for > > > update if you don't mind dead tuples from the attempted insert. > > > SELECT ... FOR UPDATE; > > > IF not exists THEN > > > SAVEPOINT; > > > INSERT ; > > > IF UNIQUE VIOLATION THEN > > > /* Someone else inserted between the SELECT and our INSERT */ > > > ROLLBACK TO SAVEPOINT; > > > UPDATE; > > > ELSE > > > RELEASE SAVEPOINT; > > > FI > > > ELSE > > > UPDATE; > > > FI > > > > Isn't there still a race between INSERT and UPDATE? > > I suppose there is although I hadn't noticed before. I've never run into > it and always check to ensure the expected number of tuples were touched > by the update or delete. > > Within the PostgreSQL backend you might get away with having your insert > hold a lock on the index page and follow it up with a FOR UPDATE lock on > the offending tuple thus ensuring that your update will succeed. If you > hack index mechanisms for the support you don't need the SAVEPOINT > either -- just don't throw an error when you run across the existing > entry. > > For client side code one possibility is to repeat until successful. > > WHILE > SELECT FOR UPDATE; > IF NOT EXISTS THEN > SAVEPOINT > INSERT; > IF UNIQUE VIOLATION THEN > ROLLBACK TO SAVEPOINT; > ELSE > RELEASE SAVEPOINT > EXIT; > FI > ELSE > UPDATE; > EXIT; > END > > -- Check for infinite loop > END > > -- > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, Nov 11, 2005 at 03:42:38PM -0500, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > It seems to me that it has always been implicitly assumed around here > > that the MERGE command would be a substitute for a MySQL-like REPLACE > > functionality. After rereading the spec it seems that this is not the > > case. MERGE always operates on two different tables, which REPLACE > > doesn't do. > > Normally I'd plump for following the standard ... but AFAIR, we have had > bucketloads of requests for REPLACE functionality, and not one request > for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a > whole lot harder and slower than REPLACE, it seems that we could do > worse than to concentrate on doing REPLACE for now. (We can always come > back to MERGE some other day.) I suspect a lot of those requests are from people who actually want merge and don't realize that mysql has a replace. On another note, is there any reason we can't put an equivalent to example 36-1 (http://lnk.nu/postgresql.org/617.html) into the backend? Presumably it wouldn't be as fast as a more elegant solution, but OTOH it'd probably be faster than plpgsql... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sun, 2005-11-13 at 23:56 +0100, Martijn van Oosterhout wrote: > On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote: > > I am really not db expert and I don't have copy of sql standard but you > > don't need to use 2 tables I think - USING part can also be subquery > > (some SELECT) and if I am right then you could simulate what REPLACE > > does because in PostgreSQL you are not forced to specify FROM clause in > > SELECT. So you could in theory do > > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... > > But I am not sure if this is what you want. > > Well, the obvious extension to this is that the extire USING clause is > in fact optional: > > MERGE INTO tablename ON id = 1 ... > > Which starts looking a lot simpler. > ...choosing a place to jump in is a little hard in this thread...so I'll jump in with some general info and thoughts on topics so far: MERGE is useful both for OLTP systems and for Data Warehousing, where it is sometimes known as the UPSERT. The MERGE statement in SQL:2003 requires a target table and a table statement. I don't see anything in that to always require two separate tables - this is just the same as a self-referencing INSERT SELECT statement. The USING clause is also a compulsory part of SQL:2003. One of the more interesting ways to use MERGE is with Oracle external tables. The same idea for us would be to have MERGE become a variant of the PostgreSQL COPY FROM command. That would be very cool. The above is the reason why MERGE doesn't seem to provide for external data being passed, as does INSERT or MySQL REPLACE. Neither DB2 or Oracle perform predicate locking. DB2 is more cautious, and some would say more efficient, thats all. PostgreSQL's locking features are just fine for pragmatic implementation of MERGE, AFAICS. Where there is doubt, we should fall back to table locking just like the rest of the world, IMHO. Making this work with partitioning will be hard enough without overusing the predicate solving logic. The UPSERT concept is also supported by Teradata, who simply append an ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems to me to be a fairly small subset of MERGE functionality and we ought to be able to offer that functionality as a side branch of the main work. I've been looking at ways of doing INSERT then UPDATE, but it doesn't seem very easy to avoid unique index violations in that case. So doing the UPDATE first then INSERTs later seems like the way to go. Best Regards, Simon Riggs
Simon, > The UPSERT concept is also supported by Teradata, who simply append an > ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems > to me to be a fairly small subset of MERGE functionality and we ought to > be able to offer that functionality as a side branch of the main work. Yes, I guess my hesitation on the full-table-lock strategy is that it doesn't really fulfill the mandate for why people want REPLACE-like statements ... to give them an INSERT-or-UPDATE with *higher* efficiency and concurrency than doing two statements. That being said, I've personally designed more than a dozen web applications and have not yet been faced with a single circumstance of not knowing whether I wanted to INSERT or UPDATE. I've even ported MySQL apps and found it easy to re-code them to do "if $id = 0, then insert ..." without even needing to use a pl/pgsql hack. So we thus have two seperate use cases. The first, for bulk loading/ETL is what MERGE fulfills rather neatly and for that full table locking is perfectly OK, even desirable. You really don't want to MERGE-load the same table on two threads at once. The second case is for applications coded for MySQL; this is the REPLACE case. However, the most common MySQL applications doing this use full table locking (MyISAM) anyway! So, while full table locking wouldn't gain them any performance over using two statements, it shouldn't lose them anything they're used to having. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Tue, Nov 15, 2005 at 10:27:10AM -0800, Josh Berkus wrote: > Yes, I guess my hesitation on the full-table-lock strategy is that it > doesn't really fulfill the mandate for why people want REPLACE-like > statements ... to give them an INSERT-or-UPDATE with *higher* efficiency > and concurrency than doing two statements. That being said, I've > personally designed more than a dozen web applications and have not yet > been faced with a single circumstance of not knowing whether I wanted to > INSERT or UPDATE. I've even ported MySQL apps and found it easy to > re-code them to do "if $id = 0, then insert ..." without even needing to > use a pl/pgsql hack. > So we thus have two seperate use cases. The first, for bulk loading/ETL is > what MERGE fulfills rather neatly and for that full table locking is > perfectly OK, even desirable. You really don't want to MERGE-load the > same table on two threads at once. Yes. > The second case is for applications coded for MySQL; this is the REPLACE > case. However, the most common MySQL applications doing this use full > table locking (MyISAM) anyway! So, while full table locking wouldn't gain > them any performance over using two statements, it shouldn't lose them > anything they're used to having. The last two times I wished to use it: Objects with dynamic fields. The table rows are identified by the object key, a field key, that refer to a text value. I believe I still have a race in this case, as I allow INSERT, UPDATE, and DELETE. The DELETE means that an UPDATE, followed by an INSERT, may fail, if the DELETE occurs between the two. DELETE is fine as is, as it is a single operation, that won't fail. Combining UPDATE and INSERT into one, as per the MySQL REPLACE, would eliminate my race caused by the DELETE. If REPLACE locked the whole table, it would not be a candidate, as this table is frequently modified by many users at once. REPLACE using the primary key index to do locking would be great. Timestamping the last signin and access time (~30 seconds error allowed in the access time to eliminate a majority of the updates using memcached as we all know how much PostgreSQL hates this) for a table that doesn't have a 1:1 mapping between the fields associated with the user, and the row that records their last signin/access time. The signin/access time can be associated with a context, that is a part of the primary key. Currently, I rely on the code never deleting rows, and perform update first, and if the update does not change any rows, I fall back to insert. As it is, though, I ended up combining the signin and access time into the same row, to use one table instead of two, and I'm not sure that MySQL replace really gives me what I want in this situation. I have never used MERGE, so can't say whether this would do what I want. I would not want the solution to lock the entire table, as these updates, although throttled by the the ~30 seconds error allowed, and memcached queries, would be potentially performed by hundreds of users every 30 seconds. I can extend the error factor to 60 seconds, but that only cuts the queries in half. In general, however, the current model does work fine, and doesn't require replacement. UPDATE will be used most of the time, and be the correct operation. Unfortunately, there is a race that I see here. If both UPDATE operations fail at the same time, then one of the INSERTS will fail. I don't handle this currently, but perhaps I could argue myself into considering this an ignorable failure. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Normally I'd plump for following the standard ... but AFAIR, we have had > bucketloads of requests for REPLACE functionality, and not one request > for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a > whole lot harder and slower than REPLACE, it seems that we could do > worse than to concentrate on doing REPLACE for now. (We can always come > back to MERGE some other day.) Not to be too much of a pain, but I asked for full-spec MERGE a while back... :) I don't think I was the only one asking for full-spec MERGE in the "What features would you like to see in Postgres?" thread a while ago either, though I could be wrong. I'd like to see MySQL-like 'replace' too, of course. :) Thanks, Stephen
On 11/15/05, Josh Berkus <josh@agliodbs.com> wrote: > Simon, > > > The UPSERT concept is also supported by Teradata, who simply append an > > ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems > > to me to be a fairly small subset of MERGE functionality and we ought to > > be able to offer that functionality as a side branch of the main work. > > Yes, I guess my hesitation on the full-table-lock strategy is that it > doesn't really fulfill the mandate for why people want REPLACE-like > statements ... to give them an INSERT-or-UPDATE with *higher* efficiency > and concurrency than doing two statements. That being said, I've > personally designed more than a dozen web applications and have not yet > been faced with a single circumstance of not knowing whether I wanted to > INSERT or UPDATE. I've even ported MySQL apps and found it easy to > re-code them to do "if $id = 0, then insert ..." without even needing to > use a pl/pgsql hack. > Actually REPLACE is not INSERT or UPDATE... REPLACE means INSERT if already exists DELETE then INSERT can be used as an UPDATE if you use the SET clause but, it is optional -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
On Tue, 2005-11-15 at 10:27 -0800, Josh Berkus wrote: > > The UPSERT concept is also supported by Teradata, who simply append an > > ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems > > to me to be a fairly small subset of MERGE functionality and we ought to > > be able to offer that functionality as a side branch of the main work. > > Yes, I guess my hesitation on the full-table-lock strategy is that it > doesn't really fulfill the mandate for why people want REPLACE-like > statements ... to give them an INSERT-or-UPDATE with *higher* efficiency > and concurrency than doing two statements. Agreed... OK, what I said was "...Where there is doubt, we should fall back to table locking just like the rest of the world, IMHO." I didn't mean we should use full table locking all of the time. Sorry if I wasn't clear. What I meant, in context was, IMHO - we do no need predicate locking - we should use row level locks when these can be used - we should use table level locks other times On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote: > Scenario: > > session1: REPLACE .... 1 > session2: REPLACE ..... 1 > session1: check to see that "1" exists .... no > session2: check to see that "1" exists .... no > session1: INSERT 1 > session2: INSERT 1 .... ERROR > > Get the picture? The only way to avoid a race condition is to be able to > do "predicate locking", that is to lock the table against any data write > matching that predicate. The above situation happens now if you have two people doing SELECT then INSERT on the same row. Or even if you do UPDATE then INSERT. Currently if two people INSERT a row with a duplicate PK, we may find that both INSERTs made it into a data block, but one has its txn aborted because of a unique index violation. Nobody complains about that, nor says we should have predicate locking for that case, so why worry about it for MERGE? I don't see we need a special effort to avoid: if you try to do the same thing more than once, only the first one will work. At least you get a nice error message to let you know its happened. Seems like we - run the query in the USING clause - join it to the target table using pseudo outer join logic - if it matches we UPDATE - if it doesn't we INSERT ...but we do the last two in a single step, to minimise the window of opportunity for wierd situations. > That being said, I've > personally designed more than a dozen web applications and have not yet > been faced with a single circumstance of not knowing whether I wanted to > INSERT or UPDATE. I've even ported MySQL apps and found it easy to > re-code them to do "if $id = 0, then insert ..." without even needing to > use a pl/pgsql hack. OK, but not everybody knows what they're doing as well as you do. :-) > So we thus have two seperate use cases. The first, for bulk loading/ETL is > what MERGE fulfills rather neatly and for that full table locking is > perfectly OK, even desirable. You really don't want to MERGE-load the > same table on two threads at once. Probably true, but I do want to avoid full table locking for MERGE whenever possible. Concurrency is important even in data warehousing. Best Regards, Simon Riggs
Josh Berkus wrote: > Jaime, > > > why? seems that REPLACE only work if there are at least one row > > matching... > > Scenario: > > session1: REPLACE .... 1 > session2: REPLACE ..... 1 > session1: check to see that "1" exists .... no > session2: check to see that "1" exists .... no > session1: INSERT 1 > session2: INSERT 1 .... ERROR > > Get the picture? The only way to avoid a race condition is to be able to > do "predicate locking", that is to lock the table against any data write > matching that predicate. Isn't the standard practice to do the INSERT, and if that fails, do the UPDATE? In fact, if someone is already doing a INSERT inside a transaction, it will wait for the transaction to complete so you can then do the UPDATE, or complete the INSERT if the transaction aborted. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > But even REPLACE requires predicate locking. There's no real way to get > > around it. > > The point though is that REPLACE is restricted to a type of predicate > narrow enough to be enforced through a unique-index mechanism, and so > it's implementable without solving the general case of predicate > locking. > > Predicate locking for narrow cases isn't very hard; it's the general > case of arbitrary predicates that's hard. My feeling is we should implement MERGE for the limited cases we can, and throw an error for cases we can not (or require table locking), and then see what reports we get from users. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Simon Riggs wrote: > I've been looking at ways of doing INSERT then UPDATE, but it doesn't > seem very easy to avoid unique index violations in that case. So doing > the UPDATE first then INSERTs later seems like the way to go. INSERT has to be first to avoid a race condition (see my previous emails). We just need a way to get a unique index violation and continue with the UPDATE. We have savepoints now so it certainly seems possible. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Josh Berkus wrote: > So we thus have two seperate use cases. The first, for bulk loading/ETL is > what MERGE fulfills rather neatly and for that full table locking is > perfectly OK, even desirable. You really don't want to MERGE-load the > same table on two threads at once. > > The second case is for applications coded for MySQL; this is the REPLACE > case. However, the most common MySQL applications doing this use full > table locking (MyISAM) anyway! So, while full table locking wouldn't gain > them any performance over using two statements, it shouldn't lose them > anything they're used to having. For any kind of efficiency, I assume MySQL REPLACE wants a unique index in place, so practially everyone doing merge probably already has the setup we need to avoid new non-index predicate locking code. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Just to summarize, MySQL REPLACE is INSERT or DELETE/INSERT, while they have a SET clauses that allows UPDATE, and INSERT has a ON DUPLICATE KEY UPDATE clause too. I think the INSERT ... ON DUPLICATE KEY is undesirable because this functionality should have a new keyword in the first position, e.g MERGE, and I think the REPLACE is out because most people feel that the DELETE/INSERT functionality is near-useless if we can give users the INSERT/UPDATE functionality of MERGE. I think even if we have to restrict MERGE to requiring a unique index, it is better to go that way than to drag REPLACE into our syntax. MERGE can be extended over time, while REPLACE has a non-optimal initial behavior. The idea that MERGE can use a constant list (not requiring a second table) makes it a valid replacement for REPLACE, and other database support for MERGE reinforces this. --------------------------------------------------------------------------- pmagnoli@systemevolution.it wrote: > I think you translated it correctly, MySQL has another way of specifying this > which is "INSERT ... ON DUPLICATE KEY UPDATE ..." > (http://dev.mysql.com/doc/refman/5.0/en/insert.html) > Regards > > Paolo > > Jochem van Dieten <jochemd@gmail.com> ha scritto > > > On 11/13/05, Petr Jelinek wrote: > > > > > > I am really not db expert and I don't have copy of sql standard but you > > > don't need to use 2 tables I think - USING part can also be subquery > > > (some SELECT) and if I am right then you could simulate what REPLACE > > > does because in PostgreSQL you are not forced to specify FROM clause in > > > SELECT. So you could in theory do > > > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) > ... > > > I think the MySQL statement: > > REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '0000-00-00', NULL, > 3) > > > would translate into the following MERGE statement: > > MERGE INTO table target > > USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source > > ON target.pknew = source.pk > > WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = > col3new > > WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew, > > col1new, col2new, col3new) > > > It might not be the most elegant solution, but I don't see why it won't > work. > > > Jochem > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > http://archives.postgresql.org > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, Nov 15, 2005 at 07:16:21PM -0500, Bruce Momjian wrote: > Tom Lane wrote: > > Josh Berkus <josh@agliodbs.com> writes: > > > But even REPLACE requires predicate locking. There's no real way to get > > > around it. > > > > The point though is that REPLACE is restricted to a type of predicate > > narrow enough to be enforced through a unique-index mechanism, and so > > it's implementable without solving the general case of predicate > > locking. > > > > Predicate locking for narrow cases isn't very hard; it's the general > > case of arbitrary predicates that's hard. > > My feeling is we should implement MERGE for the limited cases we can, > and throw an error for cases we can not (or require table locking), and > then see what reports we get from users. We should probably throw a notice or warning if we go to a table lock, too. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> We should probably throw a notice or warning if we go to a table lock, > too. That's not very useful, because you can only do somethign about it AFTER the 1 hour exclusive lock merge has already run :) Chris
On Wed, Nov 16, 2005 at 01:27:29PM +0800, Christopher Kings-Lynne wrote: > >We should probably throw a notice or warning if we go to a table lock, > >too. > > That's not very useful, because you can only do somethign about it AFTER > the 1 hour exclusive lock merge has already run :) Not true; it would be useful for development when you'd like to know that some statement is grabbing a table lock. This is something that you wouldn't normally notice in a dev environment, and it sounds like it'd be easy to do a merge that has the unintended effect of grabbing a table lock. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >> We should probably throw a notice or warning if we go to a table lock, >> too. > That's not very useful, because you can only do somethign about it AFTER > the 1 hour exclusive lock merge has already run :) We shouldn't do anything remotely like that. A statement whose locking effects can't be predicted on sight is horrid both from the user's viewpoint and from the implementation viewpoint. In particular, if we have to do planning before we can determine whether the table needs just a SELECT lock or something stronger, then we have to take a weak lock to do the planning and then we are faced with upgrading to the stronger lock at runtime. Can you say "deadlock risk"? I think we should do REPLACE-like functionality that simply fails if the match condition isn't equality on a primary key. If we can use SQL-spec MERGE syntax for this, that's fine, but let's not think in terms of silently changing to a stronger table lock and a much slower implementation when the condition isn't a primary key. That's a whole lot of work that isn't solving any real-world problems, and *is* creating a foot-gun for people to cause themselves performance and deadlock problems anytime they spell the WHERE condition slightly wrong. regards, tom lane
I agree. I would never ever ever want it to silently start doing table locks. I would simply avoid using merge at all if that was a possibility. However it seems like the idea is to eventually flesh out full fledged merge. And to do that it sounds like you would need to do one of the following: 1) implement predicate locking beyond the simple "match on unique index" case that we have here 2) do full table locks. It sounds like #1 isn't going to happen for a while. So in order to do more complicated merges you will need to do #2. If you are going to implement more complicated merge functionality I certainly wouldn't want it throwing a warning telling me about a table lock if I had already knew it would get the table lock and decided I wanted to go ahead with using merge anyway. Could you let the user create the lock himself to handle this situation? For instance: analyze the merge if merge condition matches unique indexmerge without table locking elseif needed table lock already existsmerge elsethrow an error You could also just add something to the merge syntax like ALLOW TABLE LOCK or something. The idea is just that the user can explicitly allow the table lock and thus the more complicated merge. I don't really know anything about the implementation details but that is the behavior that I would prefer. That way I could always do a complicated merge if I wanted to but there is no way it would ever do an implicit table lock on me. And it would never throw an error/ warning unless I actually did something questionable. Does that make sense. Rick Gigger On Nov 16, 2005, at 7:49 AM, Tom Lane wrote: > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >>> We should probably throw a notice or warning if we go to a table >>> lock, >>> too. > >> That's not very useful, because you can only do somethign about it >> AFTER >> the 1 hour exclusive lock merge has already run :) > > We shouldn't do anything remotely like that. A statement whose > locking > effects can't be predicted on sight is horrid both from the user's > viewpoint and from the implementation viewpoint. In particular, if we > have to do planning before we can determine whether the table needs > just > a SELECT lock or something stronger, then we have to take a weak > lock to > do the planning and then we are faced with upgrading to the stronger > lock at runtime. Can you say "deadlock risk"? > > I think we should do REPLACE-like functionality that simply fails > if the > match condition isn't equality on a primary key. If we can use SQL- > spec > MERGE syntax for this, that's fine, but let's not think in terms of > silently changing to a stronger table lock and a much slower > implementation when the condition isn't a primary key. That's a whole > lot of work that isn't solving any real-world problems, and *is* > creating a foot-gun for people to cause themselves performance and > deadlock problems anytime they spell the WHERE condition slightly > wrong. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly >
> You could also just add something to the merge syntax like ALLOW > TABLE LOCK or something. The idea is just that the user can > explicitly allow the table lock and thus the more complicated merge. > The problem here is that many people will see that option and think it's safe to do it... i mean, many people will shoot themselves in the foot and the culprit will be PostgreSQL because he let a ready to shoot gun in a visible place when are kids around -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. --------------------------------------------------------------------------- Rick Gigger wrote: > I agree. I would never ever ever want it to silently start doing > table locks. I would simply avoid using merge at all if that was a > possibility. > > However it seems like the idea is to eventually flesh out full > fledged merge. And to do that it sounds like you would need to do > one of the following: > > 1) implement predicate locking beyond the simple "match on unique > index" case that we have here > 2) do full table locks. > > It sounds like #1 isn't going to happen for a while. So in order to > do more complicated merges you will need to do #2. If you are going > to implement more complicated merge functionality I certainly > wouldn't want it throwing a warning telling me about a table lock if > I had already knew it would get the table lock and decided I wanted > to go ahead with using merge anyway. > > Could you let the user create the lock himself to handle this > situation? For instance: > > analyze the merge > if merge condition matches unique index > merge without table locking > elseif needed table lock already exists > merge > else > throw an error > > You could also just add something to the merge syntax like ALLOW > TABLE LOCK or something. The idea is just that the user can > explicitly allow the table lock and thus the more complicated merge. > > I don't really know anything about the implementation details but > that is the behavior that I would prefer. That way I could always do > a complicated merge if I wanted to but there is no way it would ever > do an implicit table lock on me. And it would never throw an error/ > warning unless I actually did something questionable. > > Does that make sense. > > Rick Gigger > > On Nov 16, 2005, at 7:49 AM, Tom Lane wrote: > > > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > >>> We should probably throw a notice or warning if we go to a table > >>> lock, > >>> too. > > > >> That's not very useful, because you can only do somethign about it > >> AFTER > >> the 1 hour exclusive lock merge has already run :) > > > > We shouldn't do anything remotely like that. A statement whose > > locking > > effects can't be predicted on sight is horrid both from the user's > > viewpoint and from the implementation viewpoint. In particular, if we > > have to do planning before we can determine whether the table needs > > just > > a SELECT lock or something stronger, then we have to take a weak > > lock to > > do the planning and then we are faced with upgrading to the stronger > > lock at runtime. Can you say "deadlock risk"? > > > > I think we should do REPLACE-like functionality that simply fails > > if the > > match condition isn't equality on a primary key. If we can use SQL- > > spec > > MERGE syntax for this, that's fine, but let's not think in terms of > > silently changing to a stronger table lock and a much slower > > implementation when the condition isn't a primary key. That's a whole > > lot of work that isn't solving any real-world problems, and *is* > > creating a foot-gun for people to cause themselves performance and > > deadlock problems anytime they spell the WHERE condition slightly > > wrong. > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that > > your > > message can get through to the mailing list cleanly > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On 11/16/05, Bruce Momjian <pgman@candle.pha.pa.us> wrote: > > Interesting approach. Actually, we could tell the user they have to use > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't > already have a table lock. > If the lock will be required, what's the problem in doing it internally? without user interaction? -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Jaime Casanova <systemguards@gmail.com> writes: > If the lock will be required, what's the problem in doing it > internally? I already explained that: lock upgrading is deadlock-prone. regards, tom lane
Jaime Casanova wrote: > On 11/16/05, Bruce Momjian <pgman@candle.pha.pa.us> wrote: > > > > Interesting approach. Actually, we could tell the user they have to use > > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't > > already have a table lock. > > > > If the lock will be required, what's the problem in doing it > internally? without user interaction? Because first, we are making it visible to the user, and second the lock is taken out _before_ we actually execute the statement, meaning we don't have to escalate our locks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: > > Interesting approach. Actually, we could tell the user they have to use > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't > already have a table lock. The bit I'm still missing is why there needs to be a lock at all. The SQL standard doesn't say anywhere that concurrent MERGE operations can't conflict. It seems to me that standard visibility rules apply. If neither MERGE statement can see the results of the other, then they will both INSERT. If you don't have a UNIQUE constraint to prevent this then what's the problem? It seems to me people would like, in the case of an existing UNIQUE constraint, to be able to use it to prevent "duplicate key" errors. This is nice, but the standard doesn't require that either. In other words, if we can use an index to avoid duplicate key errors, fine. But if there is no index available, it is not an error to do an INSERT because another INSERT was hidden from you. Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: > > > > Interesting approach. Actually, we could tell the user they have to use > > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't > > already have a table lock. > > The bit I'm still missing is why there needs to be a lock at all. The > SQL standard doesn't say anywhere that concurrent MERGE operations > can't conflict. It seems to me that standard visibility rules apply. If > neither MERGE statement can see the results of the other, then they > will both INSERT. If you don't have a UNIQUE constraint to prevent this > then what's the problem? I assume they want MERGE because they don't want duplicates. If they don't care, they would have used INSERT. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote: > On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: > > > > Interesting approach. Actually, we could tell the user they have to use > > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't > > already have a table lock. > > The bit I'm still missing is why there needs to be a lock at all. The > SQL standard doesn't say anywhere that concurrent MERGE operations > can't conflict. It seems to me that standard visibility rules apply. If > neither MERGE statement can see the results of the other, then they > will both INSERT. If you don't have a UNIQUE constraint to prevent this > then what's the problem? > > It seems to me people would like, in the case of an existing UNIQUE > constraint, to be able to use it to prevent "duplicate key" errors. > This is nice, but the standard doesn't require that either. > > In other words, if we can use an index to avoid duplicate key errors, > fine. But if there is no index available, it is not an error to do an > INSERT because another INSERT was hidden from you. > > Conceptually, a MERGE statement is just a long string of INSERTs and > UPDATEs in the same transaction and I think we should treat it as > such. Agreed. Best Regards, Simon Riggs
On Wed, Nov 16, 2005 at 12:59:36PM -0500, Bruce Momjian wrote: > Martijn van Oosterhout wrote: > > The bit I'm still missing is why there needs to be a lock at all. The > > SQL standard doesn't say anywhere that concurrent MERGE operations > > can't conflict. It seems to me that standard visibility rules apply. If > > neither MERGE statement can see the results of the other, then they > > will both INSERT. If you don't have a UNIQUE constraint to prevent this > > then what's the problem? > > I assume they want MERGE because they don't want duplicates. If they > don't care, they would have used INSERT. The whole point of MERGE was because you can do different things on INSERT or UPDATE. If you didn't care about the UPDATE case you could indeed do just inserts... I just don't think we should tie ourselves in knots over details that the spec doesn't require. If the spec doesn't require "predicate locking" then why on earth should we provide it if it's so difficult? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Simon Riggs > Sent: Wednesday, November 16, 2005 10:35 AM > To: Martijn van Oosterhout > Cc: Bruce Momjian; Rick Gigger; Tom Lane; Christopher Kings-Lynne; Jim C. > Nasby; josh@agliodbs.com; pgsql-hackers@postgresql.org; Jaime Casanova; > Peter Eisentraut > Subject: Re: [HACKERS] MERGE vs REPLACE > > On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote: > > On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: > > > > > > Interesting approach. Actually, we could tell the user they have to > use > > > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't > > > already have a table lock. > > > > The bit I'm still missing is why there needs to be a lock at all. The > > SQL standard doesn't say anywhere that concurrent MERGE operations > > can't conflict. It seems to me that standard visibility rules apply. If > > neither MERGE statement can see the results of the other, then they > > will both INSERT. If you don't have a UNIQUE constraint to prevent this > > then what's the problem? > > > > It seems to me people would like, in the case of an existing UNIQUE > > constraint, to be able to use it to prevent "duplicate key" errors. > > This is nice, but the standard doesn't require that either. > > > > In other words, if we can use an index to avoid duplicate key errors, > > fine. But if there is no index available, it is not an error to do an > > INSERT because another INSERT was hidden from you. > > > > Conceptually, a MERGE statement is just a long string of INSERTs and > > UPDATEs in the same transaction and I think we should treat it as > > such. Merge could also be considered as a long string of deletes and inserts. I guess that deleting those records that already exist and then inserting all of the records is faster because it could be done like a single join to perform the delete and then a single batch insert.
On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote: > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > > owner@postgresql.org] On Behalf Of Simon Riggs > > > Conceptually, a MERGE statement is just a long string of INSERTs and > > > UPDATEs in the same transaction and I think we should treat it as > > > such. > > Merge could also be considered as a long string of deletes and inserts. > I guess that deleting those records that already exist and then > inserting all of the records is faster because it could be done like a > single join to perform the delete and then a single batch insert. And for us it makes no difference because in MVCC, UPDATE == DELETE + INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs in the same statement. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
>> Conceptually, a MERGE statement is just a long string of INSERTs and >> UPDATEs in the same transaction and I think we should treat it as >> such. I've just got one question about this. Everyone seems to be saying that "try to insert and if that fails update" is the same as "try to insert and if that fails delete and then insert". What about the following scenario: mytable ------------ id serial primary key, a int4, b int4, data text I've got an id field on every table because it simplifies a lot of things (such as slony configuration for example) But I've also got a unique key on (a, b) and if I was to do a merge I would most likely do it in (a, b) not id. If merge does a delete insert then it creates new values for the id columns which could cause me problems. Basically any default fields are going to change or for that matter any fields not specified would be reinitialized whereas an update would leave them in place. It seems to me that "try to update and if that fails insert" seems to be the best approach for not messing with existing data. I guess "try to insert and if that fails update" gets you the same effect. - Rick Gigger
On Wed, Nov 16, 2005 at 09:49:28AM -0500, Tom Lane wrote: > I think we should do REPLACE-like functionality that simply fails if the > match condition isn't equality on a primary key. If we can use SQL-spec > MERGE syntax for this, that's fine, but let's not think in terms of > silently changing to a stronger table lock and a much slower > implementation when the condition isn't a primary key. That's a whole I agree, but would like to relax the primary key requirement to simply a unique index. I can see use cases for unique so long as not null keys, so it would be nice if the MERGE operation would work for these. As nulls are not "equal" anyway this doesn't seem to do too much violence to the semantics. -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
daveg <daveg@sonic.net> writes: > I agree, but would like to relax the primary key requirement to simply > a unique index. I can see use cases for unique so long as not null keys, > so it would be nice if the MERGE operation would work for these. As nulls > are not "equal" anyway this doesn't seem to do too much violence to the > semantics. But a "unique" key doesn't guarantee that there's only one matching row, so ISTM you're right back to needing a predicate lock if you do that. regards, tom lane
On Wed, Nov 16, 2005 at 04:51:07PM -0500, Tom Lane wrote: > daveg <daveg@sonic.net> writes: > > I agree, but would like to relax the primary key requirement to simply > > a unique index. I can see use cases for unique so long as not null keys, > > so it would be nice if the MERGE operation would work for these. As nulls > > are not "equal" anyway this doesn't seem to do too much violence to the > > semantics. > > But a "unique" key doesn't guarantee that there's only one matching row, > so ISTM you're right back to needing a predicate lock if you do that. But there is no need to guarentee anything. As the spec says, if the join of the table with the other clauses matches a row in the table more than once, raise a cardinality exception. If someone creates a join that matches more than once the whole statement fails. But you can work that out at runtime. If the user specifies NOT NULL in the join condition then it can work and there no reason to forbid that. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Well, from my point of view it is more than delete and insert. That I can do right now with existing infrastructure. The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize the count if it does not exist... this can't be done with current infrastructure without race conditions. Our current solution is to initialize all the possible counts beforehand, but that suboptimal as only a few of them will actually have data coming in later... And of course the import problem... we can have multiple concurrent imports, which must insert just once per some unique criteria, and update if the record is already there, and all this in a batch. This is also not possible without race conditions or aggressive locking. So for me the atomic, consistent and without performance penalties "update_or_insert_based_on_unique_criteria" does have a value, and that's coming exactly from the locking of the unique index which eliminates the race condition from this operation. I don't care about syntax sugar, just about things I could do more efficiently if this mechanism were in place... Cheers, Csaba. On Wed, 2005-11-16 at 20:33, Martijn van Oosterhout wrote: > On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote: > > > -----Original Message----- > > > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > > > owner@postgresql.org] On Behalf Of Simon Riggs > > > > Conceptually, a MERGE statement is just a long string of INSERTs and > > > > UPDATEs in the same transaction and I think we should treat it as > > > > such. > > > > Merge could also be considered as a long string of deletes and inserts. > > I guess that deleting those records that already exist and then > > inserting all of the records is faster because it could be done like a > > single join to perform the delete and then a single batch insert. > > And for us it makes no difference because in MVCC, UPDATE == DELETE + > INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs > in the same statement. > > Have a nice day,
> The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize > the count if it does not exist... this can't be done with current infrastructure without race conditions. The solution without merge but a unique key in other db's is: update if no rows updated insert if duplicate key update if no rows updated goto insert note, that the counter updates need to be of the form set x = x + ? where key=y do you see a potential race condition with this ? In pg you also need a savepoint before the insert for this to work. Depending on the ratio of insert vs update we also start with insert when the insert succeeds more that 50% (I would use a higher percentage with pg though): insert if duplicate key update if no rows updated goto insert Andreas
Yes, these algorithms are clear to me, but they don't work for batch updates in postgres without savepoints before each row insert/update, which is not good for performance (not to mention on older postgres versions without savepoint support it won't work at all). If there is a way of no race condition, no performance penalty, that would be something new and useful. I just guess the MERGE would provide that. Cheers, Csaba. On Thu, 2005-11-17 at 12:34, Zeugswetter Andreas DCP SD wrote: > > The problem I try to solve is something along: a bunch of clients try > to update a count, and ONE of them must initialize > the count if it > does not exist... this can't be done with current infrastructure without > race conditions. > > The solution without merge but a unique key in other db's is: > > update > if no rows updated > insert > if duplicate key > update > if no rows updated goto insert > > note, that the counter updates need to be of the form set x = x + ? > where key=y > do you see a potential race condition with this ? > In pg you also need a savepoint before the insert for this to work. > > Depending on the ratio of insert vs update we also start with insert > when > the insert succeeds more that 50% (I would use a higher percentage with > pg though): > > insert > if duplicate key > update > if no rows updated goto insert > > Andreas
On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote: > Yes, these algorithms are clear to me, but they don't work for batch > updates in postgres without savepoints before each row insert/update, > which is not good for performance (not to mention on older postgres > versions without savepoint support it won't work at all). If there is a > way of no race condition, no performance penalty, that would be > something new and useful. I just guess the MERGE would provide that. Well, then you guess wrong. This isn't what MERGE is for. MERGE is just a neat way of specifying the UPDATE and INSERT cases in the same statement. It doesn't remove the possibility duplicate inserts and thus primary key violations. If someone wants to make extensions to MERGE so that it can avoid the race condition and avoid the duplicate key violations, that's fine. But be aware that this is outside of the spec. It may be a useful addition, but perhaps we should consider MERGE and REPLACE as completely seperate targets. MERGE has a whole join construction with subqueries that would be a pain to make work in a way that is truly serialisable. REPLACE deals with only one row and tries to solve the race for that case only. Much easier to consider them seperately, no? I guess what's really irritating is that this clearly exposes the case listed in the docs as "Why SERIALIZABLE isn't in all cases". If we could solve that for MERGE, we could probably solve it in the general case too. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
OK, in this case I don't care about either MERGE or REPLACE, but for an UPSERT which does the locking :-) Cheers, Csaba. On Thu, 2005-11-17 at 13:32, Martijn van Oosterhout wrote: > On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote: > > Yes, these algorithms are clear to me, but they don't work for batch > > updates in postgres without savepoints before each row insert/update, > > which is not good for performance (not to mention on older postgres > > versions without savepoint support it won't work at all). If there is a > > way of no race condition, no performance penalty, that would be > > something new and useful. I just guess the MERGE would provide that. > > Well, then you guess wrong. This isn't what MERGE is for. MERGE is just > a neat way of specifying the UPDATE and INSERT cases in the same > statement. It doesn't remove the possibility duplicate inserts and thus > primary key violations. > > If someone wants to make extensions to MERGE so that it can avoid the > race condition and avoid the duplicate key violations, that's fine. But > be aware that this is outside of the spec. It may be a useful addition, > but perhaps we should consider MERGE and REPLACE as completely seperate > targets. > > MERGE has a whole join construction with subqueries that would be a > pain to make work in a way that is truly serialisable. REPLACE deals > with only one row and tries to solve the race for that case only. Much > easier to consider them seperately, no? > > I guess what's really irritating is that this clearly exposes the case > listed in the docs as "Why SERIALIZABLE isn't in all cases". If we > could solve that for MERGE, we could probably solve it in the general > case too. > > Have a nice day,
Csaba Nagy <nagy@ecircle-ag.com> writes: > OK, in this case I don't care about either MERGE or REPLACE, but for an > UPSERT which does the locking :-) This is exactly the point --- pretty much nobody has come to us and asked for a feature that does what Peter and Martijn say MERGE does. (I haven't bothered to look at the 2003 spec, I'm assuming they read it correctly.) What we *have* been asked for, over and over, is an insert-or-update feature that's not so tedious and inefficient as the savepoint-insert-rollback-update kluge. That's what we ought to be concentrating on providing. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > This is exactly the point --- pretty much nobody has come to us and > asked for a feature that does what Peter and Martijn say MERGE does. > (I haven't bothered to look at the 2003 spec, I'm assuming they read it > correctly.) What we *have* been asked for, over and over, is an > insert-or-update feature that's not so tedious and inefficient as the > savepoint-insert-rollback-update kluge. That's what we ought to be > concentrating on providing. I guess to be clear on what this distinction actually is, specifically: MERGE under SQL2003 doesn't appear to be intended to be used concurrently. For data warehousing situations this can be just fine such as in my case where I get a monthly update of some information and need to merge that update in with the prior information. In this case there's only one MERGE running and I'd hope it'd be faster than doing check for existance, insert/update on each row in plpgsql or something (since there'd be multiple index lookups, etc, I think). Concurrent MERGEs running *can* fail, just like whole transactions which do the check/insert/update can fail. REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a transaction which is supposed to not fail but instead do locking to ensure that it doesn't fail. This requires predicate locking to be efficient because you want to tell the concurrent transaction "if you have the same key as me, just wait a second and you can do an update 'cause I'm going to create the key if it doesn't exist before I'm done". I think REPLACE/INSERT ON DUPLICATE UPDATE is definitely harder to do than MERGE because of the idea that it isn't supposed to fail generally. I think SQL2003 MERGE would be reasonably easy to do and to get the efficiency benefits out of it (assuming there are some to be had in the end). I don't think MERGE can really be made to be both though, in which case it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON DUPLICATE UPDATE something else. Perhaps a special form of MERGE where you know it's going to be doing that locking. I really don't like the idea of making the SQL2003 version of MERGE be the MERGE special case (by requiring someone to take a table lock ahead of time or do something else odd). Thanks, Stephen
On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a > transaction which is supposed to not fail but instead do locking to > ensure that it doesn't fail. This requires predicate locking to be > efficient because you want to tell the concurrent transaction "if you > have the same key as me, just wait a second and you can do an update > 'cause I'm going to create the key if it doesn't exist before I'm done". Is the requirement for predicate locking, over and above a unique constraint on an index that involves the record key, to deal with the scenario of two inserts executing at the same time, both before commit? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Unless you have a table lock, INSERT has to be before UPDATE, think UPDATE, UPDATE (both fail), INSERT, INSERT. --------------------------------------------------------------------------- Zeugswetter Andreas DCP SD wrote: > > The problem I try to solve is something along: a bunch of clients try > to update a count, and ONE of them must initialize > the count if it > does not exist... this can't be done with current infrastructure without > race conditions. > > The solution without merge but a unique key in other db's is: > > update > if no rows updated > insert > if duplicate key > update > if no rows updated goto insert > > note, that the counter updates need to be of the form set x = x + ? > where key=y > do you see a potential race condition with this ? > In pg you also need a savepoint before the insert for this to work. > > Depending on the ratio of insert vs update we also start with insert > when > the insert succeeds more that 50% (I would use a higher percentage with > pg though): > > insert > if duplicate key > update > if no rows updated goto insert > > Andreas > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
mark@mark.mielke.cc wrote: > On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > > REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a > > transaction which is supposed to not fail but instead do locking to > > ensure that it doesn't fail. This requires predicate locking to be > > efficient because you want to tell the concurrent transaction "if you > > have the same key as me, just wait a second and you can do an update > > 'cause I'm going to create the key if it doesn't exist before I'm done". > > Is the requirement for predicate locking, over and above a unique > constraint on an index that involves the record key, to deal with > the scenario of two inserts executing at the same time, both before > commit? No. If you have a primary key you can easily prevent duplicates. You need a table lock or predicate locking to prevent duplicates if you do not have a primary key. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > Csaba Nagy <nagy@ecircle-ag.com> writes: > > OK, in this case I don't care about either MERGE or REPLACE, but for an > > UPSERT which does the locking :-) > > This is exactly the point --- pretty much nobody has come to us and > asked for a feature that does what Peter and Martijn say MERGE does. > (I haven't bothered to look at the 2003 spec, I'm assuming they read it > correctly.) What we *have* been asked for, over and over, is an > insert-or-update feature that's not so tedious and inefficient as the > savepoint-insert-rollback-update kluge. That's what we ought to be > concentrating on providing. I am confused over the various options. I have heard these syntaxes: SQL2003 MERGEMySQL REPLACE http://dev.mysql.com/doc/refman/5.1/en/replace.htmlMySQL INSERT VIOLATION ...UPSERT So it seems MERGE does not have the use-case we most need, though it can be bent to do it. (Given their MATCH syntax, it doesn't seem there is any logic that it tries INSERT first). Looking at the MySQL URL above, REPLACE has three possible syntaxes with normal (DELETE), SET (UPDATE), and SELECT. Is this the direction we need to go? I don't like INSERT ... VIOLATION because I would like a new keyword for this. Is UPSERT the same as REPLACE? Should we use UPSERT instead? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Thu, 17 Nov 2005, Bruce Momjian wrote: > Unless you have a table lock, INSERT has to be before UPDATE, think > UPDATE, UPDATE (both fail), INSERT, INSERT. No matter what operation you start with you need a loop that try insert/update until one of them succeed like in this example: http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE Without a loop you might not get to execute neither the insert nor the update. Why? Think about this example: BEGIN INSERT <- fail because there is a row already <- before we manage to do the update someone delete the row (which we can see in the default transaction isolation level) UPDATE <- fail because there is no row so we will loop and try the insert again <- before we manage to do the insert someone else does an insert INSERT <- fail because there is a row already <- before we manage to do the update someone delete the row .... You might need to loop any number of times before you manage to perform one of the two operations. Which operation you should start with depends on which of the two cases is the common one. -- /Dennis Björklund
> Unless you have a table lock, INSERT has to be before UPDATE, think UPDATE, UPDATE (both fail), INSERT, INSERT. > > update > > if no rows updated > > insert > > if duplicate key > > update > > if no rows updated goto insert That is why you have the loop. This is not a problem with above code, because only one insert succeeds while the others then do the update. Andreas
Oh, good point. I was thinking just about concurrent MERGEs. However, it is more complicated than that. By definitaion you can not see changes from other transactions while your statement is being run (even if you increment CommandCounter), so to be atomic, you would still see the row even though some other transaction had deleted it. I think we avoid that now because UPDATE, (which is a DELETE then INSERT) chains the tuples together so others see the activity happening. Seems like we are going to have to peek at rows like we do now for INSERT and peek at index rows, if I remember correctly. I can't think of any other place in the code where we loop around other backend's activity like this. This could be tricky. --------------------------------------------------------------------------- Dennis Bjorklund wrote: > On Thu, 17 Nov 2005, Bruce Momjian wrote: > > > Unless you have a table lock, INSERT has to be before UPDATE, think > > UPDATE, UPDATE (both fail), INSERT, INSERT. > > No matter what operation you start with you need a loop that try > insert/update until one of them succeed like in this example: > > http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE > > Without a loop you might not get to execute neither the insert nor the > update. Why? Think about this example: > > BEGIN > > INSERT <- fail because there is a row already > > <- before we manage to do the update someone > delete the row (which we can see in the > default transaction isolation level) > > UPDATE <- fail because there is no row so we will loop > and try the insert again > > <- before we manage to do the insert someone else does > an insert > > INSERT <- fail because there is a row already > > <- before we manage to do the update someone > delete the row > .... > > > You might need to loop any number of times before you manage to perform > one of the two operations. Which operation you should start with depends > on which of the two cases is the common one. > > -- > /Dennis Bj?rklund > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Oh, good point. I was thinking just about concurrent MERGEs. However, > it is more complicated than that. By definitaion you can not see > changes from other transactions while your statement is being run (even > if you increment CommandCounter), so to be atomic, you would still see > the row even though some other transaction had deleted it. We would have to use the same semantics we use now for read-committed UPDATE, that is look at the latest version of the row even though this would not normally be visible to the transaction's snapshot. In the case of a serializable transaction, no doubt we should fail if any concurrent change actually happens. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Oh, good point. I was thinking just about concurrent MERGEs. However, > > it is more complicated than that. By definitaion you can not see > > changes from other transactions while your statement is being run (even > > if you increment CommandCounter), so to be atomic, you would still see > > the row even though some other transaction had deleted it. > > We would have to use the same semantics we use now for read-committed > UPDATE, that is look at the latest version of the row even though this > would not normally be visible to the transaction's snapshot. > > In the case of a serializable transaction, no doubt we should fail if > any concurrent change actually happens. I have some psaudocode to explain what we want for this feature, whatever syntax we choose: StartCheck unique indexFound lock row for update if zero rows, return to start if more than one row, fail updaterowNotfound create savepoint insert row into heap lock index page if conflicting index entry, abort savepoint,return to start add index entry unlock index page While the "notfound" case might look strange, we actually use this exact method for inserts now, see ExecInsert() and _bt_doinsert(). Particularly see this comment in the second function: /* * If we're not allowing duplicates, make sure the key isn't already in * the index. * * NOTE: obviously,_bt_check_uniquecan only detect keys that are already in * the index; so it cannot defend against concurrentinsertions of the * same key. We protect against that by means of holding a write lock on * the target page. Any other would-be inserter of the same key must * acquire a write lock on the same target page, so only one would-be * inserter can be making the check at one time. Furthermore, once we are * past the check we hold write lockscontinuously until we have performed * our insertion, so no later inserter can fail to see our insertion. * (Thisrequires some care in _bt_insertonpg.) * * If we must wait for another xact, we release the lock while waiting,and * then must start over completely. */ Here is the unique check error from _bt_check_unique(): ereport(ERROR, (errcode(ERRCODE_UNIQUE_VIOLATION), errmsg("duplicatekey violates unique constraint \"%s\"", RelationGetRelationName(rel)))); I think the problem here is that it is going to longjump() back to postgres.c (and out of your code loop). While we have savepoints, I think they only work coming from client applications, rather than inside our code. Ideally you would like to be able to say: savepoint();func();rollback_to_savepoint(); but you can't, so I think you are going to have to factor out that unique error callback and return a failure code to the caller. I suppose some boolean flag need to be added to _bt_doinsert(), but that is called via a function pointer for the index type, so you are going to have to update the insert function signatures for all access methods. The good news is that only btree supports unique indexes, according to the documentation ("Only B-tree currently supports unique indexes") so for the other access methods the extra parameter is just ignored. Another issue is multiple unique indexes. What if the first unique index matches one row, but a different row matches the second unique indexed column? Fail because unique checks do not identify exactly one row? Or the _new_ value for the second indexed column conflicts with the second unique index. The MERGE/REPLACE should fail. The UPDATE block will handle this on its own, but the INSERT block will need to check for that an really error out, rather than return to the caller, so the loop in ExecInsertIndexTuples() has to restart on unique failure _only_ on the first index check, not the subsequent ones. One simplification would be to allow MERGE/REPLACE only on a table that has a single unique index. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Thu, Nov 17, 2005 at 09:30:43PM -0500, Bruce Momjian wrote: > > Is the requirement for predicate locking, over and above a unique > > constraint on an index that involves the record key, to deal with > > the scenario of two inserts executing at the same time, both before > > commit? > > No. If you have a primary key you can easily prevent duplicates. You > need a table lock or predicate locking to prevent duplicates if you do > not have a primary key. AFAIK you can also accomplish this without a table lock as long as you have a unique index on the right set of fields and those fields are also NOT NULL. ISTM it would be good to support that case as well, since you might want to MERGE based on something other than the PK. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > I don't think MERGE can really be made to be both though, in which case > it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON > DUPLICATE UPDATE something else. Perhaps a special form of MERGE where > you know it's going to be doing that locking. I really don't like the > idea of making the SQL2003 version of MERGE be the MERGE special case > (by requiring someone to take a table lock ahead of time or do something > else odd). Anyone know off-hand what the big 3 do? If the industry consensus is that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then it's probably better to follow that lead. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > >>I don't think MERGE can really be made to be both though, in which case >>it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON >>DUPLICATE UPDATE something else. Perhaps a special form of MERGE where >>you know it's going to be doing that locking. I really don't like the >>idea of making the SQL2003 version of MERGE be the MERGE special case >>(by requiring someone to take a table lock ahead of time or do something >>else odd). > > > Anyone know off-hand what the big 3 do? If the industry consensus is > that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then > it's probably better to follow that lead. It was already said here that oracle and db2 both use MERGE, dunno about mssql. And yes merge CAN be used to do REPLACE (oracle uses their dummy table for this, we can use the fact that FROM clause isn't required in postgres). -- Regards Petr Jelinek (PJMODOS)
> > And yes merge CAN be used to do REPLACE (oracle uses their dummy table > for this, we can use the fact that FROM clause isn't required in postgres). > the FROM clause is required by default (starting with 8.1) unless you change a postgresql.conf parameter. and i don't think that idea will have any fan... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
On Tue, Nov 22, 2005 at 04:20:12AM +0100, Petr Jelinek wrote: > It was already said here that oracle and db2 both use MERGE, dunno about > mssql. > > And yes merge CAN be used to do REPLACE (oracle uses their dummy table > for this, we can use the fact that FROM clause isn't required in postgres). Statements about MERGE on the web: http://www.dba-oracle.com/oracle_tips_rittman_merge.htm http://databasejournal.com/features/db2/article.php/3322041 http://certcities.com/editorial/columns/story.asp?EditorialsID=51 http://publib.boulder.ibm.com/infocenter/ids9help/index.jsp?topic=/com.ibm.sqls.doc/sqls578.htm http://www.jdixon.dotnetdevelopersjournal.com/i_want_my_sql_2005_merge_statement.htm http://publib.boulder.ibm.com/infocenter/db2help/topic/com.ibm.db2.udb.doc/admin/r0010873.htm http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_gci978700,00.html Not one (*not one!*) of these mentions any special handling of duplicate keys. They even go to pains to say that any errors cause everything to rollback. The last one is especially interesting: : Is there any way to capture errors from a MERGE statement? Also, is : there any way to know how many records were inserted or updated for the : MERGE statement like SQL%ROWCOUNT? Any assistance greatly appreciated. : : You capture errors the same way you would if you were doing regular : INSERT and UPDATE statements....with exception handlers. Just include a : WHEN OTHERS exception handler in the block where your MERGE statement : is and have to display SQLCODE and SQLERRM if an error occurs. Then you : can figure out which specific errors are occurring and create : individual exception handlers for those. There are even places that tell you how to decompose your MERGE into an INSERT plus UPDATE statement. The real advantage of MERGE is that the semantics prevent your updating a row you just inserted, which is harder in the general case but easy if the executor is handling the rows one at a time. Rather than trying to make MERGE do something it wasn't designed for, we should probably be spending our efforts on triggers for error conditions. Maybe something like: CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz(); Where baz would be passed NEW and OLD just like a normal trigger and if the trigger return NULL, the update is ignored. In the meantime the function can divert the insert to another table if it likes. This seems like a much more workable and useful addition. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Jaime Casanova wrote: > > > > And yes merge CAN be used to do REPLACE (oracle uses their dummy table > > for this, we can use the fact that FROM clause isn't required in postgres). > > > > the FROM clause is required by default (starting with 8.1) unless you > change a postgresql.conf parameter. > > and i don't think that idea will have any fan... No, it is not, try SELECT 1. Oracle requires SELECT 1 FROM dual. The change in 8.1 is that SELECT pg_class.relname no longer works. You have to do SELECT relname FROM pg_class. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On 11/22/05, Bruce Momjian <pgman@candle.pha.pa.us> wrote: > Jaime Casanova wrote: > > > > > > And yes merge CAN be used to do REPLACE (oracle uses their dummy table > > > for this, we can use the fact that FROM clause isn't required in postgres). > > > > > > > the FROM clause is required by default (starting with 8.1) unless you > > change a postgresql.conf parameter. > > > > and i don't think that idea will have any fan... > > No, it is not, try SELECT 1. Oracle requires SELECT 1 FROM dual. The > change in 8.1 is that SELECT pg_class.relname no longer works. You have to > do SELECT relname FROM pg_class. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > touche... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
On Tue, Nov 22, 2005 at 11:57:48AM +0100, Martijn van Oosterhout wrote: <excellent research snipped> > Rather than trying to make MERGE do something it wasn't designed for, > we should probably be spending our efforts on triggers for error > conditions. Maybe something like: > > CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz(); > > Where baz would be passed NEW and OLD just like a normal trigger and if > the trigger return NULL, the update is ignored. In the meantime the > function can divert the insert to another table if it likes. This seems > like a much more workable and useful addition. I agree that we shouldn't try and distort MERGE into something fancy. The AFTER ERROR trigger is a very interesting idea, since it could handle many different cases. But I'm worried that people might not want that behavior on by default for everything done against some table. I think it'd be better to have some way to specify in a command that you want to use some kind of error-handling trigger. Though presumably the underlying framework would be same, so it shouldn't be hard to support both. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jaime Casanova wrote: > > the FROM clause is required by default (starting with 8.1) unless you > change a postgresql.conf parameter. > > and i don't think that idea will have any fan... > Bruce already replied to your first statement so, what idea won't have any fan ? It's not that we would change what MERGE does. Postgres just does not requeire FROM clause in SELECT and second parameter of MERGE can be SELECT which means you can do what REPLACE) does without problems and without breaking something or violating standard and like I said you can do the same in oracle using dual. Btw about that keys, oracle gives error on many-to-one or many-to-many relationship between the source and target tables. -- Regards Petr Jelinek (PJMODOS)
On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote: > Btw about that keys, oracle gives error on many-to-one or many-to-many > relationship between the source and target tables. The standard has something called a "cardinality violation" if the to-be-merged table doesn't match 1-1 with the rest of the statement. If I had access to an Oracle I'd run two tests on MERGE: 1. Does the joining column have to have an index? For example, make a column that's full of unique values but no unique index. According to my reading of the the standard, this should still work (just slower). 2. Additionally, only the rows involved in the MERGE need to be uniquely referenced, so if you add duplicate values but add a WHERE clause to exclude those, it should also work. My feeling is that requiring an index will limit it's usefulness as a general tool. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn, Here is a quick test (Oracle 10.1.0.3/Linux): SQL> select banner from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Prod PL/SQL Release 10.1.0.3.0 - Production CORE 10.1.0.3.0 Production TNS for Linux: Version 10.1.0.3.0 - Production NLSRTL Version 10.1.0.3.0 - Production SQL> select * from merge_test_1; ID NAME ---------- -------------------- 1 aaa 2 bbb 3 ccc 4 ddd 5 eee 1 xxx 6 rows selected. SQL> select * from merge_test_2; ID NAME ---------- -------------------- 1 AAA 2 BBB 6 FFF SQL> select index_name from user_indexes where table_name like 'merge_test%'; no rows selected SQL> merge into merge_test_1 a1 2 using merge_test_2 a2 3 on (a1.id = a2.id) 4 when matched then 5 update seta1.name = a2.name 6 when not matched then 7 insert (id, name) values (a2.id, a2.name); 4 rows merged. SQL> select * from merge_test_1; ID NAME ---------- -------------------- 1 AAA 2 BBB 3 ccc 4 ddd 5 eee 1 AAA 6 FFF 7 rows selected. Regards, Lubomir Petrov Martijn van Oosterhout wrote: > On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote: > >> Btw about that keys, oracle gives error on many-to-one or many-to-many >> relationship between the source and target tables. >> > > The standard has something called a "cardinality violation" if the > to-be-merged table doesn't match 1-1 with the rest of the statement. If > I had access to an Oracle I'd run two tests on MERGE: > > 1. Does the joining column have to have an index? For example, make a > column that's full of unique values but no unique index. According to > my reading of the the standard, this should still work (just slower). > > 2. Additionally, only the rows involved in the MERGE need to be > uniquely referenced, so if you add duplicate values but add a WHERE > clause to exclude those, it should also work. > > My feeling is that requiring an index will limit it's usefulness as a > general tool. > > Have a nice day, >