Thread: MERGE vs REPLACE

MERGE vs REPLACE

From
Peter Eisentraut
Date:
It seems to me that it has always been implicitly assumed around here 
that the MERGE command would be a substitute for a MySQL-like REPLACE 
functionality.  After rereading the spec it seems that this is not the 
case.  MERGE always operates on two different tables, which REPLACE 
doesn't do.

That said, what kind of support for insert-or-update-this-row do we want 
to provide, if any?  Should it be a REPLACE command, an extension of 
the INSERT command, a modication of the MERGE syntax, or something 
else?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: MERGE vs REPLACE

From
Petr Jelinek
Date:
Peter Eisentraut wrote:
> It seems to me that it has always been implicitly assumed around here 
> that the MERGE command would be a substitute for a MySQL-like REPLACE 
> functionality.  After rereading the spec it seems that this is not the 
> case.  MERGE always operates on two different tables, which REPLACE 
> doesn't do.
> 
> That said, what kind of support for insert-or-update-this-row do we want 
> to provide, if any?  Should it be a REPLACE command, an extension of 
> the INSERT command, a modication of the MERGE syntax, or something 
> else?
> 

MERGE of course, it's standard, REPLACE is mysql extension


-- 
Regards
Petr Jelinek (PJMODOS)
www.parba.cz


Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
On 11/11/05, Peter Eisentraut <peter_e@gmx.net> wrote:
> It seems to me that it has always been implicitly assumed around here
> that the MERGE command would be a substitute for a MySQL-like REPLACE
> functionality.  After rereading the spec it seems that this is not the
> case.  MERGE always operates on two different tables, which REPLACE
> doesn't do.
>
> That said, what kind of support for insert-or-update-this-row do we want
> to provide, if any?  Should it be a REPLACE command, an extension of
> the INSERT command, a modication of the MERGE syntax, or something
> else?
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>

MERGE seems to me the better option... not just because is standard
but at least i can see some use cases for it...


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
Peter Eisentraut
Date:
Jaime Casanova wrote:
> MERGE seems to me the better option... not just because is standard
> but at least i can see some use cases for it...

I don't think you understand my message: MERGE does not do what REPLACE 
does.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: MERGE vs REPLACE

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> It seems to me that it has always been implicitly assumed around here 
> that the MERGE command would be a substitute for a MySQL-like REPLACE 
> functionality.  After rereading the spec it seems that this is not the 
> case.  MERGE always operates on two different tables, which REPLACE 
> doesn't do.

Normally I'd plump for following the standard ... but AFAIR, we have had
bucketloads of requests for REPLACE functionality, and not one request
for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
whole lot harder and slower than REPLACE, it seems that we could do
worse than to concentrate on doing REPLACE for now.  (We can always come
back to MERGE some other day.)
        regards, tom lane


Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
On 11/11/05, Peter Eisentraut <peter_e@gmx.net> wrote:
> Jaime Casanova wrote:
> > MERGE seems to me the better option... not just because is standard
> > but at least i can see some use cases for it...
>
> I don't think you understand my message: MERGE does not do what REPLACE
> does.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>

I understand you well... what i was trying to say is that i prefer
MERGE (standard SQL command) to be done because the functionally it
has (basically a merge of two tables) seems to me to be more usefull
than REPLACE (MySql Command)...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
Josh Berkus
Date:
Guys,

> I understand you well... what i was trying to say is that i prefer
> MERGE (standard SQL command) to be done because the functionally it
> has (basically a merge of two tables) seems to me to be more usefull
> than REPLACE (MySql Command)...

But even REPLACE requires predicate locking.  There's no real way to get 
around it.

--Josh

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote:
> Guys,
>
> > I understand you well... what i was trying to say is that i prefer
> > MERGE (standard SQL command) to be done because the functionally it
> > has (basically a merge of two tables) seems to me to be more usefull
> > than REPLACE (MySql Command)...
>
> But even REPLACE requires predicate locking.  There's no real way to get
> around it.
>
> --Josh
>

why? seems that REPLACE only work if there are at least one row matching...

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
Josh Berkus
Date:
Jaime,

> why? seems that REPLACE only work if there are at least one row
> matching...

Scenario:

session1: REPLACE .... 1                            session2:  REPLACE ..... 1
session1: check to see that "1" exists .... no                 session2: check to see that "1" exists .... no
session1: INSERT 1                 session2: INSERT 1 .... ERROR

Get the picture?  The only way to avoid a race condition is to be able to 
do "predicate locking", that is to lock the table against any data write 
matching that predicate.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: MERGE vs REPLACE

From
Rod Taylor
Date:
On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote:
> Jaime,
> 
> > why? seems that REPLACE only work if there are at least one row
> > matching...

> Get the picture?  The only way to avoid a race condition is to be able to 
> do "predicate locking", that is to lock the table against any data write 
> matching that predicate.

So? That is what save points are for.  You can even skip the select for
update if you don't mind dead tuples from the attempted insert.

SELECT ... FOR UPDATE;
IF not exists THENSAVEPOINT;INSERT ;IF UNIQUE VIOLATION THEN    /* Someone else inserted between the SELECT and our
INSERT*/    ROLLBACK TO SAVEPOINT;    UPDATE;ELSE    RELEASE SAVEPOINT;FI
 
ELSEUPDATE;
FI
-- 



Re: MERGE vs REPLACE

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> But even REPLACE requires predicate locking.  There's no real way to get 
> around it.

The point though is that REPLACE is restricted to a type of predicate
narrow enough to be enforced through a unique-index mechanism, and so
it's implementable without solving the general case of predicate
locking.

Predicate locking for narrow cases isn't very hard; it's the general
case of arbitrary predicates that's hard.
        regards, tom lane


Re: MERGE vs REPLACE

From
Gavin Sherry
Date:
On Fri, 11 Nov 2005, Josh Berkus wrote:

> Jaime,
>
> > why? seems that REPLACE only work if there are at least one row
> > matching...
>
> Scenario:
>
> session1: REPLACE .... 1
>                   session2:  REPLACE ..... 1
> session1: check to see that "1" exists .... no
>                   session2: check to see that "1" exists .... no
> session1: INSERT 1
>                   session2: INSERT 1 .... ERROR
>
> Get the picture?  The only way to avoid a race condition is to be able to
> do "predicate locking", that is to lock the table against any data write
> matching that predicate.

When it comes to predicate locking, I think we should defer to Peter's
comment at Open DB Con:

http://www.treehou.se/~swm/peter_merge.jpg

Gavin


Re: MERGE vs REPLACE

From
mark@mark.mielke.cc
Date:
On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote:
> So? That is what save points are for.  You can even skip the select for
> update if you don't mind dead tuples from the attempted insert.
> SELECT ... FOR UPDATE;
> IF not exists THEN
>     SAVEPOINT;
>     INSERT ;
>     IF UNIQUE VIOLATION THEN
>         /* Someone else inserted between the SELECT and our INSERT */
>         ROLLBACK TO SAVEPOINT;
>         UPDATE;
>     ELSE
>         RELEASE SAVEPOINT;
>     FI
> ELSE
>     UPDATE;
> FI

Isn't there still a race between INSERT and UPDATE?

Low probability, for sure, as it would have had to not exist, then
exist, then not exist, but still possible.

I'd like a REPLACE that could be safe, or at least cause a COMMIT to
fail, for this reason.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: MERGE vs REPLACE

From
Rod Taylor
Date:
On Fri, 2005-11-11 at 18:36 -0500, mark@mark.mielke.cc wrote:
> On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote:
> > So? That is what save points are for.  You can even skip the select for
> > update if you don't mind dead tuples from the attempted insert.
> > SELECT ... FOR UPDATE;
> > IF not exists THEN
> >     SAVEPOINT;
> >     INSERT ;
> >     IF UNIQUE VIOLATION THEN
> >         /* Someone else inserted between the SELECT and our INSERT */
> >         ROLLBACK TO SAVEPOINT;
> >         UPDATE;
> >     ELSE
> >         RELEASE SAVEPOINT;
> >     FI
> > ELSE
> >     UPDATE;
> > FI
> 
> Isn't there still a race between INSERT and UPDATE?

I suppose there is although I hadn't noticed before. I've never run into
it and always check to ensure the expected number of tuples were touched
by the update or delete.

Within the PostgreSQL backend you might get away with having your insert
hold a lock on the index page and follow it up with a FOR UPDATE lock on
the offending tuple thus ensuring that your update will succeed. If you
hack index mechanisms for the support you don't need the SAVEPOINT
either -- just don't throw an error when you run across the existing
entry.

For client side code one possibility is to repeat until successful.

WHILESELECT FOR UPDATE;IF NOT EXISTS THEN    SAVEPOINT    INSERT;    IF UNIQUE VIOLATION THEN        ROLLBACK TO
SAVEPOINT;   ELSE        RELEASE SAVEPOINT        EXIT;    FIELSE    UPDATE;    EXIT;END
 
-- Check for infinite loop
END

-- 



Re: MERGE vs REPLACE

From
Matteo Beccati
Date:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> It seems to me that it has always been implicitly assumed around here 
>> that the MERGE command would be a substitute for a MySQL-like REPLACE 
>> functionality.  After rereading the spec it seems that this is not the 
>> case.  MERGE always operates on two different tables, which REPLACE 
>> doesn't do.
> 
> Normally I'd plump for following the standard ... but AFAIR, we have had
> bucketloads of requests for REPLACE functionality, and not one request
> for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
> whole lot harder and slower than REPLACE, it seems that we could do
> worse than to concentrate on doing REPLACE for now.  (We can always come
> back to MERGE some other day.)

I would also like to add that MySQL's REPLACE is not exactly an INSERT 
OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the 
fields not specified in the query are set to their defaults:

i.e.

CREATE TABLE t (a int PRIMARY KEY, b int, c int);

INSERT INTO t (a, b, c) VALUES (1, 1, 2);

SELECT * FROM t;
+---+------+------+
| a | b    | c    |
+---+------+------+
| 1 |    1 |    2 |
+---+------+------+

REPLACE INTO t (a, b) VALUES (1, 1);

SELECT * FROM t;
+---+------+------+
| a | b    | c    |
+---+------+------+
| 1 |    1 | NULL |
+---+------+------+


I wanted to point it out this because people are commonly mistaking this.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com


Re: MERGE vs REPLACE

From
Robert Treat
Date:
On Saturday 12 November 2005 04:06, Matteo Beccati wrote:
> Tom Lane wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> >> It seems to me that it has always been implicitly assumed around here
> >> that the MERGE command would be a substitute for a MySQL-like REPLACE
> >> functionality.  After rereading the spec it seems that this is not the
> >> case.  MERGE always operates on two different tables, which REPLACE
> >> doesn't do.
> >
> > Normally I'd plump for following the standard ... but AFAIR, we have had
> > bucketloads of requests for REPLACE functionality, and not one request
> > for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
> > whole lot harder and slower than REPLACE, it seems that we could do
> > worse than to concentrate on doing REPLACE for now.  (We can always come
> > back to MERGE some other day.)
>
> I would also like to add that MySQL's REPLACE is not exactly an INSERT
> OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the
> fields not specified in the query are set to their defaults:
>
> i.e.
>
> CREATE TABLE t (a int PRIMARY KEY, b int, c int);
>
> INSERT INTO t (a, b, c) VALUES (1, 1, 2);
>
> SELECT * FROM t;
> +---+------+------+
>
> | a | b    | c    |
>
> +---+------+------+
>
> | 1 |    1 |    2 |
>
> +---+------+------+
>
> REPLACE INTO t (a, b) VALUES (1, 1);
>
> SELECT * FROM t;
> +---+------+------+
>
> | a | b    | c    |
>
> +---+------+------+
>
> | 1 |    1 | NULL |
>
> +---+------+------+
>
>
> I wanted to point it out this because people are commonly mistaking this.
>
>

Wow, that seems ugly.... maybe there's a reason for it, but I'm not sure we 
could deviate from my$ql's behavior on this even if we wanted... they are the 
"standard" here.

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: MERGE vs REPLACE

From
Gregory Maxwell
Date:
On 11/13/05, Robert Treat <xzilla@users.sourceforge.net> wrote:
> On Saturday 12 November 2005 04:06, Matteo Beccati wrote:
> > | 1 |    1 | NULL |
> Wow, that seems ugly.... maybe there's a reason for it, but I'm not sure we
> could deviate from my$ql's behavior on this even if we wanted... they are the
> "standard" here.

I don't think that's ugly, I think that's exactly working as
advertised. Replace behaves exactly like deleting the record with the
matching primary key and inserting the provided input. ... not merging
together old data with new.


Re: MERGE vs REPLACE

From
Robert Treat
Date:
On Sunday 13 November 2005 10:01, Gregory Maxwell wrote:
> On 11/13/05, Robert Treat <xzilla@users.sourceforge.net> wrote:
> > On Saturday 12 November 2005 04:06, Matteo Beccati wrote:
> > > | 1 |    1 | NULL |
> >
> > Wow, that seems ugly.... maybe there's a reason for it, but I'm not sure
> > we could deviate from my$ql's behavior on this even if we wanted... they
> > are the "standard" here.
>
> I don't think that's ugly, I think that's exactly working as
> advertised. Replace behaves exactly like deleting the record with the
> matching primary key and inserting the provided input. ... not merging
> together old data with new.

I disagree in that REPLACE is advertised as a solution for the INSERT else 
UPDATE problem, but has a different behavior than a true INSERT else UPDATE 
would produce.   Maybe that's a problem with the implementation, or maybe 
it's a problem in the advertisment, but there is certainly a discrepency 
there. 

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: MERGE vs REPLACE

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> I disagree in that REPLACE is advertised as a solution for the INSERT else 
> UPDATE problem, but has a different behavior than a true INSERT else UPDATE 
> would produce.   Maybe that's a problem with the implementation, or maybe 
> it's a problem in the advertisment, but there is certainly a discrepency 
> there. 

Yeah.  REPLACE fails to solve common examples like a web hit counter
("if key doesn't exist, insert row with count 1; if it does exist,
add 1 to the current count").

IIRC, SQL's MERGE deals with this by offering two quite separate
specifications of what to do when there is or isn't already a matching
row.

I don't necessarily feel that we have to slavishly duplicate what MySQL
offers.  I do think that it's reasonable to restrict the functionality
to updating/replacing a row with matching primary key --- that gets us
out of the problem of needing a full predicate-locking mechanism, while
still covering most all of the practical use-cases that I can see.

It'd be useful to look at what comparable functionality is offered by
other DBs besides MySQL.  Anyone know what DB2 or Oracle have in this
area?
        regards, tom lane


Re: MERGE vs REPLACE

From
Petr Jelinek
Date:
Tom Lane wrote:
> 
> It'd be useful to look at what comparable functionality is offered by
> other DBs besides MySQL.  Anyone know what DB2 or Oracle have in this
> area?
> 

IIRC they both have MERGE.

-- 
Regards
Petr Jelinek (PJMODOS)



Re: MERGE vs REPLACE

From
"Joshua D. Drake"
Date:
Petr Jelinek wrote:

> Tom Lane wrote:
>
>>
>> It'd be useful to look at what comparable functionality is offered by
>> other DBs besides MySQL.  Anyone know what DB2 or Oracle have in this
>> area?
>
Oracle:
http://www.psoug.org/reference/merge.html
http://www.psoug.org/reference/translate_replace.html

But the oracle replace seems completely different to the topic at hand.

DB2:  Merge:     
http://www.databasejournal.com/features/db2/article.php/10896_3322041_2
  I was not able to easily find information on REPLACE.

Joshua D. Drake




-- 
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/



Re: MERGE vs REPLACE

From
Peter Eisentraut
Date:
Tom Lane wrote:
> IIRC, SQL's MERGE deals with this by offering two quite separate
> specifications of what to do when there is or isn't already a
> matching row.

In that regard, MERGE is quite flexible, but MERGE doesn't address the 
point of REPLACE, because MERGE requires *two* tables as input, whereas 
REPLACE only takes *one*.  Unless someone can show that you can trick 
MERGE into doing the REPLACE job anyway, we're not discussing the same 
thing.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: MERGE vs REPLACE

From
Petr Jelinek
Date:
Peter Eisentraut wrote:
> 
> In that regard, MERGE is quite flexible, but MERGE doesn't address the 
> point of REPLACE, because MERGE requires *two* tables as input, whereas 
> REPLACE only takes *one*.  Unless someone can show that you can trick 
> MERGE into doing the REPLACE job anyway, we're not discussing the same 
> thing.
> 

I am really not db expert and I don't have copy of sql standard but you 
don't need to use 2 tables I think - USING part can also be subquery 
(some SELECT) and if I am right then you could simulate what REPLACE 
does because in PostgreSQL you are not forced to specify FROM clause in 
SELECT. So you could in theory do
MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...
But I am not sure if this is what you want.

-- 
Regards
Petr Jelinek (PJMODOS)


Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote:
> I am really not db expert and I don't have copy of sql standard but you
> don't need to use 2 tables I think - USING part can also be subquery
> (some SELECT) and if I am right then you could simulate what REPLACE
> does because in PostgreSQL you are not forced to specify FROM clause in
> SELECT. So you could in theory do
> MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...
> But I am not sure if this is what you want.

Well, the obvious extension to this is that the extire USING clause is
in fact optional:

MERGE INTO tablename ON id = 1 ...

Which starts looking a lot simpler.

BTW, my reading of the MERGE examples given earlier is that there no
notes in there at all about guarenteeing concurrency. None of the
documentation says that using MERGE will avoid duplicate key errors if
someone else does the same thing concurrently. It seems more like a
performence hack to avoid scanning the table twice.

Basically, you could implement this by taking the USING clause, do a
left outer join with the merge table and for the blank rows fill in a
CTID for insert and instead of NULLs the values of the INSERT portion.

Which is kind of a bummer for the people who want to do the "insert
zero if not there else add 1" thing a lot and expecting this to solve
the concurrency for them.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
Jochem van Dieten
Date:
On 11/13/05, Petr Jelinek wrote:
>
> I am really not db expert and I don't have copy of sql standard but you
> don't need to use 2 tables I think - USING part can also be subquery
> (some SELECT) and if I am right then you could simulate what REPLACE
> does because in PostgreSQL you are not forced to specify FROM clause in
> SELECT. So you could in theory do
> MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...

I think the MySQL statement:
REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '0000-00-00', NULL, 3)

would translate into the following MERGE statement:
MERGE INTO table target
USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source
ON target.pknew = source.pk
WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = col3new
WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew,
col1new, col2new, col3new)

It might not be the most elegant solution, but I don't see why it won't work.

Jochem

Re: MERGE vs REPLACE

From
Date:
I think you translated it correctly, MySQL has another way of specifying this
which is "INSERT ... ON DUPLICATE KEY UPDATE ..."
(http://dev.mysql.com/doc/refman/5.0/en/insert.html)
Regards

Paolo

Jochem van Dieten <jochemd@gmail.com> ha scritto

> On 11/13/05, Petr Jelinek wrote:
> >
> > I am really not db expert and I don't have copy of sql standard but you
> > don't need to use 2 tables I think - USING part can also be subquery
> > (some SELECT) and if I am right then you could simulate what REPLACE
> > does because in PostgreSQL you are not forced to specify FROM clause in
> > SELECT. So you could in theory do
> > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid)
...

> I think the MySQL statement:
> REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '0000-00-00', NULL,
3)

> would translate into the following MERGE statement:
> MERGE INTO table target
> USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source
> ON target.pknew = source.pk
> WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 =
col3new
> WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew,
> col1new, col2new, col3new)

> It might not be the most elegant solution, but I don't see why it won't
work.

> Jochem

> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?

>                http://archives.postgresql.org
> 





Re: MERGE vs REPLACE

From
"Jim C. Nasby"
Date:
See 'merge_db' in http://lnk.nu/postgresql.org/5sl.html

On Fri, Nov 11, 2005 at 10:07:07PM -0500, Rod Taylor wrote:
> On Fri, 2005-11-11 at 18:36 -0500, mark@mark.mielke.cc wrote:
> > On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote:
> > > So? That is what save points are for.  You can even skip the select for
> > > update if you don't mind dead tuples from the attempted insert.
> > > SELECT ... FOR UPDATE;
> > > IF not exists THEN
> > >     SAVEPOINT;
> > >     INSERT ;
> > >     IF UNIQUE VIOLATION THEN
> > >         /* Someone else inserted between the SELECT and our INSERT */
> > >         ROLLBACK TO SAVEPOINT;
> > >         UPDATE;
> > >     ELSE
> > >         RELEASE SAVEPOINT;
> > >     FI
> > > ELSE
> > >     UPDATE;
> > > FI
> > 
> > Isn't there still a race between INSERT and UPDATE?
> 
> I suppose there is although I hadn't noticed before. I've never run into
> it and always check to ensure the expected number of tuples were touched
> by the update or delete.
> 
> Within the PostgreSQL backend you might get away with having your insert
> hold a lock on the index page and follow it up with a FOR UPDATE lock on
> the offending tuple thus ensuring that your update will succeed. If you
> hack index mechanisms for the support you don't need the SAVEPOINT
> either -- just don't throw an error when you run across the existing
> entry.
> 
> For client side code one possibility is to repeat until successful.
> 
> WHILE
>     SELECT FOR UPDATE;
>     IF NOT EXISTS THEN
>         SAVEPOINT
>         INSERT;
>         IF UNIQUE VIOLATION THEN
>             ROLLBACK TO SAVEPOINT;
>         ELSE
>             RELEASE SAVEPOINT
>             EXIT;
>         FI
>     ELSE
>         UPDATE;
>         EXIT;
>     END
> 
>     -- Check for infinite loop
> END
> 
> -- 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: MERGE vs REPLACE

From
"Jim C. Nasby"
Date:
On Fri, Nov 11, 2005 at 03:42:38PM -0500, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > It seems to me that it has always been implicitly assumed around here 
> > that the MERGE command would be a substitute for a MySQL-like REPLACE 
> > functionality.  After rereading the spec it seems that this is not the 
> > case.  MERGE always operates on two different tables, which REPLACE 
> > doesn't do.
> 
> Normally I'd plump for following the standard ... but AFAIR, we have had
> bucketloads of requests for REPLACE functionality, and not one request
> for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
> whole lot harder and slower than REPLACE, it seems that we could do
> worse than to concentrate on doing REPLACE for now.  (We can always come
> back to MERGE some other day.)

I suspect a lot of those requests are from people who actually want
merge and don't realize that mysql has a replace.

On another note, is there any reason we can't put an equivalent to
example 36-1 (http://lnk.nu/postgresql.org/617.html) into the backend?
Presumably it wouldn't be as fast as a more elegant solution, but OTOH
it'd probably be faster than plpgsql...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: MERGE vs REPLACE

From
Simon Riggs
Date:
On Sun, 2005-11-13 at 23:56 +0100, Martijn van Oosterhout wrote:
> On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote:
> > I am really not db expert and I don't have copy of sql standard but you 
> > don't need to use 2 tables I think - USING part can also be subquery 
> > (some SELECT) and if I am right then you could simulate what REPLACE 
> > does because in PostgreSQL you are not forced to specify FROM clause in 
> > SELECT. So you could in theory do
> > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...
> > But I am not sure if this is what you want.
> 
> Well, the obvious extension to this is that the extire USING clause is
> in fact optional:
> 
> MERGE INTO tablename ON id = 1 ...
> 
> Which starts looking a lot simpler.
> 

...choosing a place to jump in is a little hard in this thread...so I'll
jump in with some general info and thoughts on topics so far:

MERGE is useful both for OLTP systems and for Data Warehousing, where it
is sometimes known as the UPSERT. The MERGE statement in SQL:2003
requires a target table and a table statement. I don't see anything in
that to always require two separate tables - this is just the same as a
self-referencing INSERT SELECT statement. The USING clause is also a
compulsory part of SQL:2003.

One of the more interesting ways to use MERGE is with Oracle external
tables. The same idea for us would be to have MERGE become a variant of
the PostgreSQL COPY FROM command. That would be very cool.

The above is the reason why MERGE doesn't seem to provide for external
data being passed, as does INSERT or MySQL REPLACE.

Neither DB2 or Oracle perform predicate locking. DB2 is more cautious,
and some would say more efficient, thats all. PostgreSQL's locking
features are just fine for pragmatic implementation of MERGE, AFAICS.
Where there is doubt, we should fall back to table locking just like the
rest of the world, IMHO. Making this work with partitioning will be hard
enough without overusing the predicate solving logic.

The UPSERT concept is also supported by Teradata, who simply append an
ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
to me to be a fairly small subset of MERGE functionality and we ought to
be able to offer that functionality as a side branch of the main work.

I've been looking at ways of doing INSERT then UPDATE, but it doesn't
seem very easy to avoid unique index violations in that case. So doing
the UPDATE first then INSERTs later seems like the way to go.

Best Regards, Simon Riggs



Re: MERGE vs REPLACE

From
Josh Berkus
Date:
Simon,

> The UPSERT concept is also supported by Teradata, who simply append an
> ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
> to me to be a fairly small subset of MERGE functionality and we ought to
> be able to offer that functionality as a side branch of the main work.

Yes, I guess my hesitation on the full-table-lock strategy is that it 
doesn't really fulfill the mandate for why people want REPLACE-like 
statements ... to give them an INSERT-or-UPDATE with *higher* efficiency 
and concurrency than doing two statements.  That being said, I've 
personally designed more than a dozen web applications and have not yet 
been faced with a single circumstance of not knowing whether I wanted to 
INSERT or UPDATE.  I've even ported MySQL apps and found it easy to 
re-code them to do "if $id = 0, then insert ..." without even needing to 
use a pl/pgsql hack.

So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
what MERGE fulfills rather neatly and for that full table locking is 
perfectly OK, even desirable.  You really don't want to MERGE-load the 
same table on two threads at once.  

The second case is for applications coded for MySQL; this is the REPLACE 
case.  However, the most common MySQL applications doing this use full 
table locking (MyISAM) anyway!  So, while full table locking wouldn't gain 
them any performance over using two statements, it shouldn't lose them 
anything they're used to having.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: MERGE vs REPLACE

From
mark@mark.mielke.cc
Date:
On Tue, Nov 15, 2005 at 10:27:10AM -0800, Josh Berkus wrote:
> Yes, I guess my hesitation on the full-table-lock strategy is that it 
> doesn't really fulfill the mandate for why people want REPLACE-like 
> statements ... to give them an INSERT-or-UPDATE with *higher* efficiency 
> and concurrency than doing two statements.  That being said, I've 
> personally designed more than a dozen web applications and have not yet 
> been faced with a single circumstance of not knowing whether I wanted to 
> INSERT or UPDATE.  I've even ported MySQL apps and found it easy to 
> re-code them to do "if $id = 0, then insert ..." without even needing to 
> use a pl/pgsql hack.

> So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
> what MERGE fulfills rather neatly and for that full table locking is 
> perfectly OK, even desirable.  You really don't want to MERGE-load the 
> same table on two threads at once.  

Yes.

> The second case is for applications coded for MySQL; this is the REPLACE 
> case.  However, the most common MySQL applications doing this use full 
> table locking (MyISAM) anyway!  So, while full table locking wouldn't gain 
> them any performance over using two statements, it shouldn't lose them 
> anything they're used to having.

The last two times I wished to use it:

Objects with dynamic fields. The table rows are identified by the
object key, a field key, that refer to a text value. I believe I
still have a race in this case, as I allow INSERT, UPDATE, and DELETE.
The DELETE means that an UPDATE, followed by an INSERT, may fail,
if the DELETE occurs between the two. DELETE is fine as is, as it
is a single operation, that won't fail. Combining UPDATE and INSERT
into one, as per the MySQL REPLACE, would eliminate my race caused
by the DELETE. If REPLACE locked the whole table, it would not be a
candidate, as this table is frequently modified by many users at
once. REPLACE using the primary key index to do locking would be
great.

Timestamping the last signin and access time (~30 seconds error
allowed in the access time to eliminate a majority of the updates
using memcached as we all know how much PostgreSQL hates this) for a
table that doesn't have a 1:1 mapping between the fields associated
with the user, and the row that records their last signin/access time.
The signin/access time can be associated with a context, that is a
part of the primary key. Currently, I rely on the code never deleting
rows, and perform update first, and if the update does not change any
rows, I fall back to insert.

As it is, though, I ended up combining the signin and access time
into the same row, to use one table instead of two, and I'm not sure
that MySQL replace really gives me what I want in this situation. I
have never used MERGE, so can't say whether this would do what I want.
I would not want the solution to lock the entire table, as these updates,
although throttled by the the ~30 seconds error allowed, and memcached
queries, would be potentially performed by hundreds of users every
30 seconds. I can extend the error factor to 60 seconds, but that only
cuts the queries in half. In general, however, the current model does
work fine, and doesn't require replacement. UPDATE will be used most
of the time, and be the correct operation.

Unfortunately, there is a race that I see here. If both UPDATE
operations fail at the same time, then one of the INSERTS will fail.
I don't handle this currently, but perhaps I could argue myself into
considering this an ignorable failure.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: MERGE vs REPLACE

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Normally I'd plump for following the standard ... but AFAIR, we have had
> bucketloads of requests for REPLACE functionality, and not one request
> for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
> whole lot harder and slower than REPLACE, it seems that we could do
> worse than to concentrate on doing REPLACE for now.  (We can always come
> back to MERGE some other day.)

Not to be too much of a pain, but I asked for full-spec MERGE a while
back... :)  I don't think I was the only one asking for full-spec MERGE
in the "What features would you like to see in Postgres?" thread a while
ago either, though I could be wrong.

I'd like to see MySQL-like 'replace' too, of course. :)
Thanks,
    Stephen

Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
On 11/15/05, Josh Berkus <josh@agliodbs.com> wrote:
> Simon,
>
> > The UPSERT concept is also supported by Teradata, who simply append an
> > ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
> > to me to be a fairly small subset of MERGE functionality and we ought to
> > be able to offer that functionality as a side branch of the main work.
>
> Yes, I guess my hesitation on the full-table-lock strategy is that it
> doesn't really fulfill the mandate for why people want REPLACE-like
> statements ... to give them an INSERT-or-UPDATE with *higher* efficiency
> and concurrency than doing two statements.  That being said, I've
> personally designed more than a dozen web applications and have not yet
> been faced with a single circumstance of not knowing whether I wanted to
> INSERT or UPDATE.  I've even ported MySQL apps and found it easy to
> re-code them to do "if $id = 0, then insert ..." without even needing to
> use a pl/pgsql hack.
>

Actually REPLACE is not INSERT or UPDATE...
REPLACE means INSERT if already exists DELETE then INSERT

can be used as an UPDATE if you use the SET clause but, it is optional


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
Simon Riggs
Date:
On Tue, 2005-11-15 at 10:27 -0800, Josh Berkus wrote:

> > The UPSERT concept is also supported by Teradata, who simply append an
> > ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
> > to me to be a fairly small subset of MERGE functionality and we ought to
> > be able to offer that functionality as a side branch of the main work.
> 
> Yes, I guess my hesitation on the full-table-lock strategy is that it 
> doesn't really fulfill the mandate for why people want REPLACE-like 
> statements ... to give them an INSERT-or-UPDATE with *higher* efficiency 
> and concurrency than doing two statements.  

Agreed...

OK, what I said was "...Where there is doubt, we should fall back to
table locking just like the rest of the world, IMHO." I didn't mean we
should use full table locking all of the time. Sorry if I wasn't clear.

What I meant, in context was, IMHO
- we do no need predicate locking
- we should use row level locks when these can be used
- we should use table level locks other times

On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote: 
> Scenario:
> 
> session1: REPLACE .... 1           
>                   session2:  REPLACE ..... 1
> session1: check to see that "1" exists .... no
>                   session2: check to see that "1" exists .... no
> session1: INSERT 1
>                   session2: INSERT 1 .... ERROR
> 
> Get the picture?  The only way to avoid a race condition is to be able to 
> do "predicate locking", that is to lock the table against any data write 
> matching that predicate.

The above situation happens now if you have two people doing SELECT then
INSERT on the same row. Or even if you do UPDATE then INSERT.

Currently if two people INSERT a row with a duplicate PK, we may find
that both INSERTs made it into a data block, but one has its txn aborted
because of a unique index violation.

Nobody complains about that, nor says we should have predicate locking
for that case, so why worry about it for MERGE? I don't see we need a
special effort to avoid: if you try to do the same thing more than once,
only the first one will work. At least you get a nice error message to
let you know its happened. 

Seems like we
- run the query in the USING clause
- join it to the target table using pseudo outer join logic
- if it matches we UPDATE
- if it doesn't we INSERT
...but we do the last two in a single step, to minimise the window of
opportunity for wierd situations.

> That being said, I've 
> personally designed more than a dozen web applications and have not yet 
> been faced with a single circumstance of not knowing whether I wanted to 
> INSERT or UPDATE.  I've even ported MySQL apps and found it easy to 
> re-code them to do "if $id = 0, then insert ..." without even needing to 
> use a pl/pgsql hack.

OK, but not everybody knows what they're doing as well as you do. :-)

> So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
> what MERGE fulfills rather neatly and for that full table locking is 
> perfectly OK, even desirable.  You really don't want to MERGE-load the 
> same table on two threads at once.  

Probably true, but I do want to avoid full table locking for MERGE
whenever possible. Concurrency is important even in data warehousing.

Best Regards, Simon Riggs



Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Jaime,
> 
> > why? seems that REPLACE only work if there are at least one row
> > matching...
> 
> Scenario:
> 
> session1: REPLACE .... 1           
>                   session2:  REPLACE ..... 1
> session1: check to see that "1" exists .... no
>                   session2: check to see that "1" exists .... no
> session1: INSERT 1
>                   session2: INSERT 1 .... ERROR
> 
> Get the picture?  The only way to avoid a race condition is to be able to 
> do "predicate locking", that is to lock the table against any data write 
> matching that predicate.

Isn't the standard practice to do the INSERT, and if that fails, do
the UPDATE?  In fact, if someone is already doing a INSERT inside a
transaction, it will wait for the transaction to complete so you can
then do the UPDATE, or complete the INSERT if the transaction aborted.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > But even REPLACE requires predicate locking.  There's no real way to get 
> > around it.
> 
> The point though is that REPLACE is restricted to a type of predicate
> narrow enough to be enforced through a unique-index mechanism, and so
> it's implementable without solving the general case of predicate
> locking.
> 
> Predicate locking for narrow cases isn't very hard; it's the general
> case of arbitrary predicates that's hard.

My feeling is we should implement MERGE for the limited cases we can,
and throw an error for cases we can not (or require table locking), and
then see what reports we get from users.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Simon Riggs wrote:
> I've been looking at ways of doing INSERT then UPDATE, but it doesn't
> seem very easy to avoid unique index violations in that case. So doing
> the UPDATE first then INSERTs later seems like the way to go.

INSERT has to be first to avoid a race condition (see my previous
emails).  We just need a way to get a unique index violation and
continue with the UPDATE.  We have savepoints now so it certainly seems
possible.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Josh Berkus wrote:
> So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
> what MERGE fulfills rather neatly and for that full table locking is 
> perfectly OK, even desirable.  You really don't want to MERGE-load the 
> same table on two threads at once.  
> 
> The second case is for applications coded for MySQL; this is the REPLACE 
> case.  However, the most common MySQL applications doing this use full 
> table locking (MyISAM) anyway!  So, while full table locking wouldn't gain 
> them any performance over using two statements, it shouldn't lose them 
> anything they're used to having.

For any kind of efficiency, I assume MySQL REPLACE wants a unique index
in place, so practially everyone doing merge probably already has the
setup we need to avoid new non-index predicate locking code.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Just to summarize, MySQL REPLACE is INSERT or DELETE/INSERT, while they
have a SET clauses that allows UPDATE, and INSERT has a ON DUPLICATE KEY
UPDATE clause too.

I think the INSERT ...  ON DUPLICATE KEY is undesirable because this
functionality should have a new keyword in the first position, e.g
MERGE, and I think the REPLACE is out because most people feel that the
DELETE/INSERT functionality is near-useless if we can give users the
INSERT/UPDATE functionality of MERGE.

I think even if we have to restrict MERGE to requiring a unique index,
it is better to go that way than to drag REPLACE into our syntax.  MERGE
can be extended over time, while REPLACE has a non-optimal initial
behavior.  The idea that MERGE can use a constant list (not requiring a
second table) makes it a valid replacement for REPLACE, and other
database support for MERGE reinforces this.

---------------------------------------------------------------------------

pmagnoli@systemevolution.it wrote:
> I think you translated it correctly, MySQL has another way of specifying this
> which is "INSERT ... ON DUPLICATE KEY UPDATE ..."
> (http://dev.mysql.com/doc/refman/5.0/en/insert.html)
> Regards
> 
> Paolo
> 
> Jochem van Dieten <jochemd@gmail.com> ha scritto
> 
> > On 11/13/05, Petr Jelinek wrote:
> > >
> > > I am really not db expert and I don't have copy of sql standard but you
> > > don't need to use 2 tables I think - USING part can also be subquery
> > > (some SELECT) and if I am right then you could simulate what REPLACE
> > > does because in PostgreSQL you are not forced to specify FROM clause in
> > > SELECT. So you could in theory do
> > > MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid)
> ...
> 
> > I think the MySQL statement:
> > REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '0000-00-00', NULL,
> 3)
> 
> > would translate into the following MERGE statement:
> > MERGE INTO table target
> > USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source
> > ON target.pknew = source.pk
> > WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 =
> col3new
> > WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew,
> > col1new, col2new, col3new)
> 
> > It might not be the most elegant solution, but I don't see why it won't
> work.
> 
> > Jochem
> 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> 
> >                http://archives.postgresql.org
> > 
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
"Jim C. Nasby"
Date:
On Tue, Nov 15, 2005 at 07:16:21PM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> > > But even REPLACE requires predicate locking.  There's no real way to get 
> > > around it.
> > 
> > The point though is that REPLACE is restricted to a type of predicate
> > narrow enough to be enforced through a unique-index mechanism, and so
> > it's implementable without solving the general case of predicate
> > locking.
> > 
> > Predicate locking for narrow cases isn't very hard; it's the general
> > case of arbitrary predicates that's hard.
> 
> My feeling is we should implement MERGE for the limited cases we can,
> and throw an error for cases we can not (or require table locking), and
> then see what reports we get from users.

We should probably throw a notice or warning if we go to a table lock,
too.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: MERGE vs REPLACE

From
Christopher Kings-Lynne
Date:
> We should probably throw a notice or warning if we go to a table lock,
> too.

That's not very useful, because you can only do somethign about it AFTER 
the 1 hour exclusive lock merge has already run :)

Chris



Re: MERGE vs REPLACE

From
"Jim C. Nasby"
Date:
On Wed, Nov 16, 2005 at 01:27:29PM +0800, Christopher Kings-Lynne wrote:
> >We should probably throw a notice or warning if we go to a table lock,
> >too.
> 
> That's not very useful, because you can only do somethign about it AFTER 
> the 1 hour exclusive lock merge has already run :)

Not true; it would be useful for development when you'd like to know
that some statement is grabbing a table lock. This is something that you
wouldn't normally notice in a dev environment, and it sounds like it'd
be easy to do a merge that has the unintended effect of grabbing a table
lock.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: MERGE vs REPLACE

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>> We should probably throw a notice or warning if we go to a table lock,
>> too.

> That's not very useful, because you can only do somethign about it AFTER 
> the 1 hour exclusive lock merge has already run :)

We shouldn't do anything remotely like that.  A statement whose locking
effects can't be predicted on sight is horrid both from the user's
viewpoint and from the implementation viewpoint.  In particular, if we
have to do planning before we can determine whether the table needs just
a SELECT lock or something stronger, then we have to take a weak lock to
do the planning and then we are faced with upgrading to the stronger
lock at runtime.  Can you say "deadlock risk"?

I think we should do REPLACE-like functionality that simply fails if the
match condition isn't equality on a primary key.  If we can use SQL-spec
MERGE syntax for this, that's fine, but let's not think in terms of
silently changing to a stronger table lock and a much slower
implementation when the condition isn't a primary key.  That's a whole
lot of work that isn't solving any real-world problems, and *is*
creating a foot-gun for people to cause themselves performance and
deadlock problems anytime they spell the WHERE condition slightly wrong.
        regards, tom lane


Re: MERGE vs REPLACE

From
Rick Gigger
Date:
I agree.  I would never ever ever want it to silently start doing  
table locks.  I would simply avoid using merge at all if that was a  
possibility.

However it seems like the idea is to eventually flesh out full  
fledged merge.  And to do that it sounds like you would need to do  
one of the following:

1) implement predicate locking beyond the simple "match on unique  
index" case that we have here
2) do full table locks.

It sounds like #1 isn't going to happen for a while.  So in order to  
do more complicated merges you will need to do #2.  If you are going  
to implement more complicated merge functionality I certainly  
wouldn't want it throwing a warning telling me about a table lock if  
I had already knew it would get the table lock and decided I wanted  
to go ahead with using merge anyway.

Could you let the user create the lock himself to handle this  
situation?  For instance:

analyze the merge
if merge condition matches unique indexmerge without  table locking
elseif needed table lock already existsmerge
elsethrow an error

You could also just add something to the merge syntax like ALLOW  
TABLE LOCK or something.  The idea is just that the user can  
explicitly allow the table lock and thus the more complicated merge.

I don't really know anything about the implementation details but  
that is the behavior that I would prefer.  That way I could always do  
a complicated merge if I wanted to but there is no way it would ever  
do an implicit table lock on me.  And it would never throw an error/ 
warning unless I actually did something questionable.

Does that make sense.

Rick Gigger

On Nov 16, 2005, at 7:49 AM, Tom Lane wrote:

> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>>> We should probably throw a notice or warning if we go to a table  
>>> lock,
>>> too.
>
>> That's not very useful, because you can only do somethign about it  
>> AFTER
>> the 1 hour exclusive lock merge has already run :)
>
> We shouldn't do anything remotely like that.  A statement whose  
> locking
> effects can't be predicted on sight is horrid both from the user's
> viewpoint and from the implementation viewpoint.  In particular, if we
> have to do planning before we can determine whether the table needs  
> just
> a SELECT lock or something stronger, then we have to take a weak  
> lock to
> do the planning and then we are faced with upgrading to the stronger
> lock at runtime.  Can you say "deadlock risk"?
>
> I think we should do REPLACE-like functionality that simply fails  
> if the
> match condition isn't equality on a primary key.  If we can use SQL- 
> spec
> MERGE syntax for this, that's fine, but let's not think in terms of
> silently changing to a stronger table lock and a much slower
> implementation when the condition isn't a primary key.  That's a whole
> lot of work that isn't solving any real-world problems, and *is*
> creating a foot-gun for people to cause themselves performance and
> deadlock problems anytime they spell the WHERE condition slightly  
> wrong.
>
>             regards, tom lane
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that  
> your
>        message can get through to the mailing list cleanly
>



Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
> You could also just add something to the merge syntax like ALLOW
> TABLE LOCK or something.  The idea is just that the user can
> explicitly allow the table lock and thus the more complicated merge.
>

The problem here is that many people will see that option and think
it's safe to do it... i mean, many people will shoot themselves in the
foot and the culprit will be PostgreSQL because he let a ready to
shoot gun in a visible place when are kids around

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Interesting approach.  Actually, we could tell the user they have to use
BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
already have a table lock.

---------------------------------------------------------------------------

Rick Gigger wrote:
> I agree.  I would never ever ever want it to silently start doing  
> table locks.  I would simply avoid using merge at all if that was a  
> possibility.
> 
> However it seems like the idea is to eventually flesh out full  
> fledged merge.  And to do that it sounds like you would need to do  
> one of the following:
> 
> 1) implement predicate locking beyond the simple "match on unique  
> index" case that we have here
> 2) do full table locks.
> 
> It sounds like #1 isn't going to happen for a while.  So in order to  
> do more complicated merges you will need to do #2.  If you are going  
> to implement more complicated merge functionality I certainly  
> wouldn't want it throwing a warning telling me about a table lock if  
> I had already knew it would get the table lock and decided I wanted  
> to go ahead with using merge anyway.
> 
> Could you let the user create the lock himself to handle this  
> situation?  For instance:
> 
> analyze the merge
> if merge condition matches unique index
>     merge without  table locking
> elseif needed table lock already exists
>     merge
> else
>     throw an error
> 
> You could also just add something to the merge syntax like ALLOW  
> TABLE LOCK or something.  The idea is just that the user can  
> explicitly allow the table lock and thus the more complicated merge.
> 
> I don't really know anything about the implementation details but  
> that is the behavior that I would prefer.  That way I could always do  
> a complicated merge if I wanted to but there is no way it would ever  
> do an implicit table lock on me.  And it would never throw an error/ 
> warning unless I actually did something questionable.
> 
> Does that make sense.
> 
> Rick Gigger
> 
> On Nov 16, 2005, at 7:49 AM, Tom Lane wrote:
> 
> > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> >>> We should probably throw a notice or warning if we go to a table  
> >>> lock,
> >>> too.
> >
> >> That's not very useful, because you can only do somethign about it  
> >> AFTER
> >> the 1 hour exclusive lock merge has already run :)
> >
> > We shouldn't do anything remotely like that.  A statement whose  
> > locking
> > effects can't be predicted on sight is horrid both from the user's
> > viewpoint and from the implementation viewpoint.  In particular, if we
> > have to do planning before we can determine whether the table needs  
> > just
> > a SELECT lock or something stronger, then we have to take a weak  
> > lock to
> > do the planning and then we are faced with upgrading to the stronger
> > lock at runtime.  Can you say "deadlock risk"?
> >
> > I think we should do REPLACE-like functionality that simply fails  
> > if the
> > match condition isn't equality on a primary key.  If we can use SQL- 
> > spec
> > MERGE syntax for this, that's fine, but let's not think in terms of
> > silently changing to a stronger table lock and a much slower
> > implementation when the condition isn't a primary key.  That's a whole
> > lot of work that isn't solving any real-world problems, and *is*
> > creating a foot-gun for people to cause themselves performance and
> > deadlock problems anytime they spell the WHERE condition slightly  
> > wrong.
> >
> >             regards, tom lane
> >
> > ---------------------------(end of  
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >        subscribe-nomail command to majordomo@postgresql.org so that  
> > your
> >        message can get through to the mailing list cleanly
> >
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
On 11/16/05, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
>
> Interesting approach.  Actually, we could tell the user they have to use
> BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
> already have a table lock.
>

If the lock will be required, what's the problem in doing it
internally? without user interaction?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
Tom Lane
Date:
Jaime Casanova <systemguards@gmail.com> writes:
> If the lock will be required, what's the problem in doing it
> internally?

I already explained that: lock upgrading is deadlock-prone.
        regards, tom lane


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Jaime Casanova wrote:
> On 11/16/05, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> >
> > Interesting approach.  Actually, we could tell the user they have to use
> > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
> > already have a table lock.
> >
> 
> If the lock will be required, what's the problem in doing it
> internally? without user interaction?

Because first, we are making it visible to the user, and second the lock
is taken out _before_ we actually execute the statement, meaning we
don't have to escalate our locks.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:
>
> Interesting approach.  Actually, we could tell the user they have to use
> BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
> already have a table lock.

The bit I'm still missing is why there needs to be a lock at all. The
SQL standard doesn't say anywhere that concurrent MERGE operations
can't conflict. It seems to me that standard visibility rules apply. If
neither MERGE statement can see the results of the other, then they
will both INSERT. If you don't have a UNIQUE constraint to prevent this
then what's the problem?

It seems to me people would like, in the case of an existing UNIQUE
constraint, to be able to use it to prevent "duplicate key" errors.
This is nice, but the standard doesn't require that either.

In other words, if we can use an index to avoid duplicate key errors,
fine. But if there is no index available, it is not an error to do an
INSERT because another INSERT was hidden from you.

Conceptually, a MERGE statement is just a long string of INSERTs and
UPDATEs in the same transaction and I think we should treat it as
such.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:
> > 
> > Interesting approach.  Actually, we could tell the user they have to use
> > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
> > already have a table lock.
> 
> The bit I'm still missing is why there needs to be a lock at all. The
> SQL standard doesn't say anywhere that concurrent MERGE operations
> can't conflict. It seems to me that standard visibility rules apply. If
> neither MERGE statement can see the results of the other, then they
> will both INSERT. If you don't have a UNIQUE constraint to prevent this
> then what's the problem?

I assume they want MERGE because they don't want duplicates.  If they
don't care, they would have used INSERT.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Simon Riggs
Date:
On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote:
> On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:
> > 
> > Interesting approach.  Actually, we could tell the user they have to use
> > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
> > already have a table lock.
> 
> The bit I'm still missing is why there needs to be a lock at all. The
> SQL standard doesn't say anywhere that concurrent MERGE operations
> can't conflict. It seems to me that standard visibility rules apply. If
> neither MERGE statement can see the results of the other, then they
> will both INSERT. If you don't have a UNIQUE constraint to prevent this
> then what's the problem?
> 
> It seems to me people would like, in the case of an existing UNIQUE
> constraint, to be able to use it to prevent "duplicate key" errors.
> This is nice, but the standard doesn't require that either. 
> 
> In other words, if we can use an index to avoid duplicate key errors,
> fine. But if there is no index available, it is not an error to do an
> INSERT because another INSERT was hidden from you.
> 
> Conceptually, a MERGE statement is just a long string of INSERTs and
> UPDATEs in the same transaction and I think we should treat it as
> such.

Agreed.

Best Regards, Simon Riggs



Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Wed, Nov 16, 2005 at 12:59:36PM -0500, Bruce Momjian wrote:
> Martijn van Oosterhout wrote:
> > The bit I'm still missing is why there needs to be a lock at all. The
> > SQL standard doesn't say anywhere that concurrent MERGE operations
> > can't conflict. It seems to me that standard visibility rules apply. If
> > neither MERGE statement can see the results of the other, then they
> > will both INSERT. If you don't have a UNIQUE constraint to prevent this
> > then what's the problem?
>
> I assume they want MERGE because they don't want duplicates.  If they
> don't care, they would have used INSERT.

The whole point of MERGE was because you can do different things on
INSERT or UPDATE. If you didn't care about the UPDATE case you could
indeed do just inserts...

I just don't think we should tie ourselves in knots over details that
the spec doesn't require. If the spec doesn't require "predicate
locking" then why on earth should we provide it if it's so difficult?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Simon Riggs
> Sent: Wednesday, November 16, 2005 10:35 AM
> To: Martijn van Oosterhout
> Cc: Bruce Momjian; Rick Gigger; Tom Lane; Christopher Kings-Lynne; Jim
C.
> Nasby; josh@agliodbs.com; pgsql-hackers@postgresql.org; Jaime
Casanova;
> Peter Eisentraut
> Subject: Re: [HACKERS] MERGE vs REPLACE
>
> On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote:
> > On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:
> > >
> > > Interesting approach.  Actually, we could tell the user they have
to
> use
> > > BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
> > > already have a table lock.
> >
> > The bit I'm still missing is why there needs to be a lock at all.
The
> > SQL standard doesn't say anywhere that concurrent MERGE operations
> > can't conflict. It seems to me that standard visibility rules apply.
If
> > neither MERGE statement can see the results of the other, then they
> > will both INSERT. If you don't have a UNIQUE constraint to prevent
this
> > then what's the problem?
> >
> > It seems to me people would like, in the case of an existing UNIQUE
> > constraint, to be able to use it to prevent "duplicate key" errors.
> > This is nice, but the standard doesn't require that either.
> >
> > In other words, if we can use an index to avoid duplicate key
errors,
> > fine. But if there is no index available, it is not an error to do
an
> > INSERT because another INSERT was hidden from you.
> >
> > Conceptually, a MERGE statement is just a long string of INSERTs and
> > UPDATEs in the same transaction and I think we should treat it as
> > such.

Merge could also be considered as a long string of deletes and inserts.
I guess that deleting those records that already exist and then
inserting all of the records is faster because it could be done like a
single join to perform the delete and then a single batch insert.


Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote:
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> > owner@postgresql.org] On Behalf Of Simon Riggs
> > > Conceptually, a MERGE statement is just a long string of INSERTs and
> > > UPDATEs in the same transaction and I think we should treat it as
> > > such.
>
> Merge could also be considered as a long string of deletes and inserts.
> I guess that deleting those records that already exist and then
> inserting all of the records is faster because it could be done like a
> single join to perform the delete and then a single batch insert.

And for us it makes no difference because in MVCC, UPDATE == DELETE +
INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs
in the same statement.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
Rick Gigger
Date:
>> Conceptually, a MERGE statement is just a long string of INSERTs and
>> UPDATEs in the same transaction and I think we should treat it as
>> such.

I've just got one question about this.  Everyone seems to be saying  
that "try to insert and if that fails update" is the same as "try to  
insert and if that fails delete and then insert".

What about the following scenario:

mytable
------------
id  serial primary key,
a  int4,
b  int4,
data text

I've got an id field on every table because it simplifies a lot of  
things (such as slony configuration for example)
But I've also got a unique key on (a, b) and if I was to do a merge I  
would most likely do it in (a, b) not id.

If merge does a delete insert then it creates new values for the id  
columns which could cause me problems.  Basically any default fields  
are going to change or for that matter any fields not specified would  
be reinitialized whereas an update would leave them in place.

It seems to me that "try to update and if that fails insert" seems to  
be the best approach for not messing with existing data.  I guess   
"try to insert and if that fails update" gets you the same effect.

- Rick Gigger


Re: MERGE vs REPLACE

From
daveg
Date:
On Wed, Nov 16, 2005 at 09:49:28AM -0500, Tom Lane wrote:
> I think we should do REPLACE-like functionality that simply fails if the
> match condition isn't equality on a primary key.  If we can use SQL-spec
> MERGE syntax for this, that's fine, but let's not think in terms of
> silently changing to a stronger table lock and a much slower
> implementation when the condition isn't a primary key.  That's a whole

I agree, but would like to relax the primary key requirement to simply
a unique index. I can see use cases for unique so long as not null keys,
so it would be nice if the MERGE operation would work for these. As nulls
are not "equal" anyway this doesn't seem to do too much violence to the
semantics.

-dg

-- 
David Gould                      daveg@sonic.net
If simplicity worked, the world would be overrun with insects.


Re: MERGE vs REPLACE

From
Tom Lane
Date:
daveg <daveg@sonic.net> writes:
> I agree, but would like to relax the primary key requirement to simply
> a unique index. I can see use cases for unique so long as not null keys,
> so it would be nice if the MERGE operation would work for these. As nulls
> are not "equal" anyway this doesn't seem to do too much violence to the
> semantics.

But a "unique" key doesn't guarantee that there's only one matching row,
so ISTM you're right back to needing a predicate lock if you do that.
        regards, tom lane


Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Wed, Nov 16, 2005 at 04:51:07PM -0500, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > I agree, but would like to relax the primary key requirement to simply
> > a unique index. I can see use cases for unique so long as not null keys,
> > so it would be nice if the MERGE operation would work for these. As nulls
> > are not "equal" anyway this doesn't seem to do too much violence to the
> > semantics.
>
> But a "unique" key doesn't guarantee that there's only one matching row,
> so ISTM you're right back to needing a predicate lock if you do that.

But there is no need to guarentee anything. As the spec says, if the
join of the table with the other clauses matches a row in the table
more than once, raise a cardinality exception. If someone creates a
join that matches more than once the whole statement fails. But you can
work that out at runtime. If the user specifies NOT NULL in the join
condition then it can work and there no reason to forbid that.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
Csaba Nagy
Date:
Well, from my point of view it is more than delete and insert. That I
can do right now with existing infrastructure. The problem I try to
solve is something along: a bunch of clients try to update a count, and
ONE of them must initialize the count if it does not exist... this can't
be done with current infrastructure without race conditions.
Our current solution is to initialize all the possible counts
beforehand, but that suboptimal as only a few of them will actually have
data coming in later...
And of course the import problem... we can have multiple concurrent
imports, which must insert just once per some unique criteria, and
update if the record is already there, and all this in a batch. This is
also not possible without race conditions or aggressive locking.
So for me the atomic, consistent and without performance penalties
"update_or_insert_based_on_unique_criteria" does have a value, and
that's coming exactly from the locking of the unique index which
eliminates the race condition from this operation. I don't care about
syntax sugar, just about things I could do more efficiently if this
mechanism were in place...

Cheers,
Csaba.

On Wed, 2005-11-16 at 20:33, Martijn van Oosterhout wrote:
> On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote:
> > > -----Original Message-----
> > > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> > > owner@postgresql.org] On Behalf Of Simon Riggs
> > > > Conceptually, a MERGE statement is just a long string of INSERTs and
> > > > UPDATEs in the same transaction and I think we should treat it as
> > > > such.
> > 
> > Merge could also be considered as a long string of deletes and inserts.
> > I guess that deleting those records that already exist and then
> > inserting all of the records is faster because it could be done like a
> > single join to perform the delete and then a single batch insert.
> 
> And for us it makes no difference because in MVCC, UPDATE == DELETE +
> INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs
> in the same statement.
> 
> Have a nice day,



Re: MERGE vs REPLACE

From
"Zeugswetter Andreas DCP SD"
Date:
> The problem I try to solve is something along: a bunch of clients try
to update a count, and ONE of them must initialize > the count if it
does not exist... this can't be done with current infrastructure without
race conditions.

The solution without merge but a unique key in other db's is:

update
if no rows updated insert if duplicate key   update   if no rows updated goto insert

note, that the counter updates need to be of the form set x = x + ?
where key=y
do you see a potential race condition with this ?
In pg you also need a savepoint before the insert for this to work.

Depending on the ratio of insert vs update we also start with insert
when
the insert succeeds more that 50% (I would use a higher percentage with
pg though):

insert if duplicate key   update   if no rows updated goto insert

Andreas


Re: MERGE vs REPLACE

From
Csaba Nagy
Date:
Yes, these algorithms are clear to me, but they don't work for batch
updates in postgres without savepoints before each row insert/update,
which is not good for performance (not to mention on older postgres
versions without savepoint support it won't work at all). If there is a
way of no race condition, no performance penalty, that would be
something new and useful. I just guess the MERGE would provide that.

Cheers,
Csaba.

On Thu, 2005-11-17 at 12:34, Zeugswetter Andreas DCP SD wrote:
> > The problem I try to solve is something along: a bunch of clients try
> to update a count, and ONE of them must initialize > the count if it
> does not exist... this can't be done with current infrastructure without
> race conditions.
> 
> The solution without merge but a unique key in other db's is:
> 
> update
> if no rows updated
>   insert
>   if duplicate key
>     update
>     if no rows updated goto insert
> 
> note, that the counter updates need to be of the form set x = x + ?
> where key=y 
> do you see a potential race condition with this ?
> In pg you also need a savepoint before the insert for this to work.
> 
> Depending on the ratio of insert vs update we also start with insert
> when 
> the insert succeeds more that 50% (I would use a higher percentage with
> pg though):
> 
> insert
>   if duplicate key
>     update
>     if no rows updated goto insert
> 
> Andreas



Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote:
> Yes, these algorithms are clear to me, but they don't work for batch
> updates in postgres without savepoints before each row insert/update,
> which is not good for performance (not to mention on older postgres
> versions without savepoint support it won't work at all). If there is a
> way of no race condition, no performance penalty, that would be
> something new and useful. I just guess the MERGE would provide that.

Well, then you guess wrong. This isn't what MERGE is for. MERGE is just
a neat way of specifying the UPDATE and INSERT cases in the same
statement. It doesn't remove the possibility duplicate inserts and thus
primary key violations.

If someone wants to make extensions to MERGE so that it can avoid the
race condition and avoid the duplicate key violations, that's fine. But
be aware that this is outside of the spec. It may be a useful addition,
but perhaps we should consider MERGE and REPLACE as completely seperate
targets.

MERGE has a whole join construction with subqueries that would be a
pain to make work in a way that is truly serialisable. REPLACE deals
with only one row and tries to solve the race for that case only. Much
easier to consider them seperately, no?

I guess what's really irritating is that this clearly exposes the case
listed in the docs as "Why SERIALIZABLE isn't in all cases". If we
could solve that for MERGE, we could probably solve it in the general
case too.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
Csaba Nagy
Date:
OK, in this case I don't care about either MERGE or REPLACE, but for an
UPSERT which does the locking :-)

Cheers,
Csaba.

On Thu, 2005-11-17 at 13:32, Martijn van Oosterhout wrote:
> On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote:
> > Yes, these algorithms are clear to me, but they don't work for batch
> > updates in postgres without savepoints before each row insert/update,
> > which is not good for performance (not to mention on older postgres
> > versions without savepoint support it won't work at all). If there is a
> > way of no race condition, no performance penalty, that would be
> > something new and useful. I just guess the MERGE would provide that.
> 
> Well, then you guess wrong. This isn't what MERGE is for. MERGE is just
> a neat way of specifying the UPDATE and INSERT cases in the same
> statement. It doesn't remove the possibility duplicate inserts and thus
> primary key violations.
> 
> If someone wants to make extensions to MERGE so that it can avoid the
> race condition and avoid the duplicate key violations, that's fine. But
> be aware that this is outside of the spec. It may be a useful addition,
> but perhaps we should consider MERGE and REPLACE as completely seperate
> targets.
> 
> MERGE has a whole join construction with subqueries that would be a
> pain to make work in a way that is truly serialisable. REPLACE deals
> with only one row and tries to solve the race for that case only. Much
> easier to consider them seperately, no?
> 
> I guess what's really irritating is that this clearly exposes the case
> listed in the docs as "Why SERIALIZABLE isn't in all cases". If we
> could solve that for MERGE, we could probably solve it in the general
> case too.
> 
> Have a nice day,



Re: MERGE vs REPLACE

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
> OK, in this case I don't care about either MERGE or REPLACE, but for an
> UPSERT which does the locking :-)

This is exactly the point --- pretty much nobody has come to us and
asked for a feature that does what Peter and Martijn say MERGE does.
(I haven't bothered to look at the 2003 spec, I'm assuming they read it
correctly.)  What we *have* been asked for, over and over, is an
insert-or-update feature that's not so tedious and inefficient as the
savepoint-insert-rollback-update kluge.  That's what we ought to be
concentrating on providing.
        regards, tom lane


Re: MERGE vs REPLACE

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> This is exactly the point --- pretty much nobody has come to us and
> asked for a feature that does what Peter and Martijn say MERGE does.
> (I haven't bothered to look at the 2003 spec, I'm assuming they read it
> correctly.)  What we *have* been asked for, over and over, is an
> insert-or-update feature that's not so tedious and inefficient as the
> savepoint-insert-rollback-update kluge.  That's what we ought to be
> concentrating on providing.

I guess to be clear on what this distinction actually is, specifically:
MERGE under SQL2003 doesn't appear to be intended to be used
concurrently.  For data warehousing situations this can be just fine
such as in my case where I get a monthly update of some information and
need to merge that update in with the prior information.  In this case
there's only one MERGE running and I'd hope it'd be faster than doing
check for existance, insert/update on each row in plpgsql or something
(since there'd be multiple index lookups, etc, I think).  Concurrent
MERGEs running *can* fail, just like whole transactions which do the
check/insert/update can fail.

REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a
transaction which is supposed to not fail but instead do locking to
ensure that it doesn't fail.  This requires predicate locking to be
efficient because you want to tell the concurrent transaction "if you
have the same key as me, just wait a second and you can do an update
'cause I'm going to create the key if it doesn't exist before I'm done".

I think REPLACE/INSERT ON DUPLICATE UPDATE is definitely harder to do
than MERGE because of the idea that it isn't supposed to fail generally.
I think SQL2003 MERGE would be reasonably easy to do and to get the
efficiency benefits out of it (assuming there are some to be had in the
end).

I don't think MERGE can really be made to be both though, in which case
it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
DUPLICATE UPDATE something else.  Perhaps a special form of MERGE where
you know it's going to be doing that locking.  I really don't like the
idea of making the SQL2003 version of MERGE be the MERGE special case
(by requiring someone to take a table lock ahead of time or do something
else odd).
Thanks,
    Stephen

Re: MERGE vs REPLACE

From
mark@mark.mielke.cc
Date:
On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
> REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a
> transaction which is supposed to not fail but instead do locking to
> ensure that it doesn't fail.  This requires predicate locking to be
> efficient because you want to tell the concurrent transaction "if you
> have the same key as me, just wait a second and you can do an update
> 'cause I'm going to create the key if it doesn't exist before I'm done".

Is the requirement for predicate locking, over and above a unique
constraint on an index that involves the record key, to deal with
the scenario of two inserts executing at the same time, both before
commit?

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Unless you have a table lock, INSERT has to be before UPDATE, think
UPDATE, UPDATE (both fail), INSERT, INSERT.

---------------------------------------------------------------------------

Zeugswetter Andreas DCP SD wrote:
> > The problem I try to solve is something along: a bunch of clients try
> to update a count, and ONE of them must initialize > the count if it
> does not exist... this can't be done with current infrastructure without
> race conditions.
> 
> The solution without merge but a unique key in other db's is:
> 
> update
> if no rows updated
>   insert
>   if duplicate key
>     update
>     if no rows updated goto insert
> 
> note, that the counter updates need to be of the form set x = x + ?
> where key=y 
> do you see a potential race condition with this ?
> In pg you also need a savepoint before the insert for this to work.
> 
> Depending on the ratio of insert vs update we also start with insert
> when 
> the insert succeeds more that 50% (I would use a higher percentage with
> pg though):
> 
> insert
>   if duplicate key
>     update
>     if no rows updated goto insert
> 
> Andreas
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
mark@mark.mielke.cc wrote:
> On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
> > REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a
> > transaction which is supposed to not fail but instead do locking to
> > ensure that it doesn't fail.  This requires predicate locking to be
> > efficient because you want to tell the concurrent transaction "if you
> > have the same key as me, just wait a second and you can do an update
> > 'cause I'm going to create the key if it doesn't exist before I'm done".
> 
> Is the requirement for predicate locking, over and above a unique
> constraint on an index that involves the record key, to deal with
> the scenario of two inserts executing at the same time, both before
> commit?

No.  If you have a primary key you can easily prevent duplicates.  You
need a table lock or predicate locking to prevent duplicates if you do
not have a primary key.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Tom Lane wrote:
> Csaba Nagy <nagy@ecircle-ag.com> writes:
> > OK, in this case I don't care about either MERGE or REPLACE, but for an
> > UPSERT which does the locking :-)
> 
> This is exactly the point --- pretty much nobody has come to us and
> asked for a feature that does what Peter and Martijn say MERGE does.
> (I haven't bothered to look at the 2003 spec, I'm assuming they read it
> correctly.)  What we *have* been asked for, over and over, is an
> insert-or-update feature that's not so tedious and inefficient as the
> savepoint-insert-rollback-update kluge.  That's what we ought to be
> concentrating on providing.

I am confused over the various options.  I have heard these syntaxes:
SQL2003 MERGEMySQL REPLACE    http://dev.mysql.com/doc/refman/5.1/en/replace.htmlMySQL INSERT VIOLATION ...UPSERT

So it seems MERGE does not have the use-case we most need, though it can
be bent to do it.  (Given their MATCH syntax, it doesn't seem there is
any logic that it tries INSERT first).

Looking at the MySQL URL above, REPLACE has three possible syntaxes with
normal (DELETE), SET (UPDATE), and SELECT.  Is this the direction we
need to go?  I don't like INSERT ... VIOLATION because I would like a
new keyword for this.  Is UPSERT the same as REPLACE?  Should we use
UPSERT instead?
--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Dennis Bjorklund
Date:
On Thu, 17 Nov 2005, Bruce Momjian wrote:

> Unless you have a table lock, INSERT has to be before UPDATE, think
> UPDATE, UPDATE (both fail), INSERT, INSERT.

No matter what operation you start with you need a loop that try 
insert/update until one of them succeed like in this example:

http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

Without a loop you might not get to execute neither the insert nor the 
update. Why? Think about this example:

BEGIN

INSERT      <- fail because there is a row already
           <- before we manage to do the update someone              delete the row (which we can see in the
 default transaction isolation level)
 

UPDATE      <- fail because there is no row so we will loop              and try the insert again
           <- before we manage to do the insert someone else does              an insert

INSERT      <- fail because there is a row already
           <- before we manage to do the update someone              delete the row 
....


You might need to loop any number of times before you manage to perform
one of the two operations. Which operation you should start with depends
on which of the two cases is the common one.

-- 
/Dennis Björklund



Re: MERGE vs REPLACE

From
"Zeugswetter Andreas DCP SD"
Date:
> Unless you have a table lock, INSERT has to be before UPDATE, think
UPDATE, UPDATE (both fail), INSERT, INSERT.

> > update
> > if no rows updated
> >   insert
> >   if duplicate key
> >     update
> >     if no rows updated goto insert

That is why you have the loop. This is not a problem with above code,
because only one insert succeeds
while the others then do the update.

Andreas


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Oh, good point.  I was thinking just about concurrent MERGEs.  However,
it is more complicated than that.  By definitaion you can not see
changes from other transactions while your statement is being run (even
if you increment CommandCounter), so to be atomic, you would still see
the row even though some other transaction had deleted it.

I think we avoid that now because UPDATE, (which is a DELETE then
INSERT) chains the tuples together so others see the activity happening.

Seems like we are going to have to peek at rows like we do now for
INSERT and peek at index rows, if I remember correctly.  I can't think
of any other place in the code where we loop around other backend's
activity like this.

This could be tricky.

---------------------------------------------------------------------------

Dennis Bjorklund wrote:
> On Thu, 17 Nov 2005, Bruce Momjian wrote:
> 
> > Unless you have a table lock, INSERT has to be before UPDATE, think
> > UPDATE, UPDATE (both fail), INSERT, INSERT.
> 
> No matter what operation you start with you need a loop that try 
> insert/update until one of them succeed like in this example:
> 
> http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
> 
> Without a loop you might not get to execute neither the insert nor the 
> update. Why? Think about this example:
> 
> BEGIN
> 
> INSERT      <- fail because there is a row already
> 
>             <- before we manage to do the update someone
>                delete the row (which we can see in the
>                default transaction isolation level)
> 
> UPDATE      <- fail because there is no row so we will loop
>                and try the insert again
> 
>             <- before we manage to do the insert someone else does
>                an insert
> 
> INSERT      <- fail because there is a row already
> 
>             <- before we manage to do the update someone
>                delete the row 
> ....
> 
> 
> You might need to loop any number of times before you manage to perform
> one of the two operations. Which operation you should start with depends
> on which of the two cases is the common one.
> 
> -- 
> /Dennis Bj?rklund
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Oh, good point.  I was thinking just about concurrent MERGEs.  However,
> it is more complicated than that.  By definitaion you can not see
> changes from other transactions while your statement is being run (even
> if you increment CommandCounter), so to be atomic, you would still see
> the row even though some other transaction had deleted it.

We would have to use the same semantics we use now for read-committed
UPDATE, that is look at the latest version of the row even though this
would not normally be visible to the transaction's snapshot.

In the case of a serializable transaction, no doubt we should fail if
any concurrent change actually happens.
        regards, tom lane


Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Oh, good point.  I was thinking just about concurrent MERGEs.  However,
> > it is more complicated than that.  By definitaion you can not see
> > changes from other transactions while your statement is being run (even
> > if you increment CommandCounter), so to be atomic, you would still see
> > the row even though some other transaction had deleted it.
> 
> We would have to use the same semantics we use now for read-committed
> UPDATE, that is look at the latest version of the row even though this
> would not normally be visible to the transaction's snapshot.
> 
> In the case of a serializable transaction, no doubt we should fail if
> any concurrent change actually happens.

I have some psaudocode to explain what we want for this feature,
whatever syntax we choose:
StartCheck unique indexFound    lock row for update    if zero rows, return to start    if more than one row, fail
updaterowNotfound    create savepoint    insert row into heap    lock index page    if conflicting index entry, abort
savepoint,return to start    add index entry    unlock index page
 

While the "notfound" case might look strange, we actually use this exact
method for inserts now, see ExecInsert() and _bt_doinsert(). 
Particularly see this comment in the second function:
   /*    * If we're not allowing duplicates, make sure the key isn't already in    * the index.    *    * NOTE:
obviously,_bt_check_uniquecan only detect keys that are already in    * the index; so it cannot defend against
concurrentinsertions of the    * same key.  We protect against that by means of holding a write lock on    * the target
page. Any other would-be inserter of the same key must    * acquire a write lock on the same target page, so only one
would-be   * inserter can be making the check at one time.  Furthermore, once we are    * past the check we hold write
lockscontinuously until we have performed    * our insertion, so no later inserter can fail to see our insertion.    *
(Thisrequires some care in _bt_insertonpg.)    *    * If we must wait for another xact, we release the lock while
waiting,and    * then must start over completely.    */
 

Here is the unique check error from _bt_check_unique():
                   ereport(ERROR,                           (errcode(ERRCODE_UNIQUE_VIOLATION),
errmsg("duplicatekey violates unique constraint \"%s\"",                          RelationGetRelationName(rel))));
 

I think the problem here is that it is going to longjump() back to
postgres.c (and out of your code loop).  While we have savepoints, I
think they only work coming from client applications, rather than inside
our code.  Ideally you would like to be able to say:
savepoint();func();rollback_to_savepoint();

but you can't, so I think you are going to have to factor out that
unique error callback and return a failure code to the caller.  I
suppose some boolean flag need to be added to _bt_doinsert(), but that
is called via a function pointer for the index type, so you are going to
have to update the insert function signatures for all access methods. 
The good news is that only btree supports unique indexes, according to
the documentation ("Only B-tree currently supports unique indexes") so
for the other access methods the extra parameter is just ignored.

Another issue is multiple unique indexes.  What if the first unique
index matches one row, but a different row matches the second unique
indexed column?  Fail because unique checks do not identify exactly one
row?

Or the _new_ value for the second indexed column conflicts with the
second unique index. The MERGE/REPLACE should fail.  The UPDATE block
will handle this on its own, but the INSERT block will need to check for
that an really error out, rather than return to the caller, so the loop
in ExecInsertIndexTuples() has to restart on unique failure _only_ on
the first index check, not the subsequent ones.

One simplification would be to allow MERGE/REPLACE only on a table that
has a single unique index.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
"Jim C. Nasby"
Date:
On Thu, Nov 17, 2005 at 09:30:43PM -0500, Bruce Momjian wrote:
> > Is the requirement for predicate locking, over and above a unique
> > constraint on an index that involves the record key, to deal with
> > the scenario of two inserts executing at the same time, both before
> > commit?
> 
> No.  If you have a primary key you can easily prevent duplicates.  You
> need a table lock or predicate locking to prevent duplicates if you do
> not have a primary key.

AFAIK you can also accomplish this without a table lock as long as you
have a unique index on the right set of fields and those fields are also
NOT NULL. ISTM it would be good to support that case as well, since you
might want to MERGE based on something other than the PK.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: MERGE vs REPLACE

From
"Jim C. Nasby"
Date:
On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
> I don't think MERGE can really be made to be both though, in which case
> it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
> DUPLICATE UPDATE something else.  Perhaps a special form of MERGE where
> you know it's going to be doing that locking.  I really don't like the
> idea of making the SQL2003 version of MERGE be the MERGE special case
> (by requiring someone to take a table lock ahead of time or do something
> else odd).

Anyone know off-hand what the big 3 do? If the industry consensus is
that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then
it's probably better to follow that lead.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: MERGE vs REPLACE

From
Petr Jelinek
Date:
Jim C. Nasby wrote:
> On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
> 
>>I don't think MERGE can really be made to be both though, in which case
>>it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
>>DUPLICATE UPDATE something else.  Perhaps a special form of MERGE where
>>you know it's going to be doing that locking.  I really don't like the
>>idea of making the SQL2003 version of MERGE be the MERGE special case
>>(by requiring someone to take a table lock ahead of time or do something
>>else odd).
> 
> 
> Anyone know off-hand what the big 3 do? If the industry consensus is
> that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then
> it's probably better to follow that lead.

It was already said here that oracle and db2 both use MERGE, dunno about 
mssql.

And yes merge CAN be used to do REPLACE (oracle uses their dummy table 
for this, we can use the fact that FROM clause isn't required in postgres).

-- 
Regards
Petr Jelinek (PJMODOS)


Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
>
> And yes merge CAN be used to do REPLACE (oracle uses their dummy table
> for this, we can use the fact that FROM clause isn't required in postgres).
>

the FROM clause is required by default (starting with 8.1) unless you
change a postgresql.conf parameter.

and i don't think that idea will have any fan...


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Tue, Nov 22, 2005 at 04:20:12AM +0100, Petr Jelinek wrote:
> It was already said here that oracle and db2 both use MERGE, dunno about
> mssql.
>
> And yes merge CAN be used to do REPLACE (oracle uses their dummy table
> for this, we can use the fact that FROM clause isn't required in postgres).

Statements about MERGE on the web:

http://www.dba-oracle.com/oracle_tips_rittman_merge.htm
http://databasejournal.com/features/db2/article.php/3322041
http://certcities.com/editorial/columns/story.asp?EditorialsID=51
http://publib.boulder.ibm.com/infocenter/ids9help/index.jsp?topic=/com.ibm.sqls.doc/sqls578.htm
http://www.jdixon.dotnetdevelopersjournal.com/i_want_my_sql_2005_merge_statement.htm
http://publib.boulder.ibm.com/infocenter/db2help/topic/com.ibm.db2.udb.doc/admin/r0010873.htm
http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_gci978700,00.html

Not one (*not one!*) of these mentions any special handling of
duplicate keys. They even go to pains to say that any errors cause
everything to rollback. The last one is especially interesting:

: Is there any way to capture errors from a MERGE statement? Also, is
: there any way to know how many records were inserted or updated for the
: MERGE statement like SQL%ROWCOUNT? Any assistance greatly appreciated.
:
: You capture errors the same way you would if you were doing regular
: INSERT and UPDATE statements....with exception handlers. Just include a
: WHEN OTHERS exception handler in the block where your MERGE statement
: is and have to display SQLCODE and SQLERRM if an error occurs. Then you
: can figure out which specific errors are occurring and create
: individual exception handlers for those.

There are even places that tell you how to decompose your MERGE into an
INSERT plus UPDATE statement. The real advantage of MERGE is that the
semantics prevent your updating a row you just inserted, which is
harder in the general case but easy if the executor is handling the
rows one at a time.

Rather than trying to make MERGE do something it wasn't designed for,
we should probably be spending our efforts on triggers for error
conditions. Maybe something like:

CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz();

Where baz would be passed NEW and OLD just like a normal trigger and if
the trigger return NULL, the update is ignored. In the meantime the
function can divert the insert to another table if it likes. This seems
like a much more workable and useful addition.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
Bruce Momjian
Date:
Jaime Casanova wrote:
> >
> > And yes merge CAN be used to do REPLACE (oracle uses their dummy table
> > for this, we can use the fact that FROM clause isn't required in postgres).
> >
> 
> the FROM clause is required by default (starting with 8.1) unless you
> change a postgresql.conf parameter.
> 
> and i don't think that idea will have any fan...

No, it is not, try SELECT 1.  Oracle requires SELECT 1 FROM dual.  The
change in 8.1 is that SELECT pg_class.relname no longer works.  You have to
do SELECT relname FROM pg_class.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: MERGE vs REPLACE

From
Jaime Casanova
Date:
On 11/22/05, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> Jaime Casanova wrote:
> > >
> > > And yes merge CAN be used to do REPLACE (oracle uses their dummy table
> > > for this, we can use the fact that FROM clause isn't required in postgres).
> > >
> >
> > the FROM clause is required by default (starting with 8.1) unless you
> > change a postgresql.conf parameter.
> >
> > and i don't think that idea will have any fan...
>
> No, it is not, try SELECT 1.  Oracle requires SELECT 1 FROM dual.  The
> change in 8.1 is that SELECT pg_class.relname no longer works.  You have to
> do SELECT relname FROM pg_class.
>
> --
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 359-1001
>  +  If your life is a hard drive,     |  13 Roberts Road
>  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>

touche...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: MERGE vs REPLACE

From
"Jim C. Nasby"
Date:
On Tue, Nov 22, 2005 at 11:57:48AM +0100, Martijn van Oosterhout wrote:
<excellent research snipped>

> Rather than trying to make MERGE do something it wasn't designed for,
> we should probably be spending our efforts on triggers for error
> conditions. Maybe something like:
> 
> CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz();
> 
> Where baz would be passed NEW and OLD just like a normal trigger and if
> the trigger return NULL, the update is ignored. In the meantime the
> function can divert the insert to another table if it likes. This seems
> like a much more workable and useful addition.

I agree that we shouldn't try and distort MERGE into something fancy.
The AFTER ERROR trigger is a very interesting idea, since it could
handle many different cases. But I'm worried that people might not want
that behavior on by default for everything done against some table. I
think it'd be better to have some way to specify in a command that
you want to use some kind of error-handling trigger. Though presumably
the underlying framework would be same, so it shouldn't be hard to
support both.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: MERGE vs REPLACE

From
Petr Jelinek
Date:
Jaime Casanova wrote:
> 
> the FROM clause is required by default (starting with 8.1) unless you
> change a postgresql.conf parameter.
> 
> and i don't think that idea will have any fan...
> 

Bruce already replied to your first statement so, what idea won't have 
any fan ? It's not that we would change what MERGE does. Postgres just 
does not requeire FROM clause in SELECT and second parameter of MERGE 
can be SELECT which means you can do what REPLACE) does without problems 
and without breaking something or violating standard and like I said you 
can do the same in oracle using dual.

Btw about that keys, oracle gives error on many-to-one or many-to-many 
relationship between the source and target tables.

-- 
Regards
Petr Jelinek (PJMODOS)



Re: MERGE vs REPLACE

From
Martijn van Oosterhout
Date:
On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote:
> Btw about that keys, oracle gives error on many-to-one or many-to-many
> relationship between the source and target tables.

The standard has something called a "cardinality violation" if the
to-be-merged table doesn't match 1-1 with the rest of the statement. If
I had access to an Oracle I'd run two tests on MERGE:

1. Does the joining column have to have an index? For example, make a
column that's full of unique values but no unique index. According to
my reading of the the standard, this should still work (just slower).

2. Additionally, only the rows involved in the MERGE need to be
uniquely referenced, so if you add duplicate values but add a WHERE
clause to exclude those, it should also work.

My feeling is that requiring an index will limit it's usefulness as a
general tool.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: MERGE vs REPLACE

From
Lyubomir Petrov
Date:
Martijn,

Here is a quick test (Oracle 10.1.0.3/Linux):


SQL> select banner from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Prod
PL/SQL Release 10.1.0.3.0 - Production
CORE    10.1.0.3.0      Production
TNS for Linux: Version 10.1.0.3.0 - Production
NLSRTL Version 10.1.0.3.0 - Production

SQL> select * from merge_test_1;       ID NAME
---------- --------------------        1 aaa        2 bbb        3 ccc        4 ddd        5 eee        1 xxx
6 rows selected.

SQL> select * from merge_test_2;       ID NAME
---------- --------------------        1 AAA        2 BBB        6 FFF

SQL> select index_name from user_indexes where table_name like 
'merge_test%';
no rows selected

SQL> merge into merge_test_1 a1 2  using merge_test_2 a2 3      on (a1.id = a2.id) 4  when matched then 5      update
seta1.name = a2.name 6  when not matched then 7      insert (id, name) values (a2.id, a2.name);
 
4 rows merged.

SQL> select * from merge_test_1;       ID NAME
---------- --------------------        1 AAA        2 BBB        3 ccc        4 ddd        5 eee        1 AAA        6
FFF
7 rows selected.



Regards,
Lubomir Petrov



Martijn van Oosterhout wrote:
> On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote:
>   
>> Btw about that keys, oracle gives error on many-to-one or many-to-many 
>> relationship between the source and target tables.
>>     
>
> The standard has something called a "cardinality violation" if the
> to-be-merged table doesn't match 1-1 with the rest of the statement. If
> I had access to an Oracle I'd run two tests on MERGE:
>
> 1. Does the joining column have to have an index? For example, make a
> column that's full of unique values but no unique index. According to
> my reading of the the standard, this should still work (just slower).
>
> 2. Additionally, only the rows involved in the MERGE need to be
> uniquely referenced, so if you add duplicate values but add a WHERE
> clause to exclude those, it should also work.
>
> My feeling is that requiring an index will limit it's usefulness as a
> general tool.
>
> Have a nice day,
>