Thread: MERGE Support (SQL2003)

MERGE Support (SQL2003)

From

Stephen Frost

Date:

14 April 2004, 17:26:55

Greetings,
 As mentioned in the PostgreSQL Weekly News, SQL2003 has been approved. Looking at one of the recent drafts it appears
thatMERGE has been added to the spec.  MERGE is described as "Conditionally update rows of a table, or insert new rows
intoa table, or both."  Support for this has been asked for in the past and if anything demand for this has increased.
Isanyone working on this?  Is there any status on it?
 
 Additionally, from the draft I'm reading the SQL2003 standard makes MERGE very capable but without obvious defaults
forthe simple case. While I believe support for the standard should be paramount it would make a great deal of sense to
allowsimple operations without unnecessary complexity.
 
 For example: Table T1 exists with columns a, b, c where a, b is the primary key.  T1 contains a single tuple '1, 2,
3'. I would now like to either update or insert the primary key '1, 2' with the value '4'.
 
 From what I understand of the specification this would be done by:

a) insert into T2 values (1,2,4); merge into T1       USING T2ON T1.a=T2.a and T1.b=T2.b   WHEN MATCHED THEN
UPDATESET T1.c = T2.c       ON T1.a=T2.a and T1.b=T2.b   WHEN NOT MATCHED THEN        INSERT (a,b,c) VALUES
(T2.a,T2.b,T2.c);
 This requires an addtional table (T2).  There may be a better way around this but I know that PostgreSQL already
allowsselect statements without a from clause, so this could be simplified to:
 

b) merge into T1       USING (select 1 as a, 2 as b, 4 as c) as T2ON T1.a=T2.a and T1.b=T2.b   WHEN MATCHED THEN
UPDATESET T1.c = T2.c       ON T1.a=T2.a and T1.b=T2.b   WHEN NOT MATCHED THEN        INSERT (a,b,c) VALUES
(T2.a,T2.b,T2.c);
 Still pretty long-winded for what most would consider a relatively simple request.  My goal would be the ability to
haveimplied ON/WHEN clauses and USING VALUES, like so:
 

c) merge into T1 USING VALUES (1,2,4);
 The ON clauses are implied primary key matches.  The WHEN clause for MATCHED is then to UPDATE SET all columns which
arenot part of the primary key.  The WHEN clause for NOT MATCHED is to INSERT the row. Clearly this last usage is what
Iwould prefer for this case.  It also parallels the 'replace into' which MySQL has which would make migration from
MySQLto PostgreSQL much easier for programs and users. Please let me know if there's something I'm missing in the
specificationthat would allow for a simple case similar to what I've illustrated, either with MERGE or without.  Of
course,the expectation is that MERGE wouldn't be able to fail because of another instance adding a row with the same
primarykey.
 
 I plan to forward this suggestion on to the SQL committee as well, though I don't know what kind of response, if any,
I'llget.  Feel free to address the standard MERGE support seperately from my suggestion.  I have need for both and so
bothare of interest and do not have to come at the same time.
 
     Many thanks,
    Stephen

Re: MERGE Support (SQL2003)

From

Greg Stark

Date:

15 April 2004, 14:29:56

Stephen Frost <sfrost@snowman.net> writes:

> c)
>   merge into T1 USING VALUES (1,2,4);

I'm not happy with the implied use of the primary key. a) some tables can have
two effective primary keys, even if only one is explicitly declared as such.
and b) some update/inserts are done with where clauses that aren't primary
keys at all. and c) you might want to update any record for a date within the
last week but if you insert then insert with the current date.

I see this as vaguely analogous to the NATURAL JOIN/ON/USING where I feel
using NATURAL JOIN to work automatically is somewhat risky and hides too much
of whats happening. I would think the key columns should always be explicitly
named in permanent code.

Similarly vaguely analogous to the ON/USING distinction I would think you
would want a form of this where the key columns simply named. And one where
the where clauses could be enumerated.

So I would suggest doing something like this which is basically exactly
equivalent to an INSERT statement except with an added ON/USING clause exactly
like the syntax in SELECT.

MERGE INTO t1 USING (col1,col2) VALUES (1,2,4)

Or explicitly naming columns (which I argue should be done on inserts):

MERGE INTO t1 USING (col1,col2) (col1,col2,col3) VALUES (1,2,4)

and 

MERGE INTO t2 ON (col1='1',col2='2' OR col2='3') (col1,col2,col3) 
VALUES(1,2,4)

-- 
greg

Re: MERGE Support (SQL2003)

From

Stephen Frost

Date:

15 April 2004, 19:11:54

* Greg Stark (gsstark@mit.edu) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>
> > c)
> >   merge into T1 USING VALUES (1,2,4);
>
> I'm not happy with the implied use of the primary key. a) some tables can have
> two effective primary keys, even if only one is explicitly declared as such.
> and b) some update/inserts are done with where clauses that aren't primary
> keys at all. and c) you might want to update any record for a date within the
> last week but if you insert then insert with the current date.

Certainly using the primary key won't work for all cases, it's not meant
to.  The intent was to allow a simpler syntax for what is likely to be
a common case.  Also, I didn't want to diviate too much from the
specification by changing intent or ordering of clauses.  The cases you
describe above would look something like:

merge into T1 USING VALUES (1,2,4) ON (a=1,b=2);

> I see this as vaguely analogous to the NATURAL JOIN/ON/USING where I feel
> using NATURAL JOIN to work automatically is somewhat risky and hides too much
> of whats happening. I would think the key columns should always be explicitly
> named in permanent code.

Explicitly naming what columns are key vs. what columns are data and
should be replaced is possible with the formal syntax from the
specification, which I certainly think should be supported in addition
to anything else.  I'd really like to be able to use a more simplified
syntax for the common/simple case though.

> Similarly vaguely analogous to the ON/USING distinction I would think you
> would want a form of this where the key columns simply named. And one where
> the where clauses could be enumerated.

That's possible but as I mentioned I didn't want to move too far away
from the specification either..

> So I would suggest doing something like this which is basically exactly
> equivalent to an INSERT statement except with an added ON/USING clause exactly
> like the syntax in SELECT.
>
> MERGE INTO t1 USING (col1,col2) VALUES (1,2,4)

I don't have a problem with this from a verbosity standpoint but I am
concerned that it deviates from the specification by what's expected in
the 'USING' clause.  The USING clause is intended to be the source not
the match/search clause (that's the 'ON' clause which follows the USING
clause).

> Or explicitly naming columns (which I argue should be done on inserts):
>
> MERGE INTO t1 USING (col1,col2) (col1,col2,col3) VALUES (1,2,4)

Same issue as above for this.

> MERGE INTO t2 ON (col1='1',col2='2' OR col2='3')
>  (col1,col2,col3)
> VALUES
>  (1,2,4)

This is closer to how the specification lays out the clauses but starts
to get more verbose and doesn't include the 'USING' keyword the
specification lays out for defining the source.  Perhaps something like
this:

merge into t1 using (select 1 as col1,2 as col2,3 as col3) ON a=col1 and b=col2;

This follows the specification except the 'WHEN' clauses are implied
here and having a select inside the using clause.  It's also rather
length for the simple case in my view.  This would be better, but
deviates more:

merge into T1 using col1=1,col2=2,col3=3 ON a=col1 and b=col2;

Or

merge into T1 using (col1,col2,col3) (1,2,3) ON a=col1 and b=col2;
Thanks,
    Stephen