Thread: someone working to add merge?

someone working to add merge?

From
Jaime Casanova
Date:
Hi,


there is someone working in add the MERGE statement?

i don't find much about what a good implementation of merge must have...

i think what it needs to do is something like:

- try to lock the rows for update   - if the lock cannot be immediatly acquire ask why   - if the rows are already
locked,wait and try again?   - if no rows were found try de insert part   - if there was any other error, abort   -
elseupdate 

so i suppose we can reuse many of the code breaking the merge in 3
pieces... for now they are just thougths, i will think more in this
and try to implement it...

comments? ideas? suggestions?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: someone working to add merge?

From
Josh Berkus
Date:
Jaime,

> so i suppose we can reuse many of the code breaking the merge in 3
> pieces... for now they are just thougths, i will think more in this
> and try to implement it...
>
> comments? ideas? suggestions?

Funny, we were just discussing this at OpenDBCon.   Seems that you can't do a 
full implementation of MERGE without Predicate Locking (the ability to say 
"lock this table against inserts or updates of any row with key=5").  
However, Peter suggested that we could do a proof-of-concept implementation, 
working out syntax and trigger issues, based on a full table lock and do the 
hard work once it was proved to be feasable.

Peter?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: someone working to add merge?

From
Jaime Casanova
Date:
On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote:
> Jaime,
>
> > so i suppose we can reuse many of the code breaking the merge in 3
> > pieces... for now they are just thougths, i will think more in this
> > and try to implement it...
> >
> > comments? ideas? suggestions?
>
> Funny, we were just discussing this at OpenDBCon.   Seems that you can't do a
> full implementation of MERGE without Predicate Locking (the ability to say
> "lock this table against inserts or updates of any row with key=5").

it isn't what select for update does?

> However, Peter suggested that we could do a proof-of-concept implementation,
> working out syntax and trigger issues, based on a full table lock and do the
> hard work once it was proved to be feasable.
>
> Peter?
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: someone working to add merge?

From
Andrew Dunstan
Date:

Jaime Casanova wrote:

>>Funny, we were just discussing this at OpenDBCon.   Seems that you can't do a
>>full implementation of MERGE without Predicate Locking (the ability to say
>>"lock this table against inserts or updates of any row with key=5").
>>    
>>
>
>it isn't what select for update does?
>
>  
>
>  
>

It won't prevent the insertion of a row with the given predicate.

cheers

andrew


Re: someone working to add merge?

From
Csaba Nagy
Date:
On Fri, 2005-11-11 at 18:15, Jaime Casanova wrote:
> On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote:
> > Jaime,
> >
> > > so i suppose we can reuse many of the code breaking the merge in 3
> > > pieces... for now they are just thougths, i will think more in this
> > > and try to implement it...
> > >
> > > comments? ideas? suggestions?
> >
> > Funny, we were just discussing this at OpenDBCon.   Seems that you can't do a
> > full implementation of MERGE without Predicate Locking (the ability to say
> > "lock this table against inserts or updates of any row with key=5").
> 
> it isn't what select for update does?

Select for update only works if the row is already there. If there's no
row, you can't lock it. So you want then to insert it, but then it is
possible that somebody inserted it before you, immediately after your
update... so the solution would be more like:
 - try insert; - if insert fails, do update;

You can already do that, but you have to place a save-point before the
insert, so you can continue your transaction even if the insert fails.
Without knowledge of postgres internals, the simplest would be to be
able to do the "continue transaction if insert fails" with the cheapest
prise to pay. This would mean wrap up existing code, except that
"continue transaction after failure of insert" part.

All this might be completely bull*it of course, I don't know too much
about postgres internals.

[snip]

Cheers,
Csaba.





Re: someone working to add merge?

From
Peter Eisentraut
Date:
Josh Berkus wrote:
> Funny, we were just discussing this at OpenDBCon.   Seems that you
> can't do a full implementation of MERGE without Predicate Locking
> (the ability to say "lock this table against inserts or updates of
> any row with key=5"). However, Peter suggested that we could do a
> proof-of-concept implementation, working out syntax and trigger
> issues, based on a full table lock and do the hard work once it was
> proved to be feasable.

Yes, I've started to work on this.  Realizing that the current way to 
manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a 
table lock anyway, a MERGE implementation using a table lock would at 
least give some convenience benefit to users. (And possibly some 
performance, too, if the decision logic is currently run in the 
client.)

A predicate locking implementation for MERGE might actually not be all 
that complicated, because you only need to look on pk = constant, not 
on arbitrary expressions.  Nevertheless, I think it's best to write the 
MERGE command first and then optimize the locking.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: someone working to add merge?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Funny, we were just discussing this at OpenDBCon.   Seems that you can't do a
> full implementation of MERGE without Predicate Locking (the ability to say 
> "lock this table against inserts or updates of any row with key=5").  
> However, Peter suggested that we could do a proof-of-concept implementation, 
> working out syntax and trigger issues, based on a full table lock and do the 
> hard work once it was proved to be feasable.

If you don't have any better idea how to do it than a full table lock,
you might as well not do it at all.  A "proof of concept" that doesn't
solve the hard part of the problem is no proof :-(

My first guess about a real implementation would involve extending the
index AM API to offer a function "insert this key, or return the
existing match if there already is one".  This might tie into
refactoring the existing implementation of unique indexes, in which all
the problem is put on the AM's head (which is why only btree copes at
the moment).
        regards, tom lane


Re: someone working to add merge?

From
Csaba Nagy
Date:
OK, I'm relatively new on this list, and I might have missed a few
discussions on this topic.
I wonder if doing it this way would not be better than using a table
lock:
- set a save point;- insert the row;    - on error:       - roll back to the save point;       - update the row;    -
onsuccess release the save point;
 

This would provide less contention while paying the prise for the save
point. In low contention scenarios the table lock would be better, and I
wonder for high contention scenarios which is better, the table lock, or
the save point version...

Of course the table lock version is the future if predicate locking is
going to be implemented later.

Cheers,
Csaba.


On Fri, 2005-11-11 at 18:37, Peter Eisentraut wrote:
> Josh Berkus wrote:
> > Funny, we were just discussing this at OpenDBCon.   Seems that you
> > can't do a full implementation of MERGE without Predicate Locking
> > (the ability to say "lock this table against inserts or updates of
> > any row with key=5"). However, Peter suggested that we could do a
> > proof-of-concept implementation, working out syntax and trigger
> > issues, based on a full table lock and do the hard work once it was
> > proved to be feasable.
> 
> Yes, I've started to work on this.  Realizing that the current way to 
> manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a 
> table lock anyway, a MERGE implementation using a table lock would at 
> least give some convenience benefit to users. (And possibly some 
> performance, too, if the decision logic is currently run in the 
> client.)
> 
> A predicate locking implementation for MERGE might actually not be all 
> that complicated, because you only need to look on pk = constant, not 
> on arbitrary expressions.  Nevertheless, I think it's best to write the 
> MERGE command first and then optimize the locking.



Re: someone working to add merge?

From
Peter Eisentraut
Date:
Tom Lane wrote:
> If you don't have any better idea how to do it than a full table
> lock, you might as well not do it at all.  A "proof of concept" that
> doesn't solve the hard part of the problem is no proof :-(

But the problem here is not to break any kind of performance barrier, 
but to give people migrating from MySQL and alternative for REPLACE 
command.

> My first guess about a real implementation would involve extending
> the index AM API to offer a function "insert this key, or return the
> existing match if there already is one".

This assumes that there are indexes defined for the columns involved in 
the merge condition, which is not required anywhere.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: someone working to add merge?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> This assumes that there are indexes defined for the columns involved in 
> the merge condition, which is not required anywhere.

Surely they require a unique constraint --- else the behavior isn't even
well defined, is it?
        regards, tom lane


Re: someone working to add merge?

From
Jaime Casanova
Date:
On 11/11/05, Peter Eisentraut <peter_e@gmx.net> wrote:
> Tom Lane wrote:
> > If you don't have any better idea how to do it than a full table
> > lock, you might as well not do it at all.  A "proof of concept" that
> > doesn't solve the hard part of the problem is no proof :-(
>
> But the problem here is not to break any kind of performance barrier,
> but to give people migrating from MySQL and alternative for REPLACE
> command.
>

But MERGE isn't REPLACE...

REPLACE will delete old records to insert new ones; MERGE try to
insert and if the record exists then can UPDATE just a few values,
maybe incrementing them with a value (all the calculation are doing by
the MERGE)


> > My first guess about a real implementation would involve extending
> > the index AM API to offer a function "insert this key, or return the
> > existing match if there already is one".
>
> This assumes that there are indexes defined for the columns involved in
> the merge condition, which is not required anywhere.
>

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: someone working to add merge?

From
Peter Eisentraut
Date:
Tom Lane wrote:
> Surely they require a unique constraint --- else the behavior isn't
> even well defined, is it?

They require that the merge condition does not match for more than one 
row, but since the merge condition can do just about anything, there is 
no guarantee that a unique constraint encompasses it.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: someone working to add merge?

From
Peter Eisentraut
Date:
Jaime Casanova wrote:
> REPLACE will delete old records to insert new ones; MERGE try to
> insert and if the record exists then can UPDATE just a few values,
> maybe incrementing them with a value (all the calculation are doing
> by the MERGE)

I'm not the expert on REPLACE, but it would seem that REPLACE is a 
special case of MERGE.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: someone working to add merge?

From
"John Hansen"
Date:
Jaime Casanova Wrote:
> But MERGE isn't REPLACE...
>
> REPLACE will delete old records to insert new ones; MERGE try
> to insert and if the record exists then can UPDATE just a few
> values, maybe incrementing them with a value (all the
> calculation are doing by the MERGE)

That sounds like MySQL's 'INSERT INTO ... ON DUPLICATE KEY UPDATE',
which they recommend over REPLACE anyways.



Re: someone working to add merge?

From
Bruno Wolff III
Date:
On Fri, Nov 11, 2005 at 18:48:33 +0100, Csaba Nagy <nagy@ecircle-ag.com> wrote:
> OK, I'm relatively new on this list, and I might have missed a few
> discussions on this topic.
> I wonder if doing it this way would not be better than using a table
> lock:
> 
>  - set a save point;
>  - insert the row;
>      - on error:
>         - roll back to the save point;
>         - update the row;
>      - on success release the save point;
> 
> This would provide less contention while paying the prise for the save
> point. In low contention scenarios the table lock would be better, and I
> wonder for high contention scenarios which is better, the table lock, or
> the save point version...

You may not be able to update the row after the insert fails. If there is
insert occurring in another transaction, the row may not be visible to
the current transaction. In which case you can neither insert or update the
row. You need to wait for the other transaction to commit or rollback.


Re: someone working to add merge?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> Surely they require a unique constraint --- else the behavior isn't
>> even well defined, is it?

> They require that the merge condition does not match for more than one 
> row, but since the merge condition can do just about anything, there is 
> no guarantee that a unique constraint encompasses it.

ISTM to be a reasonable implementation restriction that there be a
constraint by which the system can prove that there is at most one
matching row.  Per other comments in this thread, we'd not be the only
implementation making such a restriction.

(Certainly, if I were a DBA and were told that the performance of MERGE
would go to hell in a handbasket if I had no such constraint, I'd make
sure there was one.  I don't think there is very much of a use-case for
the general scenario.)
        regards, tom lane


Re: someone working to add merge?

From
"John Hansen"
Date:
Tom Lane Wrote:

> Surely they require a unique constraint --- else the behavior
> isn't even well defined, is it?

From the mysql manual:

'REPLACE works exactly like INSERT, except that if an old record in the
table has the same value as a new record for a PRIMARY KEY or a UNIQUE
index, the old record is deleted before the new record is inserted. See
Section 13.2.4, "INSERT Syntax".'

... John


Re: someone working to add merge?

From
"John Hansen"
Date:
I Wrote:

> From the mysql manual:
>
> 'REPLACE works exactly like INSERT, except that if an old
> record in the table has the same value as a new record for a
> PRIMARY KEY or a UNIQUE index, the old record is deleted
> before the new record is inserted. See Section 13.2.4,
> "INSERT Syntax".'

It also says:
Note that unless the table has a PRIMARY KEY  or UNIQUE index, using a
REPLACE statement makes no sense. It becomes equivalent to INSERT,
because there is no index to be used to determine whether a new row
duplicates another.


... John


Re: someone working to add merge?

From
Csaba Nagy
Date:
On Fri, 2005-11-11 at 20:22, Bruno Wolff III wrote:
> On Fri, Nov 11, 2005 at 18:48:33 +0100,
>   Csaba Nagy <nagy@ecircle-ag.com> wrote:
> > OK, I'm relatively new on this list, and I might have missed a few
> > discussions on this topic.
> > I wonder if doing it this way would not be better than using a table
> > lock:
> > 
> >  - set a save point;
> >  - insert the row;
> >      - on error:
> >         - roll back to the save point;
> >         - update the row;
> >      - on success release the save point;
> > 
> > This would provide less contention while paying the prise for the save
> > point. In low contention scenarios the table lock would be better, and I
> > wonder for high contention scenarios which is better, the table lock, or
> > the save point version...
> 
> You may not be able to update the row after the insert fails. If there is
> insert occurring in another transaction, the row may not be visible to
> the current transaction. In which case you can neither insert or update the
> row. You need to wait for the other transaction to commit or rollback.

Are you sure ? From what I understand, the insert will only fail when
the other transaction commits, and actively wait for the commit or
rollback. Look at this:


session_1=> create table test (col smallint primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE
session_1=> begin;
BEGIN
cnagy=> insert into test values (1);
INSERT 165068987 1

session_2=> begin;
BEGIN
session_2=> insert into test values (1);

[session_2 is now waiting]

session_1=> commit;
COMMIT

[session_2 wakes up]

ERROR:  duplicate key violates unique constraint "test_pkey"


So it looks like predicate locking is already in place for primary key
conditions...

Cheers,
Csaba.








Re: someone working to add merge?

From
Alvaro Herrera
Date:
Csaba Nagy wrote:

> session_1=> create table test (col smallint primary key);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> "test_pkey" for table "test"
> CREATE TABLE
> session_1=> begin;
> BEGIN
> cnagy=> insert into test values (1);
> INSERT 165068987 1
> 
> session_2=> begin;
> BEGIN
> session_2=> insert into test values (1);
> 
> [session_2 is now waiting]

This only happens because of the unique index.  There's no predicate
locking involved.  The btree code goes some lengths to make this work;
it would be probably simple to modify this to support MERGE or REPLACE
on the limited cases where there's a UNIQUE index.  Tom has already said
this twice (on this thread only; he has already said it before IIRC.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: someone working to add merge?

From
Csaba Nagy
Date:
Well, from my point of view it is a special case of predicate locking
supported well by existing code, in this case the unique index (you said
that, I'm not familiar with the code). I don't see why this cannot be
capitalized on, to implement a sub-set of what predicate locking is,
based on the mechanism already existing. I guess most of the users who
need some kind of merge, replace, insert-or-update feature (or other
reasons to block inserts/updates on a specific row) would be happy for
now with the restriction that the condition must be backed by a unique
index.

So basically the only thing I'm trying to say is that a partial
implementation which might be easily implementable (I might be wrong
here), without too big performance penalties and covers a very valid
problem is better than chasing the complete solution which is too
complex to be implemented easily or it could be easily implemented but
then has performance disadvantages. The only thing is to be labeled
correctly so people don't expect something it isn't.

Of course all this is hand-waving, I can't really help even if I wanted,
my C skills are less then mediocre.

Cheers,
Csaba.

On Tue, 2005-11-15 at 12:14, Alvaro Herrera wrote:
> Csaba Nagy wrote:
> 
> > session_1=> create table test (col smallint primary key);
> > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> > "test_pkey" for table "test"
> > CREATE TABLE
> > session_1=> begin;
> > BEGIN
> > cnagy=> insert into test values (1);
> > INSERT 165068987 1
> > 
> > session_2=> begin;
> > BEGIN
> > session_2=> insert into test values (1);
> > 
> > [session_2 is now waiting]
> 
> This only happens because of the unique index.  There's no predicate
> locking involved.  The btree code goes some lengths to make this work;
> it would be probably simple to modify this to support MERGE or REPLACE
> on the limited cases where there's a UNIQUE index.  Tom has already said
> this twice (on this thread only; he has already said it before IIRC.)




Re: someone working to add merge?

From
Jaime Casanova
Date:
On 11/15/05, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Csaba Nagy wrote:
>
> > session_1=> create table test (col smallint primary key);
> > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> > "test_pkey" for table "test"
> > CREATE TABLE
> > session_1=> begin;
> > BEGIN
> > cnagy=> insert into test values (1);
> > INSERT 165068987 1
> >
> > session_2=> begin;
> > BEGIN
> > session_2=> insert into test values (1);
> >
> > [session_2 is now waiting]
>
> This only happens because of the unique index.  There's no predicate
> locking involved.  The btree code goes some lengths to make this work;
> it would be probably simple to modify this to support MERGE or REPLACE
> on the limited cases where there's a UNIQUE index.  Tom has already said
> this twice (on this thread only; he has already said it before IIRC.)
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>

And the only type of predicate locking we need for MySQL REPLACE
because it needs a pk or unique index to know it has to replace
otherwise it inserts the row...
that's the way it works as mysql spec said...



--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: someone working to add merge?

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> This only happens because of the unique index.  There's no predicate
> locking involved.  The btree code goes some lengths to make this work;

That's one way to look at it; the other is to say that we have predicate
locking for a very specific class of predicate, ie, equality of a
unique key.

In practice I think we only have a useful lock there for *primary* keys,
because unique without NOT NULL doesn't actually constrain you to just
one matching row ...
        regards, tom lane


Re: someone working to add merge?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > This only happens because of the unique index.  There's no predicate
> > locking involved.  The btree code goes some lengths to make this work;
> 
> That's one way to look at it; the other is to say that we have predicate
> locking for a very specific class of predicate, ie, equality of a
> unique key.
> 
> In practice I think we only have a useful lock there for *primary* keys,
> because unique without NOT NULL doesn't actually constrain you to just
> one matching row ...

I agree --- an implementation that needs to use a table lock is useless,
and one with no primary key is too hard to implement and also near
useless.

I have update the TODO item to reflect this:
* Add MERGE command that does UPDATE/DELETE, or on failure, INSERT  (rules, triggers?)  To implement this cleanly
requiresthat the table have a unique index  so duplicate checking can be easily performed.
 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: someone working to add merge?

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> I agree --- an implementation that needs to use a table lock is
> useless, and one with no primary key is too hard to implement and
> also near useless.

Well, there were just a couple of people saying the opposite.

> I have update the TODO item to reflect this:
>
>     * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT
>       (rules, triggers?)
>
>       To implement this cleanly requires that the table have a unique
> index so duplicate checking can be easily performed.

We're still trying to work out the semantic relationship between MERGE 
and REPLACE and what-we-actually-want.  This entry doesn't seem to take 
that into account.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: someone working to add merge?

From
Martijn van Oosterhout
Date:
On Fri, Nov 18, 2005 at 05:30:34PM +0100, Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > I have update the TODO item to reflect this:
> >
> >     * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT
> >       (rules, triggers?)
> >
> >       To implement this cleanly requires that the table have a unique
> > index so duplicate checking can be easily performed.
>
> We're still trying to work out the semantic relationship between MERGE
> and REPLACE and what-we-actually-want.  This entry doesn't seem to take
> that into account.

Right. From my reading of the spec (which someone posted here
somewhere) MERGE has no special rules regarding visibility. So it's
just as susceptable to "duplicate keys" and "lost updates" as any
current method. That doesn't dimish its usefulness, it's just not what
some people thought it was.

My current position is that since REPLACE seems to violate normal
transaction semantics (must not fail no matter what other backends are
doing) that any implementation will probably have to play fancy
footwork with locking and savepoints within a single statement. And
that's not MERGE.

I'd say implement SQL MERGE which doesn't have any really unusual
features. And seperately implement some kind of INSERT OR UPDATE which
works only for a table with a primary key.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: someone working to add merge?

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > I agree --- an implementation that needs to use a table lock is
> > useless, and one with no primary key is too hard to implement and
> > also near useless.
> 
> Well, there were just a couple of people saying the opposite.
> 
> > I have update the TODO item to reflect this:
> >
> >     * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT
> >       (rules, triggers?)
> >
> >       To implement this cleanly requires that the table have a unique
> > index so duplicate checking can be easily performed.
> 
> We're still trying to work out the semantic relationship between MERGE 
> and REPLACE and what-we-actually-want.  This entry doesn't seem to take 
> that into account.

Right.  Once we are done I will update it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: someone working to add merge?

From
"Jim C. Nasby"
Date:
On Fri, Nov 18, 2005 at 09:03:25PM +0100, Martijn van Oosterhout wrote:
> I'd say implement SQL MERGE which doesn't have any really unusual
> features. And seperately implement some kind of INSERT OR UPDATE which
> works only for a table with a primary key.

Is there any reeason this has to be a PK; shouldn't a unique index with
no nullable fields work just as well?

It seems bad to limit this to just a PK if we can avoid it. For example,
if you have something that's logging hits to web pages, you might have
this table:

CREATE TABLE url (   url_id      serial  PRIMARY KEY,   url         text    NOT NULL UNIQUE
);

I prefer having url_id as the PK because it's how you normally access
the table. But ISTM that there are cases where yo'd want to be able to
merge on two different sets of fields in one table, which is impossible
if we limit it to PK merges only.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: someone working to add merge?

From
Jan Wieck
Date:
On 11/11/2005 2:20 PM, Tom Lane wrote:

> Peter Eisentraut <peter_e@gmx.net> writes:
>> Tom Lane wrote:
>>> Surely they require a unique constraint --- else the behavior isn't
>>> even well defined, is it?
> 
>> They require that the merge condition does not match for more than one 
>> row, but since the merge condition can do just about anything, there is 
>> no guarantee that a unique constraint encompasses it.
> 
> ISTM to be a reasonable implementation restriction that there be a
> constraint by which the system can prove that there is at most one
> matching row.  Per other comments in this thread, we'd not be the only
> implementation making such a restriction.
> 
> (Certainly, if I were a DBA and were told that the performance of MERGE
> would go to hell in a handbasket if I had no such constraint, I'd make
> sure there was one.  I don't think there is very much of a use-case for
> the general scenario.)

Such restriction does look reasonable. Especially because ...

The largest problem I see with MERGE is the question of BEFORE triggers. 
Consider a BEFORE INSERT trigger that modifies a third table, after 
which the constraint or whatever post-heap_insert-attempt we might use 
detects a conflict. How do we undo the actions of the BEFORE trigger? 
The only way to do that is to plan the query as a nestloop, with the 
USING part as the outer loop. If the (updating) scan of the INTO 
relation did not hit any tuple, then do the INSERT. We can only undo the 
side effects of any BEFORE trigger by wrapping each and evey nested INTO 
relation insert attempt into its own subtransaction.

Sure, we "could" of course do the insert and then rescan the whole thing 
with read-committed to see if our row is now the only one ... needless 
to say that in the case of a sequential scan inside the loop, that 
nestloop will suck big times even without that second scan. But ... hmmm 
... we could get away with that and if we don't find a constraint that 
will ensure uniqueness, then we do a rescan to check for it. But I would 
vote for a "please_no_notice_about_stupid_usage_of_merge" runtime option 
that suppresses the corresponding NOTICE.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: someone working to add merge?

From
Martijn van Oosterhout
Date:
On Wed, Nov 23, 2005 at 04:55:25PM -0500, Jan Wieck wrote:
> The largest problem I see with MERGE is the question of BEFORE triggers.
> Consider a BEFORE INSERT trigger that modifies a third table, after
> which the constraint or whatever post-heap_insert-attempt we might use
> detects a conflict. How do we undo the actions of the BEFORE trigger?
> The only way to do that is to plan the query as a nestloop, with the
> USING part as the outer loop. If the (updating) scan of the INTO
> relation did not hit any tuple, then do the INSERT. We can only undo the
> side effects of any BEFORE trigger by wrapping each and evey nested INTO
> relation insert attempt into its own subtransaction.

Umm, if there are any errors you abort the transaction, just like any
other case. ACID requires that either the whole statement is done, or
none. If a trigger causes the INSERT or UPDATE to fail you have no
choice but to abort the transaction.

Besides, someone posted an example on Oracle, they don't require an
index so I don't think we realistically can say that people need one.
If two concurrent MERGEs, which can't see eachothers output, both end
up INSERTing, that not an error unless the user has a UNIQUE
constraint, so the problem vanishes.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: someone working to add merge?

From
Jan Wieck
Date:
On 11/24/2005 1:30 AM, Martijn van Oosterhout wrote:

> On Wed, Nov 23, 2005 at 04:55:25PM -0500, Jan Wieck wrote:
>> The largest problem I see with MERGE is the question of BEFORE triggers. 
>> Consider a BEFORE INSERT trigger that modifies a third table, after 
>> which the constraint or whatever post-heap_insert-attempt we might use 
>> detects a conflict. How do we undo the actions of the BEFORE trigger? 
>> The only way to do that is to plan the query as a nestloop, with the 
>> USING part as the outer loop. If the (updating) scan of the INTO 
>> relation did not hit any tuple, then do the INSERT. We can only undo the 
>> side effects of any BEFORE trigger by wrapping each and evey nested INTO 
>> relation insert attempt into its own subtransaction.
> 
> Umm, if there are any errors you abort the transaction, just like any
> other case. ACID requires that either the whole statement is done, or
> none. If a trigger causes the INSERT or UPDATE to fail you have no
> choice but to abort the transaction.

I guess you misunderstood. What I am talking about is a problem in the 
order of execution. since we don't have predicate locking, there is a 
possibility that our implementation of MERGE decides to do an INSERT 
while another transaction does the same. What has to happen is that the 
BEFORE INSERT trigger is called, then the heap tuple inserted, then the 
index tuples created. At this time, the duplicate key error occurs, 
telling us that we had a conflict and that we have to try an UPDATE 
instead. That means, in the end this particular row's INSERT has never 
happened and we have to undo the BEFORE INSERT triggers actions too.

> 
> Besides, someone posted an example on Oracle, they don't require an
> index so I don't think we realistically can say that people need one.
> If two concurrent MERGEs, which can't see eachothers output, both end
> up INSERTing, that not an error unless the user has a UNIQUE
> constraint, so the problem vanishes.

Not following the semantics is an error. MERGE is not supposed to do 
multiple inserts for the same match, concurrency or not.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: someone working to add merge?

From
Martijn van Oosterhout
Date:
On Thu, Nov 24, 2005 at 11:11:34AM -0500, Jan Wieck wrote:
> On 11/24/2005 1:30 AM, Martijn van Oosterhout wrote:
> >Umm, if there are any errors you abort the transaction, just like any
> >other case. ACID requires that either the whole statement is done, or
> >none. If a trigger causes the INSERT or UPDATE to fail you have no
> >choice but to abort the transaction.
>
> I guess you misunderstood. What I am talking about is a problem in the
> order of execution. since we don't have predicate locking, there is a
> possibility that our implementation of MERGE decides to do an INSERT
> while another transaction does the same. What has to happen is that the
> BEFORE INSERT trigger is called, then the heap tuple inserted, then the
> index tuples created. At this time, the duplicate key error occurs,
> telling us that we had a conflict and that we have to try an UPDATE
> instead. That means, in the end this particular row's INSERT has never
> happened and we have to undo the BEFORE INSERT triggers actions too.

But I'm not sure we're supposed to handle that case anyway. Oracle at
least doesn't require an index on the table being merged. And if I look
at it from a visibility view point, if someone else does an INSERT in
another transaction, then MERGE cannot see it and thus it will INSERT
too. This isn't an error.

Consider the case of a deferred unique constraint (Postgres doesn't
support these yet but bear with me). Say in one session you start a
transaction, do a MERGE and then a few other statements. In the
meantime in another session someone inserts a row on the table you
merged. Are you asserting that the first session should undo
everything, do an UPDATE instead of an INSERT and redo all your queries
since? Obviously not.

Though the above relies on something Postgres doesn't support, but you
would be able to emulate the some without a unique key. For example:

Session 1:
CREATE TEMP TABLE foo (id integer, val integer);

BEGIN;
SELECT * FROM foo;

Session 2:
BEGIN;

INSERT INTO foo (1,3);

Session 1:
MERGE INTO foo USING (SELECT 1) ON (foo.id = 1) WHEN MATCHED THEN UPDATE SET val = 1 WHEN NOT MATCHED THEN INSERT
(id,val)VALUES (1,2); 

Session 2:
COMMIT;

Session 1:
COMMIT;

Now, (IMO) in serializable mode, the MERGE should block on reading the
row inserted by the second session and when that commits do the UPDATE,
thus leaving you with a table with one row (1.1).

In read committed mode, the MERGE shouldn't block and you should end up
with a table with two rows (1,3) and (1,2).

If you switch the order of the insert and merge you should get the same
results in both cases, (1,2) and (1,3).

I think you are arguing that the result should be (1,1) in all cases. I
honestly don't see how that is feasible and certainly not supported by
my reading of the standard. I would be interested to know how other
databases handle the above case.

> Not following the semantics is an error. MERGE is not supposed to do
> multiple inserts for the same match, concurrency or not.

Yes, any single MERGE cannot insert twice. However, two MERGEs running
concurrently, or a MERGE with an INSERT/UPDATE/DELETE in another
session could very well end up with multiple rows on the same key. I
maintain that MERGE has no special rules w.r.t. visibility or locking
and we should not be acting as if it does. If at the end of the
transaction there a duplicate key we should throw the error and let the
client deal with it.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: someone working to add merge?

From
Jan Wieck
Date:
On 11/25/2005 7:14 AM, Martijn van Oosterhout wrote:
> On Thu, Nov 24, 2005 at 11:11:34AM -0500, Jan Wieck wrote:
>> I guess you misunderstood. [...]
> 
> But I'm not sure we're supposed to handle that case anyway. Oracle at
> least doesn't require an index on the table being merged. And if I look
> at it from a visibility view point, if someone else does an INSERT in
> another transaction, then MERGE cannot see it and thus it will INSERT
> too. This isn't an error.

Hmmm ... so you maintain that MERGE without an explicit LOCK TABLE, done 
by the user before performing the MERGE, can create duplicate rows (WRT 
the merge condition) and consequently raise a duplicate key error if 
there is a UNIQUE constraint.

If that is what the standard describes, then it can be implemented 
without any sort of index or constraint requirement. The query tree for 
MERGE will have the INTO relation as a left outer join. In the case of a 
match of this outer join, one set of targetlist expressions is used to 
form the result tuple containing the INTO-relations ctid. That result 
tuple us useable for heap_update() or heap_delete(). In the case of 
no-match another set of target list expressions is used, suitable for 
heap_insert(). This way, MERGE will work with one single sequential scan 
of the INTO relation in case there is no suitable index.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: someone working to add merge?

From
Martijn van Oosterhout
Date:
On Fri, Nov 25, 2005 at 09:14:47AM -0500, Jan Wieck wrote:
> Hmmm ... so you maintain that MERGE without an explicit LOCK TABLE, done
> by the user before performing the MERGE, can create duplicate rows (WRT
> the merge condition) and consequently raise a duplicate key error if
> there is a UNIQUE constraint.
>
> If that is what the standard describes, then it can be implemented
> without any sort of index or constraint requirement. The query tree for
> MERGE will have the INTO relation as a left outer join. In the case of a
> match of this outer join, one set of targetlist expressions is used to
> form the result tuple containing the INTO-relations ctid. That result
> tuple us useable for heap_update() or heap_delete(). In the case of
> no-match another set of target list expressions is used, suitable for
> heap_insert(). This way, MERGE will work with one single sequential scan
> of the INTO relation in case there is no suitable index.

Yes, that's the way I read the standard and how I was thinking it could
be implemented. It does simplify the case suggested by people that want
atomic REPLACE, because you only have one statement to repeat until you
get success.

do  MERGE ...;
while( not error and modified_rows <> 1 )

I was thinking that we could make a seperate REPLACE command which can
only insert or update one row, but can do it atomically. Basically a
loop like above with builtin savepoints.

Alternativly, we could special case the MERGE-without-USING case, in
the cases where the plan simply devolves into an Index Scan, but I
don't like special casing. If we're going to have special semantics we
should have a seperate statement so it's clear that it's special.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.