Thread: MERGE Specification

MERGE Specification

From
Simon Riggs
Date:
The following two files specify the behaviour of the MERGE statement and
how it will work in the world of PostgreSQL. In places, this supercedes
my recent commentary on MERGE, particularly with regard to triggers.

Neither of these files is intended for CVS.

The HTML file was generated from SGML source, though the latter is not
included here for clarity.

The test file shows how I expect a successful test run to look when a
regression test is executed with a working version of final MERGE patch
applied. It has behavioural comments in it also, to make it slightly
more readable.

If anybody has any questions, ask them now please, before I begin
detailed implementation.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Attachment

Re: MERGE Specification

From
"Pavel Stehule"
Date:
Hello Simon,

would you support RETURNING clause?

Regards
Pavel Stehule

On 21/04/2008, Simon Riggs <simon@2ndquadrant.com> wrote:
> The following two files specify the behaviour of the MERGE statement and
>  how it will work in the world of PostgreSQL. In places, this supercedes
>  my recent commentary on MERGE, particularly with regard to triggers.
>
>  Neither of these files is intended for CVS.
>
>  The HTML file was generated from SGML source, though the latter is not
>  included here for clarity.
>
>  The test file shows how I expect a successful test run to look when a
>  regression test is executed with a working version of final MERGE patch
>  applied. It has behavioural comments in it also, to make it slightly
>  more readable.
>
>  If anybody has any questions, ask them now please, before I begin
>  detailed implementation.
>
>
>  --
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com
>
>
>  --
>  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>  To make changes to your subscription:
>  http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>


Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-21 at 22:18 +0200, Pavel Stehule wrote:

> would you support RETURNING clause?

I wouldn't rule it out completely, but not in the first implementation
because
- its more work
- its not in the SQL Standard
- neither Oracle nor DB2 support it either, so its only going to provide
incompatibility
- there are some wrinkles with MERGE that means I don't want to
over-complicate it because it looks to me like it will change in future
versions of the standard
- not sure what the use case for that will be

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
"A.M."
Date:
On Apr 21, 2008, at 4:08 PM, Simon Riggs wrote:

> The following two files specify the behaviour of the MERGE statement  
> and
> how it will work in the world of PostgreSQL. In places, this  
> supercedes
> my recent commentary on MERGE, particularly with regard to triggers.
>
> Neither of these files is intended for CVS.
>
> The HTML file was generated from SGML source, though the latter is not
> included here for clarity.
>
> The test file shows how I expect a successful test run to look when a
> regression test is executed with a working version of final MERGE  
> patch
> applied. It has behavioural comments in it also, to make it slightly
> more readable.
>
> If anybody has any questions, ask them now please, before I begin
> detailed implementation.


"MERGE will not invoke Rules." Does this imply that MERGE cannot be  
used on views or that the resulting INSERTs or UPDATEs do not work on  
views?

Cheers,
M


Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:

> "MERGE will not invoke Rules." Does this imply that MERGE cannot be  
> used on views or that the resulting INSERTs or UPDATEs do not work on  
> views?

Yes, that's right. Just like COPY. That seems fine to me because you're
likely to be doing a MERGE immediately after a COPY anyway, so the
restriction just continues.

Rules for Insert, Update and Delete are almost identical to the way
MERGE works anyway, so there's no particular loss of functionality. That
was why I co-opted the ability to DO NOTHING in a WHEN clause from the
way PostgreSQL Rules work.

I'm not taking any explicit decisions to exclude them permanently. I do
think its possible that we could support them and possibly very cheaply,
but I wouldn't make any promises initially.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
>> "MERGE will not invoke Rules." Does this imply that MERGE cannot be  
>> used on views or that the resulting INSERTs or UPDATEs do not work on  
>> views?

> Yes, that's right. Just like COPY.

I find this to be pretty ugly.  COPY is a special case because
(a) it is a utility statement not a plannable one, and (b) its only
reason to exist is to transfer data as fast as possible, so bypassing
rules isn't an unreasonable restriction.  MERGE has neither of those
properties, and thus arguing that it can or should be like COPY is an
entirely unconvincing proposition.  (In fact, I don't even want to think
about what kind of crock you're going to need in order to get it through
the planner without also going through the rewriter.)

Please think a bit harder.
        regards, tom lane


Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-21 at 20:28 -0400, Tom Lane wrote:

> In fact, I don't even want to think
> about what kind of crock you're going to need in order to get it through
> the planner without also going through the rewriter.

Hmmm, I hadn't thought I might be adding work rather than avoiding it.

I'll give it a go.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-21 at 20:28 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
> >> "MERGE will not invoke Rules." Does this imply that MERGE cannot be  
> >> used on views or that the resulting INSERTs or UPDATEs do not work on  
> >> views?
> 
> > Yes, that's right. Just like COPY.
> 
> I find this to be pretty ugly.  COPY is a special case because
> (a) it is a utility statement not a plannable one, and (b) its only
> reason to exist is to transfer data as fast as possible, so bypassing
> rules isn't an unreasonable restriction.  MERGE has neither of those
> properties, and thus arguing that it can or should be like COPY is an
> entirely unconvincing proposition. 

Unrelated to rule processing, you did read the bit about MERGE and race
conditions? ISTM that MERGE as it stands isn't very useful for anything
other than large data loads since its going to cause problems if used
concurrently.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Unrelated to rule processing, you did read the bit about MERGE and race
> conditions? ISTM that MERGE as it stands isn't very useful for anything
> other than large data loads since its going to cause problems if used
> concurrently.

But that's how the committee designed it, yes?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: MERGE Specification

From
Gregory Stark
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> Unrelated to rule processing, you did read the bit about MERGE and race
> conditions? ISTM that MERGE as it stands isn't very useful for anything
> other than large data loads since its going to cause problems if used
> concurrently.

If there are race conditions what advantage does it offer over writing plpgsql
or client code to do it?

I thought the whole advantage of having a built-in command is that it could do
the kind of locking our unique constraints do to avoid race conditions.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-21 at 21:57 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> 
> > Unrelated to rule processing, you did read the bit about MERGE and race
> > conditions? ISTM that MERGE as it stands isn't very useful for anything
> > other than large data loads since its going to cause problems if used
> > concurrently.
> 
> But that's how the committee designed it, yes?

Yes. Not sure if I see your point there, but yes, that's how its been
designed.

Both DB2 and Oracle have additional items to get around the shortcomings
of the command.

The way MERGE works we first test to see if it matches or not, then if
not matched we would activate the NOT MATCHED action, which standard
says must be an insert. The gap between the two actions allows a race
condition to exist. 

We could close the gap by taking a lock on the row when we perform the
is-matched test, but that would be expensive for bulk operations. ISTM
the lock should be optional. Not sure what the default should be. Input
welcome.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Martijn van Oosterhout
Date:
On Tue, Apr 22, 2008 at 08:24:58AM +0100, Simon Riggs wrote:
> The way MERGE works we first test to see if it matches or not, then if
> not matched we would activate the NOT MATCHED action, which standard
> says must be an insert. The gap between the two actions allows a race
> condition to exist.
>
> We could close the gap by taking a lock on the row when we perform the
> is-matched test, but that would be expensive for bulk operations. ISTM
> the lock should be optional. Not sure what the default should be. Input
> welcome.

ISTM that if the original select does a SELECT FOR UPDATE then it
should work fine for UPDATEs since any update with overwrite the xmax
field anyway.

What you can't do is prevent multiple inserts. Though if its a unique
index you should be able to do the same trick as normal inserts: create
the row, try to insert into the index and if that fails fall back to
doing an update.

What you really need for this though is a non-fatal _bt_check_unique so
you can recover without having a savepoint for every row.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: MERGE Specification

From
Simon Riggs
Date:
On Tue, 2008-04-22 at 10:02 +0200, Martijn van Oosterhout wrote:
> On Tue, Apr 22, 2008 at 08:24:58AM +0100, Simon Riggs wrote:
> > The way MERGE works we first test to see if it matches or not, then if
> > not matched we would activate the NOT MATCHED action, which standard
> > says must be an insert. The gap between the two actions allows a race
> > condition to exist. 
> > 
> > We could close the gap by taking a lock on the row when we perform the
> > is-matched test, but that would be expensive for bulk operations. ISTM
> > the lock should be optional. Not sure what the default should be. Input
> > welcome.
> 
> ISTM that if the original select does a SELECT FOR UPDATE then it
> should work fine for UPDATEs since any update with overwrite the xmax
> field anyway.

Yes, agreed, that's what I meant by the lock on the row.

Incidentally, this is essentially the same problem that occurs with
SERIALIZABLE updates.

It should be easy enough to put an optional "LOCK MATCHED ROW" clause
into the MERGE statement, as an extension. The Standard doesn't specify
the lock timing. 

> What you can't do is prevent multiple inserts. Though if its a unique
> index you should be able to do the same trick as normal inserts: create
> the row, try to insert into the index and if that fails fall back to
> doing an update.

The Standard doesn't really allow that. It's either matched or its not. 

MERGE is specifically
1. Match
2. Update or Insert as per step (1), following complex logic

rather than

1. Update
2. if not matched Insert

which is exactly what the MySQL and Teradata upsert statements do, but
only for single row operations, unlike MERGE.

For MERGE, there is no "lets try one of these and if not, I'll switch".
You decide which it is going to be and then do it. Which can fail... 

I guess we could just spin through, re-testing the match each time and
re-initiating an action, but I see problems there also, not least of
which is it violates the standard. That may not be that clever, but
there may be reasons we can't see yet, or reasons that would affect
other implementors. Guidance, please, if anybody sees clearly?

> What you really need for this though is a non-fatal _bt_check_unique so
> you can recover without having a savepoint for every row.

Oracle simply fails in the event of a uniqueness violation, even though
it logs other errors. DB2 fails unconditionally if there is even a
single error. The MySQL and Teradata syntax don't seem to offer any
protection from concurrent inserts either. Teradata and DB2 both use
locking, so they would lock the value prior to the update anyway, so the
update, insert issue would not happen for them at least.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-21 at 22:27 -0400, Gregory Stark wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> 
> > Unrelated to rule processing, you did read the bit about MERGE and race
> > conditions? ISTM that MERGE as it stands isn't very useful for anything
> > other than large data loads since its going to cause problems if used
> > concurrently.
> 
> If there are race conditions what advantage does it offer over writing plpgsql
> or client code to do it?

That's an excellent question. I'm not trying to sell you anything here.
MERGE is a SQL Standard command, supported by Oracle, DB2, SQLServer
etc, so there is reason enough to implement it.

There may be also reasons to implement other syntaxes, other behaviours,
which would be non-standard. If people want the latter first/second/not
at all then please speak, its not an either-or situation.

I would expect MERGE to be slightly faster than a well coded PL/pgSQL
function, but there won't be too much in it. It will allow the statement
to be more easily parallelised in the form it currently takes, I would
note.

> I thought the whole advantage of having a built-in command is that it could do
> the kind of locking our unique constraints do to avoid race conditions.

As I've said elsewhere, we could have it lock each row, its just more
overhead if we do and not necessary at all for bulk data merging.

I'll presume we want locking as an option, unless people say otherwise.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Gregory Stark
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> As I've said elsewhere, we could have it lock each row, its just more
> overhead if we do and not necessary at all for bulk data merging.
>
> I'll presume we want locking as an option, unless people say otherwise.

It's not so simple. If you look for a row to merge into and don't find one
there's no row to lock. What unique constraints do is hold the lock on the
index page where the entry would have to be added.

That's the trick that plpgsql cannot implement. That's why users are forced to
loop and retry until they manage to do an update successfully or insert
successfully.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: MERGE Specification

From
"A.M."
Date:
On Apr 22, 2008, at 1:47 PM, Simon Riggs wrote:

> On Mon, 2008-04-21 at 22:27 -0400, Gregory Stark wrote:
>> "Simon Riggs" <simon@2ndquadrant.com> writes:
>>
>>> Unrelated to rule processing, you did read the bit about MERGE and  
>>> race
>>> conditions? ISTM that MERGE as it stands isn't very useful for  
>>> anything
>>> other than large data loads since its going to cause problems if  
>>> used
>>> concurrently.
>>
>> If there are race conditions what advantage does it offer over  
>> writing plpgsql
>> or client code to do it?
>
> That's an excellent question. I'm not trying to sell you anything  
> here.
> MERGE is a SQL Standard command, supported by Oracle, DB2, SQLServer
> etc, so there is reason enough to implement it.
>
> There may be also reasons to implement other syntaxes, other  
> behaviours,
> which would be non-standard. If people want the latter first/second/ 
> not
> at all then please speak, its not an either-or situation.
>
> I would expect MERGE to be slightly faster than a well coded PL/pgSQL
> function, but there won't be too much in it. It will allow the  
> statement
> to be more easily parallelised in the form it currently takes, I would
> note.
>
>> I thought the whole advantage of having a built-in command is that  
>> it could do
>> the kind of locking our unique constraints do to avoid race  
>> conditions.
>
> As I've said elsewhere, we could have it lock each row, its just more
> overhead if we do and not necessary at all for bulk data merging.

I was hoping to use MERGE alongside the other standard DML. Its  
purpose is to set the "truth" regardless of past states.

Why should it be relegated to the bulk-loading basement alongside COPY?

Cheers,
M



Re: MERGE Specification

From
Decibel!
Date:
On Apr 22, 2008, at 1:17 PM, Gregory Stark wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
>
>> As I've said elsewhere, we could have it lock each row, its just more
>> overhead if we do and not necessary at all for bulk data merging.
>>
>> I'll presume we want locking as an option, unless people say  
>> otherwise.
>
> It's not so simple. If you look for a row to merge into and don't  
> find one
> there's no row to lock. What unique constraints do is hold the lock  
> on the
> index page where the entry would have to be added.
>
> That's the trick that plpgsql cannot implement. That's why users  
> are forced to
> loop and retry until they manage to do an update successfully or  
> insert
> successfully.

Yeah, hopefully there's a better way to do this other than row locks.

But no matter how this is done, I think we need to handle the race  
conditions, and handle them by default. If people *really* know what  
they're doing, they can disable the row locking (perhaps one way to  
do this would be to grab an explicit lock on the table and have merge  
check for that...).

On a different note, if you intend for the SGML to become the doc  
page for MERGE, I'd really like to see some more complex examples  
showing both delete and more than 2 WHEN cases. Something like the  
"multiple actions on single target row" test case would work.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: MERGE Specification

From
Martijn van Oosterhout
Date:
On Tue, Apr 22, 2008 at 02:19:24PM -0500, Decibel! wrote:
> But no matter how this is done, I think we need to handle the race
> conditions, and handle them by default. If people *really* know what
> they're doing, they can disable the row locking (perhaps one way to
> do this would be to grab an explicit lock on the table and have merge
> check for that...).

I disagree. The spec doesn't require it and MERGE is useful without it.
For a first cut I would say implement as the spec says, race conditions
and all. Later we can think on whether it's worth handling them
directly.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: MERGE Specification

From
Decibel!
Date:
On Apr 22, 2008, at 3:20 PM, Martijn van Oosterhout wrote:

> On Tue, Apr 22, 2008 at 02:19:24PM -0500, Decibel! wrote:
>> But no matter how this is done, I think we need to handle the race
>> conditions, and handle them by default. If people *really* know what
>> they're doing, they can disable the row locking (perhaps one way to
>> do this would be to grab an explicit lock on the table and have merge
>> check for that...).
>
> I disagree. The spec doesn't require it and MERGE is useful without  
> it.
> For a first cut I would say implement as the spec says, race  
> conditions
> and all. Later we can think on whether it's worth handling them
> directly.


That really strikes me as taking the "MySQL route". If push comes to  
shove, I'll take a MERGE with race conditions over no merge at all,  
but I think it's very important that it does the right thing. Just  
because the spec doesn't say anything about it doesn't mean it's ok.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: MERGE Specification

From
Tom Lane
Date:
Decibel! <decibel@decibel.org> writes:
> That really strikes me as taking the "MySQL route". If push comes to  
> shove, I'll take a MERGE with race conditions over no merge at all,  
> but I think it's very important that it does the right thing. Just  
> because the spec doesn't say anything about it doesn't mean it's ok.

Agreed.  It seems to me that in the last set of discussions, we rejected
implementing MERGE precisely because it failed to provide a solution to
the race-condition problem.  I'm not satisfied with a version that
doesn't handle that, because I think that is *exactly* what most people
will try to use it for.  The non-concurrent bulk update case that Simon
is arguing for is the uncommon usage.
        regards, tom lane


Re: MERGE Specification

From
Simon Riggs
Date:
On Thu, 2008-04-24 at 12:19 -0400, Tom Lane wrote:
> Decibel! <decibel@decibel.org> writes:
> > That really strikes me as taking the "MySQL route". If push comes to  
> > shove, I'll take a MERGE with race conditions over no merge at all,  
> > but I think it's very important that it does the right thing. Just  
> > because the spec doesn't say anything about it doesn't mean it's ok.
> 
> Agreed.  It seems to me that in the last set of discussions, we rejected
> implementing MERGE precisely because it failed to provide a solution to
> the race-condition problem.  I'm not satisfied with a version that
> doesn't handle that, because I think that is *exactly* what most people
> will try to use it for.  The non-concurrent bulk update case that Simon
> is arguing for is the uncommon usage.

If y'all think that, then I will do it that way.

The only protection from the race condition is to do the Insert first. 

With MERGE, we would need to do it like this:

1. If there are any WHEN NOT MATCHED clauses that trigger INSERTs, then
attempt to apply them first, no matter what order they were in with
respect to the WHEN MATCHED clauses. Start loop at step (3) every time.
If there aren't any, start loop straight at step (5). Note that we would
need to check to see that INSERTs had not been removed by Rules.

2. For each row retrieved by outer join, goto either step 3 or 5 as
established before the loop starts.

3. Try to apply the WHEN NOT MATCHED clauses. The ordering of the
clauses with respect to each other will remain exactly as stated. If one
of the clauses activates an INSERT, we start an internal subtransaction
and perform the INSERT action. If it succeeds, we commit the
subtransaction and continue.

4. If the INSERT fails with a uniqueness violation, we shrug. The ERROR
has caused the subtransaction to abort.

5. Process WHEN MATCHED clauses and continue.

Technically, this is a standards violation because of the potentially
out-of-order execution of the when clauses. Though the end result is not
distinguishable from standards compliant behaviour, AFAICS.

Note that in step 3 we *must* use subtransactions if there is more than
1 unique index on a table, otherwise we might succeed with the first
index and fail with the second. Using a subtransaction per row pretty
much rules out an efficient bulk load.

Note also that this results in a version optimised for INSERT. If we end
up doing an UPDATE there will be two dead rows, probably in two separate
blocks. We hope that doesn't matter because of HOT.

There's probably a reasonable argument for having an optional keyword to
make MERGE behave differently for bulk loads, but I'll save that now for
another day. Focus now is on a command that works well for OLTP cases.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Chris Browne
Date:
decibel@decibel.org (Decibel!) writes:

> On Apr 22, 2008, at 1:17 PM, Gregory Stark wrote:
>> "Simon Riggs" <simon@2ndquadrant.com> writes:
>>
>>> As I've said elsewhere, we could have it lock each row, its just more
>>> overhead if we do and not necessary at all for bulk data merging.
>>>
>>> I'll presume we want locking as an option, unless people say
>>> otherwise.
>>
>> It's not so simple. If you look for a row to merge into and don't
>> find one
>> there's no row to lock. What unique constraints do is hold the lock
>> on the
>> index page where the entry would have to be added.
>>
>> That's the trick that plpgsql cannot implement. That's why users
>> are forced to
>> loop and retry until they manage to do an update successfully or
>> insert
>> successfully.
>
> Yeah, hopefully there's a better way to do this other than row locks.
>
> But no matter how this is done, I think we need to handle the race
> conditions, and handle them by default. If people *really* know what
> they're doing, they can disable the row locking (perhaps one way to
> do this would be to grab an explicit lock on the table and have merge
> check for that...).

I agree that handling the race conditions by default is preferable.

Consider: An excellent reason to prefer MERGE is if it handles race
conditions that would otherwise require application code be more
carefully and cleverly written to avoid the race conditions.

If MERGE "solves it automatically," and eliminates hand-written code,
that's TWO benefits, that quite likely outweigh any performance costs.

I know we've had cases where race conditions [that a well-done MERGE
would probably solve easily] bit us badly, requiring considerable
development effort, and the addition of extra table columns and such.

There are some possibilities of "worst case" MERGE being "pretty
slow;" it's likely to be faster than the alternatives we used in its
absence ;-).
-- 
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/wp.html
"I once went  to a shrink.  He  told me to speak freely.   I did.  The
damn fool tried to charge me $90 an hour."
-- jimjr@qis.net (Jim Moore Jr)


Re: MERGE Specification

From
Robert Treat
Date:
On Thursday 24 April 2008 12:19, Tom Lane wrote:
> Decibel! <decibel@decibel.org> writes:
> > That really strikes me as taking the "MySQL route". If push comes to
> > shove, I'll take a MERGE with race conditions over no merge at all,
> > but I think it's very important that it does the right thing. Just
> > because the spec doesn't say anything about it doesn't mean it's ok.
>
> Agreed.  It seems to me that in the last set of discussions, we rejected
> implementing MERGE precisely because it failed to provide a solution to
> the race-condition problem.  I'm not satisfied with a version that
> doesn't handle that, because I think that is *exactly* what most people
> will try to use it for.  The non-concurrent bulk update case that Simon
> is arguing for is the uncommon usage.
>

Perhaps a better option would be to implement Merge per spec, and then 
implement a "replace into" command for the oltp scenario.  This way you keep 
the spec behavior for the spec syntax, and have a clearly non-spec command 
for non-spec behavior. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: MERGE Specification

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Perhaps a better option would be to implement Merge per spec, and then 
> implement a "replace into" command for the oltp scenario.  This way you keep 
> the spec behavior for the spec syntax, and have a clearly non-spec command 
> for non-spec behavior. 

In that case, it's a fair question to ask just who will use the "spec"
syntax.  As far as I can tell from years of watching the mailing lists,
there is plenty of demand for a concurrent-safe insert-or-update
behavior, and *exactly zero* demand for the other.  I challenge you to
find even one request for the "spec" behavior in the mailing list
archives.  (Simon doesn't count.)

I recently came across the expression "YAGNI", and think it's probably
pretty relevant to this discussion:
http://en.wikipedia.org/wiki/You_Ain't_Gonna_Need_It
        regards, tom lane


Re: MERGE Specification

From
Hannu Krosing
Date:
On Tue, 2008-04-22 at 00:24 +0100, Simon Riggs wrote:
> On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
> 
> > "MERGE will not invoke Rules." Does this imply that MERGE cannot be  
> > used on views or that the resulting INSERTs or UPDATEs do not work on  
> > views?
> 
> Yes, that's right. Just like COPY. That seems fine to me because you're
> likely to be doing a MERGE immediately after a COPY anyway, so the
> restriction just continues.

May be the bulk data merging variant of MERGE to be used after initial
COPY should be a variant of COPY with special keyword(s) instead of
MERGE ?

------------
Hannu




Re: MERGE Specification

From
Simon Riggs
Date:
On Thu, 2008-04-24 at 23:40 -0400, Tom Lane wrote:
> Robert Treat <xzilla@users.sourceforge.net> writes:
> > Perhaps a better option would be to implement Merge per spec, and then 
> > implement a "replace into" command for the oltp scenario.  This way you keep 
> > the spec behavior for the spec syntax, and have a clearly non-spec command 
> > for non-spec behavior. 
> 
> In that case, it's a fair question to ask just who will use the "spec"
> syntax.  As far as I can tell from years of watching the mailing lists,
> there is plenty of demand for a concurrent-safe insert-or-update
> behavior, and *exactly zero* demand for the other.  I challenge you to
> find even one request for the "spec" behavior in the mailing list
> archives.  (Simon doesn't count.)

A Freudian slip? Hopefully, you meant "apart from Simon's request."  ;-)

> I recently came across the expression "YAGNI", and think it's probably
> pretty relevant to this discussion:
> http://en.wikipedia.org/wiki/You_Ain't_Gonna_Need_It

In matters of technical implementation, I follow you almost without
question, and very happily so.

I think all of us should be careful when expressing views on what other
people need or don't need. We sleep soundly after having given such an
opinion, but that doesn't make those opinions valid. I'm not sure if
there is a pithy acronym for that thought.

In this case, I had already agreed to do it the safe way, for OLTP. I
believe there is a need for other behaviour as well, but that isn't the
use case that the majority are expressing at this time.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Martijn van Oosterhout
Date:
On Thu, Apr 24, 2008 at 11:40:22PM -0400, Tom Lane wrote:
> In that case, it's a fair question to ask just who will use the "spec"
> syntax.  As far as I can tell from years of watching the mailing lists,
> there is plenty of demand for a concurrent-safe insert-or-update
> behavior, and *exactly zero* demand for the other.  I challenge you to
> find even one request for the "spec" behavior in the mailing list
> archives.  (Simon doesn't count.)

I could have used something like this a few years ago. I don't think it
would get mentioned on the lists, because frankly it's not something I
would've expected a DBMS to handle internally. Certainly I'd never
heard of the MERGE command until recently. I just wrote a program to do
it (and no, race conditions wern't an issue).

Making a race condition free version is fine, just as long as merging
on a condition without a unique index is also supported.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: MERGE Specification

From
Petr Jelinek
Date:
Tom Lane wrote:
> Robert Treat <xzilla@users.sourceforge.net> writes:
>> Perhaps a better option would be to implement Merge per spec, and then 
>> implement a "replace into" command for the oltp scenario.  This way you keep 
>> the spec behavior for the spec syntax, and have a clearly non-spec command 
>> for non-spec behavior. 
> 
> In that case, it's a fair question to ask just who will use the "spec"
> syntax.  As far as I can tell from years of watching the mailing lists,
> there is plenty of demand for a concurrent-safe insert-or-update
> behavior, and *exactly zero* demand for the other.  I challenge you to
> find even one request for the "spec" behavior in the mailing list
> archives.  (Simon doesn't count.)

While I agree that there is zero demand for anything else then "UPSERT" 
version of MERGE INTO, I don't think that doing the *exact opposite* of 
what SQL standard says without any change of syntax to clearly 
differentiate the behavior is best thing to do.

Another thing is, the table on which we do SELECT (the one in USING) can 
be different from target table and we can use columns from that table in 
INSERT/UPDATE statement (probably one the reasons why spec says the 
"SELECT" query has to be executed before any changes). How you want to 
use the "INSERT first" implementation in this scenario ? IMHO you still 
need to have both implementations in the end. So we probably need to 
implement the standard one first and then implement our version and put 
some restrictions of what can be in USING or INSERT part when using it.

Regards
Petr Jelinek
-- 
Regards
Petr Jelinek (PJMODOS)


Re: MERGE Specification

From
Simon Riggs
Date:
On Thu, 2008-04-24 at 20:28 +0100, Simon Riggs wrote:

> With MERGE, we would need to do it like this:
> 
> 1. If there are any WHEN NOT MATCHED clauses that trigger INSERTs, then
> attempt to apply them first, no matter what order they were in with
> respect to the WHEN MATCHED clauses. Start loop at step (3) every time.
> If there aren't any, start loop straight at step (5). Note that we would
> need to check to see that INSERTs had not been removed by Rules.

This would only be needed when there is at least one unique index.

MERGE will work, whether or not a unique index exists. If there is no
unique index then MERGE will silently ignore duplicate rows produced by
concurrent INSERTs, though that is no different from what can occur now.

No special action will be required to handle DELETEs, since attempting
to delete a row twice doesn't produce an error.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Simon Riggs
Date:
On Fri, 2008-04-25 at 11:07 +0200, Petr Jelinek wrote:

> Another thing is, the table on which we do SELECT (the one in USING) can 
> be different from target table and we can use columns from that table in 
> INSERT/UPDATE statement (probably one the reasons why spec says the 
> "SELECT" query has to be executed before any changes). How you want to 
> use the "INSERT first" implementation in this scenario ? IMHO you still 
> need to have both implementations in the end. So we probably need to 
> implement the standard one first and then implement our version and put 
> some restrictions of what can be in USING or INSERT part when using it.

We do this:

1. Run using clause,

then for each row

2. NOT MATCHED rules
3. MATCHED

I'm now happy that we can get a spec-compliant end result by always
forcing NOT MATCHED rules to occur before MATCHED rules, when we have at
least one unique index.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Decibel!
Date:
On Apr 25, 2008, at 3:28 AM, Simon Riggs wrote:
>> I recently came across the expression "YAGNI", and think it's  
>> probably
>> pretty relevant to this discussion:
>> http://en.wikipedia.org/wiki/You_Ain't_Gonna_Need_It
>
> In matters of technical implementation, I follow you almost without
> question, and very happily so.
>
> I think all of us should be careful when expressing views on what  
> other
> people need or don't need. We sleep soundly after having given such an
> opinion, but that doesn't make those opinions valid. I'm not sure if
> there is a pithy acronym for that thought.


Agreed. Many people on hackers don't actually deal with production  
systems, so it's easy to look at things from an academic standpoint  
and not a practical one. This is generally a Good Thing (I think  
MySQL is an example of what happens when you don't do that), but it  
does need to be balanced by real-world needs. And not all of those  
needs are always well represented on the lists.

In this case, I have bulk-load code that could certainly use MERGE.  
It's not that hard to write code that will handle this in a way  
that's not safe from race conditions, so it's unlikely that we'll see  
that many requests, but that doesn't mean a fast MERGE wouldn't be  
useful. It certainly would have saved me some effort, and it would  
probably out-perform the current code.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Re: MERGE Specification

From
Simon Riggs
Date:
On Fri, 2008-04-25 at 10:03 +0300, Hannu Krosing wrote:
> On Tue, 2008-04-22 at 00:24 +0100, Simon Riggs wrote:
> > On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
> > 
> > > "MERGE will not invoke Rules." Does this imply that MERGE cannot be  
> > > used on views or that the resulting INSERTs or UPDATEs do not work on  
> > > views?
> > 
> > Yes, that's right. Just like COPY. That seems fine to me because you're
> > likely to be doing a MERGE immediately after a COPY anyway, so the
> > restriction just continues.
> 
> May be the bulk data merging variant of MERGE to be used after initial
> COPY should be a variant of COPY with special keyword(s) instead of
> MERGE ?

That does sound like a good way of differentiating between the OLTP and
bulk loading cases.

I'll bear that in mind as we develop.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Andrew Dunstan
Date:

Robert Treat wrote:
>
> Perhaps a better option would be to implement Merge per spec, and then 
> implement a "replace into" command for the oltp scenario.  This way you keep 
> the spec behavior for the spec syntax, and have a clearly non-spec command 
> for non-spec behavior. 
>   

MySQL's "REPLACE INTO" is *NOT* semantically equivalent to any flavor of 
"insert or update". It is "delete plus insert". They do have "INSERT ... 
ON DUPLICATE KEY UPDATE ..."


Presumably, if we implement MERGE with transaction-safe semantics, which 
Simon has agreed to do, we would not need to consider anything like the 
latter, but we might still want to consider REPLACE INTO (in the MySQL 
sense).

cheers

andrew


Re: MERGE Specification

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> I'm now happy that we can get a spec-compliant end result by always
> forcing NOT MATCHED rules to occur before MATCHED rules, when we have at
> least one unique index.

... and raise an ERROR when there is no unique index?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: MERGE Specification

From
Robert Treat
Date:
On Thursday 24 April 2008 23:40, Tom Lane wrote:
> Robert Treat <xzilla@users.sourceforge.net> writes:
> > Perhaps a better option would be to implement Merge per spec, and then
> > implement a "replace into" command for the oltp scenario.  This way you
> > keep the spec behavior for the spec syntax, and have a clearly non-spec
> > command for non-spec behavior.
>
> In that case, it's a fair question to ask just who will use the "spec"
> syntax.  As far as I can tell from years of watching the mailing lists,
> there is plenty of demand for a concurrent-safe insert-or-update
> behavior, and *exactly zero* demand for the other.  I challenge you to
> find even one request for the "spec" behavior in the mailing list
> archives.  (Simon doesn't count.)
>

AIUI the current implementation is designed to avoid race conditions partially 
at the cost of being insert friendly and somewhat update unfriendly. My guess 
is that most of the people wanting this for OLTP use want an update friendly 
implementation (that's certainly been the majority of cases I've needed 
myself, and that I have seen others use). 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: MERGE Specification

From
Hannes Dorbath
Date:
Simon Riggs wrote:
> On Fri, 2008-04-25 at 10:03 +0300, Hannu Krosing wrote:
>> On Tue, 2008-04-22 at 00:24 +0100, Simon Riggs wrote:
>>> On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
>>>
>>>> "MERGE will not invoke Rules." Does this imply that MERGE cannot be  
>>>> used on views or that the resulting INSERTs or UPDATEs do not work on  
>>>> views?
>>> Yes, that's right. Just like COPY. That seems fine to me because you're
>>> likely to be doing a MERGE immediately after a COPY anyway, so the
>>> restriction just continues.
>> May be the bulk data merging variant of MERGE to be used after initial
>> COPY should be a variant of COPY with special keyword(s) instead of
>> MERGE ?
> 
> That does sound like a good way of differentiating between the OLTP and
> bulk loading cases.
> 
> I'll bear that in mind as we develop.

To me, a simple user, it'd be important that MERGE implementation does 
not place any unpredictable restrictions. For example in Oracle you can 
break any MERGE statement by placing a full text index on the table. So 
I'd really expect a MERGE in PostgreSQL to be fine with views, rules, 
tsearch, etc.

Just my 2 cent.


-- 
Best regards,
Hannes Dorbath


Re: MERGE Specification

From
Simon Riggs
Date:
On Fri, 2008-04-25 at 09:10 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> 
> > I'm now happy that we can get a spec-compliant end result by always
> > forcing NOT MATCHED rules to occur before MATCHED rules, when we have at
> > least one unique index.
> 
> ... and raise an ERROR when there is no unique index?

No, I think an ERROR is not required, nor desirable.

In the absence of a unique index we allow exactly duplicate rows to
exist in a table. This is effectively user defined behaviour, albeit the
default setting.

We have two choices of behaviour:

1. If a MERGE statement runs and sees a row in the target table is NOT
MATCHED then it will insert a row. It is possible that a concurrent
MERGE statement could also see the row in the target table as NOT
MATCHED and then insert a duplicate row.

2. In the absence of a Unique Index, throw an ERROR because a concurrent
MERGE *might* result in duplicate Inserts. (i.e. prevent the above).

(1) is a situation possible with concurrent INSERTs into a table without
a unique index, so I see no reason to make MERGE follow (2) when INSERTs
do not.

Also, it is possible for a MERGE to generate duplicate rows in a table
if the INSERT clause contained constants for example. In the absence of
an applicable rule the MERGE will generate INSERT DEFAULT VALUES, i.e.
an all-constant insert will take place. So the MERGE spec allows the
inserting of duplicate rows without error.

We could include additional options to control this behaviour, if anyone
thinks it worthwhile, but ISTM more restrictive than protective.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
"Marko Kreen"
Date:
On 4/25/08, Robert Treat <xzilla@users.sourceforge.net> wrote:
> On Thursday 24 April 2008 23:40, Tom Lane wrote:
>  > Robert Treat <xzilla@users.sourceforge.net> writes:
>  > > Perhaps a better option would be to implement Merge per spec, and then
>  > > implement a "replace into" command for the oltp scenario.  This way you
>  > > keep the spec behavior for the spec syntax, and have a clearly non-spec
>  > > command for non-spec behavior.
>  >
>  > In that case, it's a fair question to ask just who will use the "spec"
>  > syntax.  As far as I can tell from years of watching the mailing lists,
>  > there is plenty of demand for a concurrent-safe insert-or-update
>  > behavior, and *exactly zero* demand for the other.  I challenge you to
>  > find even one request for the "spec" behavior in the mailing list
>  > archives.  (Simon doesn't count.)
>  >
>
>
> AIUI the current implementation is designed to avoid race conditions partially
>  at the cost of being insert friendly and somewhat update unfriendly. My guess
>  is that most of the people wanting this for OLTP use want an update friendly
>  implementation (that's certainly been the majority of cases I've needed
>  myself, and that I have seen others use).

This seems to hint that there should be 2 variants of merge/upsert
- insert-friendly and update-friendly...  It seems unlikely one implementation
can be both.  And especially bad would be implementation that is neither.

-- 
marko


Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-28 at 11:57 +0300, Marko Kreen wrote:
> On 4/25/08, Robert Treat <xzilla@users.sourceforge.net> wrote:
> > On Thursday 24 April 2008 23:40, Tom Lane wrote:
> >  > Robert Treat <xzilla@users.sourceforge.net> writes:
> >  > > Perhaps a better option would be to implement Merge per spec, and then
> >  > > implement a "replace into" command for the oltp scenario.  This way you
> >  > > keep the spec behavior for the spec syntax, and have a clearly non-spec
> >  > > command for non-spec behavior.
> >  >
> >  > In that case, it's a fair question to ask just who will use the "spec"
> >  > syntax.  As far as I can tell from years of watching the mailing lists,
> >  > there is plenty of demand for a concurrent-safe insert-or-update
> >  > behavior, and *exactly zero* demand for the other.  I challenge you to
> >  > find even one request for the "spec" behavior in the mailing list
> >  > archives.  (Simon doesn't count.)
> >  >
> >
> >
> > AIUI the current implementation is designed to avoid race conditions partially
> >  at the cost of being insert friendly and somewhat update unfriendly. My guess
> >  is that most of the people wanting this for OLTP use want an update friendly
> >  implementation (that's certainly been the majority of cases I've needed
> >  myself, and that I have seen others use).
> 
> This seems to hint that there should be 2 variants of merge/upsert
> - insert-friendly and update-friendly...  It seems unlikely one implementation
> can be both.  And especially bad would be implementation that is neither.

Not sure what an option that was "neither" would look like ...

I would summarise the two MERGE behaviour proposals as

1. Correctly protects against concurrent inserts. Uses one
sub-transaction per row and leaves 2 dead rows per update. Requires us
to perform tasks in different order than required by SQL spec, but the
end result seems identical to me (now).
Has been noted as suitable for OLTP, and poor for bulk data maintenance.
Has been described as "insert-friendly" and "non-spec".

2. Does not protect against concurrent inserts. Leaves 1 dead row per
update. Much more efficient for updates, not sure about any efficiency
gain for inserts.
Has been noted as being unsuitable for OLTP, though likely to offer more
acceptable performance for bulk operations.
Has been described as "update-friendly".

By consensus, I'm doing (1). 

It looks likely that doing (2) should be fairly small change and so can
be offered as an option. For example, we can have an additional
action-order clause with two options (the first of which is default)
[INSERT BEFORE UPDATE ACTION ORDER | DEFAULT ACTION ORDER]
So the default is to force inserts to occur before updates, as required
by (1). The other option "DEFAULT ACTION ORDER" tests the WHEN clauses
in the order specified in the statement, allowing the user to choose
whether they want to test for updates or inserts first.

Overall, the difference between these behaviours is small in comparison
with making MERGE work in the first place...

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: MERGE Specification

From
Simon Riggs
Date:
On Mon, 2008-04-21 at 21:08 +0100, Simon Riggs wrote:
> The following two files specify the behaviour of the MERGE statement and
> how it will work in the world of PostgreSQL.

> The HTML file was generated from SGML source, though the latter is not
> included here for clarity.

Enclose merge.sgml docs for forthcoming MERGE command, as originally
written.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services

Attachment

Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 05/08/10 10:46, Simon Riggs wrote:
> On Mon, 2008-04-21 at 21:08 +0100, Simon Riggs wrote:
>> The following two files specify the behaviour of the MERGE statement and
>> how it will work in the world of PostgreSQL.
>
>> The HTML file was generated from SGML source, though the latter is not
>> included here for clarity.
>
> Enclose merge.sgml docs for forthcoming MERGE command, as originally
> written.

Oh, cool, I wasn't aware you had written that already. Boxuan, please 
include this in your patch, after reviewing and removing/editing 
anything that doesn't apply to your patch.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Simon Riggs
Date:
On Thu, 2010-08-05 at 12:29 +0300, Heikki Linnakangas wrote:
> On 05/08/10 10:46, Simon Riggs wrote:
> > On Mon, 2008-04-21 at 21:08 +0100, Simon Riggs wrote:
> >> The following two files specify the behaviour of the MERGE statement and
> >> how it will work in the world of PostgreSQL.
> >
> >> The HTML file was generated from SGML source, though the latter is not
> >> included here for clarity.
> >
> > Enclose merge.sgml docs for forthcoming MERGE command, as originally
> > written.
>
> Oh, cool, I wasn't aware you had written that already. Boxuan, please
> include this in your patch, after reviewing and removing/editing
> anything that doesn't apply to your patch.

Also had these fragments as well, if they're still useful. Probably just
useful as pointers as to what else to change to include the docs.


The tests and docs were written from SQL standard, so any deviations
would need to be flagged. The idea of writing the tests first was that
they provide an objective test of whether the implementation works
according to spec.

I'd quite like a commentary on anything that needs changing. Not saying
I will necessarily object to differences, but knowing the differences
sounds important for us.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services

Attachment

Re: MERGE Specification

From
Boxuan Zhai
Date:


On Thu, Aug 5, 2010 at 7:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, 2010-08-05 at 12:29 +0300, Heikki Linnakangas wrote:
> On 05/08/10 10:46, Simon Riggs wrote:
> > On Mon, 2008-04-21 at 21:08 +0100, Simon Riggs wrote:
> >> The following two files specify the behaviour of the MERGE statement and
> >> how it will work in the world of PostgreSQL.
> >
> >> The HTML file was generated from SGML source, though the latter is not
> >> included here for clarity.
> >
> > Enclose merge.sgml docs for forthcoming MERGE command, as originally
> > written.
>
> Oh, cool, I wasn't aware you had written that already. Boxuan, please
> include this in your patch, after reviewing and removing/editing
> anything that doesn't apply to your patch.

 
Thanks a lot for the instruction file of MERGE command. I have read through it carefully. It is really a great work. I have to admit that I am not familiar with the sgml language, and I cannot write the instruction by myself. 
 
All features of MERGE demonstrated in this file are consistent with my implementation, EXCEPT the DO NOTHING option. In current edition, we don't have the DO NOTHING action type. That is, during the execution of MERGE commands, if one tuple is not caught by any of the merge actions, it will be ignored. In another word, DO NOTING (although cannot be specified explicitly by user) is the DEFAULT action for tuples.
 
In the contrary, Simon's instruction says that the DEFAULT action for the tuple caught by no actions is
WHEN NOT MATCHED THEN INSERT DEFAULT VALUES
 
From the user's point of view, these two kinds of MERGE command may have not much differences. But, as the coder, I prefer current setting, because we can save the implementation for a new type of MERGE actions (DO NOTHING is a special merge action type). And, thus, no checks and special process for it. (For example, we need to make sure that DO NOTHING is the last WHEN clause, and it has no additional qual. And we have to generate a INSERT DEFAULT VALUES action for the MERGE command if we don't find the DO NOTHING action)
 
Well, if people want the DO NOTHING action, I will add it in the system.
 
 
Now, I have changed the RULE strategy of MERGE to the better logic. And I am working on triggers for MERGE, which is also mentioned in the instruction file. I will build a new patch with no long comment and blank line around functions, and possibly contain the regress test file and this sgml instructions in it.
 
I wish we can reach a agreement on the DO NOTHING thing before my next submission, so I can make necessary modification on my code for it. (the new patch may be finished in one or two days, I think)
 
Thanks!
 
PS: I have an embarrassing question: how to view the sgml instructions of postgres in web page form, rather than read the source code of them?
 
Also had these fragments as well, if they're still useful. Probably just
useful as pointers as to what else to change to include the docs.


The tests and docs were written from SQL standard, so any deviations
would need to be flagged. The idea of writing the tests first was that
they provide an objective test of whether the implementation works
according to spec.

I'd quite like a commentary on anything that needs changing. Not saying
I will necessarily object to differences, but knowing the differences
sounds important for us.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services

Re: MERGE Specification

From
David Fetter
Date:
On Thu, Aug 05, 2010 at 09:55:29PM +0800, Boxuan Zhai wrote:
> On Thu, Aug 5, 2010 at 7:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> 
> > On Thu, 2010-08-05 at 12:29 +0300, Heikki Linnakangas wrote:
> > > On 05/08/10 10:46, Simon Riggs wrote:
> > > > On Mon, 2008-04-21 at 21:08 +0100, Simon Riggs wrote:
> > > >> The following two files specify the behaviour of the MERGE statement
> > and
> > > >> how it will work in the world of PostgreSQL.
> > > >
> > > >> The HTML file was generated from SGML source, though the latter is not
> > > >> included here for clarity.
> > > >
> > > > Enclose merge.sgml docs for forthcoming MERGE command, as originally
> > > > written.
> > >
> > > Oh, cool, I wasn't aware you had written that already. Boxuan, please
> > > include this in your patch, after reviewing and removing/editing
> > > anything that doesn't apply to your patch.
> >
> Thanks a lot for the instruction file of MERGE command. I have read through
> it carefully. It is really a great work. I have to admit that I am not
> familiar with the sgml language, and I cannot write the instruction by
> myself.

It's really not super complicated.  It's quite like (and ancestral to)
HTML.

> All features of MERGE demonstrated in this file are consistent with my
> implementation, EXCEPT the DO NOTHING option. In current edition, we don't
> have the DO NOTHING action type. That is, during the execution of MERGE
> commands, if one tuple is not caught by any of the merge actions, it will be
> ignored. In another word, DO NOTING (although cannot be specified explicitly
> by user) is the DEFAULT action for tuples.
> 
> In the contrary, Simon's instruction says that the DEFAULT action for the
> tuple caught by no actions is
> WHEN NOT MATCHED THEN INSERT DEFAULT VALUES

I believe that the SQL standard specifies this behavior, and I don't
think we have a compelling reason to do something different from what
the SQL standard specifies.

> Well, if people want the DO NOTHING action, I will add it in the system.

That'd be great :)

>  Now, I have changed the RULE strategy of MERGE to the better logic. And I
> am working on triggers for MERGE, which is also mentioned in the instruction
> file. I will build a new patch with no long comment and blank line around
> functions, and possibly contain the regress test file and this sgml
> instructions in it.
> 
> I wish we can reach a agreement on the DO NOTHING thing before my next
> submission, so I can make necessary modification on my code for it. (the new
> patch may be finished in one or two days, I think)
> 
> Thanks!
> 
> PS: I have an embarrassing question: how to view the sgml instructions of
> postgres in web page form, rather than read the source code of them?

After you've built postgresql, do this:

cd doc/src/sgml
make

Then you can point a web browser at the doc/src/sgml/html/index.html
(and similar)

http://www.postgresql.org/docs/current/static/docguide.html

has information about the tools you will need for the above to work.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: MERGE Specification

From
Simon Riggs
Date:
On Thu, 2010-08-05 at 21:55 +0800, Boxuan Zhai wrote:

> In the contrary, Simon's instruction says that the DEFAULT action for
> the tuple caught by no actions is 
> WHEN NOT MATCHED THEN INSERT DEFAULT VALUES
>  
> From the user's point of view, these two kinds of MERGE command may
> have not much differences. But, as the coder, I prefer current
> setting, because we can save the implementation for a new type
> of MERGE actions (DO NOTHING is a special merge action type). And,
> thus, no checks and special process for it. (For example, we need to
> make sure that DO NOTHING is the last WHEN clause, and it has no
> additional qual. And we have to generate a INSERT DEFAULT VALUES
> action for the MERGE command if we don't find the DO NOTHING action)
>  
> Well, if people want the DO NOTHING action, I will add it in the
> system. 

This is only important when using AND <search condition>, so its not
important for the common UPSERT case of unconditional UPDATE/INSERT.

Personally, I would prefer the default action to be RAISE ERROR or
similar. Otherwise its just too easy to get complex logic wrong and lose
a few rows without noticing. If that was the case then you would
definitely need DO NOTHING when you explicitly wanted to lose a few
rows.

You may think that's a bit strong, but consider that PostgreSQL uses
default => ERROR in vast majority of switch() statements. I think its a
safe coding practice and the annoyance of having run-time errors is much
better than losing rows.
The INSERT DEFAULT VALUES was behaviour taken from another DBMS, its not
part of the standard AFAICS.

> Now, I have changed the RULE strategy of MERGE to the better logic.
> And I am working on triggers for MERGE, which is also mentioned in the
> instruction file. I will build a new patch with no long comment and
> blank line around functions, and possibly contain the regress test
> file and this sgml instructions in it.
>  
> I wish we can reach a agreement on the DO NOTHING thing before my next
> submission, so I can make necessary modification on my code for
> it. (the new patch may be finished in one or two days, I think)
>  
> Thanks!
>  
> PS: I have an embarrassing question: how to view the sgml instructions
> of postgres in web page form, rather than read the source code of
> them?

If you edit the files, as shown in the patches here, then you just need
to drop into the doc/sgml/src directory and type "make". The SGML will
then be compiled into HTML and you can view the resulting file directly
in your web browser.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 05/08/10 17:22, Simon Riggs wrote:
> On Thu, 2010-08-05 at 21:55 +0800, Boxuan Zhai wrote:
>
>> In the contrary, Simon's instruction says that the DEFAULT action for
>> the tuple caught by no actions is
>> WHEN NOT MATCHED THEN INSERT DEFAULT VALUES
>>
>>  From the user's point of view, these two kinds of MERGE command may
>> have not much differences. But, as the coder, I prefer current
>> setting, because we can save the implementation for a new type
>> of MERGE actions (DO NOTHING is a special merge action type). And,
>> thus, no checks and special process for it. (For example, we need to
>> make sure that DO NOTHING is the last WHEN clause, and it has no
>> additional qual. And we have to generate a INSERT DEFAULT VALUES
>> action for the MERGE command if we don't find the DO NOTHING action)
>>
>> Well, if people want the DO NOTHING action, I will add it in the
>> system.
>
> This is only important when using AND<search condition>, so its not
> important for the common UPSERT case of unconditional UPDATE/INSERT.

Assuming the default action if no other action matches is to do nothing, 
then an explicit DO NOTHING is just a convenience. You can have the same 
effect by having an "AND NOT <condition>" to all the actions following 
the DO NOTHING action. I admit it's quite handy, but let's avoid 
PostgreSQL extensions at this point.

> Personally, I would prefer the default action to be RAISE ERROR or
> similar. Otherwise its just too easy to get complex logic wrong and lose
> a few rows without noticing. If that was the case then you would
> definitely need DO NOTHING when you explicitly wanted to lose a few
> rows.
>
> You may think that's a bit strong, but consider that PostgreSQL uses
> default =>  ERROR in vast majority of switch() statements. I think its a
> safe coding practice and the annoyance of having run-time errors is much
> better than losing rows.
>
> The INSERT DEFAULT VALUES was behaviour taken from another DBMS, its not
> part of the standard AFAICS.

What does the standard say about this? We should follow the standard, I 
don't see enough reason to deviate here.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Simon Riggs
Date:
On Thu, 2010-08-05 at 18:17 +0300, Heikki Linnakangas wrote:
> On 05/08/10 17:22, Simon Riggs wrote:
> > On Thu, 2010-08-05 at 21:55 +0800, Boxuan Zhai wrote:
> >
> >> In the contrary, Simon's instruction says that the DEFAULT action for
> >> the tuple caught by no actions is
> >> WHEN NOT MATCHED THEN INSERT DEFAULT VALUES
> >>
> >>  From the user's point of view, these two kinds of MERGE command may
> >> have not much differences. But, as the coder, I prefer current
> >> setting, because we can save the implementation for a new type
> >> of MERGE actions (DO NOTHING is a special merge action type). And,
> >> thus, no checks and special process for it. (For example, we need to
> >> make sure that DO NOTHING is the last WHEN clause, and it has no
> >> additional qual. And we have to generate a INSERT DEFAULT VALUES
> >> action for the MERGE command if we don't find the DO NOTHING action)
> >>
> >> Well, if people want the DO NOTHING action, I will add it in the
> >> system.
> >
> > This is only important when using AND<search condition>, so its not
> > important for the common UPSERT case of unconditional UPDATE/INSERT.
> 
> Assuming the default action if no other action matches is to do nothing, 
> then an explicit DO NOTHING is just a convenience. You can have the same 
> effect by having an "AND NOT <condition>" to all the actions following 
> the DO NOTHING action. I admit it's quite handy, but let's avoid 
> PostgreSQL extensions at this point.

err...

* DELETE is an extension to the standard, though supported by Oracle,
DB2 and SQLServer and damn useful

* INSERT DEFAULT VALUES is an extension to the standard, though matches
options on the normal INSERT clause

* rule support is an extension to the standard

* It appears we would be in violation of the standard on 
14.12 General Rule 6 a) i) 2) B), p.890
(Oh, I wish I was joking, there really is such a paragraph number)
which specifies that the join between source and target table must not
return multiple rows or must return "cardinality violation". That's
pretty difficult thing to check and not very useful if it does do that.

anyway, that list isn't an argument in favour of change. The argument in
favour of a fail-safe default is that it is a safe coding practice that
the PostgreSQL project already uses itself. The only way to write a safe
MERGE SQL statement is with an extension to the standard...

Principle of minimal extension would mean we only need to support RAISE
ERROR, to allow people to specify they actively want statement to fail
if the list of WHEN clauses does not produce a match.

> > Personally, I would prefer the default action to be RAISE ERROR or
> > similar. Otherwise its just too easy to get complex logic wrong and lose
> > a few rows without noticing. If that was the case then you would
> > definitely need DO NOTHING when you explicitly wanted to lose a few
> > rows.
> >
> > You may think that's a bit strong, but consider that PostgreSQL uses
> > default =>  ERROR in vast majority of switch() statements. I think its a
> > safe coding practice and the annoyance of having run-time errors is much
> > better than losing rows.
> >
> > The INSERT DEFAULT VALUES was behaviour taken from another DBMS, its not
> > part of the standard AFAICS.
> 
> What does the standard say about this? We should follow the standard, I 
> don't see enough reason to deviate here.

I checked the standard before commenting previously and have done so
again here. I can't see anything that refers to this (in SQL:2008),
either way.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: MERGE Specification

From
Robert Haas
Date:
On Thu, Aug 5, 2010 at 11:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> * It appears we would be in violation of the standard on
> 14.12 General Rule 6 a) i) 2) B), p.890
> (Oh, I wish I was joking, there really is such a paragraph number)

Just shoot me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: MERGE Specification

From
Merlin Moncure
Date:
On Thu, Aug 5, 2010 at 7:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Also had these fragments as well, if they're still useful. Probably just
> useful as pointers as to what else to change to include the docs.
>
>
> The tests and docs were written from SQL standard, so any deviations
> would need to be flagged. The idea of writing the tests first was that
> they provide an objective test of whether the implementation works
> according to spec.
>
> I'd quite like a commentary on anything that needs changing. Not saying
> I will necessarily object to differences, but knowing the differences
> sounds important for us.

I think this is a wonderful feature.  A couple of thoughts:

*) Would however very much like to see RETURNING support if it's not
there.  Our  other DML statements support it, and this one should too.wCTE if/when we get it will make the lack of it
especiallyglaring.
 
(OTOH, no issue if there is no rule support...they should be
deprecated)

*) The decision to stay on the standard and not do a 'race free'
version was IMO a good one.  I am starting to come around to the point
of view that the *only* safe way to guarantee race free merge with the
current locking model is to take an appropriate table lock.  BTW, our
pl/pgsql upsert example we've been encouraging people to use has a
horrible bug (see:
http://postgresql.1045698.n5.nabble.com/Danger-of-idiomatic-plpgsql-loop-for-merging-data-td2257700.html).

If we want to rework the locking model to support anticipatory locks
then fine (but that has nothing to do with MERGE specifically).

merlin


Re: MERGE Specification

From
Boxuan Zhai
Date:
Dear All,
 
I have seen a lively discussion about the DO NOTING action in MERGE command. And, I think most people want it. So it will be added to my next patch.
 
Before the implementation, I still have some questions to confirm:
 
1. If we have a DO NOTHING action specified, it should be the last WHEN clause. It must be of the NOT MATCHED cases, and it CANNOT have any additional action qualifications. Am I correct?
 
2. If no DO NOTHING specified, we will imply a INSERT DEFAULT VALUES action as the end of MERGE.
My question is, is this action taken only for the NOT MATCHED tuples?  If this is the case, then what about the MATCHED tuples that match not previous actions? Ignore them?
That means we are in fact going to add two implicit WHEN clause:
               a) WHEN NOT MATCHED INSERT default values;
               b) WHEN MATCHED THEN DO NOTHING.
OR, is the INSERT DEFAULT VALUES applied to ALL tuples not matter they are MATCHED or not?
 
 
Besides, (I mean no offense, but) can this method really avoid losing row?
 
So far as I know, the DEFAULT values for table attributes are defined when the table is created and no variables are allowed in the default value expressions. That means, they are usually constants or simple serial numbers.  
 
Image that we have a MERGE command that has thousands of NOT MATCHED tuples going to the implicit action. Then, the target table will inserted with thousands of rows with DEAULT VALUES. These row will have similar (if not exactly the same) simple content, which contains NO information from the source table of MERGE. Is this really what we want? If it is not, then what is the use of the INSERT DEFAULT VALUES action?
 
Regards 

Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 06/08/10 04:39, Boxuan Zhai wrote:
> I have seen a lively discussion about the DO NOTING action in MERGE command.
> And, I think most people want it. So it will be added to my next patch.
>
> Before the implementation, I still have some questions to confirm:
>
> 1. If we have a DO NOTHING action specified, it should be the last WHEN
> clause. It must be of the NOT MATCHED cases, and it CANNOT have any
> additional action qualifications. Am I correct?

It would be useful to specify it in WHEN MATCHED sometimes, and not 
necessarily the last. For example:

MERGE INTO Stock S
USING DailySales DS ON S.Item = DS.Item
WHEN MATCHED AND (QtyOnHand ‐ QtySold = 0) THEN DELETE
WHEN MATCHED THEN UPDATE SET QtyOnHand = QtyOnHand ‐ QtySold -- Don't add new inactive items to stock if not there
already
WHEN MATCHED AND (itemtype = 'inactive') THEN DO NOTHING
WHEN NOT MATCHED THEN INSERT VALUES (Item, QtySold);

It shouldn't be difficult to support DO NOTHING in all cases, right?

> 2. If no DO NOTHING specified, we will imply a INSERT DEFAULT VALUES action
> as the end of MERGE.
> My question is, is this action taken only for the NOT MATCHED tuples?  If
> this is the case, then what about the MATCHED tuples that match not previous
> actions? Ignore them?
> That means we are in fact going to add two implicit WHEN clause:
>                 a) WHEN NOT MATCHED INSERT default values;
>                 b) WHEN MATCHED THEN DO NOTHING.
> OR, is the INSERT DEFAULT VALUES applied to ALL tuples not matter they are
> MATCHED or not?

We'll need to figure out what the SQL standard says about this. I tried 
reading the spec but couldn't readily understand what the default action 
should be. Does someone else know that? What do other DBMSs do?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Simon Riggs
Date:
On Fri, 2010-08-06 at 09:39 +0800, Boxuan Zhai wrote:
> Besides, (I mean no offense, but) can this method really avoid losing
> row? 

Not as you just specified, no.

You need *both* actions of RAISE ERROR and DO NOTHING, or you may as
well have neither.

(1) Natural style allows missing rows if you are not careful - and also
allows missing rows in future when COL is allowed to take value 'C',
which may not have been originally considered when SQL first written

WHEN NOT MATCHED AND COL = 'A' INSERT...
WHEN NOT MATCHED AND COL = 'B' INSERT...

(2) Shows code style required to explicitly avoid missing rows

WHEN NOT MATCHED AND COL = 'A' INSERT...
WHEN NOT MATCHED AND COL = 'B' INSERT...
WHEN NOT MATCHED RAISE ERROR

(3) More complex example, with explicit DO NOTHING, showing how it can
provide well structured code

WHEN NOT MATCHED AND COL = 'A' DO NOTHING
WHEN NOT MATCHED AND COL = 'B' INSERT...
WHEN NOT MATCHED RAISE ERROR


So DO NOTHING is the default and implies silently ignoring rows. RAISE
ERROR is the opposite.

Coding for those seems very easy, its just a question of "should we do
it?". DB2 has it; SQL:2008 does not. But then SQL:2008 followed the DB2
introduction of AND clauses, and SQL:2011 has so far followed the DB2
introduction of DELETE action also.

Given that Peter is now attending SQL Standards meetings, I would
suggest we leave out my suggestion above, for now. We have time to raise
this at standards meetings and influence the outcome and then follow
later.

There is a workaround:

WHEN NOT MATCHED AND COL = 'A' DO NOTHING
WHEN NOT MATCHED AND COL = 'B' INSERT...
WHEN NOT MATCHED AND TRUE INSERT INTO ERROR_TABLE (errortext);

where ERROR_TABLE has an INSERT trigger which throws an ERROR with given
text.

SQL:2011 makes no mention of how MERGE should react to statement level
triggers. MERGE is not a trigger action even. Given considerable
confusion in this area, IMHO we should just say the MERGE does not call
statement triggers at all, of any kind.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 06/08/10 10:12, Simon Riggs wrote:
> So DO NOTHING is the default and implies silently ignoring rows. RAISE
> ERROR is the opposite.
>
> Coding for those seems very easy, its just a question of "should we do
> it?". DB2 has it; SQL:2008 does not. But then SQL:2008 followed the DB2
> introduction of AND clauses, and SQL:2011 has so far followed the DB2
> introduction of DELETE action also.

I see neither DO NOTHING or RAISE ERROR in the documentation of DB2, 
Oracle, or MSSQL server.

> Given that Peter is now attending SQL Standards meetings, I would
> suggest we leave out my suggestion above, for now. We have time to raise
> this at standards meetings and influence the outcome and then follow
> later.

Ok, fair enough.

> SQL:2011 makes no mention of how MERGE should react to statement level
> triggers. MERGE is not a trigger action even. Given considerable
> confusion in this area, IMHO we should just say the MERGE does not call
> statement triggers at all, of any kind.

IMO the UPDATE/DELETE/INSERT actions should fire the respective 
statement level triggers, but the MERGE itself should not.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Simon Riggs
Date:
On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote:

> > SQL:2011 makes no mention of how MERGE should react to statement level
> > triggers. MERGE is not a trigger action even. Given considerable
> > confusion in this area, IMHO we should just say the MERGE does not call
> > statement triggers at all, of any kind.
> 
> IMO the UPDATE/DELETE/INSERT actions should fire the respective 
> statement level triggers, but the MERGE itself should not.

When, and how?

If an UPDATE is mentioned 5 times, do we call the trigger 5 times? What
happens if none of the UPDATEs are ever executed? 

Best explain exactly what you mean.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: MERGE Specification

From
Simon Riggs
Date:
On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote:
> On 06/08/10 10:12, Simon Riggs wrote:
> > So DO NOTHING is the default and implies silently ignoring rows. RAISE
> > ERROR is the opposite.
> >
> > Coding for those seems very easy, its just a question of "should we do
> > it?". DB2 has it; SQL:2008 does not. But then SQL:2008 followed the DB2
> > introduction of AND clauses, and SQL:2011 has so far followed the DB2
> > introduction of DELETE action also.
> 
> I see neither DO NOTHING or RAISE ERROR in the documentation of DB2, 
> Oracle, or MSSQL server.

Agreed, Oracle and MSSQL server does not have these. 

However, DB2 very clearly does have these features

* SIGNAL which raises an error and can be used in place of any action,
at any point in sequence of WHEN clauses. DB2 already supports SIGNAL as
part of SQL/PSM, which we do not, so RAISE ERROR was the nearest
equivalent command for PostgreSQL.

* ELSE IGNORE which does same thing as DO NOTHING, except it must always
be last statement in a sequence of WHEN clauses. DO NOTHING is already a
phrase with exactly this meaning in PostgreSQL, so I suggest that.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: MERGE Specification

From
Boxuan Zhai
Date:


On Fri, Aug 6, 2010 at 3:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote:

> > SQL:2011 makes no mention of how MERGE should react to statement level
> > triggers. MERGE is not a trigger action even. Given considerable
> > confusion in this area, IMHO we should just say the MERGE does not call
> > statement triggers at all, of any kind.
>
> IMO the UPDATE/DELETE/INSERT actions should fire the respective
> statement level triggers, but the MERGE itself should not.

When, and how?

If an UPDATE is mentioned 5 times, do we call the trigger 5 times?
 
My current process for BEFOR / AFTER STATEMENT trigger on MERGE is to fire the triggers for all action types that appears in the command, unless it is replaced by a INSTEAD rule. But the triggers for one action type will be fired only once. That means you will get both UPDATE and INSERT triggers be activated for only once if you are executing a MERGE command with 5 UPDATEs and 10 INSERTs.   
 
What happens if none of the UPDATEs are ever executed?

The triggers (I mean triggers for statement) will be fired anyway even the UPDATE action matches no tuple. This is not for MERGE only. If you update a table with the command
UPDATE foo SET ... WHERE false;
It will also fire the STATEMENT triggers of UPDATE type on foo (I think so).
 
 
And, even not been asked, I want to say that, in current implementation of MERGE,  the row level triggers are fired by the actions that take the tuples. If one tuple is caught by an UPDATE action, then the UPDATE row trigger will be fired on this tuple. If it is handled by INSERT action, then the INSRET row triggers are on.
 
Hope you agree with my designs.   
 
Best explain exactly what you mean.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


Re: MERGE Specification

From
Boxuan Zhai
Date:


On Fri, Aug 6, 2010 at 3:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote:
> On 06/08/10 10:12, Simon Riggs wrote:
> > So DO NOTHING is the default and implies silently ignoring rows. RAISE
> > ERROR is the opposite.
> >
> > Coding for those seems very easy, its just a question of "should we do
> > it?". DB2 has it; SQL:2008 does not. But then SQL:2008 followed the DB2
> > introduction of AND clauses, and SQL:2011 has so far followed the DB2
> > introduction of DELETE action also.
>
> I see neither DO NOTHING or RAISE ERROR in the documentation of DB2,
> Oracle, or MSSQL server.

Agreed, Oracle and MSSQL server does not have these.

However, DB2 very clearly does have these features

* SIGNAL which raises an error and can be used in place of any action,
at any point in sequence of WHEN clauses. DB2 already supports SIGNAL as
part of SQL/PSM, which we do not, so RAISE ERROR was the nearest
equivalent command for PostgreSQL.

* ELSE IGNORE which does same thing as DO NOTHING, except it must always
be last statement in a sequence of WHEN clauses. DO NOTHING is already a
phrase with exactly this meaning in PostgreSQL, so I suggest that.

 
So, we need to add both DO NOTHING and RAISE ERROR actions in the MERGE command now !? What will RAISE ERROR do? To stop the whole MERGE command OR, just throw an error notice for the row and move on.
 
 
--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


Re: MERGE Specification

From
Simon Riggs
Date:
On Fri, 2010-08-06 at 16:26 +0800, Boxuan Zhai wrote:

> So, we need to add both DO NOTHING and RAISE ERROR actions in the
> MERGE command now !? What will RAISE ERROR do?

Let's get the rest of it working first. This would be a later extension,
though I think an important one for our developers.

> To stop the whole MERGE command OR, just throw an error notice for the
> row and move on.

As you say, it would be even better to be able to report errors in some
way and move onto next row. NOTICE is not the way though.

Maybe one of the actions would be to EXECUTE a procedure, so we can call
an error logging function.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: MERGE Specification

From
Peter Eisentraut
Date:
On tor, 2010-08-05 at 16:35 +0100, Simon Riggs wrote:
> * DELETE is an extension to the standard, though supported by Oracle,
> DB2 and SQLServer and damn useful

-> SQL:2011



Re: MERGE Specification

From
Boxuan Zhai
Date:
Dear All,
 
I have just finished a new patch, with the following feature:
 
1. The rule rewriter is changed to a better logic, which is the actions replaced by INSTEAD rules will still catch tuples, but do nothing for them.
 
2. Triggers can work on MERGE. We don't have CREATE TRIGGER ON MERGE... command. But the triggers (for each statement and for each row) of UPDATE, DELETE and INSERT will be activated by the actions in MERGE.
 
3. We have DO NOTHING and RAISE ERROR actions now. They can used in both MATCHED and NOT MATCHED situation, and can have additional quals. Currently RAISE ERROR just elog a NOTICE message, since we are stil not sure what should be done for these errors. I just build up the framework for it, preparing for the further extension.  
 
4. The default action of MERGE is RAISE ERROR.
 
I explain the usage of the new features in my pages https://wiki.postgresql.org/wiki/MergeTestExamples
 
Please find the patch file in the attachment.
 
Thanks
 
Boxuan
Attachment

Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 07/08/10 10:58, Boxuan Zhai wrote:
> I have just finished a new patch, with the following feature:

Please include the regression tests in the patch too. Also, I note that 
there's a few merge conflicts when applied over CVS HEAD from today, can 
you please fix the bitrot?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 09/08/10 14:47, Heikki Linnakangas wrote:
> On 07/08/10 10:58, Boxuan Zhai wrote:
>> I have just finished a new patch, with the following feature:
>
> Please include the regression tests in the patch too....

And the docs changes too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 10/08/10 06:03, Boxuan Zhai wrote:
> I have put everything in one patch, against the latest git repository. The
> program is tested on my machine.

Thanks! I get a few compiler warnings:

analyze.c: In function ‘transformMergeStmt’:
analyze.c:2476: warning: unused variable ‘lastaction’
gram.y: In function ‘base_yyparse’:
gram.y:7437: warning: assignment from incompatible pointer type
gram.y:7441: warning: assignment from incompatible pointer type
trigger.c: In function ‘ExecBSMergeTriggers’:
trigger.c:2360: warning: assignment from incompatible pointer type
trigger.c: In function ‘ExecASMergeTriggers’:
trigger.c:2411: warning: assignment from incompatible pointer type
planner.c: In function ‘merge_action_planner’:
planner.c:681: warning: assignment from incompatible pointer type
var.c: In function ‘push_up_merge_action_vars’:
var.c:738: warning: passing argument 1 of 
‘push_up_merge_action_vars_walker’ from incompatible pointer type
var.c:96: note: expected ‘struct Node *’ but argument is of type ‘struct 
List *’
var.c:740: warning: passing argument 1 of 
‘push_up_merge_action_vars_walker’ from incompatible pointer type
var.c:96: note: expected ‘struct Node *’ but argument is of type ‘struct 
List *’

The merge.sgml file should be in doc/src/sgml/ref, not doc/src/sgml. 
After moving it there, I get a few errors from compiling the docs:

openjade:ref/merge.sgml:128:55:X: reference to non-existent ID 
"SQL-SELECT-TITLE"
openjade:ref/merge.sgml:129:55:X: reference to non-existent ID 
"SQL-VALUES-TITLE"
openjade:ref/merge.sgml:185:42:X: reference to non-existent ID 
"SQL-INSERT-TITLE"
openjade:ref/merge.sgml:170:42:X: reference to non-existent ID 
"SQL-UPDATE-TITLE"

Those can be fixed by simply removing the endterm attributes from those 
lines, they're not needed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Boxuan Zhai
Date:


On Mon, Aug 9, 2010 at 8:30 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 09/08/10 14:47, Heikki Linnakangas wrote:
On 07/08/10 10:58, Boxuan Zhai wrote:
I have just finished a new patch, with the following feature:

Please include the regression tests in the patch too....

And the docs changes too.

 
I have put everything in one patch, against the latest git repository. The program is tested on my machine.
 
Thanks

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: MERGE Specification

From
Boxuan Zhai
Date:


On Tue, Aug 10, 2010 at 2:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 10/08/10 06:03, Boxuan Zhai wrote:
I have put everything in one patch, against the latest git repository. The
program is tested on my machine.

Thanks! I get a few compiler warnings:

analyze.c: In function ‘transformMergeStmt’:
analyze.c:2476: warning: unused variable ‘lastaction’
gram.y: In function ‘base_yyparse’:
gram.y:7437: warning: assignment from incompatible pointer type
gram.y:7441: warning: assignment from incompatible pointer type
trigger.c: In function ‘ExecBSMergeTriggers’:
trigger.c:2360: warning: assignment from incompatible pointer type
trigger.c: In function ‘ExecASMergeTriggers’:
trigger.c:2411: warning: assignment from incompatible pointer type
planner.c: In function ‘merge_action_planner’:
planner.c:681: warning: assignment from incompatible pointer type
var.c: In function ‘push_up_merge_action_vars’:
var.c:738: warning: passing argument 1 of ‘push_up_merge_action_vars_walker’ from incompatible pointer type
var.c:96: note: expected ‘struct Node *’ but argument is of type ‘struct List *’
var.c:740: warning: passing argument 1 of ‘push_up_merge_action_vars_walker’ from incompatible pointer type
var.c:96: note: expected ‘struct Node *’ but argument is of type ‘struct List *’

The merge.sgml file should be in doc/src/sgml/ref, not doc/src/sgml. After moving it there, I get a few errors from compiling the docs:

openjade:ref/merge.sgml:128:55:X: reference to non-existent ID "SQL-SELECT-TITLE"
openjade:ref/merge.sgml:129:55:X: reference to non-existent ID "SQL-VALUES-TITLE"
openjade:ref/merge.sgml:185:42:X: reference to non-existent ID "SQL-INSERT-TITLE"
openjade:ref/merge.sgml:170:42:X: reference to non-existent ID "SQL-UPDATE-TITLE"

Those can be fixed by simply removing the endterm attributes from those lines, they're not needed.

 
Thanks for your feedback. I fixed all the above waring bugs. Find the new patch in attachement.

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 10/08/10 12:08, Boxuan Zhai wrote:
> Thanks for your feedback. I fixed all the above waring bugs. Find the new
> patch in attachement.

Thanks.

I'm getting an assertion failure with this statement:

CREATE TABLE foo (id int4);

MERGE into foo t
USING (select id FROM generate_series(1,5) id) AS s
ON t.id = s.id
WHEN NOT MATCHED THEN INSERT (id) VALUES (s.id);

TRAP: FailedAssertion("!(ActiveSnapshotSet())", File: "postgres.c", 
Line: 749)

That's easily fixed - you need to add "case T_MergeStmt" to the list of 
optimizable command types in analyze_requires_snapshot() function.

Unfortunately that doesn't get you far, the query then trips another 
assertion:

TRAP: FailedAssertion("!(list_length(resultRelations) == 
list_length(subplans))", File: "createplan.c", Line: 3929)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: MERGE Specification

From
Boxuan Zhai
Date:


On Tue, Aug 10, 2010 at 10:29 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 10/08/10 12:08, Boxuan Zhai wrote:
Thanks for your feedback. I fixed all the above waring bugs. Find the new
patch in attachement.

Thanks.

I'm getting an assertion failure with this statement:

CREATE TABLE foo (id int4);

MERGE into foo t
USING (select id FROM generate_series(1,5) id) AS s
ON t.id = s.id
WHEN NOT MATCHED THEN INSERT (id) VALUES (s.id);

The query works on my machine.
 
TRAP: FailedAssertion("!(ActiveSnapshotSet())", File: "postgres.c", Line: 749)

That's easily fixed - you need to add "case T_MergeStmt" to the list of optimizable command types in analyze_requires_snapshot() function.

Unfortunately that doesn't get you far, the query then trips another assertion:

TRAP: FailedAssertion("!(list_length(resultRelations) == list_length(subplans))", File: "createplan.c", Line: 3929)


 
I just found that no Assert() works in my codes. I think it is because the assertion is no enabled. How to enable assertion. To define USE_ASSERT_CHECKING somewhere?
 
 
--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

Re: MERGE Specification

From
Greg Smith
Date:
Boxuan Zhai wrote:
> I just found that no Assert() works in my codes. I think it is because 
> the assertion is no enabled. How to enable assertion. To define 
> USE_ASSERT_CHECKING somewhere?

When you run "configure" before "make", use "--enable-cassert".  The 
normal trio for working on the PostgreSQL code is:

./configure --enable-depend --enable-cassert --enable-debug

Generally the only reason to build as a developer without asserts on is 
to do performance testing.  They will slow some portions of the code 
down significantly.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: MERGE Specification

From
Boxuan Zhai
Date:


On Wed, Aug 11, 2010 at 12:14 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Boxuan Zhai wrote:
I just found that no Assert() works in my codes. I think it is because the assertion is no enabled. How to enable assertion. To define USE_ASSERT_CHECKING somewhere?

When you run "configure" before "make", use "--enable-cassert".  The normal trio for working on the PostgreSQL code is:

./configure --enable-depend --enable-cassert --enable-debug

Generally the only reason to build as a developer without asserts on is to do performance testing.  They will slow some portions of the code down significantly.

 
Thanks. I will test MERGE under this new configuration. A new patch will be submitted once I fix all the asserting bugs.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: MERGE Specification

From
Boxuan Zhai
Date:


On Wed, Aug 11, 2010 at 12:18 PM, Boxuan Zhai <bxzhai2010@gmail.com> wrote:


On Wed, Aug 11, 2010 at 12:14 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Boxuan Zhai wrote:
I just found that no Assert() works in my codes. I think it is because the assertion is no enabled. How to enable assertion. To define USE_ASSERT_CHECKING somewhere?

When you run "configure" before "make", use "--enable-cassert".  The normal trio for working on the PostgreSQL code is:

./configure --enable-depend --enable-cassert --enable-debug

Generally the only reason to build as a developer without asserts on is to do performance testing.  They will slow some portions of the code down significantly.

 
Thanks. I will test MERGE under this new configuration. A new patch will be submitted once I fix all the asserting bugs.

 
The new patch is done. I named it as merge_v102. (1 means it is the non-inheritance merge command, 02 means this is the second time of fixing reported bugs) 
--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Attachment

Re: MERGE Specification

From
Peter Eisentraut
Date:
On fre, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote:
> IMO the UPDATE/DELETE/INSERT actions should fire the respective 
> statement level triggers, but the MERGE itself should not.

Yes, SQL defines the triggering of triggers as part of the modification
of rows, not as part of any particular statement that causes the
modification.



Re: MERGE Specification

From
Peter Eisentraut
Date:
On fre, 2010-08-06 at 08:12 +0100, Simon Riggs wrote:
> Given that Peter is now attending SQL Standards meetings, I would
> suggest we leave out my suggestion above, for now. We have time to
> raise this at standards meetings and influence the outcome and then
> follow later.

I'm not actually attending any (further) meetings, because no one has
agreed to fund it yet.



Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 11/08/10 11:11, Boxuan Zhai wrote:
> The new patch is done. I named it as merge_v102. (1 means it is the
> non-inheritance merge command, 02 means this is the second time of fixing
> reported bugs)

Thanks. I went through this, fixing all the spurious end-of-line
whitespace first with "git apply --whitespace=fix", and then manually
fixing comment and whitespace style, typos, and doing some other small
comment editing. Resulting patch attached. It is also available at my
git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
'mergestmt'. Please base any further patch versions on this patch, so
that we don't need to redo this cleanup.

I'll continue reviewing this sometime next week, but here's few
miscellaneous issues for starters;

* Explain output of actions needs some work:

 >  Merge  (cost=246.37..272.96 rows=1770 width=38)
 >    ACTION: UPDATE WHEN NOT MATCHED
 >    ACTION: INSERT WHEN NOT MATCHED
 >    ->  Merge Left Join  (cost=246.37..272.96 rows=1770 width=38)

Should print out more details of the action, like for normal
updates/inserts/deletes. And all uppercase doesn't fit the surrounding
style.

* Is the behavior now SQL-compliant? We had long discussions about the
default action being insert or do nothing or raise error, but I never
got a clear picture of what the SQL spec says and whether we're compliant.

* What does the "one tuple is error" notice mean? We'll have to do
something about that.. I don't think we've properly thought through the
meaning of RAISE ERROR. Let's cut it out from the patch until then. It's
only a few dozen lines to put back when we know what to do about it.

* Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would
it be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?

* Do you need the out-function for DeleteStmt? Why not for UpdateStmt
and InsertStmt?

* I wonder if it would be simpler to check the fact that you can only
have UPDATE/DELETE as a WHEN MATCHED action and only INSERT as a WHEN
NOT MATCHED action directly in the grammar?

* Regarding this speculation in the MergeStmt grammar rule:
 >     /*although we have only one USING table,
 >     we still make it a list, maybe in future
 >     we will allow multiple USING tables.*/

I wonder what the semantics of having multiple USING tables would be? If
it would be like "USING (SELECT * FROM a UNION ALL SELECT * FROM b)",
then we don't ever need it because you can already achieve it with that
subquery. If it's something like "USING (SELECT * FROM a,b WHERE ...)",
then we again don't need it because you can write that instead. So I
think we should give up on the notion that source can be a list of
tables, and simplify the code everywhere accordingly.

* Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers
every time, should set flags at plan time to mark what kind of actions
there is in the statement. Or should we defer firing the statement
triggers until we hit the first matching row and execute the action for
the first time?

* Do we need the 'replaced' field? Could you just replace the action
with a DO NOTHING action instead.

* This crashes:

postgres=# CREATE TABLE target AS (SELECT 1 AS id);
SELECT 1
postgres=# MERGE into target t
USING (select 1 AS id) AS s
ON t.id = s.id
WHEN MATCHED THEN
         UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: MERGE Specification

From
Alvaro Herrera
Date:
Excerpts from Peter Eisentraut's message of mié ago 11 11:23:19 -0400 2010:
> On fre, 2010-08-06 at 08:12 +0100, Simon Riggs wrote:
> > Given that Peter is now attending SQL Standards meetings, I would
> > suggest we leave out my suggestion above, for now. We have time to
> > raise this at standards meetings and influence the outcome and then
> > follow later.
> 
> I'm not actually attending any (further) meetings, because no one has
> agreed to fund it yet.

Eh?  Surely we have enough money for this in the SPI account.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: MERGE Specification

From
Boxuan Zhai
Date:


On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 11/08/10 11:11, Boxuan Zhai wrote:
The new patch is done. I named it as merge_v102. (1 means it is the
non-inheritance merge command, 02 means this is the second time of fixing
reported bugs)

Thanks. I went through this, fixing all the spurious end-of-line whitespace first with "git apply --whitespace=fix", and then manually fixing comment and whitespace style, typos, and doing some other small comment editing. Resulting patch attached. It is also available at my git repository at git://git.postgresql.org/git/users/heikki/postgres.git, branch 'mergestmt'. Please base any further patch versions on this patch, so that we don't need to redo this cleanup.

 
Thanks for the cleanup. The codes looks better now. My problem is that I have done some more modifications after merge_v102. Is there any way to apply the cleanup patch without erasing my new codes?
 
I'll continue reviewing this sometime next week, but here's few miscellaneous issues for starters;

* Explain output of actions needs some work:

>  Merge  (cost=246.37..272.96 rows=1770 width=38)
>    ACTION: UPDATE WHEN NOT MATCHED
>    ACTION: INSERT WHEN NOT MATCHED
>    ->  Merge Left Join  (cost=246.37..272.96 rows=1770 width=38)

Should print out more details of the action, like for normal updates/inserts/deletes.
 
Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more details than what we have here except the costs, rows and width, which is print out at the MERGE command line. 
 
For example:
 
Explain
update buy set volume = volume + 1;
                          QUERY PLAN
--------------------------------------------------------------
 Update  (cost=0.00..36.75 rows=2140 width=14)
   ->  Seq Scan on buy  (cost=0.00..36.75 rows=2140 width=14)
(2 rows)
 
For the explanation of an UPDATE command, we only have a Update title followed by the costs, then, after the arrow -> there comes the plan tree.
In a MERGE command, no action has its real private plan. They all share the main plan.  Thus, the costs for a merge command is that of the main plan. What is useful for a merge action is only the target list and quals. So my design is to show the merge command costs in first line. Then print out the actions and their qualifications in a list, followed by the main plan tree.  
 
Is there any other thing you suggest to print out for each action?
 
And all uppercase doesn't fit the surrounding style.
 
This  will be fixed.
 
* Is the behavior now SQL-compliant? We had long discussions about the default action being insert or do nothing or raise error, but I never got a clear picture of what the SQL spec says and whether we're compliant.

* What does the "one tuple is error" notice mean? We'll have to do something about that.. I don't think we've properly thought through the meaning of RAISE ERROR. Let's cut it out from the patch until then. It's only a few dozen lines to put back when we know what to do about it.
I find that we have not reached an agreement on MERGE's syntax yet. Personally, I support Simon's idea, that is the default action should be RAISE ERROR. However, I am no sure what RAISE ERROR should do when we get a missing tuple. Here I just use a simple elog(NOTICE, "a tuple is error"); for this situation.
 
I leave this for further extension when a more specific design for RAISE ERROR is available.
 
Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do you want me to chage the default action back to DO NOTHING? Or any other suggetions? (In fact, my personal thinking is to add a non-omissible "ELSE" clause as the end of the action list which forces the user to specify the default action for merge).
 
* Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?

 
 
I need one flag in these statement to differentiate them from normal InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the process of these two kinds of statements in the transfromStmt() function. I define a set of alias nodes which have different nodetags. This can make the code simpler. 
 
* Do you need the out-function for DeleteStmt? Why not for UpdateStmt and InsertStmt?
 
Ah, I add this function because I want to have a look of the content of DeleteStmt. I did this long ago, just after I started the project. If you don't want this function, I will remove it.
 
* I wonder if it would be simpler to check the fact that you can only have UPDATE/DELETE as a WHEN MATCHED action and only INSERT as a WHEN NOT MATCHED action directly in the grammar?

 
We can do this, but, I think it is better to keep it as it is now.
 
 
* Regarding this speculation in the MergeStmt grammar rule:
>       /*although we have only one USING table,
>       we still make it a list, maybe in future
>       we will allow multiple USING tables.*/

I wonder what the semantics of having multiple USING tables would be? If it would be like "USING (SELECT * FROM a UNION ALL SELECT * FROM b)", then we don't ever need it because you can already achieve it with that subquery. If it's something like "USING (SELECT * FROM a,b WHERE ...)", then we again don't need it because you can write that instead. So I think we should give up on the notion that source can be a list of tables, and simplify the code everywhere accordingly.

 
Yes, we should give up it. I was considering to allow the user input multiple source tables as a short way of "USING (SELECT * FROM a,b WHERE ...)". Now, this idea seems not interesting at all.  
 
 
* Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers every time, should set flags at plan time to mark what kind of actions there is in the statement.
 
First of all, I need to say that the functions of ExecBSMergeTriggers() and ExecASMergeTriggers() are for per-statement triggers, and are invoked only once in the whole process of a MERGE command. Setting flags at plan time can save the scanning. I may add it to next patch.  But don't expect it save much execution time.
 
Or should we defer firing the statement triggers until we hit the first matching row and execute the action for the first time?
 
No, we cannot do this. These are Per-statement triggers. They should be fired before/after the main plan is executed. The statement triggers are fired even no tuple is really matched with the actions.
 
* Do we need the 'replaced' field? Could you just replace the action with a DO NOTHING action instead.

 
Yes, I think it is a good idea to replace them with DO NOTHING. The replaced field was there before DO NOTHING is added to the system. But since we have DO NOTING action now, we can turn all the actions replaced by INSTEAD rules into DO NOTINGs. 
 
* This crashes:

postgres=# CREATE TABLE target AS (SELECT 1 AS id);
SELECT 1
postgres=# MERGE into target t
USING (select 1 AS id) AS s

ON t.id = s.id
WHEN MATCHED THEN
       UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
;
server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


 
 
Oh, a really serious bug. I have not considered this case (user series generator in actions) before. I will see what I can do for it.
 
 

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

Re: MERGE Specification

From
Heikki Linnakangas
Date:
On 15/08/10 10:31, Boxuan Zhai wrote:
> On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>> Thanks. I went through this, fixing all the spurious end-of-line whitespace
>> first with "git apply --whitespace=fix", and then manually fixing comment
>> and whitespace style, typos, and doing some other small comment editing.
>> Resulting patch attached. It is also available at my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch
>> 'mergestmt'. Please base any further patch versions on this patch, so that
>> we don't need to redo this cleanup.
>>
> Thanks for the cleanup. The codes looks better now. My problem is that I
> have done some more modifications after merge_v102. Is there any way to
> apply the cleanup patch without erasing my new codes?

Yeah, there's multiple ways. You can create a patch of what you have 
now, and run interdiff against your previous patch that I worked 
against. That gives you a diff of changes since the last patch you sent 
to the list. You can then apply that patch to the codetree from my git 
repository.

Or you can just generate a new patch of what you have as usual, and I'll 
incorporate the changes to the cleaned-up patch.

>> I'll continue reviewing this sometime next week, but here's few
>> miscellaneous issues for starters;
>>
>> * Explain output of actions needs some work:
>>
>>>   Merge  (cost=246.37..272.96 rows=1770 width=38)
>>>     ACTION: UPDATE WHEN NOT MATCHED
>>>     ACTION: INSERT WHEN NOT MATCHED
>>>     ->   Merge Left Join  (cost=246.37..272.96 rows=1770 width=38)
>>
>> Should print out more details of the action, like for normal
>> updates/inserts/deletes.
>
>
> Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more
> details than what we have here except the costs, rows and width, which is
> print out at the MERGE command line.
>
> For example:
>
> Explain
> update buy set volume = volume + 1;
>                            QUERY PLAN
> --------------------------------------------------------------
>   Update  (cost=0.00..36.75 rows=2140 width=14)
>     ->   Seq Scan on buy  (cost=0.00..36.75 rows=2140 width=14)
> (2 rows)
>
> For the explanation of an UPDATE command, we only have a Update title
> followed by the costs, then, after the arrow ->  there comes the plan tree.
> In a MERGE command, no action has its real private plan. They all share the
> main plan.  Thus, the costs for a merge command is that of the main plan.
> What is useful for a merge action is only the target list and quals. So my
> design is to show the merge command costs in first line. Then print out the
> actions and their qualifications in a list, followed by the main plan tree.

It's more more interesting with more complex statement:

postgres=# explain UPDATE target SET id = (SELECT COUNT(*) FROM 
generate_series(1,10));                                      QUERY PLAN 

-------------------------------------------------------------------------------------- Update  (cost=12.51..52.52
rows=2400width=6)   InitPlan 1 (returns $0)     ->  Aggregate  (cost=12.50..12.51 rows=1 width=0)           ->
FunctionScan on generate_series  (cost=0.00..10.00 
 
rows=1000 width=0)   ->  Seq Scan on target  (cost=0.00..40.00 rows=2400 width=6)
(5 rows)

> Is there any other thing you suggest to print out for each action?

It should match the output of a normal Update/Insert as closely as possible.

>> * Is the behavior now SQL-compliant? We had long discussions about the
>> default action being insert or do nothing or raise error, but I never got a
>> clear picture of what the SQL spec says and whether we're compliant.
>>
>> * What does the "one tuple is error" notice mean? We'll have to do
>> something about that.. I don't think we've properly thought through the
>> meaning of RAISE ERROR. Let's cut it out from the patch until then. It's
>> only a few dozen lines to put back when we know what to do about it.
>>
> I find that we have not reached an agreement on MERGE's syntax yet.
> Personally, I support Simon's idea, that is the default action should be
> RAISE ERROR. However, I am no sure what RAISE ERROR should do when we
> get a missing tuple. Here I just use a simple elog(NOTICE, "a tuple is
> error"); for this situation.
>
> I leave this for further extension when a more specific design for RAISE
> ERROR is available.
>
> Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do
> you want me to chage the default action back to DO NOTHING? Or any other
> suggetions? (In fact, my personal thinking is to add a non-omissible "ELSE"
> clause as the end of the action list which forces the user to specify the
> default action for merge).

Whatever the spec says is what we should do.

>> * Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it
>> be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?
>
> I need one flag in these statement to differentiate them from normal
> InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the
> process of these two kinds of statements in the transfromStmt() function. I
> define a set of alias nodes which have different nodetags. This can make
> the code simpler.

Ok. You might also consider just adding a "isMergeAction" boolean to 
InsertStmt/UpdateStmt/DeleteStmt instead, or if the difference between 
the regular statements and merge actions grow big, create a completely 
separate node struct for them.

>> * Do you need the out-function for DeleteStmt? Why not for UpdateStmt and
>> InsertStmt?
>
> Ah, I add this function because I want to have a look of the content of
> DeleteStmt. I did this long ago, just after I started the project. If you
> don't want this function, I will remove it.

Ok.

>> * Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers
>> every time, should set flags at plan time to mark what kind of actions there
>> is in the statement.
>
> First of all, I need to say that the functions of ExecBSMergeTriggers() and
> ExecASMergeTriggers() are for per-statement triggers, and are invoked only
> once in the whole process of a MERGE command. Setting flags at plan time can
> save the scanning. I may add it to next patch.  But don't expect it save
> much execution time.

Yeah, I'm not so much concerned about performance, it just seems like it 
might make the code slightly simpler.

>> Or should we defer firing the statement triggers until we hit the first
>> matching row and execute the action for the first time?
>
> No, we cannot do this. These are Per-statement triggers. They should be
> fired before/after the main plan is executed. The statement triggers are
> fired even no tuple is really matched with the actions.

Ok.

>> * This crashes:
>>
>> postgres=# CREATE TABLE target AS (SELECT 1 AS id);
>> SELECT 1
>> postgres=# MERGE into target t
>> USING (select 1 AS id) AS s
>>
>> ON t.id = s.id
>> WHEN MATCHED THEN
>>         UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
>> ;
>> server closed the connection unexpectedly
>>         This probably means the server terminated abnormally
>>         before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>
> Oh, a really serious bug. I have not considered this case (user series
> generator in actions) before. I will see what I can do for it.

It's not specific to generate_series() but subqueries in general that 
are the problem.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com