MERGE SQL Statement - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | MERGE SQL Statement |
Date | |
Msg-id | 1208372338.4259.202.camel@ebony.site Whole thread Raw |
Responses |
Re: MERGE SQL Statement
|
List | pgsql-hackers |
I've analysed various flavours of MERGE command to understand and propose what we should use for PostgreSQL. The results aren't what you'd expect from a quick flick through the standard, so lets look at my main concerns: 1. The simplest syntax is for SQL:2003. The syntax for DB2, SQL Server and Oracle is more complex, with SQL:2008(final draft) being very similar to DB2 and SQL Server, so unlikely to be a point of contention in the standard. I suggest we go with the latter, but yes, its still in draft (yawn). 2. MySQL and Teradata have their own syntax for the row-oriented Upsert operation. Both of those are more useful (IMHO) than MERGE for OLTP apps, while MERGE is very useful for bulk data loads. I'm open to the idea that we do something like this in addition to MERGE. 3. The way MERGE works is to define a left outer join between source and target, then specify a series of WHEN clauses that may or may not apply. It **isn't** just a simple Update/Insert and so much of what we have discussed previously goes straight in the trash. AFAICS the way it is specified to work it would be fairly straightforward to cause race conditions and failures when using multiple concurrent MERGE statements. General Example of the recommended syntax for PostgreSQL MERGE INTO Stock S /* target */ USING DailySales DS /* source table */ ON S.Item = DS.Item /* left outer join source to target */ WHEN MATCHED AND (QtyOnHand - QtySold = 0) THEN /* delete item if no stock remaining */ DELETE WHEN MATCHED THEN /* No AND clause, so drop thru */ /* update value if some remains */ UPDATE SET QtyOnHand = QtyOnHand - QtySold WHEN NOT MATCHED THEN /* insert a row if the stock is new */ INSERT VALUES (Item, QtySold) ; So lets look at the syntaxes and then review how it might work. SYNTAX ====== SQL:2003 -------- MERGE INTO target [AS correlation-name] USING [table-ref | subquery] ON <search-condition> [WHEN MATCHED THEN MergeUpdate] [WHEN NOT MATCHED THEN MergeInsert] Oracle 11g ---------- MERGE INTO target [AS correlation-name] USING [table-ref | subquery] ON <search-condition> [WHEN MATCHED THEN MergeUpdate WHERE <where-clause> DELETE WHERE <where-clause>] [WHEN NOT MATCHED THEN MergeInsert WHERE <where-clause>] Differences from SQL:2003 are * Update and Insert have WHERE clauses on them * Oracle allows multiple WHEN ... WHERE clauses * Oracle allows an error logging clause also * optional DELETE statement as part of the UPDATE, so you can only DELETE what you update (yeh, really) * WHEN MATCHED/WHEN NOT MATCHED must be in fixed order, only IBM DB2 ------- MERGE INTO target [AS correlation-name] USING [table-ref | subquery] ON <search-condition> [WHEN MATCHED [AND <where-clause>] THEN MergeUpdate | MergeDelete] [WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert | SignalClause] [ELSE IGNORE] Differences from SQL:2003 are * Update and Insert have AND clauses on them (like WHERE...) * DB2 allows multiple WHEN ... AND clauses * DELETE is also a full-strength option, not part of the MergeUpdate clause as it is in Oracle * DB2 allows a SIGNAL statement, similar to RAISE * ELSE IGNORE is an optional syntax, which does nothing SQL Server 2008 --------------- MERGE [INTO] target [AS correlation-name] USING [table-ref | subquery] ON <search-condition> [WHEN MATCHED [AND <where-clause>] THEN MergeUpdate | MergeDelete] [WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert] Differences from SQL:2003 are * Update and Insert have AND clauses on them (like WHERE...) * DB2 allows multiple WHEN ... AND clauses * DELETE is also a full-strength option, not part of the MergeUpdate clause as it is in Oracle SQL:2008 -------- MERGE INTO target [AS correlation-name] USING [table-ref | subquery] ON <search-condition> [WHEN MATCHED [AND <where-clause>] THEN MergeUpdate] [WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert] Differences from SQL:2003 are * Update and Insert have AND clauses on them (like WHERE...) * Allows multiple WHEN ... AND clauses Alternate Syntax ---------------- MySQL supports * REPLACE INTO * INSERT ... ON DUPLICATE KEY UPDATE ... Teradata supports * UPDATE ... ELSE INSERT ... * MERGE with an additional error logging clause The Teradata and Oracle error logging clauses are very cute and I suggest we do something similar for COPY, at least. Proposed Syntax for PostgreSQL ============================== MERGE INTO table [[AS] alias] USING [table-ref | query] ON join-condition [WHEN MATCHED [AND condition] THEN MergeUpdate | DELETE] [WHEN NOT MATCHED [AND condition] THEN MergeInsert] MergeUpdate is UPDATE SET { column = { expression | DEFAULT } | ( column [, ...] ) = ( { expression | DEFAULT } [, ...] ) } [, ...] (yes, there is no WHERE clause here) MergeInsert is INSERT [ ( column [, ...] ) ] { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...]} (no subquery allowed) Notes and behaviours -------------------- * It is possible for concurrent MERGE statements to cause duplicate INSERT violations because of a race condition between when we check whether the row is matching/not matching and when we apply the appropriate WHEN clause, if any. This is just the same as what we do now with try-UPDATE-then-INSERT logic. (This seems to end the discussion about whether we do inserts/updates first because the matching test is always performed before we take the action; or perhaps it means we don't like MERGE as much as we did before and would prefer alternate syntaxes...). Maybe we could avoid some problems by applying heap_lock_tuple() to each matched row, so we know it will stay matched while we evaluate the WHEN clauses? Maybe not. * USING query can be a VALUES clause if we wish to do single/few row operations, so MERGE can be used for bulk-loading and OLTP * There is no RETURNING option for MERGE, nor for any INSERT/UPDATE sub-clauses * WHERE CURRENT OF cursor is not supported anywhere * The join can't be recursive, so no WITH support (common expressions, i.e. non-recursive WITH are supported by SQLServer 2008) * conditions would not allow sub-selects * MERGE would work on base tables only, just like COPY * Changes are made only to that single table * Cannot update a column mentioned in the ON clause cos that would make my head hurt too much * MERGE will perform a left outer join between source on left and target on right. There must be no more than 1 row from table-ref for each row in the table. Each row in the table can only be updated once during each MERGE statement. Each non-matching row in the table-ref will result in one INSERT into table. * WHEN clauses are tested in the order specified. If the AND condition returns false then we skip onto the next WHEN clause. We stop once a WHEN clause activates, so only *one* action is ever activated for each row. * AND clauses need not form a complete set, i.e. it is possible that no action will result. It is also possible that some WHEN clauses will never activate because of the execution order; we would not try to prevent this, just document it as a possible user error. * MERGE will respect Triggers, but not Rules since the rules behaviour is roughly orthogonal to the WHEN clauses * MERGE fires UPDATE and INSERT triggers according to which WHEN clause is activated (if any) * It's unclear whether MERGE should activate statement-level triggers, or not. Few of the above sources are explicit either way on this point. DB2 always runs both UPDATE and INSERT statement-level triggers, whether or not rows have been changed; I would suggest we do that also for before triggers. For after statement triggers I would suggest that we track how many updates and inserts are caused and if updates > 0 then we activate the after statement for update triggers, and if inserts > 0 then we activate the after statement for insert triggers. If a statement level trigger is activated by both update and insert then it would be possible for both TRIGGER_FIRED_BY_UPDATE() and TRIGGER_FIRED_BY_DELETE() to be true (for statement level triggers only), which would be a change from what we do now, even if the old behaviour was not explicitly mutually exclusive. In both cases I suggest we run Update triggers before Insert triggers consistently for both before and after statement triggers. * The number of rows changed should be (inserts + updates) which should be < number of rows returned by table-ref. It would be good to get access to the number of rows inserted and updated, so I propose that we return a NOTICE statement with this information. * The way MERGE is specified, the internals design seems to fall out fairly clearly: - set up a left outer join between source and target, with a junk attribute that might be NULL to indicate NOT MATCHED. Let planner optimise that as it wishes, nothing special needed - in ExecutePlan have a new operation type of CMD_MERGE, which then allows us to apply the WHEN clauses to decide what kind of final operation will result for that tuple (U, D or I) I'm planning to agree the syntax and write regression tests first, so we all agree the behaviour we are aiming towards. Thoughts? (Apart from jokes about having a WHY clause...) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: