Thread: ask for review of MERGE
Dear All,
I have just generate a new patch of MERGE command.
One main change in this edition is the removal of RASIE ERROR action from MEREG, because its semantics is not well defined yet.
I also rewrote the regress test file merge.sql, to make it consistent with the examples I used in my wiki page.
Some little (error and warning) bugs are fixed.
In this patch, all the main features of MERGE (sublinks, explain, rule and trigger, inheritance) are not changed. And so is the DO NOTHING action.
I do wish the MERGE command can be added into psql 9.1. And I wonder what improvement should be made on current edition.
Could you please have a review on this patch, if you have time and interest?
Your feedback will be highly appreciated.
Thanks
Yours Boxuan
Attachment
On 23 September 2010 11:31, Boxuan Zhai <bxzhai2010@gmail.com> wrote: > Dear All, > I have just generate a new patch of MERGE command. > One main change in this edition is the removal of RASIE ERROR action from > MEREG, because its semantics is not well defined yet. > I also rewrote the regress test file merge.sql, to make it consistent with > the examples I used in my wiki page. > Some little (error and warning) bugs are fixed. > In this patch, all the main features of MERGE (sublinks, explain, rule and > trigger, inheritance) are not changed. And so is the DO NOTHING action. > I do wish the MERGE command can be added into psql 9.1. And I wonder what > improvement should be made on current edition. > Could you please have a review on this patch, if you have time and interest? > Your feedback will be highly appreciated. > Thanks > > Yours Boxuan A few corrections: in src/backend/executor/nodeModifyTable.c: s/excute/execute/ s/orignial/original/ in src/backend/optimizer/plan/planner.c s/expreesions/expressions/ s/So,we/So, we/ s/comand/command/ s/fileds/fields/ in src/backend/optimizer/prep/preptlist.c: s/aggresive/aggressive/ in src/backend/optimizer/util/var.c s/targe/target/ -- this appears twice s/sourse/source/ in src/backend/parser/analyze.c s/takend/taken/ s/relaion/relation/ s/targe/target/ -- this appears twice s/consitency/consistency/ s/commond/command/ s/seperate/separate/ in src/backend/rewrite/rewriteHandler.c s/acton/action/ in src/include/nodes/execnodes.h s/meger/merge/ in src/include/nodes/parsenodes.h s/proecess/process/ s/allwo/allow/ s/elments/elements/ in src/test/regress/expected/merge.out s/qulifications/qualifications/ -- this appears twice s/suceeds/succeeds/ -- thisappears twice -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote: > I have just generate a new patch of MERGE command. I haven't followed the discussion very closely, but this part in the regression tests caught my attention: +-- we now have a duplicate key in Buy, so when we join to +-- Stock we will generate 2 matching rows, not one. +-- According to standard this command should fail. +-- But it suceeds in PostgreSQL implementation by simply ignoring the second It doesn't seem like a very good idea to go against the standard here. The "second" row is not well defined in this case so the results are unpredictable. The patch is also missing a (trivial) change to explain.c. Regards, Marko Tiikkaja
On Thu, Sep 23, 2010 at 7:55 PM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote:
On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:I haven't followed the discussion very closely, but this part in the regression tests caught my attention:I have just generate a new patch of MERGE command.
+-- we now have a duplicate key in Buy, so when we join to
+-- Stock we will generate 2 matching rows, not one.
+-- According to standard this command should fail.
+-- But it suceeds in PostgreSQL implementation by simply ignoring the second
It doesn't seem like a very good idea to go against the standard here. The "second" row is not well defined in this case so the results are unpredictable.
Yes, the result is uncertain. It depends on which row is scanned first, which is almost out of the control of users.
But, in postgres, this is what the system do for UPDATE.
For example, consider a simple update query like the following:
CREATE TABLE target (id int, val int);
INSERT INTO target VALUES (1, 10);
CREATE TABLE source (id int, add int);
INSERT INTO source VALUES (1, 100);
INSERT INTO source VALUES (1, 100000);
-- DO the update query with source table, which has multiple matched rows
UPDATE target SET val = val + add FROM source
t=# SELECT * FROM target;
id | val
----+-----
1 | 110
(1 row)
The target tuple has two matched source tuples, but it is only updated once. And, yet, this query is not forbidden by postgres. The result is also uncertain.
The patch is also missing a (trivial) change to explain.c.
Sorry, I massed up the files. Here comes the new patch file, with EXPLAIN in it.
Regards,
Marko Tiikkaja
Attachment
Finding time for a review as large as this one is a bit tough, but I've managed to set aside a couple of days for it over the next week. I'm delivering a large project tonight and intend to start in on the review work tomorrow onced that's cleared up. If you're ever not sure who is working on your patch and what state they feel it's in, check https://commitfest.postgresql.org/action/commitfest_view?id=7 for an update; that's where we keep track of all that information. Did you ever end up keeping a current version of this patch in an alternate repository location, such as github? I thought I saw a suggestion from you about that, but after looking through the history here all I see are the diff patches you've been sending to the list. That's fine, just trying to confirm where everything is at. -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, "PostgreSQL 9.0 High Performance" Pre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book
Starting looking at the latest MERGE patch from Boxuan here tonight. The regression tests pass for me here, good starting sign. I expect to move onto trying to break it more creatively next, then onto performance tests. Nothing much more exciting than that to report so far. It had suffered some bit rot, I think because of the security label changes. Attached is a rebased version against the new git HEAD so nobody else has to duplicate that to apply the patch. Also, to provide an alternate interface for anyone who wants to do testing/browsing of this patch, I've made a Github fork with a merge branch in it. I plan to commit intermediate stuff to there that keeps up to date with review changes: http://github.com/greg2ndQuadrant/postgres/tree/merge Probably easier to read http://github.com/greg2ndQuadrant/postgres/compare/merge than most local patch viewers, so I gzip'ed the attached updated patch to save some bytes. One compiler warning I noticed that needs to get resolved: src/backend/commands/explain.c: explain.c: In function ‘ExplainMergeActions’: explain.c:1528: warning: comparison of distinct pointer types lacks a cast That is complaining about this section: if (mt_planstate->operation != CMD_MERGE || mt_planstate->mt_mergeActPstates == NIL) return; So presumably that comparison with NIL needs a cast. Once I get more familiar with the code I'll fix that myself if Boxuan doesn't offer a suggestion first. The rest of the compiler warnings I saw didn't look related to his code, maybe stuff my picky Ubuntu compiler is noticing that was done recently to HEAD. I haven't checked HEAD without this patch yet to confirm, and am done for the night now. Here's the list if anyone is interested: Warning in src/backend/parser/scan.c: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c -MMD -MP -MF .deps/index.Po In file included from gram.y:12172: scan.c: In function ‘yy_try_NUL_trans’: scan.c:16256: warning: unused variable ‘yyg’ Warning in src/backend/utils/error/elog.c: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o ts_cache.o ts_cache.c -MMD -MP -MF .deps/ts_cache.Po elog.c: In function ‘write_console’: elog.c:1698: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result elog.c: In function ‘write_pipe_chunks’: elog.c:2388: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result elog.c:2397: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result Warning in src/bin/scripts/common.c: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g -I. -I. -I../../../src/interfaces/libpq -I../../../src/bin/pg_dump -I../../../src/include -D_GNU_SOURCE -c -o input.o input.c -MMD -MP -MF .deps/input.Po common.c: In function ‘handle_sigint’: common.c:247: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result common.c:250: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result common.c:251: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, "PostgreSQL 9.0 High Performance" Pre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book
Attachment
On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote: > The rest of the compiler warnings I saw didn't look related to his code, > maybe stuff my picky Ubuntu compiler is noticing that was done recently to > HEAD. I haven't checked HEAD without this patch yet to confirm, and am done > for the night now. Here's the list if anyone is interested: > > Warning in src/backend/parser/scan.c: > > gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith > -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g > -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c -MMD -MP -MF > .deps/index.Po > In file included from gram.y:12172: > scan.c: In function ‘yy_try_NUL_trans’: > scan.c:16256: warning: unused variable ‘yyg’ Known problem: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00657.php I'm pretty sure I've seen the warn_unused_result warnings on HEAD as well. Josh
On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote: > One compiler warning I noticed that needs to get resolved: > > src/backend/commands/explain.c: > > explain.c: In function ‘ExplainMergeActions’: > explain.c:1528: warning: comparison of distinct pointer types lacks a cast > > That is complaining about this section: > > if (mt_planstate->operation != CMD_MERGE || > mt_planstate->mt_mergeActPstates == NIL) > return; > > So presumably that comparison with NIL needs a cast. Once I get more > familiar with the code I'll fix that myself if Boxuan doesn't offer a > suggestion first. Possibly NULL was meant instead of NIL. NIL is specifically for a List. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, Sep 29, 2010 at 9:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:> One compiler warning I noticed that needs to get resolved:Possibly NULL was meant instead of NIL. NIL is specifically for a List.
>
> src/backend/commands/explain.c:
>
> explain.c: In function ‘ExplainMergeActions’:
> explain.c:1528: warning: comparison of distinct pointer types lacks a cast
>
> That is complaining about this section:
>
> if (mt_planstate->operation != CMD_MERGE ||
> mt_planstate->mt_mergeActPstates == NIL)
> return;
>
> So presumably that comparison with NIL needs a cast. Once I get more
> familiar with the code I'll fix that myself if Boxuan doesn't offer a
> suggestion first.
Yes, it should be NULL instead of NIL.
At first, I designed this filed as a List. But I changed it into an array in the last editions. This is why I have an unmatched assignment here. Sorry for that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote: > Starting looking at the latest MERGE patch from Boxuan here tonight. The > regression tests pass for me here, good starting sign. I expect to move onto > trying to break it more creatively next, then onto performance tests. > Nothing much more exciting than that to report so far. Greg, are you still working on a review of this patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > Greg, are you still working on a review of this patch? > Yes, just had more distractions while coming to speed up on this area than I'd hoped. I'll get a second round of looking at this done by the weekend. -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
Boxuan Zhai wrote: > Yes, it should be NULL instead of NIL. OK, I applied that patch to my local copy and pushed it out to github: http://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0 That represents what I tested against. However, when I tried to merge against HEAD once I was finished, I discovered this patch has bit-rotted significantly. If you have a local copy that works for you, I would not recommend pulling in the PostgreSQL repo updates done in the last couple of weeks yet. Friday's "Allow WITH clauses to be attached to INSERT, UPDATE, DELETE statements" commit in particular conflicts quite a bit with your changes. Attached is a rebased patch that applies to HEAD now after a round of fixes to resolve those. But it doesn't work yet, because of recent change to the ExecUpdate and ExecDelete functions you're calling from src/backend/executor/nodeModifyTable.c inside ExecMerge. If you can continue working on the patch without merging recent repo work, I can hack away at fixing that once I figure out what got changed there recently. It's taken some painful git work to sort out what I've done so far, there's more left to do, and I know that's not an area you specialize in. Back to the feature review...I dove into how I expected this to work, relative to what it actually does at the moment. That didn't really go too well so far, but I don't know that this represents any fundamental issue with the patch. Just situations the existing code didn't really anticipate we have to flush out. As a general community FYI here, while it's taken me a while to get up to speed on this whole feature, I expect to keep chugging away on this regardless of the CommitFest boundaries. This feature is both too big and too important to just stop working on it because a date has passed. Onto the test cases. The examples that Boxuan has been working with, and that the regression tests included with the patch exercise, all involve two tables being joined together using MERGE. The use case I decided to test instead was when you're trying to simulate an UPSERT where only a single table is involved. I couldn't get to this to work correctly. Maybe I'm just using MERGE wrong here, but I tried so many different variations without success (including one that's basically copied from Simon's original regression test set suggestions) that I suspect there may be a subtle problem with the implementation instead. To replicate the most straightforward variations of what I ran into, you can start with the same data that's populated by the regression test set: CREATE TABLE Stock(item_id int UNIQUE, balance int); INSERT INTO Stock VALUES (10, 2200); INSERT INTO Stock VALUES (20, 1900); SELECT * FROM Stock; item_id | balance ---------+--------- 10 | 2200 20 | 1900 If you now execute the following: MERGE INTO Stock t USING (SELECT * FROM Stock WHERE item_id=10) AS s ON s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (10,1) ; This works fine, and updates the matching row: item_id | balance ---------+--------- 20 | 1900 10 | 2201 But if I give it a key that doesn't exist instead: MERGE INTO Stock t USING (SELECT * FROM Stock WHERE item_id=30) AS s ON s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (30,1) ; This doesn't execute the NOT MATCHED case and INSERT the way I expected it to. It just gives back "MERGE 0". Since I wasn't sure if the whole "subquery in the USING clause" case was really implemented fully, I then tried to do this with something more like the working regression test examples. I expected this to do the same thing as the first example: MERGE INTO Stock t USING Stock s ON s.item_id=10 AND s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (10,1) ; But it gives back this: ERROR: duplicate key value violates unique constraint "stock_item_id_key" DETAIL: Key (item_id)=(10) already exists. Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of things to generate that. But it doesn't work any better if you give it an example that doesn't exist: MERGE INTO Stock t USING Stock s ON s.item_id=30 AND s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (30,1) ; ERROR: duplicate key value violates unique constraint "stock_item_id_key" DETAIL: Key (item_id)=(30) already exists. Now that one is really weird, because that key value certainly doesn't exist yet in the table. There seem to be a couple of issues in the area of joining a table with itself here. Feel free to tell me I just don't know how to use MERGE if that's the case in any of these. The other thing I noticed that may take some work to sort out is that I haven't had any luck getting MERGE to execute from within a plpgsql function. I was hoping I could use this to update the pgbench tables: CREATE OR REPLACE FUNCTION merge_account_balance(key INT, delta NUMERIC) RETURNS VOID AS $$ BEGIN MERGE INTO pgbench_accounts t USING (SELECT * FROM pgbench_accounts WHERE aid = key) AS s ON t.aid=s.aid WHEN MATCHED THEN UPDATE SET abalance = s.abalance + delta WHEN NOT MATCHED THEN INSERT VALUES (key,1+(key / 100000)::integer,delta,''); END; $$ LANGUAGE plpgsql; But I just get this instead: ERROR: "pgbench_accounts" is not a known variable LINE 4: MERGE INTO pgbench_accounts t USING (SELECT * FROM p... The other way I wrote the MERGE statement above (not using a subquery) does the same thing. I know that error messages is coming from the changes introduced in http://archives.postgresql.org/pgsql-committers/2010-01/msg00161.php but I'm not familiar enough with the whole grammar implementation to know what that means yet. That's what I found so far in my second pass over this. Once the first problem here is sorted out, I've already worked out how to test the performance of the code using pgbench. Have all the scripts ready to go once the correct MERGE statement is plugged into them, just ran into this same class of problems when I tried them. So far I was only able to see how fast the UPDATE path worked though, which isn't very helpful yet. My hope here is to test the MERGE implementation vs. the classic pl/pgsql implementation of UPSERT, calling both within a function so it's a fair comparison, and see how that goes. This may flush out concurrency bugs that are in the MERGE code as well. -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index f5d67a2..29652ac 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -119,6 +119,7 @@ Complete list of usable sgml source files in this directory. <!entity listen system "listen.sgml"> <!entity load system "load.sgml"> <!entity lock system "lock.sgml"> +<!entity merge system "merge.sgml"> <!entity move system "move.sgml"> <!entity notify system "notify.sgml"> <!entity prepare system "prepare.sgml"> diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml new file mode 100644 index 0000000..7c73623 --- /dev/null +++ b/doc/src/sgml/ref/merge.sgml @@ -0,0 +1,409 @@ +<!-- +$PostgreSQL$ +--> + +<refentry id="SQL-MERGE"> + <refmeta> + <refentrytitle id="SQL-MERGE-TITLE">MERGE</refentrytitle> + <refmiscinfo>SQL - Language Statements</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>MERGE</refname> + <refpurpose>update, insert or delete rows of a table based upon source data</refpurpose> + </refnamediv> + + <indexterm zone="sql-merge"> + <primary>MERGE</primary> + </indexterm> + + <refsynopsisdiv> +<synopsis> +MERGE INTO <replaceable class="PARAMETER">table</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable>] +USING <replaceable class="PARAMETER">source-query</replaceable> +ON <replaceable class="PARAMETER">join_condition</replaceable> +[<replaceable class="PARAMETER">when_clause</replaceable> [...]] + +where <replaceable class="PARAMETER">when_clause</replaceable> is + +{ WHEN MATCHED [ AND <replaceable class="PARAMETER">condition</replaceable> ] THEN { <replaceable class="PARAMETER">merge_update</replaceable>| DELETE | DO NOTHING | RAISE ERROR} + WHEN NOT MATCHED [ AND <replaceable class="PARAMETER">condition</replaceable> ] THEN { <replaceable class="PARAMETER">merge_insert</replaceable>| DO NOTHING | RAISE ERROR} } + +where <replaceable class="PARAMETER">merge_update</replaceable> is + +UPDATE SET { <replaceable class="PARAMETER">column</replaceable> = { <replaceable class="PARAMETER">expression</replaceable>| DEFAULT } | + ( <replaceable class="PARAMETER">column</replaceable> [, ...] ) = ( { <replaceable class="PARAMETER">expression</replaceable>| DEFAULT } [, ...] ) } [, ...] + +and <replaceable class="PARAMETER">merge_insert</replaceable> is + +INSERT [( <replaceable class="PARAMETER">column</replaceable> [, ...] )] { VALUES ( { <replaceable class="PARAMETER">expression</replaceable>| DEFAULT } [, ...] ) | DEFAULT VALUES } +</synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Description</title> + + <para> + <command>MERGE</command> performs at most one action on each row from + the target table, driven by the rows from the source query. This + provides a way to specify a single SQL statement that can conditionally + <command>UPDATE</command> or <command>INSERT</command> rows, a task + that would otherwise require multiple procedural language statements. + </para> + + <para> + First, the <command>MERGE</command> command performs a left outer join + from source query to target table, producing zero or more merged rows. For + each merged row, <literal>WHEN</> clauses are evaluated in the + specified order until one of them is activated. The corresponding action + is then applied and processing continues for the next row. + </para> + + <para> + <command>MERGE</command> actions have the same effect as + regular <command>UPDATE</command>, <command>INSERT</command>, or + <command>DELETE</command> commands of the same names, though the syntax + is slightly different. + </para> + + <para> + If no <literal>WHEN</> clause activates then an implicit action of + <literal>RAISE ERROR</> is performed for that row. If that + implicit action is not desirable an explicit action of + <literal>DO NOTHING</> may be specified instead. + </para> + + <para> + <command>MERGE</command> will only affect rows only in the specified table. + </para> + + <para> + There is no <literal>RETURNING</> clause with <command>MERGE</command>. + </para> + + <para> + There is no MERGE privilege. + You must have the <literal>UPDATE</literal> privilege on the table + if you specify an update action, the <literal>INSERT</literal> privilege if + you specify an insert action and/or the <literal>DELETE</literal> privilege + if you wish to delete. You will also require the + <literal>SELECT</literal> privilege to any table whose values are read + in the <replaceable class="parameter">expressions</replaceable> or + <replaceable class="parameter">condition</replaceable>. + </para> + </refsect1> + + <refsect1> + <title>Parameters</title> + + <variablelist> + <varlistentry> + <term><replaceable class="PARAMETER">table</replaceable></term> + <listitem> + <para> + The name (optionally schema-qualified) of the table to merge into. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="parameter">alias</replaceable></term> + <listitem> + <para> + A substitute name for the target table. When an alias is + provided, it completely hides the actual name of the table. For + example, given <literal>MERGE foo AS f</>, the remainder of the + <command>MERGE</command> statement must refer to this table as + <literal>f</> not <literal>foo</>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="PARAMETER">source-query</replaceable></term> + <listitem> + <para> + A query (<command>SELECT</command> statement or <command>VALUES</command> + statement) that supplies the rows to be merged into the target table. + Refer to the <xref linkend="sql-select"> + statement or <xref linkend="sql-values"> + statement for a description of the syntax. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="PARAMETER">join_condition</replaceable></term> + <listitem> + <para> + <replaceable class="parameter">join_condition</replaceable> is + an expression resulting in a value of type + <type>boolean</type> (similar to a <literal>WHERE</literal> + clause) that specifies which rows in the join are considered to + match. You should ensure that the join produces at most one output + row for each row to be modified. An attempt to modify any row of the + target table more than once will result in an error. This behaviour + requires the user to take greater care in using <command>MERGE</command>, + though is required explicitly by the SQL Standard. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="PARAMETER">condition</replaceable></term> + <listitem> + <para> + An expression that returns a value of type <type>boolean</type>. + If this expression returns <literal>true</> then the <literal>WHEN</> + clause will be activated and the corresponding action will occur for + that row. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="PARAMETER">merge_update</replaceable></term> + <listitem> + <para> + The specification of an <literal>UPDATE</> action. Do not include + the table name, as you would normally do with an + <xref linkend="sql-update"> command. + For example, <literal>UPDATE tab SET col = 1</> is invalid. Also, + do not include a <literal>WHERE</> clause, since only the current + can be updated. For example, + <literal>UPDATE SET col = 1 WHERE key = 57</> is invalid. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="PARAMETER">merge_insert</replaceable></term> + <listitem> + <para> + The specification of an <literal>INSERT</> action. Do not include + the table name, as you would normally do with an + <xref linkend="sql-insert"> command. + For example, <literal>INSERT INTO tab VALUES (1, 50)</> is invalid. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="PARAMETER">column</replaceable></term> + <listitem> + <para> + The name of a column in <replaceable + class="PARAMETER">table</replaceable>. + The column name can be qualified with a subfield name or array + subscript, if needed. Do not include the table's name in the + specification of a target column — for example, + <literal>UPDATE SET tab.col = 1</> is invalid. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="PARAMETER">expression</replaceable></term> + <listitem> + <para> + An expression to assign to the column. The expression can use the + old values of this and other columns in the table. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>DEFAULT</literal></term> + <listitem> + <para> + Set the column to its default value (which will be NULL if no + specific default expression has been assigned to it). + </para> + </listitem> + </varlistentry> + + </variablelist> + </refsect1> + + <refsect1> + <title>Outputs</title> + + <para> + On successful completion, a <command>MERGE</> command returns a command + tag of the form +<screen> +MERGE <replaceable class="parameter">total-count</replaceable> +</screen> + The <replaceable class="parameter">total-count</replaceable> is the number + of rows changed (either updated, inserted or deleted). + If <replaceable class="parameter">total-count</replaceable> is 0, no rows + were changed (this is not considered an error). + </para> + + <para> + The number of rows updated, inserted or deleted is not available as part + of the command tag. An optional NOTIFY message can be generated to + present this information, if desired. +<screen> +NOTIFY: 34 rows processed: 11 updated, 5 deleted, 15 inserted, 3 default inserts, 0 no action +</screen> + </para> + + </refsect1> + + <refsect1> + <title>Notes</title> + + <para> + What essentially happens is that the target table is left outer-joined to + the tables mentioned in the <replaceable>source-query</replaceable>, and + each output row of the join may then activate at most one when-clause. + The row will be matched only once per statement, so the status of + <literal>MATCHED</> or <literal>NOT MATCHED</> cannot change once testing + of <literal>WHEN</> clauses has begun. <command>MERGE</command> will not + invoke Rules. + </para> + + <para> + The following steps take place during the execution of + <command>MERGE</command>. + <orderedlist> + <listitem> + <para> + Perform any BEFORE STATEMENT triggers for actions specified, whether or + not they actually occur. + </para> + </listitem> + <listitem> + <para> + Perform left outer join from source to target table. Then for each row: + <orderedlist> + <listitem> + <para> + Evaluate whether each row is MATCHED or NOT MATCHED. + </para> + </listitem> + <listitem> + <para> + Test each WHEN condition in the order specified until one activates. + Identify the action and its event type. + </para> + </listitem> + <listitem> + <para> + Perform any BEFORE ROW triggers that fire for the action's event type. + </para> + </listitem> + <listitem> + <para> + Apply the action specified. + </para> + </listitem> + <listitem> + <para> + Perform any AFTER ROW triggers that fire for the action's event type. + </para> + </listitem> + </orderedlist> + </para> + </listitem> + <listitem> + <para> + Perform any AFTER STATEMENT triggers for actions specified, whether or + not they actually occur. + </para> + </listitem> + </orderedlist> + In summary, statement triggers for an event type (say, INSERT) will + be fired whenever we <emphasis>specify</> an action of that kind. Row-level + triggers will fire only for event type <emphasis>activated</>. + So a <command>MERGE</command> might fire statement triggers for both + <literal>UPDATE</> and <literal>INSERT</>, even though only + <literal>UPDATE</> row triggers were fired. + </para> + + </refsect1> + + <refsect1> + <title>Examples</title> + + <para> + Attempt to insert a new stock item along with the quantity of stock. If + the item already exists, instead update the stock count of the existing + item. +<programlisting> +MERGE INTO wines w +USING (VALUES('Chateau Lafite 2003', '24')) v +ON v.column1 = w.winename +WHEN NOT MATCHED THEN + INSERT VALUES(v.column1, v.column2) +WHEN MATCHED THEN + UPDATE SET stock = stock + v.column2; +</programlisting> + </para> + + <para> + Perform maintenance on CustomerAccounts based upon new Transactions. + The following statement will fail if any accounts have had more than + one transaction + +<programlisting> +MERGE CustomerAccount CA + +USING (SELECT CustomerId, TransactionValue, + FROM Transactions + WHERE TransactionId > 35345678) AS T + +ON T.CustomerId = CA.CustomerId + +WHEN MATCHED THEN + UPDATE SET Balance = Balance - TransactionValue + +WHEN NOT MATCHED THEN + INSERT (CustomerId, Balance) + VALUES (T.CustomerId, T.TransactionValue) +; +</programlisting> + + so the right way to do this is to pre-aggregate the data + +<programlisting> +MERGE CustomerAccount CA + +USING (SELECT CustomerId, Sum(TransactionValue) As TransactionSum + FROM Transactions + WHERE TransactionId > 35345678 + GROUP BY CustomerId) AS T + +ON T.CustomerId = CA.CustomerId + +WHEN MATCHED THEN + UPDATE SET Balance = Balance - TransactionSum + +WHEN NOT MATCHED THEN + INSERT (CustomerId, Balance) + VALUES (T.CustomerId, T.TransactionSum) +; +</programlisting> + </para> + + </refsect1> + + <refsect1> + <title>Compatibility</title> + + <para> + This command conforms to the <acronym>SQL</acronym> standard, except + that the <literal>DELETE</literal> and <literal>DO NOTHING</> actions + are <productname>PostgreSQL</productname> extensions. + </para> + + <para> + According to the standard, the column-list syntax for an <literal>UPDATE</> + action should allow a list of columns to be assigned from a single + row-valued expression. + This is not currently implemented — the source must be a list + of independent expressions. + </para> + </refsect1> +</refentry> diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml index 5968db1..4b681bb 100644 --- a/doc/src/sgml/ref/update.sgml +++ b/doc/src/sgml/ref/update.sgml @@ -340,6 +340,9 @@ UPDATE wines SET stock = stock + 24 WHERE winename = 'Chateau Lafite 2003'; -- continue with other operations, and eventually COMMIT; </programlisting> + + This operation can be executed in a single statement using + <xref linkend="sql-merge" endterm="sql-merge-title">. </para> <para> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index 463746c..cf9d678 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -147,6 +147,7 @@ &listen; &load; &lock; + &merge; &move; ¬ify; &prepare; diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index f494ec9..a5814a7 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -81,6 +81,8 @@ static void show_sort_keys_common(PlanState *planstate, static void show_sort_info(SortState *sortstate, ExplainState *es); static void show_hash_info(HashState *hashstate, ExplainState *es); static const char *explain_get_index_name(Oid indexId); +static void ExplainMergeActions(ModifyTableState *mt_planstate, + List *ancestors, ExplainState *es); static void ExplainScanTarget(Scan *plan, ExplainState *es); static void ExplainMemberNodes(List *plans, PlanState **planstates, List *ancestors, ExplainState *es); @@ -644,6 +646,9 @@ ExplainNode(PlanState *planstate, List *ancestors, case CMD_DELETE: pname = operation = "Delete"; break; + case CMD_MERGE: + pname = operation = "Merge"; + break; default: pname = "???"; break; @@ -1207,6 +1212,11 @@ ExplainNode(PlanState *planstate, List *ancestors, if (innerPlanState(planstate)) ExplainNode(innerPlanState(planstate), ancestors, "Inner", NULL, es); + + if (IsA(plan, ModifyTable) && + ((ModifyTable *)plan)->operation == CMD_MERGE) + ExplainMergeActions((ModifyTableState *) planstate, + ancestors, es); /* special child plans */ switch (nodeTag(plan)) @@ -1547,6 +1557,89 @@ explain_get_index_name(Oid indexId) return result; } +static void +ExplainMergeActions(ModifyTableState *mt_planstate, List *ancestors, + ExplainState *es) +{ + ListCell *l; + int actno = 1; + StringInfo buf = makeStringInfo(); + StringInfo acttitle = makeStringInfo(); + MergeActionSet *actset; + + if (mt_planstate->operation != CMD_MERGE || + mt_planstate->mt_mergeActPstates == NULL) + return; + + actset = mt_planstate->mt_mergeActPstates[0]; + + foreach(l, actset->actions) + { + ModifyTableState *mt_state = (ModifyTableState *) lfirst(l); + MergeActionState *act_pstate = (MergeActionState *) mt_state->mt_plans[0]; + MergeAction *act_plan = (MergeAction *) act_pstate->ps.plan; + + /*prepare the title*/ + resetStringInfo(acttitle); + appendStringInfo(acttitle, "Action %d", actno); + + /*prepare the string for printing*/ + resetStringInfo(buf); + switch(act_pstate->operation) + { + case CMD_INSERT: + appendStringInfoString(buf, "Insert When "); + break; + case CMD_UPDATE: + appendStringInfoString(buf, "Update When "); + break; + case CMD_DELETE: + appendStringInfoString(buf, "Delete When "); + break; + case CMD_DONOTHING: + appendStringInfoString(buf, "Do Nothing When "); + break; + default: + elog(ERROR, "unknown merge action"); + } + + if (act_plan->matched) + appendStringInfoString(buf, "Matched "); + else + appendStringInfoString(buf, "Not Mactched "); + + if (act_plan->flattenedqual) + appendStringInfoString(buf, "And "); + + /*print the action type*/ + ExplainPropertyText(acttitle->data, buf->data, es); + + /*print the action qual*/ + show_qual(act_plan->flattenedqual, "Qual", + &act_pstate->ps, ancestors, true, es); + + /*print the target list of action*/ + if (es->verbose && + (act_plan->operation == CMD_INSERT || + act_plan->operation == CMD_UPDATE)) + { + List *orignialtlist; + + orignialtlist = act_plan->plan.targetlist; + act_plan->plan.targetlist = act_plan->flattenedtlist; + show_plan_tlist((PlanState *) act_pstate, ancestors, es); + act_plan->plan.targetlist = orignialtlist; + } + + if (act_pstate->ps.subPlan) + ExplainSubPlans(act_pstate->ps.subPlan, ancestors, "SubPlan", es); + + actno++; + } + + ExplainPropertyText("MainPlan", "", es); +} + /* * Show the target of a Scan node */ diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index d69fdcf..0bd0b68 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -2443,6 +2443,87 @@ ExecASTruncateTriggers(EState *estate, ResultRelInfo *relinfo) false, NULL, NULL, NIL, NULL); } +void +ExecBSMergeTriggers(ModifyTableState *mt_state) +{ + ListCell *l; + MergeActionSet *actset; + bool doUpdateTriggers = false; + bool doInsertTriggers = false; + bool doDeleteTriggers = false; + + /* Scan the actions to see what kind of statements there is */ + actset = mt_state->mt_mergeActPstates[0]; + foreach(l, actset->actions) + { + ModifyTableState *actmtstate; + MergeActionState *actPstate; + MergeAction *actplan; + + actmtstate = (ModifyTableState *) lfirst(l); + actPstate = (MergeActionState *) actmtstate->mt_plans[0]; + actplan = (MergeAction *) actPstate->ps.plan; + + if (actplan->operation == CMD_UPDATE) + doUpdateTriggers = true; + else if (actplan->operation == CMD_INSERT) + doInsertTriggers = true; + else if (actplan->operation == CMD_DELETE) + doDeleteTriggers = true; + } + + /* And fire the triggers */ + if (doUpdateTriggers) + ExecBSUpdateTriggers(mt_state->ps.state, + mt_state->ps.state->es_result_relations); + if (doInsertTriggers) + ExecBSInsertTriggers(mt_state->ps.state, + mt_state->ps.state->es_result_relations); + if (doDeleteTriggers) + ExecBSDeleteTriggers(mt_state->ps.state, + mt_state->ps.state->es_result_relations); +} + +void +ExecASMergeTriggers(ModifyTableState *mt_state) +{ + ListCell *l; + MergeActionSet *actset; + bool doUpdateTriggers = false; + bool doInsertTriggers = false; + bool doDeleteTriggers = false; + + /* Scan the actions to see what kind of statements there is */ + actset = mt_state->mt_mergeActPstates[0]; + foreach(l, actset->actions) + { + ModifyTableState *actmtstate; + MergeActionState *actPstate; + MergeAction *actplan; + + actmtstate = (ModifyTableState *)lfirst(l); + actPstate = (MergeActionState *)actmtstate->mt_plans[0]; + actplan = (MergeAction *)actPstate->ps.plan; + + if(actplan->operation == CMD_UPDATE) + doUpdateTriggers = true; + else if(actplan->operation == CMD_INSERT) + doInsertTriggers = true; + else if(actplan->operation == CMD_DELETE) + doDeleteTriggers = true; + } + + /* And fire the triggers */ + if (doUpdateTriggers) + ExecASUpdateTriggers(mt_state->ps.state, + mt_state->ps.state->es_result_relations); + if (doInsertTriggers) + ExecASInsertTriggers(mt_state->ps.state, + mt_state->ps.state->es_result_relations); + if (doDeleteTriggers) + ExecASDeleteTriggers(mt_state->ps.state, + mt_state->ps.state->es_result_relations); +} static HeapTuple GetTupleForTrigger(EState *estate, diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 69f3a28..8637fa8 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -170,6 +170,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) case CMD_INSERT: case CMD_DELETE: case CMD_UPDATE: + case CMD_MERGE: estate->es_output_cid = GetCurrentCommandId(true); break; diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c index edd175e..2be0491 100644 --- a/src/backend/executor/execProcnode.c +++ b/src/backend/executor/execProcnode.c @@ -154,6 +154,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags) estate, eflags); break; + case T_MergeAction: + result = (PlanState *) ExecInitMergeAction((MergeAction *) node, + estate, eflags); + break; + case T_Append: result = (PlanState *) ExecInitAppend((Append *) node, estate, eflags); diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 541adaf..893794f 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -704,6 +704,103 @@ lreplace:; return NULL; } +static TupleTableSlot * +ExecMerge(ItemPointer tupleid, + TupleTableSlot *slot, + TupleTableSlot *planSlot, + MergeActionSet *actset, + EState *estate) +{ + + TupleTableSlot *actslot = NULL; + ListCell *each; + + /* + * Try the merge actions one by one until we have a match. + */ + foreach(each, actset->actions) + { + ModifyTableState *mt_pstate; + MergeActionState *action_pstate; + ExprContext *econtext; + bool matched; + + mt_pstate = (ModifyTableState *) lfirst(each); + Assert(IsA(mt_pstate, ModifyTableState)); + + /* + * mt_pstate is supposed to have only ONE mt_plans, + * which is a MergeActionState + */ + action_pstate = (MergeActionState *) mt_pstate->mt_plans[0]; + matched = ((MergeAction *)action_pstate->ps.plan)->matched; + + /* + * If tupleid == NULL, it is a NOT MATCHED case, + * else, it is a MATCHED case, + */ + if ((tupleid == NULL && matched) || + (tupleid != NULL && !matched)) + continue; + + /* Setup the expression context. */ + econtext = action_pstate->ps.ps_ExprContext; + + /* + * Check that additional quals match, if any. + */ + if (action_pstate->ps.qual) + { + ResetExprContext(econtext); + + econtext->ecxt_scantuple = slot; + econtext->ecxt_outertuple = planSlot; + + if (!ExecQual(action_pstate->ps.qual, econtext, false)) + continue; + } + + /* Ok, we have a match. Perform the action */ + + /* First project any RETURNING result tuple slot, if needed */ + if (action_pstate->operation == CMD_INSERT || + action_pstate->operation == CMD_UPDATE) + actslot = ExecProcessReturning(action_pstate->ps.ps_ProjInfo, + slot, planSlot); + + switch (action_pstate->operation) + { + case CMD_INSERT: + return ExecInsert(actslot, planSlot, estate); + + case CMD_UPDATE: + return ExecUpdate(tupleid, + actslot, + planSlot, + &mt_pstate->mt_epqstate, + estate); + + case CMD_DELETE: + return ExecDelete(tupleid, + planSlot, + &mt_pstate->mt_epqstate, + estate); + + case CMD_DONOTHING: + return NULL; + + default: + elog(ERROR, "unknown merge action type for execute"); + break; + } + } + + /* + * No matching action found. Perform the default action, which is + * DO NOTHING. + */ + return NULL; +} /* * Process BEFORE EACH STATEMENT triggers @@ -725,6 +822,9 @@ fireBSTriggers(ModifyTableState *node) ExecBSDeleteTriggers(node->ps.state, node->ps.state->es_result_relations); break; + case CMD_MERGE: + ExecBSMergeTriggers(node); + break; default: elog(ERROR, "unknown operation"); break; @@ -751,6 +851,9 @@ fireASTriggers(ModifyTableState *node) ExecASDeleteTriggers(node->ps.state, node->ps.state->es_result_relations); break; + case CMD_MERGE: + ExecASMergeTriggers(node); + break; default: elog(ERROR, "unknown operation"); break; @@ -777,6 +880,7 @@ ExecModifyTable(ModifyTableState *node) ItemPointer tupleid = NULL; ItemPointerData tuple_ctid; HeapTupleHeader oldtuple = NULL; + MergeActionSet *mergeActSet = NULL; /* * On first call, fire BEFORE STATEMENT triggers before proceeding. @@ -798,6 +902,8 @@ ExecModifyTable(ModifyTableState *node) /* Preload local variables */ subplanstate = node->mt_plans[node->mt_whichplan]; junkfilter = estate->es_result_relation_info->ri_junkFilter; + if(node->mt_mergeActPstates) + mergeActSet = node->mt_mergeActPstates[node->mt_whichplan]; /* * Fetch rows from subplan(s), and execute the required table modification @@ -824,6 +930,8 @@ ExecModifyTable(ModifyTableState *node) estate->es_result_relation_info++; subplanstate = node->mt_plans[node->mt_whichplan]; junkfilter = estate->es_result_relation_info->ri_junkFilter; + if(node->mt_mergeActPstates) + mergeActSet = node->mt_mergeActPstates[node->mt_whichplan]; EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan); continue; } @@ -839,7 +947,7 @@ ExecModifyTable(ModifyTableState *node) /* * extract the 'ctid' or 'wholerow' junk attribute. */ - if (operation == CMD_UPDATE || operation == CMD_DELETE) + if (operation == CMD_UPDATE || operation == CMD_DELETE || operation == CMD_MERGE) { Datum datum; bool isNull; @@ -849,13 +957,24 @@ ExecModifyTable(ModifyTableState *node) datum = ExecGetJunkAttribute(slot, junkfilter->jf_junkAttNo, &isNull); - /* shouldn't ever get a null result... */ if (isNull) - elog(ERROR, "ctid is NULL"); + { + /* + * Shouldn't ever get a null result for UPDATE or DELETE. + * MERGE gets a null ctid in "NOT MATCHED" case + */ + if (operation != CMD_MERGE) + elog(ERROR, "ctid is NULL"); + else + tupleid = NULL; + } + else + { - tupleid = (ItemPointer) DatumGetPointer(datum); - tuple_ctid = *tupleid; /* be sure we don't free ctid!! */ - tupleid = &tuple_ctid; + tupleid = (ItemPointer) DatumGetPointer(datum); + tuple_ctid = *tupleid; /* be sure we don't free the ctid!! */ + tupleid = &tuple_ctid; + } } else { @@ -890,6 +1009,10 @@ ExecModifyTable(ModifyTableState *node) slot = ExecDelete(tupleid, oldtuple, planSlot, &node->mt_epqstate, estate); break; + case CMD_MERGE: + slot = ExecMerge(tupleid, slot, planSlot, + mergeActSet, estate); + break; default: elog(ERROR, "unknown operation"); break; @@ -917,6 +1040,69 @@ ExecModifyTable(ModifyTableState *node) return NULL; } +/* + * When init a merge plan, we also need init its action plans. + * These action plans are "MergeAction" plans. + * + * This function mainly handles the tlist and qual in the plan. + * The returning result is a "MergeActionState". + */ +MergeActionState * +ExecInitMergeAction(MergeAction *node, EState *estate, int eflags) +{ + MergeActionState *result; + + /* + * do nothing when we get to the end of a leaf on tree. + */ + if (node == NULL) + return NULL; + + /* + * create state structure + */ + result = makeNode(MergeActionState); + result->operation = node->operation; + result->ps.plan = (Plan *)node; + result->ps.state = estate; + + /* + * tuple table initialization + */ + ExecInitResultTupleSlot(estate, &result->ps); + + /* + * initialize tuple type + */ + ExecAssignResultTypeFromTL(&result->ps); + + /* + * create expression context for node + */ + ExecAssignExprContext(estate, &result->ps); + + /* + * initialize child expressions + */ + result->ps.targetlist = (List *) + ExecInitExpr((Expr *) node->plan.targetlist, &result->ps); + + result->ps.qual = (List *) + ExecInitExpr((Expr *) node->plan.qual, &result->ps); + + /* + * init the projection information + */ + ExecAssignProjectionInfo(&result->ps, NULL); + + /* + * XXX: do we need a check for the plan output here ? + * (by calling the ExecCheckPlanOutput() function + */ + + return result; +} + /* ---------------------------------------------------------------- * ExecInitModifyTable * ---------------------------------------------------------------- @@ -932,6 +1118,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) Plan *subplan; ListCell *l; int i; + bool isMergeAction = false; /* check for unsupported flags */ Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))); @@ -972,6 +1159,16 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) foreach(l, node->plans) { subplan = (Plan *) lfirst(l); + + /* + * test if this subplan node is a MergeAction. + * We need this information for setting the junkfilter. + * junkfilter is necessary for an ordinary UPDATE/DELETE plan, + * but not for an UPDATE/DELETE merge action + */ + if (IsA(subplan, MergeAction)) + isMergeAction = true; + mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags); estate->es_result_relation_info++; i++; @@ -1101,7 +1298,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) break; case CMD_UPDATE: case CMD_DELETE: - junk_filter_needed = true; + case CMD_MERGE: + if(!isMergeAction) + junk_filter_needed = true; + break; + case CMD_DONOTHING: break; default: elog(ERROR, "unknown operation"); @@ -1124,9 +1325,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) resultRelInfo->ri_RelationDesc->rd_att->tdhasoid, ExecInitExtraTupleSlot(estate)); - if (operation == CMD_UPDATE || operation == CMD_DELETE) + if (operation == CMD_UPDATE || operation == CMD_DELETE || + operation == CMD_MERGE) { - /* For UPDATE/DELETE, find the appropriate junk attr now */ + /* For UPDATE/DELETE/MERGE, find the appropriate junk attr now */ if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_RELATION) { j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid"); @@ -1161,6 +1363,36 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) if (estate->es_trig_tuple_slot == NULL) estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); + /* + * for the merge actions, we need to do similar things as above. + *Each action in each action set should be init by the ExecInitNode. + *The returned planstates will take the place of the original plan nodes. + *These new action set will be put in the mt_mergeActPstates array. + */ + if(node->operation == CMD_MERGE) + { + /*we have one action set for each result relation (main plan)*/ + mtstate->mt_mergeActPstates = + (MergeActionSet **) palloc0(sizeof(MergeActionSet*) * nplans); + + estate->es_result_relation_info = estate->es_result_relations; + i = 0; + foreach(l, node->mergeActPlan) + { + ListCell *e; + MergeActionSet *actset = (MergeActionSet *) lfirst(l); + + foreach(e, actset->actions) + { + lfirst(e) = ExecInitNode((Plan *)lfirst(e), estate, 0); + } + mtstate->mt_mergeActPstates[i] = actset; + estate->es_result_relation_info++; + i++; + } + estate->es_result_relation_info = NULL; + } + return mtstate; } diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 508d7c7..4a57aa7 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -177,6 +177,42 @@ _copyModifyTable(ModifyTable *from) COPY_NODE_FIELD(returningLists); COPY_NODE_FIELD(rowMarks); COPY_SCALAR_FIELD(epqParam); + COPY_NODE_FIELD(mergeActPlan); + + return newnode; +} + +/* + * _copyMergeAction + */ +static MergeAction * +_copyMergeAction(MergeAction *from) +{ + MergeAction *newnode = makeNode(MergeAction); + + /* + * copy node superclass fields + */ + CopyPlanFields((Plan *) from, (Plan *) newnode); + + COPY_SCALAR_FIELD(operation); + COPY_SCALAR_FIELD(matched); + COPY_NODE_FIELD(flattenedqual); + COPY_NODE_FIELD(flattenedtlist); + + return newnode; +} + +/* + * _copyMergeActionSet + */ +static MergeActionSet * +_copyMergeActionSet(MergeActionSet *from) +{ + MergeActionSet *newnode = makeNode(MergeActionSet); + + COPY_SCALAR_FIELD(result_relation); + COPY_NODE_FIELD(actions); return newnode; } @@ -2300,6 +2336,10 @@ _copyQuery(Query *from) COPY_NODE_FIELD(rowMarks); COPY_NODE_FIELD(setOperations); COPY_NODE_FIELD(constraintDeps); + COPY_SCALAR_FIELD(isMergeAction); + COPY_SCALAR_FIELD(matched); + COPY_SCALAR_FIELD(sourceAttrNo); + COPY_NODE_FIELD(mergeActQry); return newnode; } @@ -2309,6 +2349,7 @@ _copyInsertStmt(InsertStmt *from) { InsertStmt *newnode = makeNode(InsertStmt); + COPY_SCALAR_FIELD(isMergeAction); COPY_NODE_FIELD(relation); COPY_NODE_FIELD(cols); COPY_NODE_FIELD(selectStmt); @@ -2323,6 +2364,7 @@ _copyDeleteStmt(DeleteStmt *from) { DeleteStmt *newnode = makeNode(DeleteStmt); + COPY_SCALAR_FIELD(isMergeAction); COPY_NODE_FIELD(relation); COPY_NODE_FIELD(usingClause); COPY_NODE_FIELD(whereClause); @@ -2337,6 +2379,7 @@ _copyUpdateStmt(UpdateStmt *from) { UpdateStmt *newnode = makeNode(UpdateStmt); + COPY_SCALAR_FIELD(isMergeAction); COPY_NODE_FIELD(relation); COPY_NODE_FIELD(targetList); COPY_NODE_FIELD(whereClause); @@ -2374,6 +2417,47 @@ _copySelectStmt(SelectStmt *from) return newnode; } +static MergeStmt * +_copyMergeStmt(MergeStmt *from) +{ + MergeStmt *newnode = makeNode(MergeStmt); + + COPY_NODE_FIELD(relation); + COPY_NODE_FIELD(source); + COPY_NODE_FIELD(matchCondition); + COPY_NODE_FIELD(actions); + + return newnode; +} + +static MergeConditionAction * +_copyMergeConditionAction(MergeConditionAction *from) +{ + MergeConditionAction *newnode = makeNode(MergeConditionAction); + + COPY_SCALAR_FIELD(match); + COPY_NODE_FIELD(condition); + COPY_NODE_FIELD(action); + + return newnode; +} + +static MergeDoNothing * +_copyMergeDoNothing(MergeDoNothing *from) +{ + MergeDoNothing *newnode = makeNode(MergeDoNothing); + + return newnode; +} + +static MergeError* +_copyMergeError(MergeError *from) +{ + MergeError *newnode = makeNode(MergeError); + + return newnode; +} + static SetOperationStmt * _copySetOperationStmt(SetOperationStmt *from) { @@ -3650,6 +3734,12 @@ copyObject(void *from) case T_ModifyTable: retval = _copyModifyTable(from); break; + case T_MergeAction: + retval = _copyMergeAction(from); + break; + case T_MergeActionSet: + retval = _copyMergeActionSet(from); + break; case T_Append: retval = _copyAppend(from); break; @@ -3953,6 +4043,18 @@ copyObject(void *from) case T_SelectStmt: retval = _copySelectStmt(from); break; + case T_MergeStmt: + retval = _copyMergeStmt(from); + break; + case T_MergeConditionAction: + retval = _copyMergeConditionAction(from); + break; + case T_MergeDoNothing: + retval = _copyMergeDoNothing(from); + break; + case T_MergeError: + retval = _copyMergeError(from); + break; case T_SetOperationStmt: retval = _copySetOperationStmt(from); break; diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 19262aa..a95057b 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -879,6 +879,10 @@ _equalQuery(Query *a, Query *b) COMPARE_NODE_FIELD(rowMarks); COMPARE_NODE_FIELD(setOperations); COMPARE_NODE_FIELD(constraintDeps); + COMPARE_SCALAR_FIELD(isMergeAction); + COMPARE_SCALAR_FIELD(matched); + COMPARE_SCALAR_FIELD(sourceAttrNo); + COMPARE_NODE_FIELD(mergeActQry); return true; } @@ -886,6 +890,7 @@ _equalQuery(Query *a, Query *b) static bool _equalInsertStmt(InsertStmt *a, InsertStmt *b) { + COMPARE_SCALAR_FIELD(isMergeAction); COMPARE_NODE_FIELD(relation); COMPARE_NODE_FIELD(cols); COMPARE_NODE_FIELD(selectStmt); @@ -898,6 +903,7 @@ _equalInsertStmt(InsertStmt *a, InsertStmt *b) static bool _equalDeleteStmt(DeleteStmt *a, DeleteStmt *b) { + COMPARE_SCALAR_FIELD(isMergeAction); COMPARE_NODE_FIELD(relation); COMPARE_NODE_FIELD(usingClause); COMPARE_NODE_FIELD(whereClause); @@ -910,6 +916,7 @@ _equalDeleteStmt(DeleteStmt *a, DeleteStmt *b) static bool _equalUpdateStmt(UpdateStmt *a, UpdateStmt *b) { + COMPARE_SCALAR_FIELD(isMergeAction); COMPARE_NODE_FIELD(relation); COMPARE_NODE_FIELD(targetList); COMPARE_NODE_FIELD(whereClause); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index ee2aeb0..3106040 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -332,6 +332,7 @@ _outModifyTable(StringInfo str, ModifyTable *node) WRITE_NODE_FIELD(returningLists); WRITE_NODE_FIELD(rowMarks); WRITE_INT_FIELD(epqParam); + WRITE_NODE_FIELD(mergeActPlan); } static void @@ -2062,6 +2063,53 @@ _outQuery(StringInfo str, Query *node) WRITE_NODE_FIELD(rowMarks); WRITE_NODE_FIELD(setOperations); WRITE_NODE_FIELD(constraintDeps); + WRITE_BOOL_FIELD(isMergeAction); + WRITE_BOOL_FIELD(matched); + WRITE_INT_FIELD(sourceAttrNo); + WRITE_NODE_FIELD(mergeActQry); +} + +static void +_outMergeConditionAction(StringInfo str, MergeConditionAction *node) +{ + WRITE_NODE_TYPE("MERGECONDITIONACTION"); + + WRITE_BOOL_FIELD(match); + WRITE_NODE_FIELD(condition); + WRITE_NODE_FIELD(action); +} + +static void +_outMergeStmt(StringInfo str, MergeStmt *node) +{ + WRITE_NODE_TYPE("MERGESTMT"); + + WRITE_NODE_FIELD(relation); + WRITE_NODE_FIELD(source); + WRITE_NODE_FIELD(matchCondition); + WRITE_NODE_FIELD(actions); +} + +static void +_outMergeAction(StringInfo str, MergeAction *node) +{ + WRITE_NODE_TYPE("MERGEACTION"); + + _outPlanInfo(str, (Plan *)node); + + WRITE_ENUM_FIELD(operation, CmdType); + WRITE_BOOL_FIELD(matched); + WRITE_NODE_FIELD(flattenedqual); + WRITE_NODE_FIELD(flattenedtlist); +} + +static void +_outMergeActionSet(StringInfo str, MergeActionSet *node) +{ + WRITE_NODE_TYPE("MERGEACTIONSET"); + + WRITE_INT_FIELD(result_relation); + WRITE_NODE_FIELD(actions); } static void @@ -2953,6 +3001,18 @@ _outNode(StringInfo str, void *obj) case T_XmlSerialize: _outXmlSerialize(str, obj); break; + case T_MergeAction: + _outMergeAction(str, obj); + break; + case T_MergeActionSet: + _outMergeActionSet(str, obj); + break; + case T_MergeStmt: + _outMergeStmt(str, obj); + break; + case T_MergeConditionAction: + _outMergeConditionAction(str,obj); + break; default: diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 9a4e03e..e06c5b8 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -219,6 +219,10 @@ _readQuery(void) READ_NODE_FIELD(rowMarks); READ_NODE_FIELD(setOperations); READ_NODE_FIELD(constraintDeps); + READ_BOOL_FIELD(isMergeAction); + READ_BOOL_FIELD(matched); + READ_INT_FIELD(sourceAttrNo); + READ_NODE_FIELD(mergeActQry); READ_DONE(); } diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 52dd27b..ec6dd22 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -4090,7 +4090,7 @@ make_result(PlannerInfo *root, ModifyTable * make_modifytable(CmdType operation, List *resultRelations, List *subplans, List *returningLists, - List *rowMarks, int epqParam) + List *rowMarks, List *mergeActPlans, int epqParam) { ModifyTable *node = makeNode(ModifyTable); Plan *plan = &node->plan; @@ -4143,6 +4143,8 @@ make_modifytable(CmdType operation, List *resultRelations, node->returningLists = returningLists; node->rowMarks = rowMarks; node->epqParam = epqParam; + if (operation == CMD_MERGE) + node->mergeActPlan = mergeActPlans; return node; } diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 93daedc..55ca01f 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -103,7 +103,8 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc, int *ordNumCols, AttrNumber **ordColIdx, Oid **ordOperators); - +static ModifyTable *merge_action_planner(PlannerInfo *root, Plan *mainPlan); +static MergeActionSet *merge_action_set_planner(PlannerInfo *root, Plan *mainPlan); /***************************************************************************** * @@ -453,6 +454,27 @@ subquery_planner(PlannerGlobal *glob, Query *parse, } /* + * for MERGE command, we need to preprocess + * the expressions for merge actions too. + */ + if(parse->commandType == CMD_MERGE) + { + ListCell *e; + + foreach(e, parse->mergeActQry) + { + Query *actqry = (Query *)lfirst(e); + + actqry->targetList = (List *) preprocess_expression(root, + (Node *) actqry->targetList, + EXPRKIND_TARGET); + actqry->jointree->quals = preprocess_expression(root, + (Node *) actqry->jointree->quals, + EXPRKIND_QUAL); + } + } + + /* * In some cases we may want to transfer a HAVING clause into WHERE. We * cannot do so if the HAVING clause contains aggregates (obviously) or * volatile functions (since a HAVING clause is supposed to be executed @@ -529,6 +551,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse, { List *returningLists; List *rowMarks; + List *mergeAction; /* * Deal with the RETURNING clause if any. It's convenient to pass @@ -560,12 +583,19 @@ subquery_planner(PlannerGlobal *glob, Query *parse, else rowMarks = root->rowMarks; + /*if here is a MERGE command, we need to plan the actions too*/ + if(parse->commandType == CMD_MERGE) + mergeAction = list_make1(merge_action_set_planner(root, plan)); + else + mergeAction = NIL; + plan = (Plan *) make_modifytable(parse->commandType, copyObject(root->resultRelations), list_make1(plan), returningLists, rowMarks, - SS_assign_special_param(root)); + mergeAction, + SS_assign_special_param(root)); } } @@ -586,6 +616,130 @@ subquery_planner(PlannerGlobal *glob, Query *parse, } /* +to generate the merge action set for the main plan. +Call this function after the grouping_planner() + +Only works for MERGE command +*/ +static MergeActionSet * +merge_action_set_planner(PlannerInfo *root, Plan *mainPlan) +{ + MergeActionSet *result; + Query *mainQry = root->parse; + ListCell *l; + PlannerInfo subroot; + + /*for non-merge command, no need to plan the merge actions*/ + if(mainQry->commandType != CMD_MERGE || + mainQry->mergeActQry == NIL) + return NULL; + + /*do a copy of the root info*/ + memcpy(&subroot, root, sizeof(PlannerInfo)); + + /*create the result node*/ + result = makeNode(MergeActionSet); + result->result_relation = mainQry->resultRelation; + result->actions = NIL; + + /*plan the actions one by one*/ + foreach(l, mainQry->mergeActQry) + { + ModifyTable *actplan; + + /*put the action query into the subroot*/ + subroot.parse = (Query *) lfirst(l); + actplan = merge_action_planner(&subroot, mainPlan); + result->actions = lappend(result->actions, actplan); + } + return result; +} + +/* create plan for a single merge action */ +static ModifyTable * +merge_action_planner(PlannerInfo *root, Plan *mainPlan) +{ + Query *parse = root->parse; + MergeAction *actplan; + ModifyTable *result; + + List *returningLists; + List *rowMarks; + + /* + * no having clause in a merge action + */ + Assert(parse->havingQual == NULL); + + /* + * Create the action plan node + */ + actplan = makeNode(MergeAction); + actplan->operation = parse->commandType; + actplan->matched = parse->matched; + + /* copy the cost from the top_plan */ + actplan->plan.startup_cost = mainPlan->startup_cost; + actplan->plan.total_cost = mainPlan->total_cost; + actplan->plan.plan_rows = mainPlan->plan_rows; + actplan->plan.plan_width = mainPlan->plan_width; + + /* + * Here, the quals expressions are flattened, which is accepted + * by deparse functions in EXPLAIN. + * But, these expressions will be processed by push_up_vars + * latterly and become not flat again. + * So, we need to keep a copy of current quals for explaining. + */ + actplan->flattenedqual = (List *) copyObject(parse->jointree->quals); + actplan->plan.qual = (List *)parse->jointree->quals; + + /* prepare the target list */ + if (parse->targetList) + { + actplan->plan.targetlist = preprocess_targetlist(root, + parse->targetList); + /*the target list should also be copied for EXPLAIN*/ + actplan->flattenedtlist = (List *) copyObject(actplan->plan.targetlist); + } + + /* + *In general situation, all the vars in target list and quals are flattened. + *But, we want them to point to the attributes of the top join plan, not to + *the subplans. So push them up again here. + */ + push_up_merge_action_vars(actplan, parse); + + if (parse->returningList) + { + List *rlist; + + Assert(parse->resultRelation); + rlist = set_returning_clause_references(root->glob, + parse->returningList, + &actplan->plan, + parse->resultRelation); + returningLists = list_make1(rlist); + } + else + returningLists = NIL; + + if (parse->rowMarks) + rowMarks = NIL; + else + rowMarks = root->rowMarks; + + result = make_modifytable(parse->commandType, + list_make1_int(parse->resultRelation), + list_make1(actplan), + returningLists, + rowMarks, + NIL, + SS_assign_special_param(root)); + return result; +} + +/* * preprocess_expression * Do subquery_planner's preprocessing work for an expression, * which can be a targetlist, a WHERE clause (including JOIN/ON @@ -730,6 +884,7 @@ inheritance_planner(PlannerInfo *root) List *returningLists = NIL; List *rtable = NIL; List *rowMarks; + List *mergeActSets = NIL; List *tlist; PlannerInfo subroot; ListCell *l; @@ -791,6 +946,66 @@ inheritance_planner(PlannerInfo *root) appinfo->child_relid); returningLists = lappend(returningLists, rlist); } + + /* + *For a merge command, we need to generate a set of action plans corresponding + *to current result relations. + *The adjust_appendrel_attrs() will not process the merge action list of + *the main query. And, We need to do this here for a MERGE command. + *In fact, no need to do a full query tree mutator on the action queries. + *only adjust the target list and quals. + */ + if (parse->commandType == CMD_MERGE) + { + ListCell *e; + MergeActionSet *maset; + + /* + *the parse in subroot now is a copy of the main query of current result relation + *Here we need to generate a copy of the action queries and shift their target table + *to current result relation + */ + subroot.parse->mergeActQry = NIL; + + foreach(e, parse->mergeActQry) + { + Query *actqry = (Query *)lfirst(e); + Query *newactqry = makeNode(Query); + + /*copy most of the common fields from original query*/ + *newactqry = *actqry; + + /*reset the result relation to current child table*/ + newactqry->resultRelation = subroot.parse->resultRelation; + + /*make the range table to be consistent with current main query*/ + newactqry->rtable = subroot.parse->rtable; + + /*adjust the target list*/ + newactqry->targetList = (List *) adjust_appendrel_attrs( + (Node *) actqry->targetList, + appinfo); + newactqry->targetList = adjust_inherited_tlist(newactqry->targetList, + appinfo); + /*and qual*/ + newactqry->jointree = makeNode(FromExpr); + newactqry->jointree->fromlist = subroot.parse->jointree->fromlist; + newactqry->jointree->quals = adjust_appendrel_attrs( + actqry->jointree->quals, + appinfo); + + /*put this new action query in to the action list of current main query*/ + subroot.parse->mergeActQry = lappend(subroot.parse->mergeActQry, newactqry); + } + + /* + * now we have a complete query (main query + action queries) that has been + *shifted to current result relation. Plan these actions here. + */ + maset = merge_action_set_planner(&subroot, subplan); + Assert(maset != NULL); + mergeActSets = lappend(mergeActSets, maset); + } } root->resultRelations = resultRelations; @@ -843,6 +1058,7 @@ inheritance_planner(PlannerInfo *root) subplans, returningLists, rowMarks, + mergeActSets, SS_assign_special_param(root)); } diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c index 2c97c71..73e832d 100644 --- a/src/backend/optimizer/prep/preptlist.c +++ b/src/backend/optimizer/prep/preptlist.c @@ -78,6 +78,76 @@ preprocess_targetlist(PlannerInfo *root, List *tlist) result_relation, range_table); /* + for MERGE command, we also need to expend the target list of the main query. + Note that, the target list of main query is the combination of attrs + from Source table and target table. We only want to expend the part + of target table. + + We do this in an aggressive way: + 1. Truncate the old target list, keep only the entries for source table + 2. expend the target list of result relation from an NIL list. + 3. Append this new list at the end of old target list. + */ + if (command_type == CMD_MERGE) + { + ListCell *l; + List *TLforResultRelatoin = NIL; + int new_resno; + + Assert(parse->sourceAttrNo > 0); + + tlist = list_truncate(tlist, parse->sourceAttrNo); + + TLforResultRelatoin = expand_targetlist(TLforResultRelatoin, command_type, + result_relation, range_table); + new_resno = parse->sourceAttrNo + 1; + foreach(l, TLforResultRelatoin) + { + TargetEntry *te = (TargetEntry *)lfirst(l); + te->resno = new_resno++; + } + + tlist = list_concat(tlist, TLforResultRelatoin); + } + + /* + * for "update" , "delete" and "merge" queries, add ctid of the result relation into + * the target list so that the ctid will propagate through execution and + * ExecutePlan() will be able to identify the right tuple to replace or + * delete. This extra field is marked "junk" so that it is not stored + * back into the tuple. + * + * BUT, if the query node is a merge action, + * we don't need to expand the ctid attribute in tlist. + * The tlist of the merge top level plan already contains + * a "ctid" junk attr of the target relation. + */ + if (!parse->isMergeAction && + (command_type == CMD_UPDATE || + command_type == CMD_DELETE || + command_type == CMD_MERGE)) + { + TargetEntry *tle; + Var *var; + + var = makeVar(result_relation, SelfItemPointerAttributeNumber, + TIDOID, -1, 0); + tle = makeTargetEntry((Expr *) var, + list_length(tlist) + 1, + pstrdup("ctid"), + true); + /* + * For an UPDATE, expand_targetlist already created a fresh tlist. For + * DELETE, better do a listCopy so that we don't destructively modify + * the original tlist (is this really necessary?). + */ + if (command_type == CMD_DELETE) + tlist = list_copy(tlist); + + tlist = lappend(tlist, tle); + } + + /* * Add necessary junk columns for rowmarked rels. These values are needed * for locking of rels selected FOR UPDATE/SHARE, and to do EvalPlanQual * rechecking. While we are at it, store these junk attnos in the @@ -305,6 +375,7 @@ expand_targetlist(List *tlist, int command_type, } break; case CMD_UPDATE: + case CMD_MERGE: if (!att_tup->attisdropped) { new_expr = (Node *) makeVar(result_relation, diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 0d3a739..2fda504 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -101,9 +101,6 @@ static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, static Node *adjust_appendrel_attrs_mutator(Node *node, AppendRelInfo *context); static Relids adjust_relid_set(Relids relids, Index oldrelid, Index newrelid); -static List *adjust_inherited_tlist(List *tlist, - AppendRelInfo *context); - /* * plan_set_operations @@ -1741,8 +1738,9 @@ adjust_relid_set(Relids relids, Index oldrelid, Index newrelid) * scribble on. * * Note that this is not needed for INSERT because INSERT isn't inheritable. + * But the INSERT actions in MERGE need this function */ -static List * +List * adjust_inherited_tlist(List *tlist, AppendRelInfo *context) { bool changed_it = false; diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c index 399f3a9..862dc5c 100644 --- a/src/backend/optimizer/util/var.c +++ b/src/backend/optimizer/util/var.c @@ -67,6 +67,16 @@ typedef struct bool inserted_sublink; /* have we inserted a SubLink? */ } flatten_join_alias_vars_context; +typedef struct +{ + int varno_source; + int varno_target; + int varno_join; + + int offset_source; + int offset_target; +} push_up_merge_action_vars_context; + static bool pull_varnos_walker(Node *node, pull_varnos_context *context); static bool pull_varattnos_walker(Node *node, Bitmapset **varattnos); @@ -83,6 +93,8 @@ static bool pull_var_clause_walker(Node *node, static Node *flatten_join_alias_vars_mutator(Node *node, flatten_join_alias_vars_context *context); static Relids alias_relid_set(PlannerInfo *root, Relids relids); +static bool push_up_merge_action_vars_walker(Node *node, + push_up_merge_action_vars_context *context); /* @@ -677,6 +689,79 @@ pull_var_clause_walker(Node *node, pull_var_clause_context *context) (void *) context); } +/* + * When prepare for the MERGE command, we have made a + * left join between the Source table and target table as the + * main plan. + * + * In this case, the range table contains ONLY THREE range table entries: + * 1. the source table, which may be a subquery or a plain table + * 2. the entry of the target table, which is a plain table + * 3. join expression with the source table and target table as its parameters. + * + * Each merge action of the command has its own query and + * plan nodes as well. And, the vars in its target list and qual + * expressions may refers to the attribute in any one of the above 3 + * range table entries. + * + * However, since the result tuple slots of merge actions are + * projected from the returned tuple of the join, we need to + * mapping the vars of source table and target table to their + * corresponding attributes in the third range table entry. + * + * This function does the opposite of the flatten_join_alias_vars() + * function. It walks through the target list and qual of a + * MergeAction plan, changes the vars' varno and varattno to the + * corresponding position in the upper level join RTE. + */ +void +push_up_merge_action_vars(MergeAction *actplan, Query *actqry) +{ + push_up_merge_action_vars_context context; + + context.varno_source = Merge_SourceTableRTindex; + context.varno_target = actqry->resultRelation; + context.varno_join = Merge_TopJoinTableRTindex; + context.offset_source = 0; + context.offset_target = actqry->sourceAttrNo; + + push_up_merge_action_vars_walker((Node *)actplan->plan.targetlist, + &context); + push_up_merge_action_vars_walker((Node *)actplan->plan.qual, + &context); +} + +static bool +push_up_merge_action_vars_walker(Node *node, + push_up_merge_action_vars_context *context) +{ + if (node == NULL) + return false; + if (IsA(node, Var)) + { + Var *var = (Var *)node; + + if(var->varno == context->varno_source) + { + var->varno = context->varno_join; + var->varattno += context->offset_source; + return false; + } + else if(var->varno == context->varno_target) + { + var->varno = context->varno_join; + var->varattno += context->offset_target; + return false; + } + else if(var->varno == context->varno_join) + return false; + else + elog(ERROR, "the vars in merge action tlist of qual should only belongs to the source table or target table"); + } + + return expression_tree_walker(node, push_up_merge_action_vars_walker, + (void *) context); +} /* * flatten_join_alias_vars diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index bb2bf04..73a4708 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -48,6 +48,7 @@ static Query *transformInsertStmt(ParseState *pstate, InsertStmt *stmt); static List *transformInsertRow(ParseState *pstate, List *exprlist, List *stmtcols, List *icolumns, List *attrnos); static int count_rowexpr_columns(ParseState *pstate, Node *expr); +static Query *transformMergeStmt(ParseState *pstate, MergeStmt *stmt); static Query *transformSelectStmt(ParseState *pstate, SelectStmt *stmt); static Query *transformValuesClause(ParseState *pstate, SelectStmt *stmt); static Query *transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt); @@ -176,6 +177,10 @@ transformStmt(ParseState *pstate, Node *parseTree) result = transformUpdateStmt(pstate, (UpdateStmt *) parseTree); break; + case T_MergeStmt: + result = transformMergeStmt(pstate, (MergeStmt *)parseTree); + break; + case T_SelectStmt: { SelectStmt *n = (SelectStmt *) parseTree; @@ -246,6 +251,7 @@ analyze_requires_snapshot(Node *parseTree) case T_DeleteStmt: case T_UpdateStmt: case T_SelectStmt: + case T_MergeStmt: result = true; break; @@ -299,12 +305,24 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt) qry->distinctClause = NIL; /* - * The USING clause is non-standard SQL syntax, and is equivalent in - * functionality to the FROM list that can be specified for UPDATE. The - * USING keyword is used rather than FROM because FROM is already a - * keyword in the DELETE syntax. + * The input stmt could be a MergeDelete node. + * In this case, we don't need the process on range table. */ - transformFromClause(pstate, stmt->usingClause); + if (!stmt->isMergeAction) + { + /* set up range table with just the result rel */ + qry->resultRelation = setTargetTable(pstate, stmt->relation, + interpretInhOption(stmt->relation->inhOpt), + true, + ACL_DELETE); + /* + * The USING clause is non-standard SQL syntax, and is equivalent in + * functionality to the FROM list that can be specified for UPDATE. The + * USING keyword is used rather than FROM because FROM is already a + * keyword in the DELETE syntax. + */ + transformFromClause(pstate, stmt->usingClause); + } qual = transformWhereClause(pstate, stmt->whereClause, "WHERE"); @@ -369,6 +387,8 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt) * VALUES clause. If we have any of those, treat it as a general SELECT; * so it will work, but you can't use DEFAULT items together with those. */ + + /* a MergeInsert statement is always a VALUES clause*/ isGeneralSelect = (selectStmt && (selectStmt->valuesLists == NIL || selectStmt->sortClause != NIL || selectStmt->limitOffset != NULL || @@ -407,7 +427,8 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt) * mentioned in the SELECT part. Note that the target table is not added * to the joinlist or namespace. */ - qry->resultRelation = setTargetTable(pstate, stmt->relation, + if (!stmt->isMergeAction) /* for MergeInsert, no need to do this */ + qry->resultRelation = setTargetTable(pstate, stmt->relation, false, false, ACL_INSERT); /* Validate stmt->cols list, or build default list if no list given */ @@ -1802,16 +1823,18 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt) qry->cteList = transformWithClause(pstate, stmt->withClause); } - qry->resultRelation = setTargetTable(pstate, stmt->relation, - interpretInhOption(stmt->relation->inhOpt), - true, - ACL_UPDATE); - - /* - * the FROM clause is non-standard SQL syntax. We used to be able to do - * this with REPLACE in POSTQUEL so we keep the feature. - */ - transformFromClause(pstate, stmt->fromClause); + if (!stmt->isMergeAction) /* for MergeUpdate, no need to do this */ + { + qry->resultRelation = setTargetTable(pstate, stmt->relation, + interpretInhOption(stmt->relation->inhOpt), + true, + ACL_UPDATE); + /* + * the FROM clause is non-standard SQL syntax. We used to be able to do + * this with REPLACE in POSTQUEL so we keep the feature. + */ + transformFromClause(pstate, stmt->fromClause); + } qry->targetList = transformTargetList(pstate, stmt->targetList); @@ -2313,3 +2336,329 @@ applyLockingClause(Query *qry, Index rtindex, rc->pushedDown = pushedDown; qry->rowMarks = lappend(qry->rowMarks, rc); } + +/* + * transform an action of merge command into a query. + * No change of the pstate range table is allowed in this function. + */ +static Query * +transformMergeActions(ParseState *pstate, MergeStmt *stmt, + MergeConditionAction *condact) +{ + Query *actqry; + + /* + * First, we need to make sure that DELETE and UPDATE + * actions are only taken in MATCHED condition, + * and INSERTs are only taken when not MATCHED + */ + switch(condact->action->type) + { + case T_DeleteStmt:/*a delete action*/ + { + DeleteStmt *deleteact = (DeleteStmt *) condact->action; + Assert(deleteact->isMergeAction); + + if (!condact->match) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("The DELETE action in MERGE command is not allowed when NOT MATCHED"))); + + /*put new right code to the result relation. + This line chages the RTE in range table directly*/ + pstate->p_target_rangetblentry->requiredPerms |= ACL_DELETE; + + deleteact->relation = stmt->relation; + deleteact->usingClause = list_make1(stmt->source); + deleteact->whereClause = condact->condition; + + /*parse the action query*/ + actqry = transformStmt(pstate, (Node *) deleteact); + + if (!IsA(actqry, Query) || + actqry->commandType != CMD_DELETE || + actqry->utilityStmt != NULL) + elog(ERROR, "improper DELETE action in merge stmt"); + + actqry->isMergeAction = true; + actqry->matched = condact->match; + + return actqry; + } + break; + case T_UpdateStmt:/*an update action*/ + { + UpdateStmt *updateact = (UpdateStmt *) condact->action; + Assert(updateact->isMergeAction); + + if (!condact->match) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("The UPDATE action in MERGE command is not allowed when NOT MATCHED"))); + + pstate->p_target_rangetblentry->requiredPerms |= ACL_UPDATE; + + /*the "targetlist" of the updateact is filled in the parser */ + updateact->relation = stmt->relation; + updateact->fromClause = list_make1(stmt->source); + updateact->whereClause = condact->condition; + + /*parse the action query*/ + actqry = transformStmt(pstate, (Node *)updateact); + + if (!IsA(actqry, Query) || + actqry->commandType != CMD_UPDATE|| + actqry->utilityStmt != NULL) + elog(ERROR, "improper UPDATE action in merge stmt"); + + actqry->isMergeAction = true; + actqry->matched = condact->match; + + return actqry; + } + break; + case T_InsertStmt:/*an insert action*/ + { + InsertStmt *insertact = (InsertStmt *) condact->action; + Assert(insertact->isMergeAction); + + if(condact->match) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("The INSERT action in MERGE command is not allowed when MATCHED"))); + + pstate->p_target_rangetblentry->requiredPerms |= ACL_INSERT; + + /*the "cols" and "selectStmt" of the insertact is filled in the parser */ + insertact->relation = stmt->relation; + + /* + the merge insert action has a strange feature. + In an ordinary INSERT, the VALUES list can only + contains constants and DEFAULT. (am I right??) + But in the INSERT action of MERGE command, + the VALUES list can have expressions with + variables(attributes of the target and source tables). + Besides, in the ordinary INSERT, a VALUES list can + never be followed by a WHERE clause. + But in MERGE INSERT action, there are matching conditions. + + Thus, the output qry of this function is an INSERT + query in the style of "INSERT...VALUES...", + except that we have other range tables and a WHERE clause. + Note that it is also different from the "INSERT ... SELECT..." + query, in which the whole SELECT is a subquery. + (We don't have subquery here). + + We construct this novel query structure in order + to keep consistency with other merge action types + (DELETE, UPDATE). In this way, all the merge action + queries are in fact share the very same Range Table, + They only differs in their target lists and join trees + */ + + /*parse the action query, this will call + transformInsertStmt() which analyzes the VALUES list.*/ + actqry = transformStmt(pstate, (Node *)insertact); + + /*do the WHERE clause here, Since the + transformInsertStmt() function only analyzes + the VALUES list but not the WHERE clause*/ + actqry->jointree = makeFromExpr(pstate->p_joinlist, + transformWhereClause(pstate, + condact->condition, + "WHERE")); + if(!IsA(actqry, Query) || + actqry->commandType != CMD_INSERT|| + actqry->utilityStmt != NULL) + elog(ERROR, "improper INSERT action in merge stmt"); + + actqry->isMergeAction = true; + actqry->matched = condact->match; + + return actqry; + } + break; + case T_MergeDoNothing: + { + MergeDoNothing *nothingact; + + nothingact = (MergeDoNothing *)(condact->action); + + Assert(IsA(nothingact,MergeDoNothing)); + + actqry = makeNode(Query); + + actqry->jointree = makeFromExpr(pstate->p_joinlist, + transformWhereClause(pstate, + condact->condition, + "WHERE")); + actqry->rtable = pstate->p_rtable; + + actqry->commandType = CMD_DONOTHING; + actqry->isMergeAction = true; + actqry->matched = condact->match; + + return actqry; + } + break; + default: + elog(ERROR, "unknown MERGE action type %d", condact->action->type); + break; + } + + /*never comes here*/ + return NULL; +} + +static Query * +transformMergeStmt(ParseState *pstate, MergeStmt *stmt) +{ + Query *qry; + + ColumnRef *starRef; + ResTarget *starResTarget; + ListCell *act; + ListCell *l; + JoinExpr *joinexp; + RangeTblEntry *resultRTE = NULL; + RangeTblEntry *topjoinRTE = NULL; + RangeTblEntry *sourceRTE = NULL; + + /*firstly, create the output node structure*/ + qry = makeNode(Query); + qry->commandType = CMD_MERGE; + + /* + * What we are doing here is to create a query like + * "SELECT * FROM <source_rel> LEFT JOIN <target_rel> ON <match_condition>;" + * + * Note: + * 1. we set the "match condition" as the join qualification. + * The left join will scan both the matched and non-matched tuples. + * + * 2. a normal SELECT query has no "target relation". + * But here we need to set the target relation in query, + * like the UPDATE/DELETE/INSERT queries. + * So this is a left join SELECT with a "target table" in its range table. + * + * 3. We don't have a specific ACL level for Merge, here we just use + * ACL_SELECT. + * But we will add other ACL levels when handle each merge actions. + */ + + /* + * Before transforming the FROM clause, acquire write lock on the + * target relation. We don't want to add it to the range table yet, + * so we use setTargetTableLock() instead of setTargetTable(). + */ + setTargetTableLock(pstate, stmt->relation); + + /* + * Make the join expression on source table and target table, + * as the only element in FROM list + */ + joinexp = makeNode(JoinExpr); + joinexp->jointype = JOIN_LEFT; + joinexp->isNatural = FALSE; + joinexp->larg = stmt->source; + joinexp->rarg = (Node *)stmt->relation; + joinexp->quals = stmt->matchCondition; + + /* + * transform the FROM clause. The target relation and + * source relation will be added to the range table here. + */ + transformFromClause(pstate, list_make1(joinexp)); + + /* the targetList of the main query is "*" */ + starRef = makeNode(ColumnRef); + starRef->fields = list_make1(makeNode(A_Star)); + starRef->location = 1; + + starResTarget = makeNode(ResTarget); + starResTarget->name = NULL; + starResTarget->indirection = NIL; + starResTarget->val = (Node *)starRef; + starResTarget->location = 1; + + qry->targetList = transformTargetList(pstate, list_make1(starResTarget)); + + /* we don't need a WHERE clause here. Set it null. */ + qry->jointree = makeFromExpr(pstate->p_joinlist, NULL); + + /* + *The range table of a MERGE command query has only 3 RTE. + *RTE 1 is the source table, RTE 2 is the target table, RTE 3 is the left join + *between source and target table. + */ + qry->rtable = pstate->p_rtable; + + /*Set RTE 2 as the result relation of query*/ + qry->resultRelation = 2; + resultRTE = rt_fetch(qry->resultRelation, pstate->p_rtable); + if(resultRTE->relid != pstate->p_target_relation->rd_id) + elog(ERROR, "The target relation entry should be the second one in range table"); + + resultRTE->requiredPerms = ACL_SELECT; + resultRTE->inh = interpretInhOption(stmt->relation->inhOpt); + pstate->p_target_rangetblentry = resultRTE; + + /* + *we also need to find out how many attributes are there in the source table. + *There are many ways to do this. Here, I choose to scan the join var list of the + *top join table entry. And count how many vars belong to source table. + */ + topjoinRTE = rt_fetch(Merge_TopJoinTableRTindex, pstate->p_rtable); + Assert(topjoinRTE->jointype == JOIN_LEFT); + + qry->sourceAttrNo = 0; + foreach(l, topjoinRTE->joinaliasvars) + { + Var *jv = (Var *)lfirst(l); + + if(jv->varno == Merge_SourceTableRTindex) + qry->sourceAttrNo++; + } + /*lets do a simple check here*/ + sourceRTE = rt_fetch(Merge_SourceTableRTindex, pstate->p_rtable); + Assert(qry->sourceAttrNo == list_length(sourceRTE->eref->colnames)); + + /* + * For each action, transform it to a separate query. + * The action queries share the range table with the main query. + * + * In other words, in the extra conditions of the sub actions, + * we don't allow involvement of new tables + */ + qry->mergeActQry = NIL; + + foreach(act,stmt->actions) + { + MergeConditionAction *mca = (MergeConditionAction *) lfirst(act); + Query *actqry; + + /* transform the act (and its condition) as a single query. */ + actqry = transformMergeActions(pstate, stmt, mca); + + /* + * since we don't invoke setTargetTable() in transformMergeActions(), + * we need to set actqry->resultRelation here + */ + actqry->resultRelation = qry->resultRelation; + + /*record the source attr number in each action query, for latter use*/ + actqry->sourceAttrNo = qry->sourceAttrNo; + + /* put it into the list */ + qry->mergeActQry = lappend(qry->mergeActQry, actqry); + } + + /* + * set the sublink mark at the last. + * Thus, the sublink in actions will be counted in. + */ + qry->hasSubLinks = pstate->p_hasSubLinks; + + return qry; +} diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 609c472..f045abd 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -212,6 +212,11 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_ DeallocateStmt PrepareStmt ExecuteStmt DropOwnedStmt ReassignOwnedStmt AlterTSConfigurationStmt AlterTSDictionaryStmt + MergeStmt + +%type <node> opt_and_condition merge_condition_action merge_action +%type <boolean> opt_not +%type <list> merge_condition_action_list %type <node> select_no_parens select_with_parens select_clause simple_select values_clause @@ -483,7 +488,7 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_ DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP - EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT + EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ERROR_P ESCAPE EXCEPT EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT FALSE_P FAMILY FETCH FIRST_P FLOAT_P FOLLOWING FOR FORCE FOREIGN FORWARD @@ -506,7 +511,7 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_ LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOGIN_P - MAPPING MATCH MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE + MAPPING MATCH MATCHED MAXVALUE MERGE MINUTE_P MINVALUE MODE MONTH_P MOVE NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NOCREATEDB NOCREATEROLE NOCREATEUSER NOINHERIT NOLOGIN_P NONE NOSUPERUSER @@ -521,7 +526,7 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_ QUOTE - RANGE READ REAL REASSIGN RECHECK RECURSIVE REF REFERENCES REINDEX + RAISE RANGE READ REAL REASSIGN RECHECK RECURSIVE REF REFERENCES REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE @@ -730,6 +735,7 @@ stmt : | ListenStmt | LoadStmt | LockStmt + | MergeStmt | NotifyStmt | PrepareStmt | ReassignOwnedStmt @@ -7128,6 +7134,7 @@ ExplainableStmt: | InsertStmt | UpdateStmt | DeleteStmt + | MergeStmt | DeclareCursorStmt | CreateAsStmt | ExecuteStmt /* by default all are $$=$1 */ @@ -7476,6 +7483,112 @@ set_target_list: /***************************************************************************** * * QUERY: + * MERGE STATEMENT + * + *****************************************************************************/ + + +MergeStmt: + MERGE INTO relation_expr_opt_alias + USING table_ref + ON a_expr + merge_condition_action_list + { + MergeStmt *m = makeNode(MergeStmt); + + m->relation = $3; + m->source = $5; + m->matchCondition = $7; + m->actions = $8; + + $$ = (Node *)m; + } + ; + +merge_condition_action_list: + merge_condition_action + { $$ = list_make1($1); } + | merge_condition_action_list merge_condition_action + { $$ = lappend($1,$2); } + ; + +merge_condition_action: + WHEN opt_not MATCHED opt_and_condition THEN merge_action + { + MergeConditionAction *m = makeNode(MergeConditionAction); + + m->match = $2; + m->condition = $4; + m->action = $6; + + $$ = (Node *)m; + } + ; + +opt_and_condition: + AND a_expr { $$ = $2; } + | /*EMPTY*/ { $$ = NULL; } + ; + +opt_not: + NOT { $$ = false; } + | /*EMPTY*/ { $$ = true; } + ; + +merge_action: + DELETE_P + { + DeleteStmt *n = makeNode(DeleteStmt); + n->isMergeAction = true; + $$ = (Node *) n; + } + | UPDATE SET set_clause_list + { + UpdateStmt *n = makeNode(UpdateStmt); + n->targetList = $3; + n->isMergeAction = true; + + $$ = (Node *) n; + } + | INSERT values_clause + { + InsertStmt *n = makeNode(InsertStmt); + n->cols = NIL; + n->selectStmt = $2; + n->isMergeAction = true; + + $$ = (Node *) n; + } + + | INSERT '(' insert_column_list ')' values_clause + { + InsertStmt *n = makeNode(InsertStmt); + n->cols = $3; + n->selectStmt = $5; + n->isMergeAction = true; + + $$ = (Node *) n; + } + | INSERT DEFAULT VALUES + { + InsertStmt *n = makeNode(InsertStmt); + n->cols = NIL; + n->selectStmt = NULL; + n->isMergeAction = true; + + $$ = (Node *) n; + } + | DO NOTHING + { + $$ = (Node *) makeNode(MergeDoNothing); + } + ; + + + +/***************************************************************************** + * + * QUERY: * CURSOR STATEMENTS * *****************************************************************************/ @@ -11108,6 +11221,7 @@ unreserved_keyword: | ENCODING | ENCRYPTED | ENUM_P + | ERROR_P | ESCAPE | EXCLUDE | EXCLUDING @@ -11162,7 +11276,9 @@ unreserved_keyword: | LOGIN_P | MAPPING | MATCH + | MATCHED | MAXVALUE + | MERGE | MINUTE_P | MINVALUE | MODE @@ -11205,6 +11321,7 @@ unreserved_keyword: | PROCEDURAL | PROCEDURE | QUOTE + | RAISE | RANGE | READ | REASSIGN diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index 5e13fa4..d149f5e 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -214,6 +214,25 @@ setTargetTable(ParseState *pstate, RangeVar *relation, return rtindex; } +void +setTargetTableLock(ParseState *pstate, RangeVar *relation) +{ + + /* Close old target; this could only happen for multi-action rules */ + if (pstate->p_target_relation != NULL) + heap_close(pstate->p_target_relation, NoLock); + + /* + * Open target rel and grab suitable lock (which we will hold till end of + * transaction). + * + * free_parsestate() will eventually do the corresponding heap_close(), + * but *not* release the lock. + */ + pstate->p_target_relation = parserOpenTable(pstate, relation, + RowExclusiveLock); +} + /* * Simplify InhOption (yes/no/default) into boolean yes/no. * diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 31b2595..c6a34cc 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -2000,6 +2000,41 @@ RewriteQuery(Query *parsetree, List *rewrite_events) return rewritten; } +/*if the merge action type has already been processed by rewriter*/ +#define insert_rewrite (1<<0) +#define delete_rewrite (1<<1) +#define update_rewrite (1<<2) + +/*if the merge action type is fully replace by rules.*/ +#define insert_instead (1<<3) +#define delete_instead (1<<4) +#define update_instead (1<<5) + +#define merge_action_already_rewrite(acttype, flag) \ + ((acttype == CMD_INSERT && (flag & insert_rewrite)) || \ + (acttype == CMD_UPDATE && (flag & update_rewrite)) || \ + (acttype == CMD_DELETE && (flag & delete_rewrite))) + +#define set_action_rewrite(acttype, flag) \ + if(acttype == CMD_INSERT) \ + {flag |= insert_rewrite;}\ + else if(acttype == CMD_UPDATE) \ + {flag |= update_rewrite;}\ + else if(acttype == CMD_DELETE) \ + {flag |= delete_rewrite;} + +#define merge_action_instead(acttype, flag) \ + ((acttype == CMD_INSERT && (flag & insert_instead)) || \ + (acttype == CMD_UPDATE && (flag & update_instead)) || \ + (acttype == CMD_DELETE && (flag & delete_instead))) + +#define set_action_instead(acttype, flag)\ + if(acttype == CMD_INSERT) \ + {flag |= insert_instead;}\ + else if(acttype == CMD_UPDATE) \ + {flag |= update_instead;}\ + else if(acttype == CMD_DELETE) \ + {flag |= delete_instead;} /* * QueryRewrite - @@ -2025,7 +2060,148 @@ QueryRewrite(Query *parsetree) * * Apply all non-SELECT rules possibly getting 0 or many queries */ - querylist = RewriteQuery(parsetree, NIL); + if (parsetree->commandType == CMD_MERGE) + { + /* + * for MERGE, we have a set of action queries (not subquery). + * each of these action queries should rewritten with RewriteQuery(). + */ + ListCell *l; + int flag = 0; + List *pre_qry = NIL; + List *post_qry = NIL; + + querylist = NIL; + + /*rewrite the merge action queries one by one.*/ + foreach(l, parsetree->mergeActQry) + { + List *queryList4action = NIL; + Query *actionqry; + Query *q; + + actionqry = lfirst(l); + + /* no rewriting for DO NOTHING action*/ + if (actionqry->commandType == CMD_DONOTHING) + continue; + + /* + * if this kind of actions are fully replaced by rules, + * we change it into a DO NOTHING action + */ + if (merge_action_instead(actionqry->commandType, flag)) + { + /* + * Still need to call RewriteQuery(), + * since we need the process on target list and so on. + * BUT, the returned list is discarded + */ + RewriteQuery(actionqry, NIL); + actionqry->commandType = CMD_DONOTHING; + actionqry->targetList = NIL; + continue; + } + + /* if this kind of actions are already processed by rewriter, skip it.*/ + if (merge_action_already_rewrite(actionqry->commandType, flag)) + { + RewriteQuery(actionqry, NIL); + continue; + } + + /* ok this action has not been processed before, let's do it now. */ + queryList4action = RewriteQuery(actionqry, NIL); + + /* this kind of actions has been processed, set the flag */ + set_action_rewrite(actionqry->commandType, flag); + + /* + * if the returning list is empty, this merge action + * is replaced by a do-nothing rule + */ + if (queryList4action == NIL) + { + /* set the flag for other merge actions of the same type */ + set_action_instead(actionqry->commandType, flag); + actionqry->commandType = CMD_DONOTHING; + actionqry->targetList = NIL; + continue; + } + + /* + * if the rewriter return a non-NIL list, the merge action query + * could be one element in it. + * if so, it must be the head (for INSERT action) + * or tail (for UPDATE/DELETE action). + */ + + /* test the list head */ + q = (Query *) linitial(queryList4action); + if (q->querySource == QSRC_ORIGINAL) + { + /* + * the merge action is the head, the remaining part + * of the list are the queries generated by rules + * we put them in the post_qry list. + */ + if (querylist == NIL) + querylist = list_make1(parsetree); + + queryList4action = list_delete_first(queryList4action); + post_qry = list_concat(post_qry,queryList4action); + + continue; + } + + /*test the list tail*/ + q = (Query *) llast(queryList4action); + if (q->querySource == QSRC_ORIGINAL) + { + /* + * the merge action is the tail. + * Put the rule queries in pre_qry list + */ + if (querylist == NIL) + querylist = list_make1(parsetree); + + queryList4action = list_truncate(queryList4action, + list_length(queryList4action) - 1); + + pre_qry = list_concat(pre_qry,queryList4action); + continue; + } + + /* + * here, the merge action query is not in the rewritten query list, + * which means the action is replaced by INSTEAD rule(s). + * We need to change it into do noting action. + * + * For a INSERT action, we put the rule queries in the post list + * otherwise, in the pre list + */ + if(actionqry->commandType == CMD_INSERT) + post_qry = list_concat(post_qry,queryList4action); + else + pre_qry = list_concat(pre_qry,queryList4action); + + set_action_instead(actionqry->commandType, flag); + actionqry->commandType = CMD_DONOTHING; + actionqry->targetList = NIL; + } + + /* + * finally, put the 3 lists into one. + * If all the merge actions are replaced by rules, + * the original merge query + * will not be involved in the querylist. + */ + querylist = list_concat(pre_qry,querylist); + querylist = list_concat(querylist, post_qry); + + } + else + querylist = RewriteQuery(parsetree, NIL); /* * Step 2 diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index cba90a9..375923e 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -979,7 +979,7 @@ exec_simple_query(const char *query_string) NULL, 0); plantree_list = pg_plan_queries(querytree_list, 0, NULL); - +//printf("the plann is \n%s\n", nodeToString(plantree_list)); /* Done with the snapshot used for parsing/planning */ if (snapshot_set) PopActiveSnapshot(); diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c index 8eb02da..52ed285 100644 --- a/src/backend/tcop/pquery.c +++ b/src/backend/tcop/pquery.c @@ -225,6 +225,10 @@ ProcessQuery(PlannedStmt *plan, snprintf(completionTag, COMPLETION_TAG_BUFSIZE, "DELETE %u", queryDesc->estate->es_processed); break; + case CMD_MERGE: + snprintf(completionTag, COMPLETION_TAG_BUFSIZE, + "MERGE %u", queryDesc->estate->es_processed); + break; default: strcpy(completionTag, "???"); break; diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index 75cb354..e583348 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -126,6 +126,7 @@ CommandIsReadOnly(Node *parsetree) case CMD_UPDATE: case CMD_INSERT: case CMD_DELETE: + case CMD_MERGE: return false; default: elog(WARNING, "unrecognized commandType: %d", @@ -1412,6 +1413,10 @@ CreateCommandTag(Node *parsetree) tag = "SELECT"; break; + case T_MergeStmt: + tag = "MERGE"; + break; + /* utility statements --- same whether raw or cooked */ case T_TransactionStmt: { @@ -2257,6 +2262,7 @@ GetCommandLogLevel(Node *parsetree) case T_InsertStmt: case T_DeleteStmt: case T_UpdateStmt: + case T_MergeStmt: lev = LOGSTMT_MOD; break; diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h index 30dc314..a41a169 100644 --- a/src/include/commands/trigger.h +++ b/src/include/commands/trigger.h @@ -178,6 +178,8 @@ extern void ExecBSTruncateTriggers(EState *estate, ResultRelInfo *relinfo); extern void ExecASTruncateTriggers(EState *estate, ResultRelInfo *relinfo); +extern void ExecBSMergeTriggers(ModifyTableState *mt_state); +extern void ExecASMergeTriggers(ModifyTableState *mt_state); extern void AfterTriggerBeginXact(void); extern void AfterTriggerBeginQuery(void); diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h index a8a28a4..96b46f0 100644 --- a/src/include/executor/nodeModifyTable.h +++ b/src/include/executor/nodeModifyTable.h @@ -16,6 +16,7 @@ #include "nodes/execnodes.h" extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags); +extern MergeActionState *ExecInitMergeAction(MergeAction *node, EState *estate, int eflags); extern TupleTableSlot *ExecModifyTable(ModifyTableState *node); extern void ExecEndModifyTable(ModifyTableState *node); extern void ExecReScanModifyTable(ModifyTableState *node); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 78188a3..19c0d43 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -1032,9 +1032,23 @@ typedef struct ModifyTableState int mt_whichplan; /* which one is being executed (0..n-1) */ EPQState mt_epqstate; /* for evaluating EvalPlanQual rechecks */ bool fireBSTriggers; /* do we need to fire stmt triggers? */ + MergeActionSet **mt_mergeActPstates; /*the list of the planstate of merge command actions. + NULL if this is not a merge command. + The elements if it are still MergeActionSet nodes. + But the action list in these nodes are ModifyTableState */ } ModifyTableState; /* ---------------- + * MergeActionState information + * ---------------- + */ +typedef struct MergeActionState +{ + PlanState ps; /* its first field is NodeTag */ + CmdType operation; +} MergeActionState; + +/* ---------------- * AppendState information * * nplans how many plans are in the array diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 15dabe3..293800c 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -44,6 +44,8 @@ typedef enum NodeTag T_Plan = 100, T_Result, T_ModifyTable, + T_MergeAction, + T_MergeActionSet, T_Append, T_MergeAppend, T_RecursiveUnion, @@ -87,6 +89,7 @@ typedef enum NodeTag T_PlanState = 200, T_ResultState, T_ModifyTableState, + T_MergeActionState, T_AppendState, T_MergeAppendState, T_RecursiveUnionState, @@ -350,6 +353,10 @@ typedef enum NodeTag T_AlterUserMappingStmt, T_DropUserMappingStmt, T_AlterTableSpaceOptionsStmt, + T_MergeStmt, + T_MergeConditionAction, + T_MergeDoNothing, + T_MergeError, T_SecLabelStmt, /* @@ -515,6 +522,8 @@ typedef enum CmdType CMD_UPDATE, /* update stmt */ CMD_INSERT, /* insert stmt */ CMD_DELETE, + CMD_MERGE, /* merge stmt */ + CMD_DONOTHING, /*DO NOTHING action in MERGE command*/ CMD_UTILITY, /* cmds like create, destroy, copy, vacuum, * etc. */ CMD_NOTHING /* dummy command for instead nothing rules diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index e0bdebd..3293b54 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -149,8 +149,24 @@ typedef struct Query List *constraintDeps; /* a list of pg_constraint OIDs that the query * depends on to be semantically valid */ + + /* fields for MERGE command */ + bool isMergeAction; /* if this query is a merge action. */ + bool matched; /* this is a MATCHED action or NOT */ + int sourceAttrNo; /*the number of attributes in source table*/ + List *mergeActQry; /* the list of all the merge actions. + * used only for merge query statement */ } Query; +/* +* In MERGE command, the initial MERGE query has only three range table entry +*The first one is the source table; The second one is the target table; and The +*third one is the left join entry of them. +*During the whole process of a MEREGE command, the rt_index for the source table +*entry and the join table entry will never change. We set them as constant here. +*/ +#define Merge_SourceTableRTindex 1 +#define Merge_TopJoinTableRTindex 3 /**************************************************************************** * Supporting data structures for Parse Trees @@ -892,6 +908,7 @@ typedef struct CommonTableExpr typedef struct InsertStmt { NodeTag type; + bool isMergeAction; /*if this is a merge insert action*/ RangeVar *relation; /* relation to insert into */ List *cols; /* optional: names of the target columns */ Node *selectStmt; /* the source SELECT/VALUES, or NULL */ @@ -906,6 +923,7 @@ typedef struct InsertStmt typedef struct DeleteStmt { NodeTag type; + bool isMergeAction; /*if this is a merge delete action*/ RangeVar *relation; /* relation to delete from */ List *usingClause; /* optional using clause for more tables */ Node *whereClause; /* qualifications */ @@ -920,6 +938,7 @@ typedef struct DeleteStmt typedef struct UpdateStmt { NodeTag type; + bool isMergeAction; /*if this is a merge delete action*/ RangeVar *relation; /* relation to update */ List *targetList; /* the target list (of ResTarget) */ Node *whereClause; /* qualifications */ @@ -996,6 +1015,54 @@ typedef struct SelectStmt /* Eventually add fields for CORRESPONDING spec here */ } SelectStmt; +/* ---------------------- + * Merge Statement + * ---------------------- + */ +typedef struct MergeStmt +{ + NodeTag type; + RangeVar *relation; /* target relation for merge */ + + /* + * source relations for the merge. + * Currently, we only allow single-source merge, + * so the length of this list should always be 1. + */ + Node *source; + Node *matchCondition; /* qualifications of the merge */ + + /* list of MergeConditionAction structure. + * It stores all the matched / not-matched + * conditions and the corresponding actions + * The elements of this list are MergeConditionAction + * nodes. + */ + List *actions; +} MergeStmt; + +/* + * the structure for the actions of MERGE command. + * Holds info of the clauses like + * "WHEN MATCHED AND ... THEN UPDATE/DELETE/INSERT" + */ +typedef struct MergeConditionAction +{ + NodeTag type; + bool match; /* WHEN MATCHED or WHEN NOT MATCHED? */ + Node *condition; /* the AND condition for this action */ + Node *action; /* the actions: delete, insert or update */ +} MergeConditionAction; + +typedef struct MergeDoNothing +{ + NodeTag type; +} MergeDoNothing; + +typedef struct MergeError +{ + NodeTag type; +} MergeError; /* ---------------------- * Set Operation node for post-analysis query trees diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 81038f6..6791b34 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -169,9 +169,38 @@ typedef struct ModifyTable List *returningLists; /* per-target-table RETURNING tlists */ List *rowMarks; /* PlanRowMarks (non-locking only) */ int epqParam; /* ID of Param for EvalPlanQual re-eval */ + List *mergeActPlan; /* the plans for merge actions, + * which are MergeActionSet nodes. + * one set for one target relation */ } ModifyTable; /* ---------------- + * MergeActionSet node - + * The node contains all actions of MERGE command for one specific target relation + * ---------------- + */ +typedef struct MergeActionSet +{ + NodeTag type; + int result_relation; + List *actions; +}MergeActionSet; + +/* ---------------- + * MergeAction node - + * The plan node for the actions of MERGE command + * ---------------- + */ +typedef struct MergeAction +{ + Plan plan; + CmdType operation; /* INSERT, UPDATE, DELETE, DO_NOTHING or RAISE_ERR */ + bool matched; /* is this a MATCHED or NOT MATCHED rule? */ + List *flattenedqual; /* the flattened qual expression of action */ + List *flattenedtlist;/*the fattened target list*/ +} MergeAction; + +/* ---------------- * Append node - * Generate the concatenation of the results of sub-plans. * ---------------- diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 5bb0e09..3e8bd0f 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -80,7 +80,7 @@ extern Result *make_result(PlannerInfo *root, List *tlist, Node *resconstantqual, Plan *subplan); extern ModifyTable *make_modifytable(CmdType operation, List *resultRelations, List *subplans, List *returningLists, - List *rowMarks, int epqParam); + List *rowMarks, List *mergeActPlans, int epqParam); extern bool is_projection_capable_plan(Plan *plan); /* diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index f8dd542..9ed2a60 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -53,4 +53,6 @@ extern void expand_inherited_tables(PlannerInfo *root); extern Node *adjust_appendrel_attrs(Node *node, AppendRelInfo *appinfo); +extern List *adjust_inherited_tlist(List * tlist, AppendRelInfo * context); + #endif /* PREP_H */ diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h index 8fd9977..e9f4e17 100644 --- a/src/include/optimizer/var.h +++ b/src/include/optimizer/var.h @@ -15,6 +15,7 @@ #define VAR_H #include "nodes/relation.h" +#include "nodes/plannodes.h" typedef enum { @@ -32,5 +33,6 @@ extern int locate_var_of_relation(Node *node, int relid, int levelsup); extern int find_minimum_var_level(Node *node); extern List *pull_var_clause(Node *node, PVCPlaceHolderBehavior behavior); extern Node *flatten_join_alias_vars(PlannerInfo *root, Node *node); +extern void push_up_merge_action_vars(MergeAction *actplan, Query *actqry); #endif /* VAR_H */ diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index d3ea04b..0863234 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -142,6 +142,7 @@ PG_KEYWORD("encoding", ENCODING, UNRESERVED_KEYWORD) PG_KEYWORD("encrypted", ENCRYPTED, UNRESERVED_KEYWORD) PG_KEYWORD("end", END_P, RESERVED_KEYWORD) PG_KEYWORD("enum", ENUM_P, UNRESERVED_KEYWORD) +PG_KEYWORD("error", ERROR_P, UNRESERVED_KEYWORD) PG_KEYWORD("escape", ESCAPE, UNRESERVED_KEYWORD) PG_KEYWORD("except", EXCEPT, RESERVED_KEYWORD) PG_KEYWORD("exclude", EXCLUDE, UNRESERVED_KEYWORD) @@ -231,7 +232,9 @@ PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD) PG_KEYWORD("login", LOGIN_P, UNRESERVED_KEYWORD) PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD) PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD) +PG_KEYWORD("matched", MATCHED, UNRESERVED_KEYWORD) PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD) +PG_KEYWORD("merge", MERGE, UNRESERVED_KEYWORD) PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD) PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD) PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD) @@ -298,6 +301,7 @@ PG_KEYWORD("privileges", PRIVILEGES, UNRESERVED_KEYWORD) PG_KEYWORD("procedural", PROCEDURAL, UNRESERVED_KEYWORD) PG_KEYWORD("procedure", PROCEDURE, UNRESERVED_KEYWORD) PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD) +PG_KEYWORD("raise", RAISE, UNRESERVED_KEYWORD) PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD) PG_KEYWORD("read", READ, UNRESERVED_KEYWORD) PG_KEYWORD("real", REAL, COL_NAME_KEYWORD) diff --git a/src/include/parser/parse_clause.h b/src/include/parser/parse_clause.h index 0e8a80e..831421b 100644 --- a/src/include/parser/parse_clause.h +++ b/src/include/parser/parse_clause.h @@ -19,6 +19,7 @@ extern void transformFromClause(ParseState *pstate, List *frmList); extern int setTargetTable(ParseState *pstate, RangeVar *relation, bool inh, bool alsoSource, AclMode requiredPerms); +extern void setTargetTableLock(ParseState *pstate, RangeVar *relation); extern bool interpretInhOption(InhOption inhOpt); extern bool interpretOidsOption(List *defList); diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out new file mode 100644 index 0000000..7ee155a --- /dev/null +++ b/src/test/regress/expected/merge.out @@ -0,0 +1,202 @@ +-- MERGE -- +-- +-- create basic tables for test, and insert some source rows to work from +-- +--The Stock table, which records the amount of each item on hand. +CREATE TABLE Stock(item_id int UNIQUE, balance int); +NOTICE: CREATE TABLE / UNIQUE will create implicit index "stock_item_id_key" for table "stock" +INSERT INTO Stock VALUES (10, 2200); +INSERT INTO Stock VALUES (20, 1900); +SELECT * FROM Stock; + item_id | balance +---------+--------- + 10 | 2200 + 20 | 1900 +(2 rows) + +--The Buy table, which records the amount we bought today for each item. +CREATE TABLE Buy(item_id int, volume int); +INSERT INTO Buy values(10, 1000); +INSERT INTO Buy values(30, 300); +SELECT * FROM Buy; + item_id | volume +---------+-------- + 10 | 1000 + 30 | 300 +(2 rows) + +--The Sale table, which records the amount we sold today for each item. +CREATE TABLE Sale(item_id int, volume int); +INSERT INTO Sale VALUES (10, 2200); +INSERT INTO Sale VALUES (20, 1000); +SELECT * FROM Sale; + item_id | volume +---------+-------- + 10 | 2200 + 20 | 1000 +(2 rows) + +-- +-- initial queries +-- +-- do a simple equivalent of an UPDATE join +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 10 | 3200 + 20 | 1900 +(2 rows) + +ROLLBACK; +-- do a simple equivalent of an INSERT SELECT +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume) +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 10 | 2200 + 20 | 1900 + 30 | 300 +(3 rows) + +ROLLBACK; +-- now the classic UPSERT +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume + WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume) +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 10 | 3200 + 20 | 1900 + 30 | 300 +(3 rows) + +ROLLBACK; +-- +-- Non-standard functionality +-- +-- a MERGE with DELETE action, which is not allowed in Standard. +-- Extra qualifications are allowed in each WHEN clause. +BEGIN; +MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id + WHEN MATCHED AND balance - volume > 0 THEN UPDATE SET balance = balance - volume + WHEN MATCHED THEN DELETE; +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 20 | 900 +(1 row) + +ROLLBACK; +-- The DO NOTHING action is another extension for MERGE. +-- rows specified by DO NOTHING are ignored +BEGIN; +MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id + WHEN MATCHED AND balance - volume > 0 THEN DO NOTHING + WHEN MATCHED THEN DELETE +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 20 | 1900 +(1 row) + +ROLLBACK; +-- DO NOTHING is the default action for non-matching rows that do not +-- activate any WHEN clause. They are just ignored +BEGIN; +MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id + WHEN MATCHED AND balance - volume > 100000 THEN UPDATE SET balance = balance - volume +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 10 | 2200 + 20 | 1900 +(2 rows) + +ROLLBACK; +-- Prepare the test data to generate multiple matching rows for a single target +INSERT INTO Buy values(10, 400); +SELECT * FROM Buy ORDER BY item_id; + item_id | volume +---------+-------- + 10 | 1000 + 10 | 400 + 30 | 300 +(3 rows) + +-- we now have a duplicate key in Buy, so when we join to +-- Stock we will generate 2 matching rows, not one. +-- According to standard this command should fail. +-- But it succeeds in PostgreSQL implementation by simply ignoring the second +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 10 | 3200 + 20 | 1900 +(2 rows) + +ROLLBACK; +-- +-- More complicated query +-- +-- The source table of MERGE could be a SELECT clause +BEGIN; +MERGE INTO Stock USING + (SELECT Buy.item_id, (Buy.volume - Sale.volume) as v + FROM Buy, Sale + WHERE Buy.item_id = Sale.item_id) + AS BS + ON Stock.item_id = BS.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + BS.v +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 10 | 1000 + 20 | 1900 +(2 rows) + +ROLLBACK; +-- Subplan/sublinks can be used in MERGE actions +-- Create a table for sublink. +CREATE TABLE Extra (item_id int, volume int); +INSERT INTO Extra VALUES (10, 20); +INSERT INTO Extra VALUES (10, -7); +INSERT INTO Extra VALUES (20, 16); +INSERT INTO Extra VALUES (20, 5); +INSERT INTO Extra VALUES (30, 9); +-- The following query sum-up the volumes in Extra table and upinsert the Stock. +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET + balance = balance + Buy.volume + + (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id) + WHEN NOT MATCHED THEN INSERT VALUES + (Buy.item_id, Buy.volume + + (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id)) +; +SELECT * FROM Stock ORDER BY item_id; + item_id | balance +---------+--------- + 10 | 3213 + 20 | 1900 + 30 | 309 +(3 rows) + +ROLLBACK; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 3b99e86..9c2d298 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -91,7 +91,7 @@ test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo # NB: temp.sql does a reconnect which transiently uses 2 connections, # so keep this parallel group to at most 19 tests # ---------- -test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequencepolymorphism rowtypes returning largeobject with xml +test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequencepolymorphism rowtypes returning largeobject with xml merge # run stats by itself because its delay may be insufficient under heavy load test: stats diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule index b348f0e..fde54d3 100644 --- a/src/test/regress/serial_schedule +++ b/src/test/regress/serial_schedule @@ -124,4 +124,5 @@ test: returning test: largeobject test: with test: xml +test: merge test: stats diff --git a/src/test/regress/sql/merge.sql b/src/test/regress/sql/merge.sql new file mode 100644 index 0000000..5cdb5db --- /dev/null +++ b/src/test/regress/sql/merge.sql @@ -0,0 +1,135 @@ +-- MERGE -- + +-- +-- create basic tables for test, and insert some source rows to work from +-- +--The Stock table, which records the amount of each item on hand. +CREATE TABLE Stock(item_id int UNIQUE, balance int); +INSERT INTO Stock VALUES (10, 2200); +INSERT INTO Stock VALUES (20, 1900); +SELECT * FROM Stock; + +--The Buy table, which records the amount we bought today for each item. +CREATE TABLE Buy(item_id int, volume int); +INSERT INTO Buy values(10, 1000); +INSERT INTO Buy values(30, 300); +SELECT * FROM Buy; + +--The Sale table, which records the amount we sold today for each item. +CREATE TABLE Sale(item_id int, volume int); +INSERT INTO Sale VALUES (10, 2200); +INSERT INTO Sale VALUES (20, 1000); +SELECT * FROM Sale; + +-- +-- initial queries +-- +-- do a simple equivalent of an UPDATE join +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- do a simple equivalent of an INSERT SELECT +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume) +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- now the classic UPSERT +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume + WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume) +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- +-- Non-standard functionality +-- +-- a MERGE with DELETE action, which is not allowed in Standard. +-- Extra qualifications are allowed in each WHEN clause. +BEGIN; +MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id + WHEN MATCHED AND balance - volume > 0 THEN UPDATE SET balance = balance - volume + WHEN MATCHED THEN DELETE; +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- The DO NOTHING action is another extension for MERGE. +-- rows specified by DO NOTHING are ignored +BEGIN; +MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id + WHEN MATCHED AND balance - volume > 0 THEN DO NOTHING + WHEN MATCHED THEN DELETE +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- DO NOTHING is the default action for non-matching rows that do not +-- activate any WHEN clause. They are just ignored +BEGIN; +MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id + WHEN MATCHED AND balance - volume > 100000 THEN UPDATE SET balance = balance - volume +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- Prepare the test data to generate multiple matching rows for a single target +INSERT INTO Buy values(10, 400); +SELECT * FROM Buy ORDER BY item_id; + +-- we now have a duplicate key in Buy, so when we join to +-- Stock we will generate 2 matching rows, not one. +-- According to standard this command should fail. +-- But it succeeds in PostgreSQL implementation by simply ignoring the second +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- +-- More complicated query +-- +-- The source table of MERGE could be a SELECT clause +BEGIN; +MERGE INTO Stock USING + (SELECT Buy.item_id, (Buy.volume - Sale.volume) as v + FROM Buy, Sale + WHERE Buy.item_id = Sale.item_id) + AS BS + ON Stock.item_id = BS.item_id + WHEN MATCHED THEN UPDATE SET balance = balance + BS.v +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK; + +-- Subplan/sublinks can be used in MERGE actions +-- Create a table for sublink. +CREATE TABLE Extra (item_id int, volume int); +INSERT INTO Extra VALUES (10, 20); +INSERT INTO Extra VALUES (10, -7); +INSERT INTO Extra VALUES (20, 16); +INSERT INTO Extra VALUES (20, 5); +INSERT INTO Extra VALUES (30, 9); + +-- The following query sum-up the volumes in Extra table and upinsert the Stock. +BEGIN; +MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id + WHEN MATCHED THEN UPDATE SET + balance = balance + Buy.volume + + (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id) + WHEN NOT MATCHED THEN INSERT VALUES + (Buy.item_id, Buy.volume + + (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id)) +; +SELECT * FROM Stock ORDER BY item_id; +ROLLBACK;
I think that MERGE is supposed to trigger one rule for each row in the source data. So: On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote: > MERGE INTO Stock t > USING (SELECT * FROM Stock WHERE item_id=10) AS s > ON s.item_id=t.item_id > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 > WHEN NOT MATCHED THEN INSERT VALUES (10,1) > ; > > This works fine, and updates the matching row: > > item_id | balance > ---------+--------- > 20 | 1900 > 10 | 2201 Here you have one row of source data, and you got one action (the WHEN MATCHED case). > But if I give it a key that doesn't exist instead: > > MERGE INTO Stock t > USING (SELECT * FROM Stock WHERE item_id=30) AS s > ON s.item_id=t.item_id > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 > WHEN NOT MATCHED THEN INSERT VALUES (30,1) > ; > > This doesn't execute the NOT MATCHED case and INSERT the way I expected it > to. It just gives back "MERGE 0". Here you have no rows of source data (the USING (SELECT ...) doesn't return anything, since no rows exist) so nothing happens. > Since I wasn't sure if the whole "subquery in the USING clause" case was > really implemented fully, I then tried to do this with something more like > the working regression test examples. I expected this to do the same thing > as the first example: > > MERGE INTO Stock t > USING Stock s > ON s.item_id=10 AND s.item_id=t.item_id > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 > WHEN NOT MATCHED THEN INSERT VALUES (10,1) > ; > > But it gives back this: > > ERROR: duplicate key value violates unique constraint "stock_item_id_key" > DETAIL: Key (item_id)=(10) already exists. Here you have two rows of source data. The ON clause represents the join condition. The item_id=10 row matches - so you get an update, presumably, though we can't see that as things turn out - and the item_id=20 row doesn't match - so you try to insert (10, 1), which is a duplicate key, thus the error. > Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of > things to generate that. But it doesn't work any better if you give it an > example that doesn't exist: > > MERGE INTO Stock t > USING Stock s > ON s.item_id=30 AND s.item_id=t.item_id > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 > WHEN NOT MATCHED THEN INSERT VALUES (30,1) > ; > > ERROR: duplicate key value violates unique constraint "stock_item_id_key" > DETAIL: Key (item_id)=(30) already exists. In this case neither row of the source data matches the join condition (s.item_id=30 might as well be constant FALSE as far as the test data is concerned) so you attempt to execute the NOT MATCHED side twice. So this one also looks correct to me. > The other thing I noticed that may take some work to sort out is that I > haven't had any luck getting MERGE to execute from within a plpgsql > function. I was hoping I could use this to update the pgbench tables: Good catch. Considering the size of this patch, I have no problem leaving this to the eventual committer to fix, or to a subsequent commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
leaving this to the eventual committer to fix, or to a subsequent
commit.
--
The Enterprise PostgreSQL Company
I think that MERGE is supposed to trigger one rule for each row in the
source data. So:Here you have one row of source data, and you got one action (the WHEN
On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> MERGE INTO Stock t
> USING (SELECT * FROM Stock WHERE item_id=10) AS s
> ON s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (10,1)
> ;
>
> This works fine, and updates the matching row:
>
> item_id | balance
> ---------+---------
> 20 | 1900
> 10 | 2201
MATCHED case).Here you have no rows of source data (the USING (SELECT ...) doesn't
> But if I give it a key that doesn't exist instead:
>
> MERGE INTO Stock t
> USING (SELECT * FROM Stock WHERE item_id=30) AS s
> ON s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (30,1)
> ;
>
> This doesn't execute the NOT MATCHED case and INSERT the way I expected it
> to. It just gives back "MERGE 0".
return anything, since no rows exist) so nothing happens.
Yes.
The MERGE process is based on a left join between the source table and target table.
Since here the source table is empty, no join is carried, and thus no MERGE action is taken.
But, is it correct logically? I mean, should we insert some rows in the above example rather than do nothing?
> Since I wasn't sure if the whole "subquery in the USING clause" case wasHere you have two rows of source data. The ON clause represents the
> really implemented fully, I then tried to do this with something more like
> the working regression test examples. I expected this to do the same thing
> as the first example:
>
> MERGE INTO Stock t
> USING Stock s
> ON s.item_id=10 AND s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (10,1)
> ;
>
> But it gives back this:
>
> ERROR: duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL: Key (item_id)=(10) already exists.
join condition. The item_id=10 row matches - so you get an update,
presumably, though we can't see that as things turn out - and the
item_id=20 row doesn't match - so you try to insert (10, 1), which is
a duplicate key, thus the error.In this case neither row of the source data matches the join condition
> Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of
> things to generate that. But it doesn't work any better if you give it an
> example that doesn't exist:
>
> MERGE INTO Stock t
> USING Stock s
> ON s.item_id=30 AND s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (30,1)
> ;
>
> ERROR: duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL: Key (item_id)=(30) already exists.
(s.item_id=30 might as well be constant FALSE as far as the test data
is concerned) so you attempt to execute the NOT MATCHED side twice.
So this one also looks correct to me.
Yes, that is what happened in the above two examples.
> The other thing I noticed that may take some work to sort out is that I
> haven't had any luck getting MERGE to execute from within a plpgsql
> function. I was hoping I could use this to update the pgbench tables:
Good catch. Considering the size of this patch, I have no problem> haven't had any luck getting MERGE to execute from within a plpgsql
> function. I was hoping I could use this to update the pgbench tables:
leaving this to the eventual committer to fix, or to a subsequent
commit.
--
The Enterprise PostgreSQL Company
On Mon, Oct 18, 2010 at 10:09 AM, Boxuan Zhai <bxzhai2010@gmail.com> wrote: > > > On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> I think that MERGE is supposed to trigger one rule for each row in the >> source data. So: >> >> On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> > MERGE INTO Stock t >> > USING (SELECT * FROM Stock WHERE item_id=10) AS s >> > ON s.item_id=t.item_id >> > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 >> > WHEN NOT MATCHED THEN INSERT VALUES (10,1) >> > ; >> > >> > This works fine, and updates the matching row: >> > >> > item_id | balance >> > ---------+--------- >> > 20 | 1900 >> > 10 | 2201 >> >> Here you have one row of source data, and you got one action (the WHEN >> MATCHED case). >> >> > But if I give it a key that doesn't exist instead: >> > >> > MERGE INTO Stock t >> > USING (SELECT * FROM Stock WHERE item_id=30) AS s >> > ON s.item_id=t.item_id >> > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 >> > WHEN NOT MATCHED THEN INSERT VALUES (30,1) >> > ; >> > >> > This doesn't execute the NOT MATCHED case and INSERT the way I expected >> > it >> > to. It just gives back "MERGE 0". >> >> Here you have no rows of source data (the USING (SELECT ...) doesn't >> return anything, since no rows exist) so nothing happens. >> > > Yes. > The MERGE process is based on a left join between the source table and > target table. > Since here the source table is empty, no join is carried, and thus no MERGE > action is taken. > But, is it correct logically? I mean, should we insert some rows in the > above example rather than do nothing? I don't think so. I think the right way to write UPSERT is something along the lines of: MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON s.item_id = t.item_id ... (untested) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,
I considered the empty source table situation again. I think it is correct to upsert nothing in this case.
Back to the original logic of MERGE command, it is main purpose is to add the supplementary data from the source table into the target table. Thus, an empty source table means no input data is available, so no upsert is needed in target table.
Regards,
Boxuan
Robert Haas wrote: > I think the right way to write UPSERT is something > along the lines of: > > MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON > s.item_id = t.item_id ... > That led in the right direction, after a bit more fiddling I was finally able to get something that does what I wanted: a single table UPSERT implemented with this MERGE implementation. Here's a log of a test session, suitable for eventual inclusion in the regression tests: CREATE TABLE Stock(item_id int UNIQUE, balance int); INSERT INTO Stock VALUES (10, 2200); INSERT INTO Stock VALUES (20, 1900); SELECT * FROM Stock ORDER BY item_id; item_id | balance ---------+--------- 10 | 2200 20 | 1900 MERGE INTO Stock tUSING (VALUES(10,100)) AS s(item_id,balance)ON s.item_id=t.item_idWHEN MATCHED THEN UPDATE SET balance=t.balance+ s.balanceWHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance); MERGE 1 SELECT * FROM Stock ORDER BY item_id;item_id | balance ---------+--------- 10 | 2300 20 | 1900 MERGE INTO Stock tUSING (VALUES(30,2000)) AS s(item_id,balance)ON s.item_id=t.item_idWHEN MATCHED THEN UPDATE SET balance=t.balance+ s.balanceWHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance); MERGE 1 SELECT * FROM Stock ORDER BY item_id;item_id | balance ---------+--------- 10 | 2300 20 | 1900 30 | 2000 I'm still a little uncertain as to whether any of my other examples should have worked under the spec but just didn't work here, but I'll worry about that later. Here's what the query plan looks like on a MATCH: Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 rows=0 loops=1) Action 1: Update When Matched Action 2: Insert When Not Mactched MainPlan: -> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual time=0.050..0.061 rows=1 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=1) -> Index Scan using stock_item_id_key on stock t (cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1) Index Cond: ("*VALUES*".column1= item_id)Total runtime: 0.370 ms And here's a miss: Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145 rows=0 loops=1) Action 1: Update When Matched Action 2: Insert When Not Mactched MainPlan: -> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual time=0.028..0.033 rows=1 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=1) -> Index Scan using stock_item_id_key on stock t (cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1) Index Cond: ("*VALUES*".column1= item_id)Total runtime: 0.255 ms Next steps here: 1) Performance/concurrency tests against trigger-based UPSERT approach. 2) Finish bit rot cleanup against HEAD. 3) Work out more complicated test cases to try and fine more unexpected behavior edge cases and general bugs. -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
On 10/21/2010 08:36 PM, Greg Smith wrote: > Robert Haas wrote: >> I think the right way to write UPSERT is something >> along the lines of: >> >> MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON >> s.item_id = t.item_id ... [...] > Here's what the query plan looks like on a MATCH: > > Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 rows=0 > loops=1) > Action 1: Update When Matched > Action 2: Insert When Not Mactched "Mactched"? - is this a c&p error or the actual output of EXPLAIN? :) lg Stefan
There are now two branches of MERGE code in review progress available. http://github.com/greg2ndQuadrant/postgres/tree/merge-unstable has the bit-rotted version that doesn't quite work against HEAD yet, while http://github.com/greg2ndQuadrant/postgres/tree/merge is what I'm still testing against until I get that sorted out. Attached is a tar file containing an initial concurrent MERGE test case. What it does is create is a pgbench database with a scale of 1. Then it runs a custom test that does an UPSERT using MERGE against pgbench_accounts, telling pgbench the database scale is actually 2. This means that approximately half of the statements executed will hit the MATCHED side of the merge and update existing rows, while half will hit the NOT MATCHED one and create new account records. Did some small tests using pgbench's debug mode where you see all the statements it executes, and all the output looked correct. Successive tests runs are not deterministically perfect performance comparisons, but with enough transactions I hope that averages out. For comparison sake, there's an almost identical test case that does the same thing using the pl/pgsql example function from the PostgreSQL manual for the UPSERT instead also in there. You just run test-merge.sh or test-function.sh and it runs the whole test, presuming you built and installed pgbench and don't mind that it will blow away any database named pgbench you already have. Since the current MERGE implementation is known not to be optimized for concurrency, my main goal here wasn't to see how fast it was. That number is interesting though. With the sole postgresql.conf change of: checkpoint_settings=32 And a debug/assertion build using 8 concurrent clients, I got 1607 TPS of UPSERT out of the trigger approach @ 6MB/s of writes to disk, while the MERGE one delivered 988 TPS @ 4.5MB/s of writes. Will explore this more later. This did seem to find a bug in the implementation after running for a while: TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File: "execMain.c", Line: 1732) Line number there is relative to what you can see at http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c and that's a null pointer check at the beginning of EvalPlanQualFetchRowMarks. Haven't explored why this happened or how repeatable this is, but since Boxuan was looking for some bugs to chase I wanted to deliver him one to chew on. While the performance doesn't need to be great in V1, there needs to be at least some minimal protection against concurrency issues before this is commit quality. Will continue to shake this code out looking for them now that I have some basic testing that works for doing so. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
Attachment
On 2010-10-23 8:34 AM +0300, Greg Smith wrote: > While the performance doesn't need to be great in V1, there needs to be > at least some minimal protection against concurrency issues before this > is commit quality. What's been bothering me is that so far there has not been an agreement on whether we need to protect against concurrency issues or not. In fact, there has been almost no discussion about the concurrency issues which AIUI have been the biggest single reason we don't have MERGE already. Right now, this patch fails in even the simplest scenario: =# create table foo(a int unique); NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key" for table "foo" CREATE TABLE T1=# begin; BEGIN T1=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched then update set a=a+1 when not matched then insert values (s.i); MERGE 1 T2=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched then update set a=a+1 when not matched then insert values (s.i); -- blocks T1=# commit; COMMIT .. and T2 gets a unique constraint violation. As far as I'm concerned, the reason people want MERGE is to avoid these problems; the nicer syntax is just a bonus. Having to LOCK the target table makes this feature a lot less useful, even though there are a few use cases where locking the table would be OK. Regards, Marko Tiikkaja
Marko Tiikkaja wrote: > What's been bothering me is that so far there has not been an > agreement on whether we need to protect against concurrency issues or > not. In fact, there has been almost no discussion about the > concurrency issues which AIUI have been the biggest single reason we > don't have MERGE already. Sure there were, they just happened a long time ago. I tried to summarize the previous discussion at http://wiki.postgresql.org/wiki/SQL_MERGE with the following: "PostgreSQL doesn't have a good way to lock access to a key value that doesn't exist yet--what other databases call key range locking...Improvements to the index implementation are needed to allow this feature." You can sift through the other archive links in there, the discussion leading up to that conclusion is in there somewhere. And we had a discussion of this at the last developer's meeting where this was identified as a key issue to sort out before this was done; see the table at the end of http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting and note the warning associated with this feature. The deadlock on developing this feature has been that there's little benefit to building key range locking on its own, or even a good way to prove that it works, without a visible feature that uses it. It's much easier to justify development time on if the rest of MERGE is known to work, and just needs that plugged in to improve concurrency. And since Boxuan appeared at the right time to do the syntax and implementation part first as well, that's the order it's ended up happening in. We're only making the concurrency part a second priority right now in order to break the development into managable chunks. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > Marko Tiikkaja wrote: >> What's been bothering me is that so far there has not been an >> agreement on whether we need to protect against concurrency issues or >> not. In fact, there has been almost no discussion about the >> concurrency issues which AIUI have been the biggest single reason we >> don't have MERGE already. > Sure there were, they just happened a long time ago. I tried to > summarize the previous discussion at > http://wiki.postgresql.org/wiki/SQL_MERGE with the following: > "PostgreSQL doesn't have a good way to lock access to a key value that > doesn't exist yet--what other databases call key range > locking...Improvements to the index implementation are needed to allow > this feature." This seems to be presuming there's a consensus that that's the way to implement it, which is news to me. Why wouldn't we try the INSERT first, and if we get a unique-key failure, do UPDATE instead? I am not at all in agreement that we ought to build key range locking, nor that it'd be particularly helpful for this problem even if we had it. regards, tom lane
This did seem to find a bug in the implementation after running for a while:
TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File: "execMain.c", Line: 1732)
Line number there is relative to what you can see at http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
and that's a null pointer check at the beginning of EvalPlanQualFetchRowMarks. Haven't explored why this happened or how repeatable this is, but since Boxuan was looking for some bugs to chase I wanted to deliver him one to chew on.
I am not sure how this bug comes. I will try to find out the reason for it.
On Sat, Oct 23, 2010 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "PostgreSQL doesn't have a good way to lock access to a key value that >> doesn't exist yet--what other databases call key range >> locking...Improvements to the index implementation are needed to allow >> this feature." > > This seems to be presuming there's a consensus that that's the way to > implement it, which is news to me. Why wouldn't we try the INSERT > first, and if we get a unique-key failure, do UPDATE instead? I am not > at all in agreement that we ought to build key range locking, nor that > it'd be particularly helpful for this problem even if we had it. Except we *do* have range locking for the special case of inserting equal values into a unique btree index. In that case the btree insert locks the leaf page containing the conflicting key value -- effectively locking out anyone else from inserting the same key value.That lock is held until the index entry pointing tothe newly inserted value is finished. That's what prevents two inserters from inserting the same key value. So the trick for MERGE on equality would be to refactor the btree api so that you could find the matching leaf page and *keep* that page locked. Then do an update against the conflicting row found there if any without ever releasing that page lock. Someone else can come along and delete the row (or have already deleted it) before the update locks it but if they do you can insert your row without worrying about anyone else inserting a conflicting entry since you're still holding a lock on the page they'll need to insert the btree entry on and have been holding it continuously for the whole operation. I'm not sure how the new exclusion constraints stuff affects this. I think they would work the same way except instead of holding a page lock on the leaf page they would store the value being edited in the in-memory array -- effectively another form of key range locking. I think the blocker with MERGE has always been that the standard is *so* general that it could apply to conditions that *aren't* covered by a unique constraint or exclusion constriant. Personally, I'm fine saying that those cases will perform poorly, locking the table, or aren't allowed and print a warning explaining exactly what exclusion constraint or btree unique index would be necessary to allow it. I think virtually every case where people will want to use it will be on a primary key anyways. -- greg
> So the trick for MERGE on equality would be to refactor the btree api > so that you could find the matching leaf page and *keep* that page > locked. Then do an update against the conflicting row found there if > any without ever releasing that page lock. Someone else can come along > and delete the row (or have already deleted it) before the update > locks it but if they do you can insert your row without worrying about > anyone else inserting a conflicting entry since you're still holding a > lock on the page they'll need to insert the btree entry on and have > been holding it continuously for the whole operation. I think that such a lock would also be useful for improving the FK deadlock issues we have. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Sat, Oct 23, 2010 at 4:03 PM, Josh Berkus <josh@agliodbs.com> wrote: > I think that such a lock would also be useful for improving the FK deadlock > issues we have. I don't see how. I think the problem you're referring to occurs when different plans update rows in different orders and the resulting locks on foreign key targets are taken in different orders. In which case the problem isn't that we're unable to lock the resources -- they're locked using regular row locks -- but rather that there's nothing controlling the ordering of the locks. I don't think it would be acceptable to hold low level btree page locks across multiple independent row operations on different rows and I don't see how they would be any better than the row locks we have now. Worse, the resulting deadlock would no longer be detectable (one of the reasons it wouldn't be acceptable to hold the lock for so long). That does point out a problem with the logic I sketched. If you go to do an update and find there's an uncommitted update pending you have to wait on it. You can't do that while holding the index page lock. I assume then you release it and wait on the uncommitted transaction and when it's finished you start over doing the btree lookup and reacquiring the lock. I haven't thought it through in detail though. -- greg
On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote: > I think the blocker with MERGE has always been that the standard is > *so* general that it could apply to conditions that *aren't* covered > by a unique constraint or exclusion constriant. Personally, I'm fine > saying that those cases will perform poorly, locking the table, or > aren't allowed and print a warning explaining exactly what exclusion > constraint or btree unique index would be necessary to allow it. I > think virtually every case where people will want to use it will be on > a primary key anyways. Can we please not get MERGE hung up on trying to make it atomic. The standard requires no such thing and there are plenty of uses of MERGE that don't involve high concurrency updates of the same row by different processes. If we want an atomic UPSERT, make that a seperate command, MERGE without atomicity is useful enough already. Simply, if the row is visible in the current snapshot you do an update otherwise an insert. If you get an error, raise it. Icing can be added later. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On Sun, Oct 24, 2010 at 5:50 AM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote: >> I think the blocker with MERGE has always been that the standard is >> *so* general that it could apply to conditions that *aren't* covered >> by a unique constraint or exclusion constriant. Personally, I'm fine >> saying that those cases will perform poorly, locking the table, or >> aren't allowed and print a warning explaining exactly what exclusion >> constraint or btree unique index would be necessary to allow it. I >> think virtually every case where people will want to use it will be on >> a primary key anyways. > > Can we please not get MERGE hung up on trying to make it atomic. The > standard requires no such thing and there are plenty of uses of MERGE > that don't involve high concurrency updates of the same row by > different processes. If we want an atomic UPSERT, make that a seperate > command, MERGE without atomicity is useful enough already. > > Simply, if the row is visible in the current snapshot you do an update > otherwise an insert. If you get an error, raise it. Icing can be added > later. Long term, I'm wondering if the permanent fix for this problem might involve something similar to what EXCLUDE does - use a dirty snapshot for the scan of the target table, and block on in=d Yeah, I couldn't have said it better. It's an annoying implementation restriction but only that. I think it's a perfectly acceptable restriction for an initial commit. It's no secret that our process has trouble absorbing large patches. I am also wondering exactly what the semantics are supposed to be under concurrency. We can't assume that it's OK to treat WHEN NOT MATCHED as WHEN MATCHED if a unique-key violation is encountered. The join condition could be something else entirely. I guess we could scan the target table with a dirty snapshot and block on any in-doubt tuples, similar to what EXCLUDE constraints do internally, but throwing MVCC out the window doesn't seem right either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > I am also wondering exactly what the semantics are supposed to be > under concurrency. We can't assume that it's OK to treat WHEN NOT > MATCHED as WHEN MATCHED if a unique-key violation is encountered. The > join condition could be something else entirely. I guess we could > scan the target table with a dirty snapshot and block on any in-doubt > tuples, similar to what EXCLUDE constraints do internally, but > throwing MVCC out the window doesn't seem right either. > You've answered Tom's question of why not to just INSERT and trap the error here. Presuming a unique key violation is what will kick back in this situation and to just follow the other side doesn't cover all the ways this can get used. I'm thinking of this problem now as being like the one that happens when a concurrent UPDATE happens. To quote from the docs here: "a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row." What I think we can do with adding a key reservation is apply the same sort of logic to INSERTs too, given a way to lock an index entry before the row itself is complete. If MERGE hits a WHERE condition that finds a key lock entry in the index, it has to sit and wait for that other command to finish. There isn't any other sensible behavior I can see in that situation, just like there isn't one for UPDATE right now. If the transaction that was locking the index entry commits, it has to start over with re-evaluating the WHEN MATCHED part (the equivilent of the WHERE in the UPDATE case). But it shouldn't need to do a new JOIN again, because that condition was proven to be met already by the lock squatting on that index key, without even having the rest of the row there yet to check its data. If the other entry commits, it would then follow the WHEN MATCHED path and in the UPSERT case do an UPDATE. If the transaction who had the key reservervation rolls back, the re-evaluation of the MERGE WHEN MATCHED will fail, at which point it follows the WHERE FOUND INSERT path. As it will now be the locker of the key reservation it had been waiting on earlier at that point, it will be free to do the INSERT with no concerns about race conditions or retries. In the SERIALIZABLE case, again just try to follow the same slightly different rules that exist for concurrent UPDATE commands. There may be a tricky point here that we will need to warn people that UPSERT implementations need to make sure they only reference the columns of the primary key in the MERGE conditions for this to work as expected, because otherwise they might get back a unique key violation error anyway. I don't know how other databases deal with that particular problem. I have considered following the somewhat distasteful path of installing SQL Server or Oracle here just to see how they handle some of the pathological test cases I'm now thinking about for this command. This particular weak spot in MVCC is already there for UPDATE, and this approach seems to me to be the most logical way to extend the same basic implementation to also eliminate some concurrent MERGE race conditions, including the most common one the UPSERT case is stuck with. But I have no intention of halting work on the basic MERGE to wait for this problem to be solved even if I'm completely wrong about what I've outlined above. That style of thinking--don't even start until every a perfect solution for every possible problem is known--is something I consider a problem in this community's development model, one that has contributed to why no work has been done on this feature until now. This one splits nicely into "get the implemenation working" and "improve the concurrency" parts, and I haven't heard a good reason yet why those need to coupled together. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
On Sun, Oct 24, 2010 at 12:15 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Robert Haas wrote: >> I am also wondering exactly what the semantics are supposed to be >> under concurrency. We can't assume that it's OK to treat WHEN NOT >> MATCHED as WHEN MATCHED if a unique-key violation is encountered. The >> join condition could be something else entirely. I guess we could >> scan the target table with a dirty snapshot and block on any in-doubt >> tuples, similar to what EXCLUDE constraints do internally, but >> throwing MVCC out the window doesn't seem right either. > > [discussion of EPQ behavior for UPDATE statements] > > What I think we can do with adding a key reservation is apply the same sort > of logic to INSERTs too, given a way to lock an index entry before the row > itself is complete. If MERGE hits a WHERE condition that finds a key lock > entry in the index, it has to sit and wait for that other command to finish. > There isn't any other sensible behavior I can see in that situation, just > like there isn't one for UPDATE right now. Well, there's no guarantee that any index at all exists on the target table, so any solution based on index locks can't be fully general. But let's back up and talk about MVCC for a minute. Suppose we have three source tuples, (1), (2), and (3); and the target table contains tuples (1) and (2), of which only (1) is visible to our MVCC snapshot; suppose also an equijoin. Clearly, source tuple (1) should fire the MATCHED rule and source tuple (3) should fire the NOT MATCHED rule, but what in the world should source tuple (2) do? AFAICS, the only sensible behavior is to throw a serialization error, because no matter what you do the results won't be equivalent to a serial execution of the transaction that committed target tuple (2) and the transaction that contains the MERGE. Even with predicate locks, it's not obvious how to me how solve this problem. Target tuple (2) may already be there, and its transaction already committed, by the time the MERGE statement gets around to looking at the source data. In fact, even taking an AccessExclusiveLock on the target table doesn't fix this, because the snapshot would be taken before the lock. > been done on this feature until now. This one splits nicely into "get the > implemenation working" and "improve the concurrency" parts, and I haven't > heard a good reason yet why those need to coupled together. Sounds like we're all on the same page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > Well, there's no guarantee that any index at all exists on the target > table, so any solution based on index locks can't be fully general. > Sure, but in the most common use case I think we're targeting at first, no indexes means there's also no unique constraints either. I don't think people can reasonable expect to MERGE or anything else to guarantee they won't accidentally create two rows that conflict via the terms of some non-database enforced key. I am too brain-dead this particular Sunday to think of exactly how to deal with the 3-tuple case you described, except to say that I think it's OK for complicated situations to give up and throw a serialization error. I'm already collecting a list of pathological tests and will try to add something based on your problem case, then see what we can do with it later. -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout <kleptog@svana.org> wrote: > Can we please not get MERGE hung up on trying to make it atomic. The > standard requires no such thing and there are plenty of uses of MERGE > that don't involve high concurrency updates of the same row by > different processes. If we want an atomic UPSERT, make that a seperate > command, MERGE without atomicity is useful enough already. Really? I don't really see any point in a non-atomic MERGE. Nor in a non-atomic UPDATE or INSERT or any other operation. The A in ACID is as important as any other letter. For that matter if you don't care about atomicity then this is a problem already solvable easily solvable in the application. Why bother providing a special command for it. The whole reason to want a special command is precisely because the database can implement it atomically more easily and more efficiently than any application implementation. Moreover having a case which is non-atomic and allows inconsistent results or errors in the face of concurrent updates is a foot-gun. Someone will come along and try to use it and it will appear to work in their application but introduce nasty hidden race conditions. -- greg
On Sun, Oct 24, 2010 at 2:39 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Sure, but in the most common use case I think we're targeting at first, no > indexes means there's also no unique constraints either. I don't think > people can reasonable expect to MERGE or anything else to guarantee they > won't accidentally create two rows that conflict via the terms of some > non-database enforced key. I'm fine with either a) having the merge constraint be required to match exactly either a exclusion constraint or unique index or throw an error or b) lock the table if you perform a merge against a table on a non-indexed condition like foreign keys do if you're missing the relevant index. Either way I expect this case to be a rare use case where users are either doing data loads and locking the table against concurrent updates is fine or they will immediately realize the error of their ways and create the corresponding unique or exclusion constraint index. Other than bulk data loads I don't see any useful use case that doesn't have the corresponding exclusion constraint or unique index as a hard requirement anyways. It'll be necessary for both correctness and performance. -- greg
On Sun, Oct 24, 2010 at 7:11 PM, Greg Stark <gsstark@mit.edu> wrote: > On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout > <kleptog@svana.org> wrote: >> Can we please not get MERGE hung up on trying to make it atomic. The >> standard requires no such thing and there are plenty of uses of MERGE >> that don't involve high concurrency updates of the same row by >> different processes. If we want an atomic UPSERT, make that a seperate >> command, MERGE without atomicity is useful enough already. > > Really? I don't really see any point in a non-atomic MERGE. Nor in a > non-atomic UPDATE or INSERT or any other operation. The A in ACID is > as important as any other letter. I think we're really discussing the "I" in ACID, not the "A". There's no question that the MERGE transaction will either commit or fail. What we're talking about is what happens when there are concurrent table modifications in progress; and the answer is that you might get serialization anomalies. But we have serialization anomalies without MERGE, too - see the discussions around Kevin Grittner's SSI patch which, come to think of it, might be useful for this case, too. I posted an example upthread which I think demonstrates very clearly that MERGE will result in unavoidable serialization failures - so if the standard is that we mustn't have any serialization failures then the standard can never be met. The best we can hope for is that we'll detect them and roll back if they occur, rather than going on to commit but perhaps with some anomaly. And I'm pretty sure that's what KG's SSI patch is going to give us. So I'm not sure there's really anything to get worked up about here in terms of concurrency issues. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > What we're talking about is what happens when there are concurrent > table modifications in progress; and the answer is that you might > get serialization anomalies. But we have serialization anomalies > without MERGE, too - see the discussions around Kevin Grittner's > SSI patch which, come to think of it, might be useful for this > case, too. I've been thinking about that as I read the discussion. If both transactions are serializable, there would tend to be a default behavior of preventing anomalies. ("Tend to be" because it might actually require the addition of a few calls to predicate locking functions from new MERGE code to be consistent with other code under the patch.) On the other hand, where there is a more targeted way of protecting integrity, I've tried to keep SSI out of the way and let it function as it has. For example, foreign key enforcement already manages this, so the SSI implementation intentionally ignores those reads and writes. From the discussion on MERGE I've been thinking that where there is an appropriate unique index the SSI code might want to stay out of the way, similar to foreign keys; but it might be used to handle the cases where there is no appropriate index. Or possibly the predicate locking portion of it could be used in a novel way by MERGE code to implement the MERGE logic. The API for SSI basically involves three types of functions:- Acquire a predicate lock on an object.- Check whether a given object is predicate locked.-Check for rw-conflict. To be useful for MERGE, that second category would probably need to be expanded, and we might need to upgrade btree index locking to support range locking rather than stopping at index page locks. Dan is planning to try this once he has sufficient benchmarks as a base to confirm the performance impact. > I posted an example upthread which I think demonstrates very > clearly that MERGE will result in unavoidable serialization > failures - so if the standard is that we mustn't have any > serialization failures then the standard can never be met. The best > we can hope for is that we'll detect them and roll back if they > occur, rather than going on to commit but perhaps with some > anomaly. And I'm pretty sure that's what KG's SSI patch is going to > give us. So I'm not sure there's really anything to get worked up > about here in terms of concurrency issues. Given the standard's emphasis on data integrity and the known concurrency issues with relational theory, I find it hard to believe that the standard requires "no serialization failures duing merge" -- but I haven't had time to dig into the standard's specifications yet. -Kevin
On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote: > But let's back up and talk about MVCC for a minute. Suppose we have > three source tuples, (1), (2), and (3); and the target table contains > tuples (1) and (2), of which only (1) is visible to our MVCC snapshot; > suppose also an equijoin. Clearly, source tuple (1) should fire the > MATCHED rule and source tuple (3) should fire the NOT MATCHED rule, > but what in the world should source tuple (2) do? AFAICS, the only > sensible behavior is to throw a serialization error, because no matter > what you do the results won't be equivalent to a serial execution of > the transaction that committed target tuple (2) and the transaction > that contains the MERGE. So the behaviour we get with UPDATE in this situation is that we update (2) so I would expect this to execute the MATCHED rule. The key distinction is that since we're not returning the data to the user the user sees we want to update the most recent version and it's "almost" as if we ran "after" all the other transactions. It's not really serializable and I think in serializable mode we throw a serialization failure instead but in most usage patterns it's precisely what the user wants. Here "bbb" contained two records when we began with values "1" and "2" but the "2" was inserted in a transaction which hadn't committed yet. It commited after the update. postgres=> begin; BEGIN postgres=> select * from bbb;i ---1 (1 row) postgres=> update bbb set i = i+1; UPDATE 2 postgres=> commit; COMMIT postgres=> select * from bbb;i ---23 (2 rows) -- greg
Greg Stark <gsstark@mit.edu> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> But let's back up and talk about MVCC for a minute. Suppose we >> have three source tuples, (1), (2), and (3); and the target table >> contains tuples (1) and (2), of which only (1) is visible to our >> MVCC snapshot; suppose also an equijoin. Clearly, source tuple >> (1) should fire the MATCHED rule and source tuple (3) should fire >> the NOT MATCHED rule, but what in the world should source tuple >> (2) do? AFAICS, the only sensible behavior is to throw a >> serialization error, because no matter what you do the results >> won't be equivalent to a serial execution of the transaction that >> committed target tuple (2) and the transaction that contains the >> MERGE. > > So the behaviour we get with UPDATE in this situation is that we > update (2) so I would expect this to execute the MATCHED rule. Certainly that would be consistent with the behavior of READ COMMITTED -- wait for commit or rollback of the concurrent transaction, and then proceed with whatever data is there after completion of the other transaction. With REPEATABLE READ or SERIALIZABLE you would block until commit of the other transaction and terminate with a write conflict -- a form of serialization failure. If the other transaction rolls back you INSERT. At least, that would be the least surprising behavior to me. -Kevin
On Mon, Oct 25, 2010 at 1:42 PM, Greg Stark <gsstark@mit.edu> wrote: > On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> But let's back up and talk about MVCC for a minute. Suppose we have >> three source tuples, (1), (2), and (3); and the target table contains >> tuples (1) and (2), of which only (1) is visible to our MVCC snapshot; >> suppose also an equijoin. Clearly, source tuple (1) should fire the >> MATCHED rule and source tuple (3) should fire the NOT MATCHED rule, >> but what in the world should source tuple (2) do? AFAICS, the only >> sensible behavior is to throw a serialization error, because no matter >> what you do the results won't be equivalent to a serial execution of >> the transaction that committed target tuple (2) and the transaction >> that contains the MERGE. > > So the behaviour we get with UPDATE in this situation is that we > update (2) so I would expect this to execute the MATCHED rule. Not exactly. Consider this example: rhaas=# create table concurrent (x integer primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "concurrent_pkey" for table "concurrent" CREATE TABLE rhaas=# insert into x values (1); rhaas=# begin; BEGIN rhaas=# insert into concurrent values (2); INSERT 0 1 <switch to a different window> rhaas=# update concurrent set x=x where x=2; UPDATE 0 > The key > distinction is that since we're not returning the data to the user the > user sees we want to update the most recent version and it's "almost" > as if we ran "after" all the other transactions. It's not really > serializable and I think in serializable mode we throw a serialization > failure instead but in most usage patterns it's precisely what the > user wants. I think it would be perfectly reasonable to have a transaction isolation level that does not use a snapshot at all and instead runs everything relative to SnapshotNow, and people could use it with MERGE if they were so inclined. I think this would correspond more or less to the READ COMMITTED isolation level specified in the standard; what we now call READ COMMITTED is actually better than READ COMMITTED but not quite as good as REPEATABLE READ. That, combined with an exclusive lock on the table (or, perhaps, some kind of predicate locking mechanism) would be sufficient to prevent serialization anomalies. However, I don't think that implementing those semantics for just this one command (or part of it) makes a whole lot of sense. The EPQ behavior of our current default isolation level is really pretty strange, and adding a random wart that the target table (but not the source table) in a MERGE query gets scanned using SnapshotNow would be one more piece of strangeness atop the strangeness we already have. And, as we just saw with the enum stuff, SnapshotNow can lead to some awfully strange behavior - you could end up processing half of the data from a concurrent transaction and missing the other half. > Here "bbb" contained two records when we began with values "1" and "2" > but the "2" was inserted in a transaction which hadn't committed yet. > It commited after the update. > > postgres=> begin; > BEGIN > postgres=> select * from bbb; > i > --- > 1 > (1 row) > > postgres=> update bbb set i = i+1; > UPDATE 2 > postgres=> commit; > COMMIT > postgres=> select * from bbb; > i > --- > 2 > 3 > (2 rows) Well, at least on my system, if the transaction inserting (2) hasn't committed yet, that UPDATE statement will block until it does, because trying to change i from 1 to 2 causes the update of the unique index to block, since there's an in-doubt tuple with (2) already. Then it will continue on as you've shown here, due to EPQ. But if you do the same statement with i = i + 10 instead of + 1, then it doesn't block, and only updates the one row that's visible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > rhaas=# create table concurrent (x integer primary key); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "concurrent_pkey" for table "concurrent" > CREATE TABLE > rhaas=# insert into x values (1); > rhaas=# begin; > BEGIN > rhaas=# insert into concurrent values (2); > INSERT 0 1 > > <switch to a different window> > > rhaas=# update concurrent set x=x where x=2; > UPDATE 0 That surprised me. I would have thought that the INSERT would have created an "in doubt" tuple which would block the UPDATE. What is the reason for not doing so? FWIW I did a quick test and REPEATABLE READ also lets this pass but with the SSI patch SERIALIZABLE seems to cover this correctly, generating a serialization failure where such access is involved in write skew: test=# create table concurrent (x integer primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "concurrent_pkey" for table "concurrent" CREATE TABLE test=# insert into concurrent select generate_series(1, 20000); INSERT 0 20000 test=# begin isolation level serializable; BEGIN test=# insert into concurrent values (0); INSERT 0 1 test=# update concurrent set x = 30001 where x = 30000; UPDATE 0 <different session> test=# begin isolation level serializable; BEGIN test=# insert into concurrent values (30000); INSERT 0 1 test=# update concurrent set x = -1 where x = 0; UPDATE 0 test=# commit; ERROR: could not serialize access due to read/write dependencies among transactions HINT: The transaction might succeed if retried. I'll need to add a test to cover this, because it might have broken with one of the optimizations on my list, had you not point out this behavior. On the other hand: <session 1> test=# drop table concurrent ; DROP TABLE test=# create table concurrent (x integer primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "concurrent_pkey" for table "concurrent" CREATE TABLE test=# insert into concurrent select generate_series(1, 20000); INSERT 0 20000 test=# begin isolation level serializable; BEGIN test=# insert into concurrent values (0); INSERT 0 1 <session 2> test=# begin isolation level serializable; BEGIN test=# select * from concurrent where x = 0;x --- (0 rows) test=# insert into concurrent values (0); <blocks> <session 1> test=# commit; COMMIT <session 2> ERROR: duplicate key value violates unique constraint "concurrent_pkey" DETAIL: Key (x)=(0) already exists. Anyway, I thought this might be of interest in terms of the MERGE patch concurrency issues, since the SSI patch has been mentioned. -Kevin
On Mon, Oct 25, 2010 at 3:15 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> rhaas=# create table concurrent (x integer primary key); >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "concurrent_pkey" for table "concurrent" >> CREATE TABLE >> rhaas=# insert into x values (1); >> rhaas=# begin; >> BEGIN >> rhaas=# insert into concurrent values (2); >> INSERT 0 1 >> >> <switch to a different window> >> >> rhaas=# update concurrent set x=x where x=2; >> UPDATE 0 > > That surprised me. I would have thought that the INSERT would have > created an "in doubt" tuple which would block the UPDATE. What is > the reason for not doing so? This is just standard MVCC - readers don't block writers, nor writers readers. You might also think about what would happen if the UPDATE were run before the INSERT of (2). There's no serialization anomaly here, because either concurrent case is equivalent to the serial schedule where the update precedes the insert. In the case of a MERGE that matches a just-inserted invisible tuple but no visible tuple, things are a bit stickier. Let's suppose we're trying to use MERGE to get UPSERT semantics. If the MERGE command has the obvious behavior of ignoring the invisible tuple just as UPDATE or DELETE would do, then clearly any equivalent serial schedule must run the MERGE before the INSERT (because if it were run after the INSERT, it would fire the MATCHED action rather than the NOT MATCHED action). But if the merge were run before the INSERT, then the INSERT would have failed with a unique key violation; instead, the merge fails with a unique key violation. On the other hand, if the MERGE sees the invisible tuple, essentially using SnapshotNow semantics, as Greg Stark proposed, you get a different (and probably worse) class of serialization anomalies. For example, suppose the table has rows 1-100 and you do an update adding 1000 to each value concurrently with merging in the values 51-100. You might get something like this: - MERGE scans rows 1-75, firing MATCHED for rows 51-75. - UPDATE commits. - MERGE scans rows 76-100, firing NOT MATCHED for each. Now, as Greg says, that might be what some people want, but it's certainly monumentally unserializable. In a serial execution schedule, the MERGE will either run before the UPDATE, in which case MATCHED will fire for rows 51-100, or else the UPDATE will run before the MERGE, in which case NOT MATCHED will fire for rows 51-100. No serial schedule is going to fire MATCHED for some rows and NOT MATCHED for others. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Now, as Greg says, that might be what some people want, but it's > certainly monumentally unserializable. To be clear when I said it's what people want what I meant was that in the common cases it's doing exactly what people want. As opposed to getting closer to what people want in general but not quite hitting the mark in the common cases. Just as an example I think it's important that in the simplest case, upsert of a single record, it be 100% guaranteed to do the naive upsert. If two users are doing the merge of a single key at the same time one of them had better insert and one of them had better update or else users are going to be monumentally surprised. I guess I hadn't considered all the cases and I agree it's important that our behaviour make some kind of sense and be consistent with how we handle updates and of existing in-doubt tuples. I wasn't trying to introduce a whole new mode of operation, just work from analogy from the way update works. It's clear that even with our existing semantics there are strange corner cases once you get to multiple updates happening in a single transaction. But we get the simple cases right and even in the more complex cases, while it's not truly serializable we should be able to come up with some basic smell tests that we pass. My understanding is that currently we generally treat DML in one of two ways depending on whether it's returning data to the user or updating data in the table (include select for share). If it's returning data to the user we use a snapshot to give the user a consistent view of the database. If it's altering data in the database we use the snapshot to get a consistent set of records and then apply the updates to the most recent version. The anomaly you showed with update and the problem with MERGE are both because the operation was simultaneously doing a "read" -- the WHERE clause and the uniqueness check in the MERGE -- and a write. This is already the kind of case where we do weird things -- what kind of behaviour would be consistent with our existing, somewhat weird, behaviour? -- greg
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> I would have thought that the INSERT would have >> created an "in doubt" tuple which would block the UPDATE. > This is just standard MVCC - readers don't block writers, nor > writers readers. Sure, but I tend to think of both INSERT and UPDATE as writes. ;-) > You might also think about what would happen if the UPDATE > were run before the INSERT > either concurrent case is equivalent to the serial > schedule where the update precedes the insert. I guess that's persuasive enough. It feels funny, but the argument looks sound, so I guess it's just a case of my intuition being faulty. > In the case of a MERGE that matches a just-inserted invisible > tuple but no visible tuple, things are a bit stickier. Well, more generally it can lead to anomalies in a more complex combination of actions, since it creates, as you imply above, a rw-dependency from the transaction doing the UPDATE to the transaction doing the INSERT, so the combination can form part of a cycle in apparent order of execution which can produce an anomaly. The more I look at it, the more clear it is that current behavior is correct and what the implications are. I've just missed that detail until now, wrongly assuming that it would be a write conflict. > [example of MERGE which can not serialize with a concurrent > transaction, and possible outcomes if there is no serialization > failure] > Now, as Greg says, that might be what some people want, but it's > certainly monumentally unserializable. Yeah. MERGE should probably be sensitive to the transaction isolation level, at least to the extent that MERGE in a SERIALIZABLE transaction plays nice with other SERIALIZABLE transactions. That would be necessary to allow business rules enforced through triggers to be able to guarantee data integrity. It would mean that a MERGE involving tables which were the target of modifications from concurrent SERIALIZABLE transactions would be likely to fail and/or to cause other transactions to fail. -Kevin
On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark@mit.edu> wrote: > On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Now, as Greg says, that might be what some people want, but it's >> certainly monumentally unserializable. > > To be clear when I said it's what people want what I meant was that in > the common cases it's doing exactly what people want. As opposed to > getting closer to what people want in general but not quite hitting > the mark in the common cases. > > Just as an example I think it's important that in the simplest case, > upsert of a single record, it be 100% guaranteed to do the naive > upsert. If two users are doing the merge of a single key at the same > time one of them had better insert and one of them had better update > or else users are going to be monumentally surprised. Hmm, so let's think about that case. The first merge comes along and finds no match so it fires the NOT MATCHED rule, which inserts a tuple. The second merge comes along and finds no match, so it also fires the NOT MATCHED rule and tries to insert a tuple. But upon consulting the PRIMARY KEY index it finds that an in-doubt tuple exists so it goes to sleep waiting for the first transaction to commit or abort. If the first transaction commits it then decides that the jig is up and fails. We could (maybe) fix this by doing something similar to what EPQ does for updates: when the first transaction commits, instead of redoing the insert, we back up and recheck whether the new tuple would have matched the join clause and, if so, we instead fire the MATCHED action on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not sure how hard that would be, or whether it would introduce any other nasty anomalies in more complex cases. Alternatively, we could introduce an UPSERT or REPLACE statement intended to handle exactly this case and leave MERGE for more complex situations. It's pretty easy to imagine what the coding of that should look like: if we encounter an in-doubt tuple in we wait on its xmin. If the transaction aborts, we insert. If it commits, and we're in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ or SERIALIZABLE mode, we abort with a serialization error. That's a lot simpler to understand and reason about than MERGE in its full generality. I think it's pretty much hopeless to think that MERGE is going to work in complex concurrent scenarios without creating serialization anomalies, or at least rollbacks. I think that's baked into the nature of what the statement does. To simulate MERGE, you need to read from the target table and then do writes that depend on what you read. If you do that with the commands that are available today, you're going to get serialization anomalies and/or rollbacks under concurrency. The mere fact of that logic being inside the database rather than outside isn't going to make that go away. Now sometimes, as with exclusion constraints, you can play games with dirty snapshots to get the semantics you want, but whether that's possible in a particular case depends on the details of the operation being performed, and here I think it can't be done. Some operations are *fundamentally* unserializable. A very simple example of this is a sequence that is guaranteed not to have gaps (a feature we've occasionally been requested to provide). If N processes request a sequence number simultaneously, you have to hand out a value to the first guy and wait and see whether he commits or aborts before deciding what number to give the second guy. That sucks, so usually we just design our applications not to require that sequences be gap-free. Similarly here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2010-10-25 at 16:58 -0400, Robert Haas wrote: > On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark@mit.edu> wrote: > > On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> Now, as Greg says, that might be what some people want, but it's > >> certainly monumentally unserializable. > > > > To be clear when I said it's what people want what I meant was that in > > the common cases it's doing exactly what people want. As opposed to > > getting closer to what people want in general but not quite hitting > > the mark in the common cases. > > > > Just as an example I think it's important that in the simplest case, > > upsert of a single record, it be 100% guaranteed to do the naive > > upsert. If two users are doing the merge of a single key at the same > > time one of them had better insert and one of them had better update > > or else users are going to be monumentally surprised. > > Hmm, so let's think about that case. > > The first merge comes along and finds no match so it fires the NOT > MATCHED rule, which inserts a tuple. The second merge comes along and > finds no match, so it also fires the NOT MATCHED rule and tries to > insert a tuple. But upon consulting the PRIMARY KEY index it finds > that an in-doubt tuple exists so it goes to sleep waiting for the > first transaction to commit or abort. If the first transaction > commits it then decides that the jig is up and fails. We could > (maybe) fix this by doing something similar to what EPQ does for > updates: when the first transaction commits, instead of redoing the > insert, we back up and recheck whether the new tuple would have > matched the join clause and, if so, we instead fire the MATCHED action > on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not > sure how hard that would be, or whether it would introduce any other > nasty anomalies in more complex cases. > > Alternatively, we could introduce an UPSERT or REPLACE statement > intended to handle exactly this case and leave MERGE for more complex > situations. It's pretty easy to imagine what the coding of that > should look like: if we encounter an in-doubt tuple in we wait on its > xmin. If the transaction aborts, we insert. If it commits, and we're > in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ > or SERIALIZABLE mode, we abort with a serialization error. That's a > lot simpler to understand and reason about than MERGE in its full > generality. > > I think it's pretty much hopeless to think that MERGE is going to work > in complex concurrent scenarios without creating serialization > anomalies, or at least rollbacks. I think that's baked into the > nature of what the statement does. To simulate MERGE, you need to > read from the target table and then do writes that depend on what you > read. If you do that with the commands that are available today, > you're going to get serialization anomalies and/or rollbacks under > concurrency. The mere fact of that logic being inside the database > rather than outside isn't going to make that go away. Now sometimes, > as with exclusion constraints, you can play games with dirty snapshots > to get the semantics you want, but whether that's possible in a > particular case depends on the details of the operation being > performed, and here I think it can't be done. Some operations are > *fundamentally* unserializable. I agree with your analysis. Let me review... There is a case that locking alone won't resolve, however that locking works. The transaction history for that is: T1: takes snapshot T2: INSERT row1 T2: COMMIT; T1: attempts to determine if MATCHED or NOT MATCHED. The answer is neither of those two answers. If we say it is NOT MATCHED then we will just fail on any INSERT, or if we say it is MATCHED then technically we can't see the row so the UPDATE should fail. The COMMIT of T2 releases any locks that would have helped resolve this, and note that even if T1 attempts to lock that row as early as possible, only a table level lock would prevent T2 from doing that action. Two options for resolution are 1) Throw SERIALIZABLE error 2) Implement something similar to EvalPlanQual As you say, we already resolve this situation for concurrent updates by following the update chain from a row that is visible to the latest row. For MERGE the semantics need to be subtely different here: we need to follow the update chain to the latest row, but from a row that we *can't* see. MERGE is still very useful without the need for 2), though fails in some cases for concurrent use. The failure rate would increase as the number of concurrent MERGErs and/or number of rows in source table. Those errors are no more serious than are possible now. So IMHO we should implement 1) now and come back later to implement 2). Given right now there may be other issues with 2) it seems unsafe to rely on the assumption that we'll fix them by end of release. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > I agree with your analysis. Let me review... > [review] Sounds like we're on the same page. > Two options for resolution are > > 1) Throw SERIALIZABLE error > > 2) Implement something similar to EvalPlanQual > As you say, we already resolve this situation for concurrent updates by > following the update chain from a row that is visible to the latest row. > For MERGE the semantics need to be subtely different here: we need to > follow the update chain to the latest row, but from a row that we > *can't* see. > > MERGE is still very useful without the need for 2), though fails in some > cases for concurrent use. The failure rate would increase as the number > of concurrent MERGErs and/or number of rows in source table. Those > errors are no more serious than are possible now. > > So IMHO we should implement 1) now and come back later to implement 2). > Given right now there may be other issues with 2) it seems unsafe to > rely on the assumption that we'll fix them by end of release. Yeah. In fact, I'm not sure we're ever going to want to implement #2 - I think that needs more study to determine whether there's even something there that makes sense to implement at all. But certainly I wouldn't count on it happening for 9.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 2010-10-26 at 16:08 -0400, Robert Haas wrote: > On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > I agree with your analysis. Let me review... > > [review] > > Sounds like we're on the same page. > > > Two options for resolution are > > > > 1) Throw SERIALIZABLE error > > > > 2) Implement something similar to EvalPlanQual > > As you say, we already resolve this situation for concurrent updates by > > following the update chain from a row that is visible to the latest row. > > For MERGE the semantics need to be subtely different here: we need to > > follow the update chain to the latest row, but from a row that we > > *can't* see. > > > > MERGE is still very useful without the need for 2), though fails in some > > cases for concurrent use. The failure rate would increase as the number > > of concurrent MERGErs and/or number of rows in source table. Those > > errors are no more serious than are possible now. > > > > So IMHO we should implement 1) now and come back later to implement 2). > > Given right now there may be other issues with 2) it seems unsafe to > > rely on the assumption that we'll fix them by end of release. > > Yeah. In fact, I'm not sure we're ever going to want to implement #2 > - I think that needs more study to determine whether there's even > something there that makes sense to implement at all. But certainly I > wouldn't count on it happening for 9.1. 2) sounds weird, until you realise it is exactly how you would need to code a PL/pgSQL procedure to do the equivalent of MERGE. Not doing it just makes no sense in the longer term. I agree it will take a while to think it through in sufficient detail. In the meantime it's a good argument for the ELSE construct at the end of the WHEN clauses, so we can do something other than skip a row or throw an ERROR. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Kevin Grittner wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > > rhaas=# create table concurrent (x integer primary key); > > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > > "concurrent_pkey" for table "concurrent" > > CREATE TABLE > > rhaas=# insert into x values (1); > > rhaas=# begin; > > BEGIN > > rhaas=# insert into concurrent values (2); > > INSERT 0 1 > > > > <switch to a different window> > > > > rhaas=# update concurrent set x=x where x=2; > > UPDATE 0 > > That surprised me. I would have thought that the INSERT would have > created an "in doubt" tuple which would block the UPDATE. What is > the reason for not doing so? When Kevin gets surprised, I get worried. LOL -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +