Re: MERGE Specification - Mailing list pgsql-hackers

From Boxuan Zhai
Subject Re: MERGE Specification
Date
Msg-id AANLkTi=6QTheKcBPuMzPm2KHMeH6pxP6xmY6PdTSzAsd@mail.gmail.com
Whole thread Raw
In response to Re: MERGE Specification  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: MERGE Specification
List pgsql-hackers


On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 11/08/10 11:11, Boxuan Zhai wrote:
The new patch is done. I named it as merge_v102. (1 means it is the
non-inheritance merge command, 02 means this is the second time of fixing
reported bugs)

Thanks. I went through this, fixing all the spurious end-of-line whitespace first with "git apply --whitespace=fix", and then manually fixing comment and whitespace style, typos, and doing some other small comment editing. Resulting patch attached. It is also available at my git repository at git://git.postgresql.org/git/users/heikki/postgres.git, branch 'mergestmt'. Please base any further patch versions on this patch, so that we don't need to redo this cleanup.

 
Thanks for the cleanup. The codes looks better now. My problem is that I have done some more modifications after merge_v102. Is there any way to apply the cleanup patch without erasing my new codes?
 
I'll continue reviewing this sometime next week, but here's few miscellaneous issues for starters;

* Explain output of actions needs some work:

>  Merge  (cost=246.37..272.96 rows=1770 width=38)
>    ACTION: UPDATE WHEN NOT MATCHED
>    ACTION: INSERT WHEN NOT MATCHED
>    ->  Merge Left Join  (cost=246.37..272.96 rows=1770 width=38)

Should print out more details of the action, like for normal updates/inserts/deletes.
 
Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more details than what we have here except the costs, rows and width, which is print out at the MERGE command line. 
 
For example:
 
Explain
update buy set volume = volume + 1;
                          QUERY PLAN
--------------------------------------------------------------
 Update  (cost=0.00..36.75 rows=2140 width=14)
   ->  Seq Scan on buy  (cost=0.00..36.75 rows=2140 width=14)
(2 rows)
 
For the explanation of an UPDATE command, we only have a Update title followed by the costs, then, after the arrow -> there comes the plan tree.
In a MERGE command, no action has its real private plan. They all share the main plan.  Thus, the costs for a merge command is that of the main plan. What is useful for a merge action is only the target list and quals. So my design is to show the merge command costs in first line. Then print out the actions and their qualifications in a list, followed by the main plan tree.  
 
Is there any other thing you suggest to print out for each action?
 
And all uppercase doesn't fit the surrounding style.
 
This  will be fixed.
 
* Is the behavior now SQL-compliant? We had long discussions about the default action being insert or do nothing or raise error, but I never got a clear picture of what the SQL spec says and whether we're compliant.

* What does the "one tuple is error" notice mean? We'll have to do something about that.. I don't think we've properly thought through the meaning of RAISE ERROR. Let's cut it out from the patch until then. It's only a few dozen lines to put back when we know what to do about it.
I find that we have not reached an agreement on MERGE's syntax yet. Personally, I support Simon's idea, that is the default action should be RAISE ERROR. However, I am no sure what RAISE ERROR should do when we get a missing tuple. Here I just use a simple elog(NOTICE, "a tuple is error"); for this situation.
 
I leave this for further extension when a more specific design for RAISE ERROR is available.
 
Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do you want me to chage the default action back to DO NOTHING? Or any other suggetions? (In fact, my personal thinking is to add a non-omissible "ELSE" clause as the end of the action list which forces the user to specify the default action for merge).
 
* Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?

 
 
I need one flag in these statement to differentiate them from normal InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the process of these two kinds of statements in the transfromStmt() function. I define a set of alias nodes which have different nodetags. This can make the code simpler. 
 
* Do you need the out-function for DeleteStmt? Why not for UpdateStmt and InsertStmt?
 
Ah, I add this function because I want to have a look of the content of DeleteStmt. I did this long ago, just after I started the project. If you don't want this function, I will remove it.
 
* I wonder if it would be simpler to check the fact that you can only have UPDATE/DELETE as a WHEN MATCHED action and only INSERT as a WHEN NOT MATCHED action directly in the grammar?

 
We can do this, but, I think it is better to keep it as it is now.
 
 
* Regarding this speculation in the MergeStmt grammar rule:
>       /*although we have only one USING table,
>       we still make it a list, maybe in future
>       we will allow multiple USING tables.*/

I wonder what the semantics of having multiple USING tables would be? If it would be like "USING (SELECT * FROM a UNION ALL SELECT * FROM b)", then we don't ever need it because you can already achieve it with that subquery. If it's something like "USING (SELECT * FROM a,b WHERE ...)", then we again don't need it because you can write that instead. So I think we should give up on the notion that source can be a list of tables, and simplify the code everywhere accordingly.

 
Yes, we should give up it. I was considering to allow the user input multiple source tables as a short way of "USING (SELECT * FROM a,b WHERE ...)". Now, this idea seems not interesting at all.  
 
 
* Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers every time, should set flags at plan time to mark what kind of actions there is in the statement.
 
First of all, I need to say that the functions of ExecBSMergeTriggers() and ExecASMergeTriggers() are for per-statement triggers, and are invoked only once in the whole process of a MERGE command. Setting flags at plan time can save the scanning. I may add it to next patch.  But don't expect it save much execution time.
 
Or should we defer firing the statement triggers until we hit the first matching row and execute the action for the first time?
 
No, we cannot do this. These are Per-statement triggers. They should be fired before/after the main plan is executed. The statement triggers are fired even no tuple is really matched with the actions.
 
* Do we need the 'replaced' field? Could you just replace the action with a DO NOTHING action instead.

 
Yes, I think it is a good idea to replace them with DO NOTHING. The replaced field was there before DO NOTHING is added to the system. But since we have DO NOTING action now, we can turn all the actions replaced by INSTEAD rules into DO NOTINGs. 
 
* This crashes:

postgres=# CREATE TABLE target AS (SELECT 1 AS id);
SELECT 1
postgres=# MERGE into target t
USING (select 1 AS id) AS s

ON t.id = s.id
WHEN MATCHED THEN
       UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
;
server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


 
 
Oh, a really serious bug. I have not considered this case (user series generator in actions) before. I will see what I can do for it.
 
 

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: MERGE Specification
Next
From: Heikki Linnakangas
Date:
Subject: Re: MERGE Specification