SQL-Invoked Procedures for 8.1 - Mailing list pgsql-hackers
From | Gavin Sherry |
---|---|
Subject | SQL-Invoked Procedures for 8.1 |
Date | |
Msg-id | 20040923130000.A9971@linuxworld.com.au Whole thread Raw |
Responses |
Re: SQL-Invoked Procedures for 8.1
Re: SQL-Invoked Procedures for 8.1 Re: SQL-Invoked Procedures for 8.1 |
List | pgsql-hackers |
Hi all, Following is a proposal to implement what SQL2003 calls 'SQL-Invoked Procedures' and what most people refer to as stored procedures. Fujitsu will be funding Neil Conway and I to work on this feature. Procedures ---------- Procedures are nearly identical to functions. From my reading of SQL2003 (see sections 4.27 and 11.50) procedures are different from functions in three ways: 1) Unlike functions, procedures cannot return values. 2) Procedures support parameter modes which specify whether an argument is IN, OUT or IN OUT 3) Procedures can be run in the same savepoint level as the caller when OLD SAVEPOINT LEVEL is specified at creation time. According to SQL2003, functions must be run on a new savepoint level. From my understanding, we do not do this currently. Work will focus on 1 and 2 until we have the concept of savepoint levels with functions. Its possible that we will implement this too if there is demand. This makes the difference between procedures and functions quite superficial: procedures are functions which return void and have parameter modes. As such, I think we can largely wrap around the existing function creation code using this rule. That is, CREATE PROCEDURE could and I think should be syntactic sugar. If we go down the route of saying that procedures are a type of function, we have the option of allowing users access to OUT and INOUT in functions. This would make procedures simply a subset of functions. What do people think? Grammar modifications --------------------- The grammar for creating a procedure, which is a subset of the SQL 2003 grammar, is as follows: CREATE [ OR REPLACE ] PROCEDURE <name> ( [ [ IN | OUT | IN OUT ] [ name ] type, ... ] )<routine characteristics><routinebody> <routine characteristics> are LANGUAGE, null-call and security clauses, all of which are already implemented. <routine body> is the body, same as with the existing functions implementation. DROP PROCEDURE and CALL will also need to be added. System catalog changes ----------------------- There will be cases when we need to identify whether a routine is a function or a procedure. This could be done two ways. We could say that any proc in pg_proc which returns void is a procedure or we could store this in some 'protype' column. Thoughts? To handle parameter modes, we will also need to add an array of "char"s so that we can determine the mode of an element in proargnames[]. Invocation ---------- Since procedures do not return a value they are invoked differently than functions are (at least according to SQL2003). This is because if they were used in a SELECT, UPDATE, DELETE or INSERT clause, their effect would be ambiguous at best. ie, SELECT * FROM foo WHERE bar(); Doesn't make much sense. In SQL2003, procedures are invoked using: CALL foo(); One of our original goals was to allow easier invocation of stored procedures from PL/PgSQL, by allowing "foo()" rather than "CALL foo()" or "PERFORM foo()". Neil submitted some preliminary patches implementing this; following discussion with Tom, it was agreed to make CALL optional at the SQL level, which provides the same effect: http://archives.postgresql.org/pgsql-hackers/2004-09/msg00412.php In this sense, invoking a procedure follows the same rules as invoking a function which returns void. Procedure arguments ------------------- >From what I can tell, there are only 3 ways to pass data to a procedure: 1) As a literal value: CALL foo('Then one day....'); CALL bar(1, 2, 3); CALL baz(current_timestamp) Note that this only works for parameters which are of mode IN (see more below). 2) As a variable Other databases, and SQL2003, support a few different implementations what could be called variables. In SQL2003, there are a few types of 'variables': i) Host parameters ii) Column references to transition variables (OLD and NEW in the case of triggers and rules) iii) Dynamic parameters (ie, ? in the case of prepare) iv) Embedded variables (i) and (iv) are basically bindings to the client side variables. They provide a mechanism for interface designers to allow users to associate variables with a procedure, execute that procedure, and have the new value of the variables available to the language once the execution call returns. JDBC 3.0 focuses fairly heavily on this kind of thing, for example. I think we only need implement one of these and I discuss it below under 'Host Variables'. 3) A transition variable These are the NEW and OLD variables in trigger and rule specifications. If we are calling a procedure, we will need to make sure that every argument is one of these. Only (2) and (3) can be passed as parameters which are marked OUT or INOUT. SQL2003 makes the following distinction between parameter modes: IN - A 'read only' argument. It must be initialised. OUT - An uninitialised argument which, from memory, SQL2003 says *must* be initialised by the function itself. INOUT - A combination of the two above. That is, an initialised argument which can be written to. Host Variables -------------- Host variables are required to make INOUT and OUT parameters useful, so this is an important part of the implementation of procedures. The parser will need to be modified to recognise the host parameter syntax, which is ':<varname>'. We could restrict this to something like CALL or expand it so that normal queries could use variables too. I cannot think of any reason to restrict use but I'm open to what people think. We'll need a host variable node just like we have Const, etc, so that the variable can be propagated through the system. The variable should be substituted within the executor. This ensures that EXPLAIN wont break. Then again, if we don't allow host parameters in normal queries, its not a problem. If we do, I think we could get away with piggy backing on the PREPARE/EXECUTE code with the possibility of allowing the planner to take a look at the current value of a given host variable so that it can generate a better plan. Then again, using variables in standard queries may just be a very bad idea. As for the creation and modification of host parameters, we have two options: an SQL language construct, which MySQL uses for example, or a protocol level mechanism which Oracle appears to use. As far as I can understand from SQL2003, host parameters are predefine there at procedure creation time. I don't really get this and I haven't seen it in another database. If someone could explain it and thinks it would serve us better than that covered below, please let me know :-). If we go down the grammar route, I'd imagine that we do something like: SET VARIABLE <varname> = <val> SHOW VARIABLE <varname> I've used SET and SHOW because they're intuitive. Obviously these are used by GUC and I'm not proposing we use GUC for host parameters as we want a session-local implementation. It would be useful if we could also add some type information to this so make procedure candidate selection easier. Ideas? The other option is that we do it at the protocol level and modify libpq (and psql) to add support. This would allow us something like: template1=# \vc x int -- create a variable 'x' of type int template1=# \vs x 2 -- set it to '2' template1=# CALL proc_with_inout_param(:x); CALL template1=# \vp x -- print 'x' 120 That might need some massaging but you get the idea. The commands to psql translate to lower level protocol level commands which are: create variable, set (ie, modify the value of) variable and describe variable. Neil thinks we should have create and set in one step. This is great, since it covers most cases, but I'm not sure how we have uninitialised values. Perhaps we set to NULL? I think interacting with variables as above is actually quite ugly as well but that has nothing to do with implementation at the protocol level rather my own lack of UI skills :-). So, what does doing this at the protocol level give us? Well, it keeps it out of the grammar and decreases overhead for interfaces. Still, those reasons aren't great when we're look at a protocol modification. I'd love to hear other people's ideas here: maybe I've overlooked a much more attractive alternative. Getting back to the host variables themselves. Most implementations I've seen only support a very limited range of types, such as varchar, numeric and int. I don't see any reason why we can't support the full range of types available in the system. The visibility of variables is restricted to the session which created them. That is, they're just like our implementation of temporary tables. The only other question (that I can think of now) with respect to variables is how they are affected by transactions. My gut feel is that they should have transactional semantics. For example: template1=# \vc x int template1=# \vs x 2 template1=# BEGIN; BEGIN template1=# CALL proc_which_sets_arg_to_100(:x); CALL template1=# ABORT; template1=# \vp x What should 'x' be? Following transactional semantics, it should be 2. The only problem is, implementing the transaction semantics might be painful. Bruce made a suggestion to me a while ago that we could just put the variable in a temp table. But that limits the scope for types and it also may slow performance in a long running transaction which modifies a variable a lot. That is, its a lot of overhead. The other option is a hash (or some dynamic structure) in which we attach some transaction information to each variable. We will need to process this from CommitTransaction()/AbortTransaction(). We'll also need to be subtransaction aware. Because we don't need to manage concurrent access to variables, the logic needed to implement transactional semantics by hand shouldn't be too difficult. PL/PgSQL -------- Its in PL/PgSQL that procedures will be most intensively used, I believe. Since variables are already supported in the language we will need to modify that system to make OUT and INOUT parameters work. Currently, we copy each the argument before we call a function so that any (potential) write to it will not be reflected in the caller. We're also protected here by the executor which doesn't allow us to update arguments since they are marked as constants. We'll also need to tweak this. The other thing we will need to look out for if we're executing a procedure is if the user attempts to RETURN a non-void value. This would be an error, of course. Simply using RETURN to return control to the caller shouldn't be a problem in my opinion, however. Neil has also pointed out to me that this is also the exact behaviour required of functions returning void. Permissions ----------- We can use the same permissions as for functions. ---- I will be away for the next few days (until Monday). I hope to have Internet access but I'm not sure. Neil and I have worked through this together so he can also answer any questions too. Gavin
pgsql-hackers by date: