SQL-Invoked Procedures for 8.1 - Mailing list pgsql-hackers

From Gavin Sherry
Subject SQL-Invoked Procedures for 8.1
Date
Msg-id 20040923130000.A9971@linuxworld.com.au
Whole thread Raw
Responses Re: SQL-Invoked Procedures for 8.1
Re: SQL-Invoked Procedures for 8.1
Re: SQL-Invoked Procedures for 8.1
List pgsql-hackers
Hi all,

Following is a proposal to implement what SQL2003 calls 'SQL-Invoked
Procedures' and what most people refer to as stored procedures. Fujitsu
will be funding Neil Conway and I to work on this feature.

Procedures
----------

Procedures are nearly identical to functions. From my reading of
SQL2003 (see sections 4.27 and 11.50) procedures are different from
functions in three ways:

1) Unlike functions, procedures cannot return values.

2) Procedures support parameter modes which specify whether an argument is
IN, OUT or IN OUT

3) Procedures can be run in the same savepoint level as the caller when
OLD SAVEPOINT LEVEL is specified at creation time. According to SQL2003,
functions must be run on a new savepoint level. From my understanding, we
do not do this currently.

Work will focus on 1 and 2 until we have the concept of savepoint levels
with functions. Its possible that we will implement this too if there is
demand.

This makes the difference between procedures and functions quite
superficial: procedures are functions which return void and have parameter
modes. As such, I think we can largely wrap around the existing function 
creation code using this rule. That is, CREATE PROCEDURE could and I think
should be syntactic sugar.

If we go down the route of saying that procedures are a type of function,
we have the option of allowing users access to OUT and INOUT in functions.
This would make procedures simply a subset of functions. What do people think? 

Grammar modifications
---------------------

The grammar for creating a procedure, which is a subset of the SQL 2003
grammar, is as follows:

CREATE [ OR REPLACE ] PROCEDURE <name> ( [ [ IN | OUT | IN OUT ] [ name ] type, ... ] )<routine
characteristics><routinebody>
 

<routine characteristics> are LANGUAGE, null-call and security clauses,
all of which are already implemented. <routine body> is the body, same as
with the existing functions implementation.

DROP PROCEDURE and CALL will also need to be added.

System catalog changes
-----------------------

There will be cases when we need to identify whether a routine is a
function or a procedure. This could be done two ways. We could say that
any proc in pg_proc which returns void is a procedure or we could store this
in some 'protype' column. Thoughts?

To handle parameter modes, we will also need to add an array of "char"s so
that we can determine the mode of an element in proargnames[].

Invocation
----------

Since procedures do not return a value they are invoked differently than
functions are (at least according to SQL2003). This is because if they were 
used in a SELECT, UPDATE, DELETE or INSERT clause, their effect would be 
ambiguous at best. ie,

SELECT * FROM foo WHERE bar();

Doesn't make much sense.

In SQL2003, procedures are invoked using:

CALL foo();

One of our original goals was to allow easier invocation of stored
procedures from PL/PgSQL, by allowing "foo()" rather than "CALL foo()"
or "PERFORM foo()". Neil submitted some preliminary patches
implementing this; following discussion with Tom, it was agreed to
make CALL optional at the SQL level, which provides the same effect:

http://archives.postgresql.org/pgsql-hackers/2004-09/msg00412.php

In this sense, invoking a procedure follows the same rules as invoking a
function which returns void.

Procedure arguments
-------------------

>From what I can tell, there are only 3 ways to pass data to a
procedure:

1) As a literal value:

CALL foo('Then one day....');
CALL bar(1, 2, 3);
CALL baz(current_timestamp)

Note that this only works for parameters which are of mode IN (see
more below).

2) As a variable

Other databases, and SQL2003, support a few different implementations what
could be called variables. In SQL2003, there are a few types of
'variables':

i) Host parameters
ii) Column references to transition variables (OLD and NEW in the case of
triggers and rules)
iii) Dynamic parameters (ie, ? in the case of prepare)
iv) Embedded variables

(i) and (iv) are basically bindings to the client side variables. They
provide a mechanism for interface designers to allow users to associate
variables with a procedure, execute that procedure, and have the new value
of the variables available to the language once the execution call
returns. JDBC 3.0 focuses fairly heavily on this kind of thing, for
example.

I think we only need implement one of these and I discuss it below under
'Host Variables'.

3) A transition variable

These are the NEW and OLD variables in trigger and rule specifications.

If we are calling a procedure, we will need to make sure that every
argument is one of these.

Only (2) and (3) can be passed as parameters which are marked OUT or
INOUT. SQL2003 makes the following distinction between parameter modes:

IN - A 'read only' argument. It must be initialised.
OUT - An uninitialised argument which, from memory, SQL2003 says *must* be
initialised by the function itself.
INOUT - A combination of the two above. That is, an initialised argument
which can be written to.

Host Variables
--------------

Host variables are required to make INOUT and OUT parameters useful,
so this is an important part of the implementation of procedures.

The parser will need to be modified to recognise the host parameter syntax,
which is ':<varname>'. We could restrict this to something like CALL or
expand it so that normal queries could use variables too. I cannot
think of any reason to restrict use but I'm open to what people think.

We'll need a host variable node just like we have Const, etc, so that
the variable can be propagated through the system. The variable should be
substituted within the executor. This ensures that EXPLAIN wont break.
Then again, if we don't allow host parameters in normal queries, its not
a problem.

If we do, I think we could get away with piggy backing on the PREPARE/EXECUTE 
code with the possibility of allowing the planner to take a look at the current
value of a given host variable so that it can generate a better plan.
Then again, using variables in standard queries may just be a very bad
idea.

As for the creation and modification of host parameters, we have two options:
an SQL language construct, which MySQL uses for example, or a protocol level
mechanism which Oracle appears to use.

As far as I can understand from SQL2003, host parameters are predefine there
at procedure creation time. I don't really get this and I haven't seen it in
another database. If someone could explain it and thinks it would serve us
better than that covered below, please let me know :-).

If we go down the grammar route, I'd imagine that we do something like:

SET VARIABLE <varname> = <val>
SHOW VARIABLE <varname>

I've used SET and SHOW because they're intuitive. Obviously these are used
by GUC and I'm not proposing we use GUC for host parameters as we want a
session-local implementation. It would be useful if we could also add some
type information to this so make procedure candidate selection easier. Ideas?

The other option is that we do it at the protocol level and modify libpq 
(and psql) to add support. This would allow us something like:

template1=# \vc x int -- create a variable 'x' of type int
template1=# \vs x 2 -- set it to '2'
template1=# CALL proc_with_inout_param(:x);
CALL
template1=# \vp x -- print 'x'
120

That might need some massaging but you get the idea. The commands to psql
translate to lower level protocol level commands which are: create
variable, set (ie, modify the value of) variable and describe variable. Neil
thinks we should have create and set in one step. This is great, since it
covers most cases, but I'm not sure how we have uninitialised values. Perhaps
we set to NULL?

I think interacting with variables as above is actually quite ugly as well
but that has nothing to do with implementation at the protocol level rather
my own lack of UI skills :-).

So, what does doing this at the protocol level give us? Well, it keeps it out
of the grammar and decreases overhead for interfaces. Still, those reasons
aren't great when we're look at a protocol modification. I'd love to hear
other people's ideas here: maybe I've overlooked a much more attractive
alternative.

Getting back to the host variables themselves.

Most implementations I've seen only support a very limited range of types,
such as varchar, numeric and int. I don't see any reason why we can't
support the full range of types available in the system.

The visibility of variables is restricted to the session which created
them. That is, they're just like our implementation of temporary tables.

The only other question (that I can think of now) with respect to
variables is how they are affected by transactions. My gut feel is
that they should have transactional semantics.  For example:

template1=# \vc x int
template1=# \vs x 2
template1=# BEGIN;
BEGIN
template1=# CALL proc_which_sets_arg_to_100(:x);
CALL
template1=# ABORT;
template1=# \vp x

What should 'x' be? Following transactional semantics, it should be 2.

The only problem is, implementing the transaction semantics might be
painful. Bruce made a suggestion to me a while ago that we could just put
the variable in a temp table. But that limits the scope for types and it
also may slow performance in a long running transaction which modifies a
variable a lot. That is, its a lot of overhead.

The other option is a hash (or some dynamic structure) in which we
attach some transaction information to each variable. We will need to
process this from CommitTransaction()/AbortTransaction(). We'll also
need to be subtransaction aware. Because we don't need to manage
concurrent access to variables, the logic needed to implement
transactional semantics by hand shouldn't be too difficult.

PL/PgSQL
--------

Its in PL/PgSQL that procedures will be most intensively used, I
believe.  Since variables are already supported in the language we
will need to modify that system to make OUT and INOUT parameters
work. Currently, we copy each the argument before we call a function
so that any (potential) write to it will not be reflected in the
caller. We're also protected here by the executor which doesn't allow
us to update arguments since they are marked as constants. We'll also
need to tweak this.

The other thing we will need to look out for if we're executing a
procedure is if the user attempts to RETURN a non-void value. This would be an
error, of course. Simply using RETURN to return control to the caller shouldn't
be a problem in my opinion, however. Neil has also pointed out to me that
this is also the exact behaviour required of functions returning void.

Permissions
-----------

We can use the same permissions as for functions.

----

I will be away for the next few days (until Monday). I hope to have Internet
access but I'm not sure. Neil and I have worked through this together so
he can also answer any questions too.

Gavin



pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: CVS configure failure
Next
From: Tom Lane
Date:
Subject: Re: BUG: possible busy loop when connection is closed while trying to establish SSL connection