Package support for Postgres - Mailing list pgsql-hackers

From Bill Studenmund
Subject Package support for Postgres
Date
Msg-id Pine.NEB.4.33.0110110558030.15927-400000@vespasia.home-net.internetconnect.net
Whole thread Raw
Responses Re: Package support for Postgres
Re: Package support for Postgres
List pgsql-hackers
Zembu has decided to release the result of a recent Postgres developement
project to the Postgres project. This project (for which I was the lead
developer) adds Oracle-like package support to Postgres. I'm in the
process of making a version of the patch which is relative to the current
cvs tree. The change is fairly encompasing, weighing in at around
800k of unified diffs, of which about 200k are the real meat. Before I
send it in, though, I thought I'd see what people think of the idea. Oh,
this feature would definitly be a 7.3 feature, 7.2 is too close to the
door for this to go in. :-)

This message is rather long. I've divided it into sections which start
with "**".

** What are Packages

So what are packages? In Oracle, they are a feature which helps developers
make stored procedures and functions. They provide a name space for
functions local to the package, session-specific package variables, and
initialization routines which are run before any other routine in the
package. Also, all parts of a package are loaded and unloaded at once -
you can't have it partially installed.

All of these features make life much easier for stored-procedure
developers. The name space feature means that you can name the routines in
your package whatever you want, and they won't conflict with the names
either in other packages or with functions not in a package. All you need
to do is ensure that no other package has the same name as yours.

** What did I do, and what does a package declaration look like?

What I've done is impliment Oracle packages with a Postgres flair. There
is a new command, CREATE PACKAGE <name> AS which defines a package. For
those of you up on Oracle packages, this command duplicates the Oracle
CREATE PACKAGE BODY command - there is no Postgres equivalent of the
Oracle CREATE PACKAGE command.

Packages are listed in a new system table, pg_package, and are referenced
in other tables by the oid of the row in pg_package.

There are seven different components which can be present in a package,
and so a CREATE PACKAGE command contains seven stanza types. A package can
be made up of functions, types, operators, aggregates, package-global
variables, initialization routines, and functions usable for type
declarations. Four of the stanzas are easy to understand;  to create a
function, a type, an aggregate, or an operator, you include a stanza which
is the relevant CREATE command without the CREATE keyword. Thus the
FUNCTION stanza creates a function, the TYPE stanza creates a type,
AGGREGATE => an aggregate, and OPERATOR => an operator.

The initializer routines and package-global variables are done a bit
differently than in Oracle, reflecting Postgres's strength at adding
languages. Postgres supports six procedural languages (plpgsql, pltcl,
pltclu, plperl, plperlu, and plpython) whereas I think Oracle only
supports two (PL/SQL and I herad they added a java PL). The main
difference is that the variables and the initializer routines are language
specific. So you can have different variables for plpgsql than for pltcl.
Likewise for initializers.

Package-global variables are defined as:
DECLARE <variable name> '<variable type>' [, <next name> '<next type>' ]
LANGUAGE 'langname'

The type is delimited by single quotes so that the postgres parser didn't
have to learn the syntax of each procedural language's variable types.

Initializer routines are declared like normal functions, except the
function name and signature (number & type of arguements and return type)
are not given. The name is automatically generated (it is __packinit_
followed by the language name) and the function signature should not be
depended on. It is to take no parameters and return an int4 for now, but
that should probably change whenever PG supports true procedures.
Initializer routines are declared as:
BODY AS 'function body' LANGUAGE 'lanname' [with <with options>]

I'm attaching a sample showing a package initialization routine and global
variable declaration. There's a syntax error in it, which I asked about in
another EMail.

The last component of a package are the functions usable for type
declarations. They are declared as:
BEFORE TYPE FUNCTION <standard package function declaration>

They are useful as the normal functions in a package are declared after
the types are declared, so that they can use a type newly-defined in a
package. Which is fine, except that to define a type, you have to give an
input and an output function. BEFORE TYPE FUNCTIONs are used to define
those functions. Other than exactly when they are created in package
loading, they are just like other functions in the package.

I'm attaching an example which defines the type 'myint4' (using the
internal int4 routines) and proceeds to declare routines using the new
type.

** So how do I use things in a package?

You don't have to do anything special to use a type or an operator defined
in a package - you just use it. Getting technical, operators and types in
packages are in the same name space as are types and operators not in
packages. To follow along with the example I attached above, the 'myint4'
type is usable in the typetest package, in tables, in other packages, and
in "normal" functions.

For functions and aggregates, things are a little more complicated. First
off, there is a package called "standard" which contains all types,
aggregates, operators, and functions which aren't in a specific package.
This includes all of the standard Postgres routines, and anything created
with CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, and CREATE TYPE.

Secondly, parsing is always done in terms of a specified package context.
If we are parsing an equation in a routine inside of a package, then the
package context is that package. If we are just typing along in psql, then
the package context is "standard".

When you specify a function or aggregate, you have two choices. One is to
specify a package, and a function in that package, like
"nametest.process" to specify the "process" function in the "nametest"
package.

The other choice is to just give the function's name. The first place
Postgres will look is in the package context used for parsing. If it's not
there (and that context wasn't "standard"), then it will look in
"standard". So for example in the type declaration example attached, the
type stanza uses "myint4in" and "myint4out" as the input and output
routines, and finds the ones declared as part of the package.

I've attached a sample showing off namespaces. It has two non-package
routines, and one package named "nametest".

Here's a sample session:

testing=# select standard.process(4);    process
------------------I am in standard
(1 row)
testing=# select nametest.process(4);      process
---------------------I am in the package
(1 row)
testing=# select nametest.docheck();      docheck
---------------------I am in the package
(1 row)

First we see that the standard.process() routine says it is in the
"standard" package, and that the nametest.process() routine says it is in
the package. Then we call the nametest.docheck() routine.

It evaluates "process(changer(4));" in the context of the nametest
package. We find the process() routine in the package, and use it.

The changer routine is there to test how typecasting works. It verifies
that Postgres would typecast the return of changer into a different
integer and call the process() routine in the package rather than call the
process() routine in standard. This behavior matches Oracle's.

The other routines in the package show of some examples of how sql will
parse according to the above rules.

Initialization routines:

There is only one recomended way to use them: call a function written in
the same PL in the package. That will cause the initialization routine to
be run. Assuming there are no errors, the routine you call won't be
executed until after the initialization routine finishes.

Of course the non-recomended way is to manually call __packinit_<langname>
directly. The problem with that is that you are depending on
implimentation details which might change. Like exactly how the name is
generated (which probably won't change) and the calling convention (which
hopefully will if procedures are ever suported).

Package-global variables:

Just use them. Assuming that the procedural language supports global
variables, they just work. Note that as with Oracle, each backend will get
its own set of variables. No effort is made to coordinate values across
backends. But chances are you don't want to do that, and if you did, just
make a table. :-)

** So what is the state of the diffs?

The diffs contain changes to last week's current (I'll cvs update before
sending out) which add package support to the backend, plpgsql, the SPI
interface, initdb, and pg_dump. The changes also include modifying the
system schema to support packages (pg_package which lists packages,
pg_packglobal which list global variables, and adding a package identifier
to pg_aggretage, pg_operator, pg_proc and pg_type).

The big things missing are documentation, and regression tests which
explicitly test packages.

Also, plpgsql is the only PL with package support. Adding package support
doesn't make sense for the 'C' and 'internal' languages, as you can
manually add "global" variables and initialization routines yourself. It
also doesn't make sense for 'sql' as sql doesn't support variables. The
other languages need to gain package support, and I'll appreciate help
from their authors. :-)

So I'd better wrap up here. Monday I'll send the diffs to the patches
list, and also send a message talking about more of the details of the
changes.

What do folks think?

Take care,

Bill

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: New contrib/tsearch module for 7.2
Next
From: Bruce Momjian
Date:
Subject: Re: FAQ error