Package support for Postgres - Mailing list pgsql-hackers
From | Bill Studenmund |
---|---|
Subject | Package support for Postgres |
Date | |
Msg-id | Pine.NEB.4.33.0110110558030.15927-400000@vespasia.home-net.internetconnect.net Whole thread Raw |
Responses |
Re: Package support for Postgres
Re: Package support for Postgres |
List | pgsql-hackers |
Zembu has decided to release the result of a recent Postgres developement project to the Postgres project. This project (for which I was the lead developer) adds Oracle-like package support to Postgres. I'm in the process of making a version of the patch which is relative to the current cvs tree. The change is fairly encompasing, weighing in at around 800k of unified diffs, of which about 200k are the real meat. Before I send it in, though, I thought I'd see what people think of the idea. Oh, this feature would definitly be a 7.3 feature, 7.2 is too close to the door for this to go in. :-) This message is rather long. I've divided it into sections which start with "**". ** What are Packages So what are packages? In Oracle, they are a feature which helps developers make stored procedures and functions. They provide a name space for functions local to the package, session-specific package variables, and initialization routines which are run before any other routine in the package. Also, all parts of a package are loaded and unloaded at once - you can't have it partially installed. All of these features make life much easier for stored-procedure developers. The name space feature means that you can name the routines in your package whatever you want, and they won't conflict with the names either in other packages or with functions not in a package. All you need to do is ensure that no other package has the same name as yours. ** What did I do, and what does a package declaration look like? What I've done is impliment Oracle packages with a Postgres flair. There is a new command, CREATE PACKAGE <name> AS which defines a package. For those of you up on Oracle packages, this command duplicates the Oracle CREATE PACKAGE BODY command - there is no Postgres equivalent of the Oracle CREATE PACKAGE command. Packages are listed in a new system table, pg_package, and are referenced in other tables by the oid of the row in pg_package. There are seven different components which can be present in a package, and so a CREATE PACKAGE command contains seven stanza types. A package can be made up of functions, types, operators, aggregates, package-global variables, initialization routines, and functions usable for type declarations. Four of the stanzas are easy to understand; to create a function, a type, an aggregate, or an operator, you include a stanza which is the relevant CREATE command without the CREATE keyword. Thus the FUNCTION stanza creates a function, the TYPE stanza creates a type, AGGREGATE => an aggregate, and OPERATOR => an operator. The initializer routines and package-global variables are done a bit differently than in Oracle, reflecting Postgres's strength at adding languages. Postgres supports six procedural languages (plpgsql, pltcl, pltclu, plperl, plperlu, and plpython) whereas I think Oracle only supports two (PL/SQL and I herad they added a java PL). The main difference is that the variables and the initializer routines are language specific. So you can have different variables for plpgsql than for pltcl. Likewise for initializers. Package-global variables are defined as: DECLARE <variable name> '<variable type>' [, <next name> '<next type>' ] LANGUAGE 'langname' The type is delimited by single quotes so that the postgres parser didn't have to learn the syntax of each procedural language's variable types. Initializer routines are declared like normal functions, except the function name and signature (number & type of arguements and return type) are not given. The name is automatically generated (it is __packinit_ followed by the language name) and the function signature should not be depended on. It is to take no parameters and return an int4 for now, but that should probably change whenever PG supports true procedures. Initializer routines are declared as: BODY AS 'function body' LANGUAGE 'lanname' [with <with options>] I'm attaching a sample showing a package initialization routine and global variable declaration. There's a syntax error in it, which I asked about in another EMail. The last component of a package are the functions usable for type declarations. They are declared as: BEFORE TYPE FUNCTION <standard package function declaration> They are useful as the normal functions in a package are declared after the types are declared, so that they can use a type newly-defined in a package. Which is fine, except that to define a type, you have to give an input and an output function. BEFORE TYPE FUNCTIONs are used to define those functions. Other than exactly when they are created in package loading, they are just like other functions in the package. I'm attaching an example which defines the type 'myint4' (using the internal int4 routines) and proceeds to declare routines using the new type. ** So how do I use things in a package? You don't have to do anything special to use a type or an operator defined in a package - you just use it. Getting technical, operators and types in packages are in the same name space as are types and operators not in packages. To follow along with the example I attached above, the 'myint4' type is usable in the typetest package, in tables, in other packages, and in "normal" functions. For functions and aggregates, things are a little more complicated. First off, there is a package called "standard" which contains all types, aggregates, operators, and functions which aren't in a specific package. This includes all of the standard Postgres routines, and anything created with CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, and CREATE TYPE. Secondly, parsing is always done in terms of a specified package context. If we are parsing an equation in a routine inside of a package, then the package context is that package. If we are just typing along in psql, then the package context is "standard". When you specify a function or aggregate, you have two choices. One is to specify a package, and a function in that package, like "nametest.process" to specify the "process" function in the "nametest" package. The other choice is to just give the function's name. The first place Postgres will look is in the package context used for parsing. If it's not there (and that context wasn't "standard"), then it will look in "standard". So for example in the type declaration example attached, the type stanza uses "myint4in" and "myint4out" as the input and output routines, and finds the ones declared as part of the package. I've attached a sample showing off namespaces. It has two non-package routines, and one package named "nametest". Here's a sample session: testing=# select standard.process(4); process ------------------I am in standard (1 row) testing=# select nametest.process(4); process ---------------------I am in the package (1 row) testing=# select nametest.docheck(); docheck ---------------------I am in the package (1 row) First we see that the standard.process() routine says it is in the "standard" package, and that the nametest.process() routine says it is in the package. Then we call the nametest.docheck() routine. It evaluates "process(changer(4));" in the context of the nametest package. We find the process() routine in the package, and use it. The changer routine is there to test how typecasting works. It verifies that Postgres would typecast the return of changer into a different integer and call the process() routine in the package rather than call the process() routine in standard. This behavior matches Oracle's. The other routines in the package show of some examples of how sql will parse according to the above rules. Initialization routines: There is only one recomended way to use them: call a function written in the same PL in the package. That will cause the initialization routine to be run. Assuming there are no errors, the routine you call won't be executed until after the initialization routine finishes. Of course the non-recomended way is to manually call __packinit_<langname> directly. The problem with that is that you are depending on implimentation details which might change. Like exactly how the name is generated (which probably won't change) and the calling convention (which hopefully will if procedures are ever suported). Package-global variables: Just use them. Assuming that the procedural language supports global variables, they just work. Note that as with Oracle, each backend will get its own set of variables. No effort is made to coordinate values across backends. But chances are you don't want to do that, and if you did, just make a table. :-) ** So what is the state of the diffs? The diffs contain changes to last week's current (I'll cvs update before sending out) which add package support to the backend, plpgsql, the SPI interface, initdb, and pg_dump. The changes also include modifying the system schema to support packages (pg_package which lists packages, pg_packglobal which list global variables, and adding a package identifier to pg_aggretage, pg_operator, pg_proc and pg_type). The big things missing are documentation, and regression tests which explicitly test packages. Also, plpgsql is the only PL with package support. Adding package support doesn't make sense for the 'C' and 'internal' languages, as you can manually add "global" variables and initialization routines yourself. It also doesn't make sense for 'sql' as sql doesn't support variables. The other languages need to gain package support, and I'll appreciate help from their authors. :-) So I'd better wrap up here. Monday I'll send the diffs to the patches list, and also send a message talking about more of the details of the changes. What do folks think? Take care, Bill
pgsql-hackers by date: