Thread: Oracle Style packages on postgres
Oracle Style packages on postgres OVERVIEW: To emulate oracle server side development in postgres I required server side packages. The following text demonstrates how to do this using plpython on postgres 8 and suggests a language extension. WHAT ARE ORACLE PACKAGES? Looking back over the postgres discussion forums (particulary a discussion in 2001 following a proposal by Bill Studenmund) there appears to be some confusion over what oracle packages are. Here's a concise definition : "A black box processing engine with one or morepublic access functions that retains state across calls" An oracle package is created when first referenced. Its initialization code is run once (ie costly queries to populate session wide package params) and the package dies at the end of the session An analogy with OOP is that it's like having a single class instance available for the duration of a session. SOME POWERFUL USES OF PACKAGES: 1. Pipes - oracle dbms_pipe built-in allows asynchronous communication between any number of producer/consumer database sessions on any number of pipes 2. Logging - leave all logging/debug statements in code, decision on logging output can be made when the logging package is initialised (eg by querying lookup tables for user, on/off, level, and destination). Combine logging with pipes and the output can be stored in tables seperate from the current transaction. Include timing info down to milliseconds and live problems/bottlenecks can more easily be identified. 3. Batch reporting - more suited to autonomous transactions than logging but useful to have the report package store start time, duration, error/warning count running totals etc. and summarize automatically at report end. See the example below on how to implement a version of the oracle dbms_output package in plpython EXTENSIONS TO POSTGRES: Oracle style package creation syntax is split into header and body so that the body(code) can be re-compiled without invalidating dependent objects. Postgres syntax for the dbms_output example (in any postgres server side language) would be along the lines of: CREATE OR REPLACE PACKAGE HEADER dbms_output AS FUNCTION dbms_output_put_line(text) RETURNS text, FUNCTION dbms_output_get_lines()RETURNS text; CREATE OR REPLACE PACKAGE BODY dbms_output AS $$ 'package code' $$ <language>; Adding pg_package with a link from pg_proc are the only changes required to the data dictionary. It would be nice to have similar dotted syntax as oracle (user.package.function) but would this mess up postgres namespaces? The language in which the package was created would process the 'package code', for example in python: o create public functions linking header declaration to package body code (see dbms_output example) o process embedded sql, eg l_curs=select * from dual -> l_curs=self.execute('select * from dual') o the extracted sql can be 'prepared' by postgres and syntax exceptions reported as compilation errors SUMMARY: Packages are an important addition to postgres. Some of the server side languages have the potential to create them now. It would be useful to add a common high level syntax before the various language implementations start developing their own solutions. I'm currently testing dbms_pipe on postgres, let me know if anyone is interested. I replaced xml-rpc (5 messages/second) by sockets (600x faster!), and may test corba Ronnie Mackay ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- EXAMPLE :CONVERT ORACLE BUILT-IN PACKAGE DBMS_OUTPUT: [Oracle syntax is :exec dbms_output.put_line('line1');] Postgres>select dbms_output_put_line('line 1'); Postgres>select test_call_dbms_output_from_within_plpgsql('line 2 (plpgsql)'); Postgres>select test_call_dbms_output_from_within_plpython('line 3 (plpython)'); Postgres>select dbms_output_put_line('line 4'); Postgres>select dbms_output_get_lines(); --- DBMS_OUTPUT DEMO --- line 1 line 2 (plpgsql) line 3 (plpython) line 4 --- DBMS_OUTPUT DEMO --- So using current postgres syntax the only difference with oracle is that dbms_output.put_line('line 1'); becomes dbms_output_put_line('line 1'); The source code to implement the package body is returned by postgres function dbms_output() POSTGRES CREATE STATEMENTS FOR EXAMPLE: ----------------------------------------------------------------------------- CREATE or replace FUNCTION dbms_output_put_line(text) RETURNS text AS $$ from plpython import getPackage return getPackage(GD, plpy, 'dbms_output').putLine(args[0]) $$ LANGUAGE plpythonu; CREATE or replace FUNCTION dbms_output_get_lines() RETURNS text AS $$ from plpython import getPackage return getPackage(GD, plpy, 'dbms_output').getLines() $$ LANGUAGE plpythonu; -- package body CREATE OR REPLACE FUNCTION dbms_output() RETURNS text AS $$ return """ from plpython import PlPythonPackage class Package(PlPythonPackage): def __init__(self, in_plpy): PlPythonPackage.__init__(self, in_plpy) self.lines=[] def putLine(self, in_text): self.lines.append(in_text) def getLines(self): l_lines=self._title() l_lines+=self.lines l_lines+=self._title() self.lines=[] return chr(10).join(l_lines) def _title(self): return ['--- DBMS_OUTPUT DEMO ---'] """ $$ LANGUAGE plpythonu; CREATE OR REPLACE FUNCTION test_call_dbms_output_from_within_plpython(in_text text) RETURNS text AS $$ from plpython import getPackage dbms_output = getPackage(GD, plpy, 'dbms_output') print dbms_output print dir(dbms_output) dbms_output.putLine(args[0]) $$ LANGUAGE plpythonu; CREATE or replace FUNCTION test_call_dbms_output_from_within_plpgsql(in_text text) RETURNS text AS $$ declare dummy text; BEGIN dummy := dbms_output_put_line(in_text); return ''; END; $$ LANGUAGE plpgsql; ----------------------------------------------------------------------------- PYTHON MODULE (plpython.PlPythonPackage): ----------------------------------------------------------------------------- import imp, sys class PlPythonPackage: """ Base class for postgres emulation of oracle package structure in PlPython """ def __init__(self, in_plpy): self.plpy=in_plpy l_row=self.plpy.execute('select current_user as user, current_database() as database')[0] self.user=l_row["user"] self.database=l_row["database"] def execute(self, in_sql): l_result = self.plpy.execute(in_sql) def getPackage(in_gd, in_plpy, in_package): """ Dynamically load plpython package""" try: return in_gd[in_package] except KeyError: l_result=in_plpy.execute('select %s()'%in_package) l_code=l_result[0].popitem()[1].replace('\n\t','\n') l_module = imp.new_module(in_package) exec l_code in l_module.__dict__ l_package=l_module.Package(in_plpy) in_gd[in_package]=l_package return l_package
One simple benefit to packages is just organization of related code.
On 5/7/05, rmm@sqlisor.com < rmm@sqlisor.com> wrote:
Oracle Style packages on postgres
OVERVIEW:
To emulate oracle server side development in postgres I required server
side packages. The following text demonstrates how to do this using
plpython on postgres 8 and suggests a language extension.
WHAT ARE ORACLE PACKAGES?
Looking back over the postgres discussion forums (particulary a discussion
in 2001 following a proposal by Bill Studenmund) there appears to be some
confusion over what oracle packages are. Here's a concise definition :
"A black box processing engine with one or more public access functions
that retains state across calls"
An oracle package is created when first referenced. Its initialization
code is run once (ie costly queries to populate session wide package
params) and the package dies at the end of the session
An analogy with OOP is that it's like having a single class instance
available for the duration of a session.
SOME POWERFUL USES OF PACKAGES:
1. Pipes - oracle dbms_pipe built-in allows asynchronous communication
between any number of producer/consumer database sessions on any number of
pipes
2. Logging - leave all logging/debug statements in code, decision on
logging output can be made when the logging package is initialised (eg by
querying lookup tables for user, on/off, level, and destination). Combine
logging with pipes and the output can be stored in tables seperate from
the current transaction. Include timing info down to milliseconds and
live problems/bottlenecks can more easily be identified.
3. Batch reporting - more suited to autonomous transactions than logging
but useful to have the report package store start time, duration,
error/warning count running totals etc. and summarize automatically at
report end.
See the example below on how to implement a version of the oracle
dbms_output package in plpython
EXTENSIONS TO POSTGRES:
Oracle style package creation syntax is split into header and body so that
the body(code) can be re-compiled without invalidating dependent objects.
Postgres syntax for the dbms_output example (in any postgres server side
language) would be along the lines of:
CREATE OR REPLACE PACKAGE HEADER dbms_output AS
FUNCTION dbms_output_put_line(text) RETURNS text,
FUNCTION dbms_output_get_lines() RETURNS text;
CREATE OR REPLACE PACKAGE BODY dbms_output AS $$ 'package code' $$
<language>;
Adding pg_package with a link from pg_proc are the only changes required
to the data dictionary.
It would be nice to have similar dotted syntax as oracle
(user.package.function) but would this mess up postgres namespaces?
The language in which the package was created would process the 'package
code', for example in python:
o create public functions linking header declaration to package body code
(see dbms_output example)
o process embedded sql, eg l_curs=select * from dual ->
l_curs=self.execute('select * from dual')
o the extracted sql can be 'prepared' by postgres and syntax exceptions
reported as compilation errors
SUMMARY:
Packages are an important addition to postgres. Some of the server side
languages have the potential to create them now. It would be useful to
add a common high level syntax before the various language implementations
start developing their own solutions.
I'm currently testing dbms_pipe on postgres, let me know if anyone is
interested. I replaced xml-rpc (5 messages/second) by sockets (600x
faster!), and may test corba
Ronnie Mackay
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
EXAMPLE :CONVERT ORACLE BUILT-IN PACKAGE DBMS_OUTPUT:
[Oracle syntax is :exec dbms_output.put_line('line1');]
Postgres>select dbms_output_put_line('line 1');
Postgres>select test_call_dbms_output_from_within_plpgsql('line 2
(plpgsql)');
Postgres>select test_call_dbms_output_from_within_plpython('line 3
(plpython)');
Postgres>select dbms_output_put_line('line 4');
Postgres>select dbms_output_get_lines();
--- DBMS_OUTPUT DEMO ---
line 1
line 2 (plpgsql)
line 3 (plpython)
line 4
--- DBMS_OUTPUT DEMO ---
So using current postgres syntax the only difference with oracle is that
dbms_output.put_line('line 1'); becomes
dbms_output_put_line('line 1');
The source code to implement the package body is returned by postgres
function dbms_output()
POSTGRES CREATE STATEMENTS FOR EXAMPLE:
-----------------------------------------------------------------------------
CREATE or replace FUNCTION dbms_output_put_line(text) RETURNS text AS $$
from plpython import getPackage
return getPackage(GD, plpy, 'dbms_output').putLine(args[0])
$$ LANGUAGE plpythonu;
CREATE or replace FUNCTION dbms_output_get_lines() RETURNS text AS $$
from plpython import getPackage
return getPackage(GD, plpy, 'dbms_output').getLines()
$$ LANGUAGE plpythonu;
-- package body
CREATE OR REPLACE FUNCTION dbms_output() RETURNS text AS $$
return """
from plpython import PlPythonPackage
class Package(PlPythonPackage):
def __init__(self, in_plpy):
PlPythonPackage.__init__(self, in_plpy)
self.lines=[]
def putLine(self, in_text):
self.lines.append(in_text)
def getLines(self):
l_lines=self._title()
l_lines+=self.lines
l_lines+=self._title()
self.lines=[]
return chr(10).join(l_lines)
def _title(self):
return ['--- DBMS_OUTPUT DEMO ---']
"""
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
test_call_dbms_output_from_within_plpython(in_text text) RETURNS text AS
$$
from plpython import getPackage
dbms_output = getPackage(GD, plpy, 'dbms_output')
print dbms_output
print dir(dbms_output)
dbms_output.putLine(args[0])
$$ LANGUAGE plpythonu;
CREATE or replace FUNCTION
test_call_dbms_output_from_within_plpgsql(in_text text) RETURNS text AS $$
declare
dummy text;
BEGIN
dummy := dbms_output_put_line(in_text);
return '';
END;
$$ LANGUAGE plpgsql;
-----------------------------------------------------------------------------
PYTHON MODULE (plpython.PlPythonPackage):
-----------------------------------------------------------------------------
import imp, sys
class PlPythonPackage:
""" Base class for postgres emulation of oracle package structure in
PlPython """
def __init__(self, in_plpy):
self.plpy=in_plpy
l_row=self.plpy.execute('select current_user as user,
current_database() as database')[0]
self.user=l_row ["user"]
self.database=l_row["database"]
def execute(self, in_sql):
l_result = self.plpy.execute(in_sql)
def getPackage(in_gd, in_plpy, in_package):
""" Dynamically load plpython package"""
try:
return in_gd[in_package]
except KeyError:
l_result=in_plpy.execute('select %s()'%in_package)
l_code=l_result[0].popitem()[1].replace('\n\t','\n')
l_module = imp.new_module (in_package)
exec l_code in l_module.__dict__
l_package=l_module.Package(in_plpy)
in_gd[in_package]=l_package
return l_package
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Bob wrote: > One simple benefit to packages is just organization of related code. And the package-scoped variables or constant values, similar to the global variables. It will be very useful for application programmers if one variable can be shared from several functions. I needed some tricks when I tried to port such PL/SQL to PL/pgSQL. Bob wrote: > One simple benefit to packages is just organization of related code. > > On 5/7/05, *rmm@sqlisor.com <mailto:rmm@sqlisor.com>* < rmm@sqlisor.com > <mailto:rmm@sqlisor.com>> wrote: > > Oracle Style packages on postgres > > OVERVIEW: > > To emulate oracle server side development in postgres I required server > side packages. The following text demonstrates how to do this using > plpython on postgres 8 and suggests a language extension. > > WHAT ARE ORACLE PACKAGES? > > Looking back over the postgres discussion forums (particulary a > discussion > in 2001 following a proposal by Bill Studenmund) there appears to be > some > confusion over what oracle packages are. Here's a concise definition : > "A black box processing engine with one or more public access > functions > that retains state across calls" > An oracle package is created when first referenced. Its initialization > code is run once (ie costly queries to populate session wide package > params) and the package dies at the end of the session > An analogy with OOP is that it's like having a single class instance > available for the duration of a session. > > SOME POWERFUL USES OF PACKAGES: > > 1. Pipes - oracle dbms_pipe built-in allows asynchronous communication > between any number of producer/consumer database sessions on any > number of > pipes > > 2. Logging - leave all logging/debug statements in code, decision on > logging output can be made when the logging package is initialised > (eg by > querying lookup tables for user, on/off, level, and > destination). Combine > logging with pipes and the output can be stored in tables seperate from > the current transaction. Include timing info down to milliseconds and > live problems/bottlenecks can more easily be identified. > > 3. Batch reporting - more suited to autonomous transactions than logging > but useful to have the report package store start time, duration, > error/warning count running totals etc. and summarize automatically at > report end. > > See the example below on how to implement a version of the oracle > dbms_output package in plpython > > EXTENSIONS TO POSTGRES: > > Oracle style package creation syntax is split into header and body > so that > the body(code) can be re-compiled without invalidating dependent > objects. > Postgres syntax for the dbms_output example (in any postgres server > side > language) would be along the lines of: > CREATE OR REPLACE PACKAGE HEADER dbms_output AS > FUNCTION dbms_output_put_line(text) RETURNS text, > FUNCTION dbms_output_get_lines() RETURNS text; > CREATE OR REPLACE PACKAGE BODY dbms_output AS $$ 'package code' $$ > <language>; > > Adding pg_package with a link from pg_proc are the only changes required > to the data dictionary. > It would be nice to have similar dotted syntax as oracle > (user.package.function) but would this mess up postgres namespaces? > > The language in which the package was created would process the 'package > code', for example in python: > o create public functions linking header declaration to package > body code > (see dbms_output example) > o process embedded sql, eg l_curs=select * from dual -> > l_curs=self.execute('select * from dual') > o the extracted sql can be 'prepared' by postgres and syntax exceptions > reported as compilation errors > > SUMMARY: > Packages are an important addition to postgres. Some of the server side > languages have the potential to create them now. It would be useful to > add a common high level syntax before the various language > implementations > start developing their own solutions. > > I'm currently testing dbms_pipe on postgres, let me know if anyone is > interested. I replaced xml-rpc (5 messages/second) by sockets (600x > faster!), and may test corba > > Ronnie Mackay > > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > > EXAMPLE :CONVERT ORACLE BUILT-IN PACKAGE DBMS_OUTPUT: > > [Oracle syntax is :exec dbms_output.put_line('line1');] > > Postgres>select dbms_output_put_line('line 1'); > Postgres>select test_call_dbms_output_from_within_plpgsql('line 2 > (plpgsql)'); > Postgres>select test_call_dbms_output_from_within_plpython('line 3 > (plpython)'); > Postgres>select dbms_output_put_line('line 4'); > > Postgres>select dbms_output_get_lines(); > --- DBMS_OUTPUT DEMO --- > line 1 > line 2 (plpgsql) > line 3 (plpython) > line 4 > --- DBMS_OUTPUT DEMO --- > > So using current postgres syntax the only difference with oracle is that > dbms_output.put_line('line 1'); becomes > dbms_output_put_line('line 1'); > The source code to implement the package body is returned by postgres > function dbms_output() > > POSTGRES CREATE STATEMENTS FOR EXAMPLE: > ----------------------------------------------------------------------------- > > > CREATE or replace FUNCTION dbms_output_put_line(text) RETURNS text AS $$ > from plpython import getPackage > return getPackage(GD, plpy, 'dbms_output').putLine(args[0]) > $$ LANGUAGE plpythonu; > > CREATE or replace FUNCTION dbms_output_get_lines() RETURNS text AS $$ > from plpython import getPackage > return getPackage(GD, plpy, 'dbms_output').getLines() > $$ LANGUAGE plpythonu; > > -- package body > CREATE OR REPLACE FUNCTION dbms_output() RETURNS text AS $$ > return """ > from plpython import PlPythonPackage > > class Package(PlPythonPackage): > > def __init__(self, in_plpy): > > PlPythonPackage.__init__(self, in_plpy) > self.lines=[] > > def putLine(self, in_text): > self.lines.append(in_text) > > def getLines(self): > l_lines=self._title() > l_lines+=self.lines > l_lines+=self._title() > self.lines=[] > return chr(10).join(l_lines) > > def _title(self): > return ['--- DBMS_OUTPUT DEMO ---'] > """ > $$ LANGUAGE plpythonu; > > CREATE OR REPLACE FUNCTION > test_call_dbms_output_from_within_plpython(in_text text) RETURNS > text AS > $$ > from plpython import getPackage > dbms_output = getPackage(GD, plpy, 'dbms_output') > print dbms_output > print dir(dbms_output) > dbms_output.putLine(args[0]) > $$ LANGUAGE plpythonu; > > CREATE or replace FUNCTION > test_call_dbms_output_from_within_plpgsql(in_text text) RETURNS text > AS $$ > declare > dummy text; > BEGIN > dummy := dbms_output_put_line(in_text); > return ''; > END; > $$ LANGUAGE plpgsql; > > ----------------------------------------------------------------------------- > > > PYTHON MODULE (plpython.PlPythonPackage): > ----------------------------------------------------------------------------- > > import imp, sys > > class PlPythonPackage: > """ Base class for postgres emulation of oracle package structure in > PlPython """ > > def __init__(self, in_plpy): > self.plpy=in_plpy > l_row=self.plpy.execute('select current_user as user, > current_database() as database')[0] > self.user=l_row ["user"] > self.database=l_row["database"] > > def execute(self, in_sql): > l_result = self.plpy.execute(in_sql) > > def getPackage(in_gd, in_plpy, in_package): > """ Dynamically load plpython package""" > try: > return in_gd[in_package] > except KeyError: > l_result=in_plpy.execute('select %s()'%in_package) > l_code=l_result[0].popitem()[1].replace('\n\t','\n') > l_module = imp.new_module (in_package) > exec l_code in l_module.__dict__ > l_package=l_module.Package(in_plpy) > in_gd[in_package]=l_package > return l_package > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org <mailto:majordomo@postgresql.org> > > -- NAGAYASU Satoshi <nagayasus@nttdata.co.jp> OpenSource Development Center, NTT DATA Corp. http://www.nttdata.co.jp/
Satoshi Nagayasu wrote: >> An oracle package is created when first referenced. Its initialization >> code is run once (ie costly queries to populate session wide package >> params) and the package dies at the end of the session >> An analogy with OOP is that it's like having a single class instance >> available for the duration of a session. >> PL/Java has an object called "Session" that does exactly this. It is not available from other languages at present. Are Packages supposed to be cross-language? Regards, Thomas Hallgren
On E, 2005-05-09 at 07:36 +0200, Thomas Hallgren wrote: > Satoshi Nagayasu wrote: > > >> An oracle package is created when first referenced. Its initialization > >> code is run once (ie costly queries to populate session wide package > >> params) and the package dies at the end of the session > >> An analogy with OOP is that it's like having a single class instance > >> available for the duration of a session. > >> > PL/Java has an object called "Session" that does exactly this. And pl/python has a global dictionary SD for the same purpose. > It is not > available from other languages at present. Are Packages supposed to be > cross-language? Probably not, as they already have most of the needed features. Maybe we can set up some lighter version of package for cross-language features (like installing removing a group of functions) but this are much less needed for more advanced languages. -- Hannu Krosing <hannu@skype.net>
On Sun, May 08, 2005 at 10:38:41PM -0500, Bob wrote: > One simple benefit to packages is just organization of related code. Which, IMHO, is greatly diminished by the lack of schema.package.function notation. BTW, the original post referred to this as user.package.function, but I believe that technically it's actually schema.package.function (Oracle tends to mix schemas and users). In any case, schema.package.function is what would make sense in PostgreSQL. Personally, I think the biggest win here would be adding package support and syntax to plpgsql. Not only would it make porting from Oracle easier, it would also make plpgsql much, much more powerful. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Rmm, > "A black box processing engine with one or more public access functions > that retains state across calls" In other words, an Object. <grin> > Oracle style package creation syntax is split into header and body so that > the body(code) can be re-compiled without invalidating dependent objects. > Postgres syntax for the dbms_output example (in any postgres server side > language) would be along the lines of: > CREATE OR REPLACE PACKAGE HEADER dbms_output AS > FUNCTION dbms_output_put_line(text) RETURNS text, > FUNCTION dbms_output_get_lines() RETURNS text; > CREATE OR REPLACE PACKAGE BODY dbms_output AS $$ 'package code' $$ > <language>; Hmmm. What about package variables? For me, this is one of the most valuable parts of packages. I've also never much liked Oracle's seperate package_header and package_body declaration structure: if the two are intrinsically tied, why not make it one declaration? Is syntactical compatibility important enough that we need to imitate their design errors? > Adding pg_package with a link from pg_proc are the only changes required > to the data dictionary. > It would be nice to have similar dotted syntax as oracle > (user.package.function) but would this mess up postgres namespaces? Yes, actually. If you look at the discussion, this is what killed the 2001 proposal; packages were proposed as orthagonal to schema which was not acceptable. However, now that schema are well established, it seems like this namespace issue is limited. The problem would be that you'd have to make sure that no two schema and packages had the same name, or that there would be an automatic precedence of shema, package established. So, given a shema named "dataloader" and a package named "dataloader" and a function named "copy_it(filename)", what would happen is: dataloader.dataloader.copy_it('/tmp/somefile') ... would be absolutely clear dataloader.copy_it('/tmp/somefile') ... would attempt to call the copy_it function in the dataloader *schema*, not the dataloader *package*. The above seems inevitable, and not really a problem to me. We simply warn people in the docs of the behavior, and to avoid duplicate naming. I think there are more important questions: 1) how do you prevent users from executing the package functions outside of the package? 2) Have you taken care of package variables? If so, are they only per-session, or global? If they are global, how do you accomplish this? 3) For that matter, is initialization per session or global? -- Josh Berkus Aglio Database Solutions San Francisco
I agree wholeheartedly and was actually just thinking of this yesterday. Back when I was working on NEXTGRES I implemented package support into plpgsql including scopes. While my time is pretty tight right now, I'd be more than willing to work with whoever the plpgsql master is. Jim C. Nasby wrote: >On Sun, May 08, 2005 at 10:38:41PM -0500, Bob wrote: > > >>One simple benefit to packages is just organization of related code. >> >> > >Which, IMHO, is greatly diminished by the lack of >schema.package.function notation. BTW, the original post referred to >this as user.package.function, but I believe that technically it's >actually schema.package.function (Oracle tends to mix schemas and >users). In any case, schema.package.function is what would make sense in >PostgreSQL. > >Personally, I think the biggest win here would be adding package support >and syntax to plpgsql. Not only would it make porting from Oracle >easier, it would also make plpgsql much, much more powerful. > >
Hey everyone, In addition to package support in plpgsql, it would be really handy to have inline plpgsql. Likewise, I think there are others who feel this way as-well. Years ago, Oracle merged PL/SQL with their normal SQL parser which allowed for inline PL/SQL. They did this because it was difficult to maintain two separate parsers. While this worked great for Oracle, it probably wouldn't really work as well for PostgreSQL because pgsql supports multiple procedural languages. As for implementation, I think it would obviously be best to leave plpgsql on its own as a PL but maybe change BEGIN and DECLARE in the normal parser and have the system generate/execute a function on the fly. Or, maybe it would be better to integrate plpgsql. Or, I may just be crazy. Would anyone else ever benefit from inline functions? Does anyone have any ideas about implementation? Please shoot your opinions this way. Thanks. -Jonah
> > As for implementation, I think it would obviously be best to leave > plpgsql on its own as a PL but maybe change BEGIN and DECLARE in the > normal parser and have the system generate/execute a function on the > fly. Or, maybe it would be better to integrate plpgsql. Or, I may just > be crazy. > > Would anyone else ever benefit from inline functions? Well I could see inline functions being useful for debugging a function during development but I don't think I would want a bunch of plPGSQL mucking up my pretty SQL :) Sincerely, Joshua D. Drake -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedication Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Jonah, > In addition to package support in plpgsql, it would be really handy to > have inline plpgsql. Likewise, I think there are others who feel this > way as-well. I think a number of people would be interested in this. However, your biggest development issue, as I've been told, is that the Pl/pgSQL parser isn't nearly as mature as the SQL parser. So an overhaul of the PL/pgSQL backend code would need to precede any merger of the two parsers. Also, take a look at the "pl/pgsql enabled by default" thread on this mailing list for security concerns. These security concerns would be much more significant if plpgsql were automatically available on the command line. As such, you'd need to make building it in to the SQL command line a compile-time option so that security-conscious admins could disable it if they want to. -- Josh Berkus Aglio Database Solutions San Francisco
Joshua D. Drake wrote: >> >> As for implementation, I think it would obviously be best to leave >> plpgsql on its own as a PL but maybe change BEGIN and DECLARE in the >> normal parser and have the system generate/execute a function on the >> fly. Or, maybe it would be better to integrate plpgsql. Or, I may >> just be crazy. >> >> Would anyone else ever benefit from inline functions? > > > Well I could see inline functions being useful for debugging a > function during development but I don't think I would want a bunch of > plPGSQL mucking up my pretty SQL :) > > Then don't put it there ;-) I think you'd need to do something like this: PERFORM language plpgsql $$ -- some plpgsql stuff here $$; cheers andrew
On Mon, May 09, 2005 at 10:05:38AM -0700, Josh Berkus wrote: > I've also never much liked Oracle's seperate package_header and package_body > declaration structure: if the two are intrinsically tied, why not make it one > declaration? Is syntactical compatibility important enough that we need to > imitate their design errors? Actually, there is a notable difference between the two. Replacing the body of a package has a minimal impact on the database, but replacing the header requires more work to invalidate cached stuff. I think there's also a few other side effects. This isn't to say that this is a good way to handle this, but I believe it's why Oracle does it. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
I would be interested in hearing how we can implement Oracle packages in a way that seamlessly integrates into what we have. Is it like functions that are automatically called when a schema is accessed? And the result put into a per-session temporary schema? I think it is unlikely we would implement Oracle packages exactly like Oracle but I think there is interest in adding that functionality to PostgreSQL. If we can work up a list I can add it to the TODO list. --------------------------------------------------------------------------- Josh Berkus wrote: > Rmm, > > > "A black box processing engine with one or more public access functions > > that retains state across calls" > > In other words, an Object. <grin> > > > Oracle style package creation syntax is split into header and body so that > > the body(code) can be re-compiled without invalidating dependent objects. > > Postgres syntax for the dbms_output example (in any postgres server side > > language) would be along the lines of: > > CREATE OR REPLACE PACKAGE HEADER dbms_output AS > > FUNCTION dbms_output_put_line(text) RETURNS text, > > FUNCTION dbms_output_get_lines() RETURNS text; > > CREATE OR REPLACE PACKAGE BODY dbms_output AS $$ 'package code' $$ > > <language>; > > Hmmm. What about package variables? For me, this is one of the most > valuable parts of packages. > > I've also never much liked Oracle's seperate package_header and package_body > declaration structure: if the two are intrinsically tied, why not make it one > declaration? Is syntactical compatibility important enough that we need to > imitate their design errors? > > > Adding pg_package with a link from pg_proc are the only changes required > > to the data dictionary. > > It would be nice to have similar dotted syntax as oracle > > (user.package.function) but would this mess up postgres namespaces? > > Yes, actually. If you look at the discussion, this is what killed the 2001 > proposal; packages were proposed as orthagonal to schema which was not > acceptable. > > However, now that schema are well established, it seems like this namespace > issue is limited. The problem would be that you'd have to make sure that no > two schema and packages had the same name, or that there would be an > automatic precedence of shema, package established. > > So, given a shema named "dataloader" and a package named "dataloader" and a > function named "copy_it(filename)", what would happen is: > > dataloader.dataloader.copy_it('/tmp/somefile') > ... would be absolutely clear > dataloader.copy_it('/tmp/somefile') > ... would attempt to call the copy_it function in the dataloader > *schema*, not the dataloader *package*. > > The above seems inevitable, and not really a problem to me. We simply warn > people in the docs of the behavior, and to avoid duplicate naming. > > I think there are more important questions: > > 1) how do you prevent users from executing the package functions outside of > the package? > 2) Have you taken care of package variables? If so, are they only > per-session, or global? If they are global, how do you accomplish this? > 3) For that matter, is initialization per session or global? > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Josh Berkus <josh@agliodbs.com> writes: > Yes, actually. If you look at the discussion, this is what killed the 2001 > proposal; packages were proposed as orthagonal to schema which was not > acceptable. I think what actually killed that proposal was that it was not made clear what it did that wouldn't be done as well (and in a more standard fashion) by providing schemas. What I read in this thread is that the only truly missing feature is package variables (ie, session-local variables); is that an accurate statement? If so, it would seem simplest to add such a feature to plpgsql and be done with it. Several people already pointed out that most of the other PLs support that feature today. regards, tom lane
Tom, > What I read in this thread is that the only truly missing feature is > package variables (ie, session-local variables); is that an accurate > statement? If so, it would seem simplest to add such a feature to > plpgsql and be done with it. Several people already pointed out that > most of the other PLs support that feature today. Also initialization, namespacing, and security. The ability to "package" bunches of functions, and only allow their calling in the context of a package, is quite valuable in installations which support 1,000's of procedures. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> What I read in this thread is that the only truly missing feature is >> package variables (ie, session-local variables); is that an accurate >> statement? If so, it would seem simplest to add such a feature to >> plpgsql and be done with it. Several people already pointed out that >> most of the other PLs support that feature today. > Also initialization, namespacing, and security. The ability to "package" > bunches of functions, and only allow their calling in the context of a > package, is quite valuable in installations which support 1,000's of > procedures. This is exactly the sort of argumentation that got the last proposal shot down ;-). I see no reason that you can't do the namespacing and security as well or better using the existing (and more standard) schema feature. If there's something there that's not covered, what is it? (The initialization bit goes along with the variables, AFAICS.) regards, tom lane
On E, 2005-05-09 at 11:44 -0600, Jonah H. Harris wrote: > Hey everyone, > > In addition to package support in plpgsql, it would be really handy to > have inline plpgsql. Likewise, I think there are others who feel this > way as-well. Session variables is what I miss most. > Years ago, Oracle merged PL/SQL with their normal SQL parser which > allowed for inline PL/SQL. Was that really that much time ago ? IIRC this was fanfared as one of big advancements of Oracle 10. > They did this because it was difficult to maintain two separate parsers. Also they claimed that this cleared away some subtle differences in the languages supported by SQL and pl/SQL. > While this worked great for Oracle, it > probably wouldn't really work as well for PostgreSQL because pgsql > supports multiple procedural languages. AFAIK Oracle also supports at least java, using a syntax somewhat similar to ours. > As for implementation, I think it would obviously be best to leave > plpgsql on its own as a PL but maybe change BEGIN and DECLARE in the > normal parser and have the system generate/execute a function on the > fly. Or, maybe it would be better to integrate plpgsql. Or, I may just > be crazy. Just having $$ quoting and named arguments does most of what I need for using functions from a command line. If pl/pgsql and perhaps even plain sql get session variables, preferrably usable by both (and in future accessible from other) pl-s that would cover most of my needs. > Would anyone else ever benefit from inline functions? Does anyone have > any ideas about implementation? Please shoot your opinions this way. While I can imagine how to use a declarative language from procedural one (pl/pgsql using sql) I have much harder time to imagine how to do the opposite in a convevient way. something like this ? : select $$ inline scalar plpgsql func here $$, count(*) from $$ inline set function here $$ subgroup by 1; perhaps just supporting TEMP funcions should be enough ? -- Hannu Krosing <hannu@tm.ee>
Tom, > This is exactly the sort of argumentation that got the last proposal > shot down ;-). I see no reason that you can't do the namespacing and > security as well or better using the existing (and more standard) schema > feature. If there's something there that's not covered, what is it? a) When you have 1000's of procedures, it becomes very useful to have more than one level of namespacing. This is not an exaggeration; one project I looked at who decided not to convert from Oracle to PostgreSQL had over 100,000 procedures and functions. Lack of packages was their main reason for not switching. Schemas provide only *one* level of namespacing, unless we want to "improve" on the SQL standard and allow nested schemas. b) Schemas do not provide us with any way of limiting the scope of functions and persistent variables. With packages, you would want:1. functions which can only be called internally to the package2.variables which are only visible inside the package3. functions which can only be called as part of the package(thus utilizing the initialization and internal variables) and not on their own. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Mon, May 09, 2005 at 11:44:23AM -0600, Jonah H. Harris wrote: > Hey everyone, > > In addition to package support in plpgsql, it would be really handy > to have inline plpgsql. Likewise, I think there are others who feel > this way as-well. Why yes, there are. :) > Years ago, Oracle merged PL/SQL with their normal SQL parser which > allowed for inline PL/SQL. They did this because it was difficult > to maintain two separate parsers. While this worked great for > Oracle, it probably wouldn't really work as well for PostgreSQL > because pgsql supports multiple procedural languages. I proposed a syntax for this awhile back. I haven't found it in the archives, but it goes like this: EXECUTE IMMEDIATE $$ function body here $$ LANGUAGE plfoo; Similarly, a CREATE TEMPORARY FUNCTION could be quite handy. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
Josh Berkus wrote: > Tom, > > >>This is exactly the sort of argumentation that got the last proposal >>shot down ;-). I see no reason that you can't do the namespacing and >>security as well or better using the existing (and more standard) schema >>feature. If there's something there that's not covered, what is it? > > > a) When you have 1000's of procedures, it becomes very useful to have more > than one level of namespacing. This is not an exaggeration; one project I > looked at who decided not to convert from Oracle to PostgreSQL had over > 100,000 procedures and functions. Lack of packages was their main reason > for not switching. Schemas provide only *one* level of namespacing, unless > we want to "improve" on the SQL standard and allow nested schemas. > > b) Schemas do not provide us with any way of limiting the scope of functions > and persistent variables. With packages, you would want: > 1. functions which can only be called internally to the package > 2. variables which are only visible inside the package > 3. functions which can only be called as part of the package (thus utilizing > the initialization and internal variables) and not on their own. > What Josh describes here are excellent features but IMHO, the Oracle PACKAGE concept is an abomination that should have been left out. The reason I say this is that Oracle also provide the ability to create user defined types that have methods. Both instance and static methods can be created. In Oracle you can use the syntax: <schema>.<package>.<function>() but you can just as well use the syntax: <schema>.<type>.<static method>() Why do you need both? If PostgreSQL is going to add new nice features that enables better namespace handling and global variables, take a look at Oracles UDT's with static and instance methods. Only thing that I'm not sure is there is static variables. If it's missing, we could add that easilly and give them the same life-span as the session. A UDT can be exchanged seamlessly across PL's so it would become a really elegant solution for session variables. Regards, Thomas Hallgren
Thomas Hallgren <thhal@mailblocks.com> writes: > In Oracle you can use the syntax: > <schema>.<package>.<function>() > but you can just as well use the syntax: > <schema>.<type>.<static method>() Hmm. I think there is also something pretty close to that in SQL2003. It would be a lot easier to talk us into accepting something that's in the spec than something that isn't. regards, tom lane
David, I agree with your idea. That seems like it would be somewhat easy to implement and would do exactly what I would need. Anyone else have ideas or thoughts along this line? David Fetter wrote: >On Mon, May 09, 2005 at 11:44:23AM -0600, Jonah H. Harris wrote: > > >>Hey everyone, >> >>In addition to package support in plpgsql, it would be really handy >>to have inline plpgsql. Likewise, I think there are others who feel >>this way as-well. >> >> > >Why yes, there are. :) > > > >>Years ago, Oracle merged PL/SQL with their normal SQL parser which >>allowed for inline PL/SQL. They did this because it was difficult >>to maintain two separate parsers. While this worked great for >>Oracle, it probably wouldn't really work as well for PostgreSQL >>because pgsql supports multiple procedural languages. >> >> > >I proposed a syntax for this awhile back. I haven't found it in the >archives, but it goes like this: > >EXECUTE IMMEDIATE $$ > function body here >$$ >LANGUAGE plfoo; > >Similarly, a CREATE TEMPORARY FUNCTION could be quite handy. > >Cheers, >D > >
On Mon, May 09, 2005 at 05:28:42PM -0600, Jonah H. Harris wrote: > David, > > I agree with your idea. That seems like it would be somewhat easy to > implement and would do exactly what I would need. Anyone else have > ideas or thoughts along this line? Seeing as EXECUTE IMMEDIATE is already used, so how about PERFORM IMMEDIATE? I also like the idea of functions whose scope is settable. Something like this: CREATE [OR REPLACE] [ TRANSACTION | SESSION ] FUNCTION ... Cheers, D > > David Fetter wrote: > > >On Mon, May 09, 2005 at 11:44:23AM -0600, Jonah H. Harris wrote: > > > > > >>Hey everyone, > >> > >>In addition to package support in plpgsql, it would be really handy > >>to have inline plpgsql. Likewise, I think there are others who feel > >>this way as-well. > >> > >> > > > >Why yes, there are. :) > > > > > > > >>Years ago, Oracle merged PL/SQL with their normal SQL parser which > >>allowed for inline PL/SQL. They did this because it was difficult > >>to maintain two separate parsers. While this worked great for > >>Oracle, it probably wouldn't really work as well for PostgreSQL > >>because pgsql supports multiple procedural languages. > >> > >> > > > >I proposed a syntax for this awhile back. I haven't found it in the > >archives, but it goes like this: > > > >EXECUTE IMMEDIATE $$ > > function body here > >$$ > >LANGUAGE plfoo; > > > >Similarly, a CREATE TEMPORARY FUNCTION could be quite handy. > > > >Cheers, > >D > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
David Fetter wrote: > EXECUTE IMMEDIATE $$ > function body here > $$ > LANGUAGE plfoo; Seems like a lot of unnecessary syntax for something that would be manually used by a lot of DBAs. Also, this is unrelated to normal EXECUTE, or the EXECUTE IMMEDIATE defined by the standard, so I'm not sure it's a good idea to use similar syntax. BTW, this is a little off-the-wall, but one interesting idea to help SQL and PL/foo integration would be to replace the bison grammar for SQL with a hand-written recursive descent parser. If written carefully, this would allow other procedural languages to "call into" the SQL parser from their own parsers when appropriate, and integrate the resulting parse nodes into their own parse tree. PL/PgSQL in HEAD does something a bit similar for syntax checking, but it's pretty ugly (we need to manually invoke raw_parser(), and then we throw away the result). -Neil
Josh Berkus wrote: > Tom, > > > This is exactly the sort of argumentation that got the last proposal > > shot down ;-). I see no reason that you can't do the namespacing and > > security as well or better using the existing (and more standard) schema > > feature. If there's something there that's not covered, what is it? > > a) When you have 1000's of procedures, it becomes very useful to have more > than one level of namespacing. This is not an exaggeration; one project I > looked at who decided not to convert from Oracle to PostgreSQL had over > 100,000 procedures and functions. Lack of packages was their main reason > for not switching. Schemas provide only *one* level of namespacing, unless > we want to "improve" on the SQL standard and allow nested schemas. > > b) Schemas do not provide us with any way of limiting the scope of functions > and persistent variables. With packages, you would want: > 1. functions which can only be called internally to the package > 2. variables which are only visible inside the package > 3. functions which can only be called as part of the package (thus utilizing > the initialization and internal variables) and not on their own. What if we defined functions to look in their own schemas for functions they call, then use the search_path, rather than using the search path first? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, > > b) Schemas do not provide us with any way of limiting the scope of > > functions and persistent variables. With packages, you would want: > > 1. functions which can only be called internally to the package > > 2. variables which are only visible inside the package > > 3. functions which can only be called as part of the package (thus > > utilizing the initialization and internal variables) and not on their > > own. > > What if we defined functions to look in their own schemas for functions > they call, then use the search_path, rather than using the search path > first? That really doesn't address the desired functionality. For example, I could have a package whose initialization function involves some security checks, and then the package's "methods" (internal functions) would access the variables set by the security check function ... but those variables would NOT be available to the user or modifiable by them. I know the need for this is probably hypothetical to a lot of -hackers, but it's pretty common programming in the Oracle PL/SQL world. Of course, if there's something in SQL2003 that supports this, it would be really keen to know it ... -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Bruce, > > > > b) Schemas do not provide us with any way of limiting the scope of > > > functions and persistent variables. With packages, you would want: > > > 1. functions which can only be called internally to the package > > > 2. variables which are only visible inside the package > > > 3. functions which can only be called as part of the package (thus > > > utilizing the initialization and internal variables) and not on their > > > own. > > > > What if we defined functions to look in their own schemas for functions > > they call, then use the search_path, rather than using the search path > > first? > > That really doesn't address the desired functionality. For example, I could > have a package whose initialization function involves some security checks, > and then the package's "methods" (internal functions) would access the > variables set by the security check function ... but those variables would > NOT be available to the user or modifiable by them. > > I know the need for this is probably hypothetical to a lot of -hackers, but > it's pretty common programming in the Oracle PL/SQL world. > > Of course, if there's something in SQL2003 that supports this, it would be > really keen to know it ... Agreed, but saying we are going to just go out and implement everything Oracle packages have just because they have them isn't likely to happen for PostgreSQL. We need a list of things that need to be added, and how our existing functionality will be modified to make them available. Just saying "we need Oracle packages" doesn't make it happen. I have followed the discussion and I still don't have a clear idea of the exact additions that people want, and without that, nothing is likely to happen. I don't even have something for the TODO list at this point. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
B- > Just saying "we need Oracle packages" doesn't make it happen. I have > followed the discussion and I still don't have a clear idea of the exact > additions that people want, and without that, nothing is likely to > happen. I don't even have something for the TODO list at this point. That's what I'm trying to help define. I think that private variables and private functions need to be part of the definition. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > B- > > > Just saying "we need Oracle packages" doesn't make it happen. ?I have > > followed the discussion and I still don't have a clear idea of the exact > > additions that people want, and without that, nothing is likely to > > happen. ?I don't even have something for the TODO list at this point. > > That's what I'm trying to help define. > > I think that private variables and private functions need to be part of the > definition. OK, so it seems we need: C static/private functions for schemasC static/private variables for schemas Are private variables implemented via the temporary per-session schema? nested schemas What does the standard say? Is that it? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Neil Conway <neilc@samurai.com> writes: > BTW, this is a little off-the-wall, but one interesting idea to help SQL > and PL/foo integration would be to replace the bison grammar for SQL > with a hand-written recursive descent parser. Ick. I gave up hand-written RD parsers twenty-five years ago. They are tedious to write, error-prone (due to the utter lack of any checking that what you wrote embodies the syntax you meant), less flexible than LALR(1), and generally have few redeeming social advantages. Which is not to say that plpgsql in its current implementation isn't mighty ugly too. I'd dearly love to find a better solution ... but "throw away the parser generator" isn't a better solution. regards, tom lane
Bruce, > OK, so it seems we need: > > C static/private functions for schemas > C static/private variables for schemas > > Are private variables implemented via the temporary per-session schema? > > nested schemas > > What does the standard say? > > Is that it? Hmmm. That's an interesting approach. I, personally, would buy that. -- Josh Berkus Aglio Database Solutions San Francisco
Bruce Momjian wrote: > Josh Berkus wrote: > >>I think that private variables and private functions need to be part of the >>definition. > > > OK, so it seems we need: > > C static/private functions for schemas > C static/private variables for schemas > > Are private variables implemented via the temporary per-session schema? > > nested schemas > > What does the standard say? > The standard says that rather then using nested schemas or packages in conjunction with functions in order to group functions with data, use user defined types with attributes and methods. Methods can be STATIC, INSTANCE or CONSTRUCTOR. AFAICS there's nothing in the standard that alters the visibility of methods and attributes, so "private" is not covered (and don't recall that Oracle has that kind of visibility control either). Normal access restrictions apply of course. I can't find any mention of schema variables. I think all life-cycle management of data is reduced to table storage. And why not? A temporary table can be viewed as session data right? Using a KISS approach, the easiest thing to do that also would bring us closer to the standard, is to extend the notion of user defined types to include methods and conclude that storing session data in other ways than using temporary tables should be PL specific. Regards, Thomas Hallgren
---------- Forwarded message ---------- From: Adrian Maier <adrian.maier@gmail.com> Date: May 10, 2005 12:01 PM Subject: Re: [HACKERS] Oracle Style packages on postgres To: "Jim C. Nasby" <decibel@decibel.org> On 5/9/05, Jim C. Nasby <decibel@decibel.org> wrote: > On Sun, May 08, 2005 at 10:38:41PM -0500, Bob wrote: > > One simple benefit to packages is just organization of related code. > > Which, IMHO, is greatly diminished by the lack of > schema.package.function notation. BTW, the original post referred to > this as user.package.function, but I believe that technically it's > actually schema.package.function (Oracle tends to mix schemas and > users). In any case, schema.package.function is what would make sense in > PostgreSQL. > > Personally, I think the biggest win here would be adding package support > and syntax to plpgsql. Not only would it make porting from Oracle > easier, it would also make plpgsql much, much more powerful. Hello, What do you think about having some kind of language-independent packages ? I'm thinking that it could be handy to implement some functions in plpgsql, some functions in plpython and so . And then bundle them together into the same package. Cheers, Adrian Maier
Thomas Hallgren wrote: > Bruce Momjian wrote: > > Josh Berkus wrote: > > > >>I think that private variables and private functions need to be part of the > >>definition. > > > > > > OK, so it seems we need: > > > > C static/private functions for schemas > > C static/private variables for schemas > > > > Are private variables implemented via the temporary per-session schema? > > > > nested schemas > > > > What does the standard say? > > > The standard says that rather then using nested schemas or packages in > conjunction with functions in order to group functions with data, use > user defined types with attributes and methods. Methods can be STATIC, > INSTANCE or CONSTRUCTOR. So it sounds like you group the functions into user-defined types, rather than nested schemas. So you assocate functions with a table? > AFAICS there's nothing in the standard that alters the visibility of > methods and attributes, so "private" is not covered (and don't recall > that Oracle has that kind of visibility control either). Normal access > restrictions apply of course. > > I can't find any mention of schema variables. I think all life-cycle > management of data is reduced to table storage. And why not? A temporary > table can be viewed as session data right? > > Using a KISS approach, the easiest thing to do that also would bring us > closer to the standard, is to extend the notion of user defined types to > include methods and conclude that storing session data in other ways > than using temporary tables should be PL specific. I suppose. I think we should focus on the use cases for Oracle packages, rather than the specific functionality it provides. What things do people need PostgreSQL to do that it already doesn't do? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: >Thomas Hallgren wrote: > > >>Bruce Momjian wrote: >> >> >>>Josh Berkus wrote: >>> >>> >>> >>>>I think that private variables and private functions need to be part of the >>>>definition. >>>> >>>> >>>OK, so it seems we need: >>> >>> C static/private functions for schemas >>> C static/private variables for schemas >>> >>>Are private variables implemented via the temporary per-session schema? >>> >>> nested schemas >>> >>>What does the standard say? >>> >>> >>> >>The standard says that rather then using nested schemas or packages in >>conjunction with functions in order to group functions with data, use >>user defined types with attributes and methods. Methods can be STATIC, >>INSTANCE or CONSTRUCTOR. >> >> > >So it sounds like you group the functions into user-defined types, >rather than nested schemas. > Yes, you'd get <schema>.<type>.<method> as the fully qualified name. > So you assocate functions with a table? > > Not necessarily a table. A type is just a type. A type containing just static methods is not very different from an Oracle package. A table created from a type may of course have methods associated with it. That gets really interesting when you use INSTANCE methods. They act on a per row basis so that you can do things like: SELECT x.someMethod() FROM someTable x; rather than as today. SELECT someFunction(x) FROM someTable x; Regards, Thomas Hallgren
Bruce Momjian schrieb: >OK, so it seems we need: > > C static/private functions for schemas > C static/private variables for schemas > >Are private variables implemented via the temporary per-session schema? > > nested schemas > >What does the standard say? > >Is that it? > > > Yeah, that would be great. And don't forget global variables for pl/pgsql. Daniel
> -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Tuesday, May 10, 2005 8:43 AM > To: Thomas Hallgren > Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Oracle Style packages on postgres > > [...] > I suppose. I think we should focus on the use cases for Oracle > packages, rather than the specific functionality it provides. > What things do people need PostgreSQL to do that it already > doesn't do? Is that really the best way to go about things? Already RDBMSes are patchwork quilts of functionality. Is merely adding another patch the most elegant way to evolve the database? The problem is that Oracle et al are trying to be ORDBMSes and aren't exactly sure what the best way to go is. Instead of trying to formulate a rational plan for what an ORDBMS should even look like, they simply look at what would work with their existing infrastructure and tack on features. Then Postgres plays the copycat game. Instead of trying to play catch-up with Oracle, why not beat them at their own game? What packages provide is encapsulation. Hiding the data from the user and forcing him/her to use the public interface (methods). That is an important and admirable OO feature. Some people think that using the DB's security model can achieve the same thing. It can't, exactly, but there's an important lesson to be learned from the suggestion. The problem is that OOP is a *programming* paradigm, and a database is not a *programming language*. In a programming language, there really is no such thing as "security". There is only "visibility" and "accessibility". Private methods in an OOP language do not provide *security*; they only limit *accessibility*. Like so many other differences between the relational model and the OOP model, there is an impedance mismatch here. However, there is also opportunity. In an OOPL, you can say: "Users can call this method from here, but not from there." What you *can't* say is: "User X can call this method, but User Y cannot." As you can see, these are orthogonal concepts. You could call the first "accessibility by location" and the second "accessibility by authentication". An ORDBMS should support both. "Private" does not respect your identity, only your calling location. An ACL does not respect your calling scope, only your identity. A system that has both is clearly more flexible than one that only has one or the other. Now what you need to keep in mind is that each visibility model serves a different purpose. The purpose of a security model is to limit *who* can see/touch certain data because the data has intrinsic value. The purpose of an accessibility model is to limit *where* and *how* data can be seen/touched in order to preserve *program invariants*. So if you have an object (or tuple!) that records the start and stop time of some process, it is probably a logical invariant that the stop time is greater than or equal to the start time. For this reason, in a PL, you would encapsulate these fields (attributes) and only provide controlled access to update them that checks and preserves the invariant, *no matter who you are*. You don't want a superuser violating this invariant any more than Sue User. Now you might object that constraints allow you to preserve invariants as well, and indeed they do. But constraints do not respect calling scope. Suppose there is a process that needs to update the timestamps in a way that temporarily breaks the invariant but restores it afterwards. The only way to effect this in a constraint environment is to drop the constraint, perform the operation, and restore it. However, dropping a constraint is not an ideal solution because there may be other unprivileged processes operating on the relation that still need the constraint to be enforced. There is no way to say: "There is a priviledged class of methods that is allowed to violate this constraint because they are trusted to restore it upon completion." Note that this is different from saying "There is a priviledged class of users that is allowed to violate this constraint." If you try to do something like give read-only access to everybody and only write access to one user and define that user to be the owner of the methods that update the data, you have to follow the convention that that user only operates through the defined interface, and doesn't hack the data directly. That's because user-level accessibility is not the same as scope- level accessibility. Whereas, if you define something like a package, and say: "Package X is allowed full and complete access to relation Y", and stick the interface methods in X, you still have all the user-level security you want while preserving the invariants in the most elegant way. So you can think of a package as a scope in a programming language. It's like a user, but it is not a user. A user has privileges that cut across scopes. Now, whether packages should be different from schemas is a whole different ballgame. The purpose of a schema in Postgres is not entirely clear to me. There's lots of different ways to use schemas, and there is no obvious best way to use them. In order to implement the accessibility features of packages, schemas would have to be changed considerably. Probably a lot of users would be unhappy if schemas were changed in that way. My guess is that this would not be a good idea. I think we can get some guidance from PLs. C++ is what you call a "multi-paradigm language". You can do everything from assembly to metaprogramming in C++. As such, it is very loose and open in some respects. C++ has two kinds of scopes: it has classes and namespaces. Members of a class are encapsulated and support data hiding. Members of a namespace are only loosely grouped and do not support data hiding explicitly. Namespaces exist primarily to avoid name collisions. Java, on the other hand, decided that for OOP purity, everything must be a class. That would be like making schemas into packages and imposing accessibility rules on them. At the end of the day, I think many PL design experts agree that making everything a class is not necessarily the best way to go. So schemas can be like C++ namespaces - they provide a means to loosely group related objects and help avoid name collisions. So the package could be like a class - they provide OOP-like encapsulation via accessibility rules. However, that doesn't mean that nested schemas wouldn't also be a good thing. In C++, nested namespaces are extremely useful when one layer of scoping does not sufficiently partition the namespace to avoid frequent name collisions. I think the same is true of Postgres. I certainly would like to be able to use nested schema names in several contexts. Instead, I have to make a choice between making different schemas, or making different name prefixes. I wouldn't even mind if nested schemas were only allowed to contain schemas except at the leaves of the tree. Another feature that is very useful is the "using clause". Combined with nested namespaces, this is a very powerful way to give programmers/dbas control over names. You can give everything the most natural name, and just put it in the appropriate namespace, and use the namespace that is relevant to the given task at hand. So consider this example: Tables: etl.import.record etl.export.record As you can imagine, I don't really want to make an 'import' and 'export' schema at the top level. There's several tables in each schema, but that should illustrate the point. Then, when constructing queries, it would be nice to be able to do this: USING etl.import ; SELECT * FROM record JOIN header ON ... JOIN file ON ... ; The effect of a USING clause would be to import the schema names into the public namespace for the duration of the transaction. If that leads to ambiguous names, then the parser/planner should emit an error. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
On Tue, May 10, 2005 at 12:01:54PM +0300, Adrian Maier wrote: > > Personally, I think the biggest win here would be adding package support > > and syntax to plpgsql. Not only would it make porting from Oracle > > easier, it would also make plpgsql much, much more powerful. > > Hello, > > What do you think about having some kind of language-independent > packages ? > I'm thinking that it could be handy to implement some functions in > plpgsql, some functions in plpython and so . And then bundle them > together into the same package. Personally, I basically only use plpgsql, but I can certainly see where there would be value in being able to include functions and procedures from multiple languages in one package. But I suspect this will also make some things more difficult, such as global static variables. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Mon, May 09, 2005 at 11:24:45PM +0200, Thomas Hallgren wrote: > In Oracle you can use the syntax: > > <schema>.<package>.<function>() > > but you can just as well use the syntax: > > <schema>.<type>.<static method>() > > Why do you need both? If PostgreSQL is going to add new nice features > that enables better namespace handling and global variables, take a look > at Oracles UDT's with static and instance methods. Only thing that I'm > not sure is there is static variables. If it's missing, we could add > that easilly and give them the same life-span as the session. It's been a while since I used types, but here's some issues I can think of: I don't believe types allow for internal-only methods. I seem to recall other limitations on what types could do as opposed to packages. Of course, we need not restrict ourselves in such a manner. Types are not used nearly as much as packages (this is an issue if we care about enabling Oracle users to migrate). Types generally force you to use them in relation to some database object. Packages have no such restriction. Don't get me wrong, I think supporting more powerful types would be a welcome addition, but I don't think they can be as flexable as packages. The good news is that they should both be able to use the same underlying framework. Types are afterall just a specialized implementation of packages. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Mon, May 09, 2005 at 09:56:53PM -0400, Bruce Momjian wrote: > OK, so it seems we need: > > C static/private functions for schemas > C static/private variables for schemas > > Are private variables implemented via the temporary per-session schema? > > nested schemas > > What does the standard say? > > Is that it? I think a big part of the usefulness of packages is in supplying an additional level of grouping common things together. Of course, nested schemas with public/private functions (and procedures, lest we forget them) is a much better way to do this, since a schema can encompass everything you'd need; tables, views, types, etc. Having said that, I would say that private variables need to be exposed via the same nested schema interface as everything else. If the implementation under the covers is via the temporary schema, that's fine. As for using temporary tables as session storage, that has a huge performance penalty associated with it. Part of the advantage to package variables is that you can use them to cache information your code will need to access frequently. That access then becomes a simple variable or array read, which is obviously much faster than parsing a query to hit a temp table. There is one feature not mentioned by Bruce's design, and that's initialization (and teardown) code. I don't recall using that capability in Oracle, but I was wondering if others with more experience could comment on it. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Jim C. Nasby wrote: >I don't believe types allow for internal-only methods. I seem to recall >other limitations on what types could do as opposed to packages. Of >course, we need not restrict ourselves in such a manner. > > Do Oracle packages support internal only functions? If they do, then I agree, that's a feature that the SQL standard doesn't have. >Types are not used nearly as much as packages (this is an issue if we >care about enabling Oracle users to migrate). > >Types generally force you to use them in relation to some database >object. Packages have no such restriction. > > If used as a package, i.e. only containing static methods, you don't need to use the type in relation to anything. It's simply a namespace. If used with a temporary table, you get a very neat, standardized, cross-language way of managing session data. >Don't get me wrong, I think supporting more powerful types would be a >welcome addition, but I don't think they can be as flexable as packages. > > I see this differently. A full implementation of the SQL-standard for UDT's will allow abstract types, inheritance, method overloading, etc. It quickly becomes far more flexible then Oracle packages. A full implementation is of course beyond the scope for what's needed to accommodate the needs of those who use packages but a simple implementation is extendable within the scope of the standard. >The good news is that they should both be able to use the same >underlying framework. Types are afterall just a specialized >implementation of packages. > > Right. Given a good implementation of types, packages would be easy to implement. The other way around would not be possible. A package is a very restricted type that contains static methods only. Possibly with the extension of some kind of method/attribute visibility. So do we need "internal only" functions although they are not covered by the SQL-standard? If the answer is no, then IMO we should follow the standard and use types, not packages. If the answer is yes, then the SQL-standard is not enough. Should we then use packages or simply introduce the keyword PRIVATE on methods of a type? Personally, I'd go for the latter and then, if necessary, build packages on top of that in for the benefit of Oracle users who wants to migrate. A fully fledged type system will ease Oracle migration too since Oracle already has this. Regards, Thomas Hallgren
On Tue, May 10, 2005 at 08:40:16PM +0200, Thomas Hallgren wrote: > Jim C. Nasby wrote: > > >I don't believe types allow for internal-only methods. I seem to recall > >other limitations on what types could do as opposed to packages. Of > >course, we need not restrict ourselves in such a manner. > > > > > Do Oracle packages support internal only functions? If they do, then I > agree, that's a feature that the SQL standard doesn't have. AFAIK anything in an Oracle packages can be internal-only. You just exclude it from the header. > I see this differently. A full implementation of the SQL-standard for > UDT's will allow abstract types, inheritance, method overloading, etc. > It quickly becomes far more flexible then Oracle packages. A full > implementation is of course beyond the scope for what's needed to > accommodate the needs of those who use packages but a simple > implementation is extendable within the scope of the standard. Packages allow for abstract types and method overloading. They don't allow for inheritance. > >The good news is that they should both be able to use the same > >underlying framework. Types are afterall just a specialized > >implementation of packages. > > > > > Right. Given a good implementation of types, packages would be easy to > implement. The other way around would not be possible. A package is a > very restricted type that contains static methods only. Possibly with > the extension of some kind of method/attribute visibility. I guess maybe I'm not clear on what you mean by static methods. IIRC, in Oracle nomenclature, static means it will retain state between invocations in the same session. Of course, functions and procedures that don't do this are also allowed. Basically, before we assume that one implementation allows for the other I think some research needs to be done. Hopefully someone on the list is familiar with both. I think it would be a huge win if we could offer a compatability mechanism that makes it easy for Oracle packages to be used in PostgreSQL, making migration from Oracle much, much easier. > So do we need "internal only" functions although they are not covered by > the SQL-standard? If the answer is no, then IMO we should follow the > standard and use types, not packages. If the answer is yes, then the > SQL-standard is not enough. Should we then use packages or simply > introduce the keyword PRIVATE on methods of a type? Personally, I'd go > for the latter and then, if necessary, build packages on top of that in > for the benefit of Oracle users who wants to migrate. A fully fledged > type system will ease Oracle migration too since Oracle already has this. I think both should allow for private functions/procedures/methods. BTW, I'm also very keen on the idea of nested schemas, which is another possible means to the package ends. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Jim C. Nasby wrote: >I guess maybe I'm not clear on what you mean by static methods. IIRC, in >Oracle nomenclature, static means it will retain state between >invocations in the same session. Of course, functions and procedures >that don't do this are also allowed. > > A STATIC prefix on a method simply means that it is not tied to a particular instance of the type where it is defined. You have the type Foo with the method bar(). If the method is STATIC, you can use: SELECT Foo.bar(); If it's an INSTANCE method, you can only call it when you have an instance available, so if FooTable is a table described by the type Foo and bar is non-static, you could write: SELECT x.bar() FROM FooTable x; >I think both should allow for private functions/procedures/methods. BTW, >I'm also very keen on the idea of nested schemas, which is another >possible means to the package ends. > > I'd like that too although I don't think it's included in the SQL-standard. Regards, Thomas Hallgren
Dave Held wrote: > > -----Original Message----- > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > Sent: Tuesday, May 10, 2005 8:43 AM > > To: Thomas Hallgren > > Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Oracle Style packages on postgres > > > > [...] > > I suppose. I think we should focus on the use cases for Oracle > > packages, rather than the specific functionality it provides. > > What things do people need PostgreSQL to do that it already > > doesn't do? > > Is that really the best way to go about things? Already RDBMSes > are patchwork quilts of functionality. Is merely adding another > patch the most elegant way to evolve the database? The problem is > that Oracle et al are trying to be ORDBMSes and aren't exactly sure > what the best way to go is. Instead of trying to formulate a > rational plan for what an ORDBMS should even look like, they simply > look at what would work with their existing infrastructure and tack > on features. Then Postgres plays the copycat game. Instead of > trying to play catch-up with Oracle, why not beat them at their own > game? I was unclear. I was suggesting exactly what you posted, that we look at what functionality we _need_ from Oracle packages, rather than the functionality of Oracle packages themselves. My assumption is that Oracle does some things we need, and some things we don't, and does them in some ways we will like, and others we will not, so let's look at the actuall use cases that we need to address. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
OK, so it seems we need: o make private objects accessable only to objects in the same schemao Allow current_schema.objname to access current schema objectso session variableso nested schemas? --------------------------------------------------------------------------- Dave Held wrote: > > -----Original Message----- > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > Sent: Tuesday, May 10, 2005 8:43 AM > > To: Thomas Hallgren > > Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Oracle Style packages on postgres > > > > [...] > > I suppose. I think we should focus on the use cases for Oracle > > packages, rather than the specific functionality it provides. > > What things do people need PostgreSQL to do that it already > > doesn't do? > > Is that really the best way to go about things? Already RDBMSes > are patchwork quilts of functionality. Is merely adding another > patch the most elegant way to evolve the database? The problem is > that Oracle et al are trying to be ORDBMSes and aren't exactly sure > what the best way to go is. Instead of trying to formulate a > rational plan for what an ORDBMS should even look like, they simply > look at what would work with their existing infrastructure and tack > on features. Then Postgres plays the copycat game. Instead of > trying to play catch-up with Oracle, why not beat them at their own > game? > > What packages provide is encapsulation. Hiding the data from the > user and forcing him/her to use the public interface (methods). > That is an important and admirable OO feature. Some people think > that using the DB's security model can achieve the same thing. It > can't, exactly, but there's an important lesson to be learned from > the suggestion. The problem is that OOP is a *programming* paradigm, > and a database is not a *programming language*. In a programming > language, there really is no such thing as "security". There is > only "visibility" and "accessibility". Private methods in an OOP > language do not provide *security*; they only limit *accessibility*. > Like so many other differences between the relational model and the > OOP model, there is an impedance mismatch here. However, there is > also opportunity. > > In an OOPL, you can say: "Users can call this method from here, but > not from there." What you *can't* say is: "User X can call this > method, but User Y cannot." As you can see, these are orthogonal > concepts. You could call the first "accessibility by location" and > the second "accessibility by authentication". An ORDBMS should > support both. "Private" does not respect your identity, only your > calling location. An ACL does not respect your calling scope, only > your identity. A system that has both is clearly more flexible than > one that only has one or the other. > > Now what you need to keep in mind is that each visibility model > serves a different purpose. The purpose of a security model is to > limit *who* can see/touch certain data because the data has intrinsic > value. The purpose of an accessibility model is to limit *where* and > *how* data can be seen/touched in order to preserve *program > invariants*. So if you have an object (or tuple!) that records the > start and stop time of some process, it is probably a logical > invariant that the stop time is greater than or equal to the start > time. For this reason, in a PL, you would encapsulate these fields > (attributes) and only provide controlled access to update them that > checks and preserves the invariant, *no matter who you are*. You > don't want a superuser violating this invariant any more than Sue > User. > > Now you might object that constraints allow you to preserve > invariants as well, and indeed they do. But constraints do not > respect calling scope. Suppose there is a process that needs to > update the timestamps in a way that temporarily breaks the invariant > but restores it afterwards. The only way to effect this in a > constraint environment is to drop the constraint, perform the > operation, and restore it. However, dropping a constraint is not an > ideal solution because there may be other unprivileged processes > operating on the relation that still need the constraint to be > enforced. There is no way to say: "There is a priviledged class of > methods that is allowed to violate this constraint because they are > trusted to restore it upon completion." Note that this is different > from saying "There is a priviledged class of users that is allowed > to violate this constraint." If you try to do something like give > read-only access to everybody and only write access to one user and > define that user to be the owner of the methods that update the data, > you have to follow the convention that that user only operates > through the defined interface, and doesn't hack the data directly. > That's because user-level accessibility is not the same as scope- > level accessibility. Whereas, if you define something like a > package, and say: "Package X is allowed full and complete access > to relation Y", and stick the interface methods in X, you still have > all the user-level security you want while preserving the invariants > in the most elegant way. > > So you can think of a package as a scope in a programming language. > It's like a user, but it is not a user. A user has privileges that > cut across scopes. Now, whether packages should be different from > schemas is a whole different ballgame. The purpose of a schema in > Postgres is not entirely clear to me. There's lots of different ways > to use schemas, and there is no obvious best way to use them. In > order to implement the accessibility features of packages, schemas > would have to be changed considerably. Probably a lot of users would > be unhappy if schemas were changed in that way. My guess is that > this would not be a good idea. > > I think we can get some guidance from PLs. C++ is what you call a > "multi-paradigm language". You can do everything from assembly to > metaprogramming in C++. As such, it is very loose and open in some > respects. C++ has two kinds of scopes: it has classes and namespaces. > Members of a class are encapsulated and support data hiding. Members > of a namespace are only loosely grouped and do not support data hiding > explicitly. Namespaces exist primarily to avoid name collisions. > > Java, on the other hand, decided that for OOP purity, everything must > be a class. That would be like making schemas into packages and > imposing accessibility rules on them. At the end of the day, I think > many PL design experts agree that making everything a class is not > necessarily the best way to go. > > So schemas can be like C++ namespaces - they provide a means to > loosely group related objects and help avoid name collisions. So > the package could be like a class - they provide OOP-like > encapsulation via accessibility rules. However, that doesn't mean > that nested schemas wouldn't also be a good thing. In C++, nested > namespaces are extremely useful when one layer of scoping does not > sufficiently partition the namespace to avoid frequent name > collisions. I think the same is true of Postgres. I certainly would > like to be able to use nested schema names in several contexts. > Instead, I have to make a choice between making different schemas, > or making different name prefixes. I wouldn't even mind if nested > schemas were only allowed to contain schemas except at the leaves of > the tree. Another feature that is very useful is the "using clause". > Combined with nested namespaces, this is a very powerful way to give > programmers/dbas control over names. You can give everything the > most natural name, and just put it in the appropriate namespace, > and use the namespace that is relevant to the given task at hand. > > So consider this example: > > Tables: > etl.import.record > etl.export.record > > As you can imagine, I don't really want to make an 'import' and > 'export' schema at the top level. There's several tables in > each schema, but that should illustrate the point. Then, when > constructing queries, it would be nice to be able to do this: > > USING etl.import > ; > SELECT * > FROM record > JOIN header ON ... > JOIN file ON ... > ; > > The effect of a USING clause would be to import the schema names > into the public namespace for the duration of the transaction. If > that leads to ambiguous names, then the parser/planner should emit an > error. > > __ > David B. Held > Software Engineer/Array Services Group > 200 14th Ave. East, Sartell, MN 56377 > 320.534.3637 320.253.7800 800.752.8129 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, May 10, 2005 at 06:55:39PM -0400, Bruce Momjian wrote: > > OK, so it seems we need: > > o make private objects accessable only to objects > in the same schema > o Allow current_schema.objname to access current > schema objects > o session variables > o nested schemas? Well, some kind of nestable namespace for objects, anyhow. I'll look over the SQL:2003 draft and see if I can find anything along that line in there. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
David Fetter wrote: > On Tue, May 10, 2005 at 06:55:39PM -0400, Bruce Momjian wrote: > > > > OK, so it seems we need: > > > > o make private objects accessable only to objects > > in the same schema > > o Allow current_schema.objname to access current > > schema objects > > o session variables > > o nested schemas? > > Well, some kind of nestable namespace for objects, anyhow. How would nested namespaces be different from nested schemas? I thought the two were the same. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, May 10, 2005 at 09:49:13PM -0400, Bruce Momjian wrote: > David Fetter wrote: > > On Tue, May 10, 2005 at 06:55:39PM -0400, Bruce Momjian wrote: > > > > > > OK, so it seems we need: > > > > > > o make private objects accessable only to objects in the same > > > schema > > > o Allow current_schema.objname to access current > > > schema objects > > > o session variables > > > o nested schemas? > > > > Well, some kind of nestable namespace for objects, anyhow. > > How would nested namespaces be different from nested schemas? I > thought the two were the same. I was thinking of nested namespaces in the more limited sense of namespaces for bundles of functions/stored procedures rather than a full-on hierarchy where a table can have a schema which resides inside another schema which resides...unless people really want to have it that way. In a slightly related situation, at least in my mind, it seems like for full-on ORDBMS functionality, it should be possible to have a column of type schema or setof record, &c., and be able to take these things apart at each row. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, so it seems we need: > o make private objects accessable only to objects > in the same schema > o Allow current_schema.objname to access current > schema objects > o session variables > o nested schemas? There's been a lot of handwaving about nested schemas in this thread, but no one has explained how they could actually *work* given the SQL syntax rules. In general, "a" is a column from the current table set, "a.b" is a column b in table/alias a from the current query, "a.b.c" is a column c from table b in schema a, "a.b.c.d" is a column d from table c in schema b in catalog a, and any more than that is a syntax error. I do not see how to add nested schemas without creating unworkable ambiguities, not to say outright violations of the spec. regards, tom lane
David Fetter wrote: > On Tue, May 10, 2005 at 09:49:13PM -0400, Bruce Momjian wrote: > > David Fetter wrote: > > > On Tue, May 10, 2005 at 06:55:39PM -0400, Bruce Momjian wrote: > > > > > > > > OK, so it seems we need: > > > > > > > > o make private objects accessable only to objects in the same > > > > schema > > > > o Allow current_schema.objname to access current > > > > schema objects > > > > o session variables > > > > o nested schemas? > > > > > > Well, some kind of nestable namespace for objects, anyhow. > > > > How would nested namespaces be different from nested schemas? I > > thought the two were the same. > > I was thinking of nested namespaces in the more limited sense of > namespaces for bundles of functions/stored procedures rather than a > full-on hierarchy where a table can have a schema which resides inside > another schema which resides...unless people really want to have it > that way. Oh, so allow only functions to sit in the sub-namespace? Yea, we could do that, but it seems sort of limiting. However, I am unclear how we would do sub-namespaces either. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, May 10, 2005 11:42 PM > To: Bruce Momjian > Cc: Dave Held; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Oracle Style packages on postgres > > [...] > There's been a lot of handwaving about nested schemas in this thread, > but no one has explained how they could actually *work* given the SQL > syntax rules. In general, "a" is a column from the current table > set, "a.b" is a column b in table/alias a from the current query, > "a.b.c" is a column c from table b in schema a, "a.b.c.d" is a column > d from table c in schema b in catalog a, and any more than that is > a syntax error. I do not see how to add nested schemas > without creating unworkable ambiguities, not to say outright violations > of the spec. Clearly nested schemas would violate the SQL spec, as do the numerous missing features in Postgres. Obviously, they would have to be a sort of non-conforming extension. It's an opportunity for Postgres to take the lead and influence the next standard, I guess. Unless the community decides that it's not worth the hassle, which seems much more likely. I am curious to know what the "unworkable ambiguities" are. I propose that if there is any ambiguity at all, just fail the parse and leave it to the user to write something sensible. Otherwise, it's just a matter of defining a precise precedence for resolving name scopes, which doesn't seem very tricky at all. That is, if a.b is the name of a schema b nested within a schema a, then a.b.c.d refers to a column d of table c in schema b in schema a. If a is not the name of a schema, then check to see if it's the name of a database. If it is, then a.b.c.d has the meaning you define above. If it's not, then it's an error. The rule is simple: when the identifier has more than two parts, search for the first part among the schemas first, and then the catalogs. For the parts after the first and before the last two, just search the appropriate schemas. As far as I can tell, this syntax is completely backwards-compatible with existing SQL syntax. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
"Dave Held" <dave.held@arraysg.com> writes: > The rule is simple: when the identifier has more than > two parts, search for the first part among the schemas first, and then > the catalogs. This doesn't actually work, because there is already ambiguity as to which level the first name is. See for instance the comments in transformColumnRef(). regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, May 11, 2005 10:55 AM > To: Dave Held > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Oracle Style packages on postgres > > > "Dave Held" <dave.held@arraysg.com> writes: > > The rule is simple: when the identifier has > > more than two parts, search for the first part among the schemas ^^^^^^^^^^^^^^^^^^^ > > first, and then the catalogs. > > This doesn't actually work, because there is already ambiguity as to > which level the first name is. See for instance the comments in > transformColumnRef(). I don't follow. switch (numnames) case 3 is unambiguous under either syntax. case 1 and 2 are unchanged under my proposed rules. It's really only case 4+ that is affected. And the change is as follows: if (numnames > MAX_SCHEMA_DEPTH + 3) { ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("improperqualified name (too many dotted names): %s", NameListToString(cref->fields)))); returnNULL; } switch (numnames) { case 1: ... case 2: ... case 3: ... default: { char* name[MAX_SCHEMA_DEPTH+ 3]; char** i; char** end = name + numnames; char* colname = name + numnames- 1; for (i = name; i != end; ++i) { /* definition of lnth() should be easy enoughto infer */ *i = strVal(lnth(cref->fields)); } /* * We check the catalog name and then ignore it. */ if (!isValidNamespace(name[0])) { if (strcmp(name[0], get_database_name(MyDatabaseId)) != 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cross-databasereferences are not implemented: %s", NameListToString(cref->fields)))); i = name + 1; numnames -= 3; } else { i = name; numnames -= 2; } /* * isValidNamespace() shouldwork like LookupExplicitNamespace() * except that it should return false on failure instead of * raising an error */ /* Whole-row reference? */ if (strcmp(end[-1], "*") == 0) { node = transformWholeRowRef(pstate,i, numnames, end[-2]); break; } /* * Here I've changedthe signature of transformWholeRowRef() to * accept a char** and an int for the schema names */ /* Try to identify as a twice-qualified column */ node = qualifiedNameToVar(pstate, i, numnames, end[-1],true); /* * And obviously we have to hack qualifiedNameToVar() similarly */ if (node == NULL) { /* Try it as a function call */ node = transformWholeRowRef(pstate,i, numnames, end[-2]); node = ParseFuncOrColumn(pstate, list_make1(makeString(end[-1])), list_make1(node), false, false, true); } break; } } What am I missing? __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
"Dave Held" <dave.held@arraysg.com> writes: > /* > * We check the catalog name and then ignore it. > */ > if (!isValidNamespace(name[0])) > { > if (strcmp(name[0], get_database_name(MyDatabaseId)) != 0) > ereport(ERROR, Which more or less proves my point: the syntax is fundamentally ambiguous. I suppose people would learn not to use schema names that match the database they are in, but that doesn't make it a good idea to have sensible behavior depend on non-overlap of those names. [ thinks for awhile ... ] OTOH, what if we pretended that two-level-nested schemas ARE catalogs in the sense that the SQL spec expects? Then we could get rid of the pro-forma special case here, which isn't ever likely to do anything more useful than throw an error anyway. Thus, we'd go back to the pre-7.3 notion that the current Postgres DB's name isn't part of the SQL naming scheme at all, and instead handle the spec's syntax requirements by setting up some conventions that make a schema act like what the spec says is a catalog. There are some nontrivial issues to be thought about here, like under what conditions "CREATE SCHEMA foo" ought to create a top-level schema versus creating a schema under some other schema that we are pretending is the active "catalog". But it seems on first glance like something could be worked out. regards, tom lane
> There are some nontrivial issues to be thought about here, like under > what conditions "CREATE SCHEMA foo" ought to create a top-level schema > versus creating a schema under some other schema that we are pretending > is the active "catalog". But it seems on first glance like something > could be worked out. Just go the extra info and call the top level catalogs in the commands as well: CREATE DATABASE mydb; \c mydb CREATE CATALOG foo; CREATE SCHEMA foo.bar CREATE TABLE foo.bar.baz (bif serial); --
On Wed, 2005-05-11 at 15:41 -0400, Rod Taylor wrote: > > There are some nontrivial issues to be thought about here, like under > > what conditions "CREATE SCHEMA foo" ought to create a top-level schema > > versus creating a schema under some other schema that we are pretending > > is the active "catalog". But it seems on first glance like something > > could be worked out. > > Just go the extra info and call the top level catalogs in the commands Extra inch, not info. > as well: > > CREATE DATABASE mydb; > \c mydb > > CREATE CATALOG foo; > CREATE SCHEMA foo.bar > CREATE TABLE foo.bar.baz (bif serial); --
Rod Taylor <pg@rbt.ca> writes: >> There are some nontrivial issues to be thought about here, like under >> what conditions "CREATE SCHEMA foo" ought to create a top-level schema >> versus creating a schema under some other schema that we are pretending >> is the active "catalog". But it seems on first glance like something >> could be worked out. > Just go the extra info and call the top level catalogs in the commands > as well: Nope, doesn't meet the spec requirements. One thing we can certainly say is that there would have to be a notion of an "active catalog" (which could be determined by outside-the-spec means, perhaps a GUC variable) because "CREATE SCHEMA foo" would have to create foo as a child of the active catalog. I'm also fairly unclear on what this implies for search_path searches. Currently, as soon as you have more than one dotted name, search_path is ignored ... but should it be used? Maybe "a.b" ought to be sought as "foo.a.b" for successive values of "foo" from the search path. regards, tom lane
Tom Lane wrote: > Rod Taylor <pg@rbt.ca> writes: > >> There are some nontrivial issues to be thought about here, like under > >> what conditions "CREATE SCHEMA foo" ought to create a top-level schema > >> versus creating a schema under some other schema that we are pretending > >> is the active "catalog". But it seems on first glance like something > >> could be worked out. > > > Just go the extra info and call the top level catalogs in the commands > > as well: > > Nope, doesn't meet the spec requirements. One thing we can certainly > say is that there would have to be a notion of an "active catalog" > (which could be determined by outside-the-spec means, perhaps a GUC > variable) because "CREATE SCHEMA foo" would have to create foo as a > child of the active catalog. > > I'm also fairly unclear on what this implies for search_path searches. > Currently, as soon as you have more than one dotted name, search_path > is ignored ... but should it be used? Maybe "a.b" ought to be sought > as "foo.a.b" for successive values of "foo" from the search path. How is a catalog different from a schema? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > How is a catalog different from a schema? In the spec there's a hard-wired difference: catalogs contain schemas, schemas don't contain other schemas. The idea at hand here is to make our namespaces serve both purposes. (I knew there was a good reason not to use the word "schema" for namespaces ;-)) The spec behavior would be met by using exactly two levels of namespace, but there wouldn't be anything stopping people from using more, except that their queries wouldn't look like spec-compatible queries. There are a number of issues that would have to be solved to make this actually work, but on first glance it seems like a possibly attractive idea. Besides, I can't wait to hear the moans from the newsysviews crew when the implications of this sink in ;-) ;-) regards, tom lane
On Wed, May 11, 2005 at 04:49:52PM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > How is a catalog different from a schema? > > In the spec there's a hard-wired difference: catalogs contain schemas, > schemas don't contain other schemas. The idea at hand here is to make > our namespaces serve both purposes. (I knew there was a good reason > not to use the word "schema" for namespaces ;-)) The spec behavior > would be met by using exactly two levels of namespace, but there > wouldn't be anything stopping people from using more, except that their > queries wouldn't look like spec-compatible queries. So is the *only* difference in which contains the other? It sounds like they just use a different name to enforce that there's only 2 levels. > Besides, I can't wait to hear the moans from the newsysviews crew when > the implications of this sink in ;-) ;-) Oh no, not recursive function calls! :P Actually, for the performance we're trying to obtain on the more important views (ie tables, indexes), it might become an issue. It would probably force us to C functions which we've thus-far avoided. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > On Wed, May 11, 2005 at 04:49:52PM -0400, Tom Lane wrote: >> Besides, I can't wait to hear the moans from the newsysviews crew when >> the implications of this sink in ;-) ;-) > Oh no, not recursive function calls! :P No, actually, I was wondering where the potentially N levels of schema names would appear in the output ... regards, tom lane
On Wed, May 11, 2005 at 05:28:22PM -0400, Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > On Wed, May 11, 2005 at 04:49:52PM -0400, Tom Lane wrote: > >> Besides, I can't wait to hear the moans from the newsysviews crew when > >> the implications of this sink in ;-) ;-) > > > Oh no, not recursive function calls! :P > > No, actually, I was wondering where the potentially N levels of schema > names would appear in the output ... My immediate thought is that they would be appended together in 'dot notation'; 'schema1.schema2.schema3', since that's the definative way to refer to the schema in such a scheme. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > On Wed, May 11, 2005 at 05:28:22PM -0400, Tom Lane wrote: >> No, actually, I was wondering where the potentially N levels of schema >> names would appear in the output ... > My immediate thought is that they would be appended together in 'dot > notation'; 'schema1.schema2.schema3', since that's the definative way to > refer to the schema in such a scheme. That's OK for human consumption but I'm not so sure it'll be of any value to programs. At the very least you'd have to quotify the names, so that a.b can be told from "a.b". regards, tom lane
Adding to the ambiguity is the dot notation used for composite columns. Don't forget the other end ignoring those required parens. is foo.bar.zap a database.schema.table a schema.table.columna table.column.column --elein On Wed, May 11, 2005 at 03:21:42PM -0400, Tom Lane wrote: > "Dave Held" <dave.held@arraysg.com> writes: > > /* > > * We check the catalog name and then ignore it. > > */ > > if (!isValidNamespace(name[0])) > > { > > if (strcmp(name[0], get_database_name(MyDatabaseId)) != 0) > > ereport(ERROR, > > Which more or less proves my point: the syntax is fundamentally > ambiguous. I suppose people would learn not to use schema names that > match the database they are in, but that doesn't make it a good idea to > have sensible behavior depend on non-overlap of those names. > > [ thinks for awhile ... ] > > OTOH, what if we pretended that two-level-nested schemas ARE catalogs > in the sense that the SQL spec expects? Then we could get rid of the > pro-forma special case here, which isn't ever likely to do anything more > useful than throw an error anyway. Thus, we'd go back to the pre-7.3 > notion that the current Postgres DB's name isn't part of the SQL naming > scheme at all, and instead handle the spec's syntax requirements by > setting up some conventions that make a schema act like what the spec > says is a catalog. > > There are some nontrivial issues to be thought about here, like under > what conditions "CREATE SCHEMA foo" ought to create a top-level schema > versus creating a schema under some other schema that we are pretending > is the active "catalog". But it seems on first glance like something > could be worked out. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
On Wed, May 11, 2005 at 05:43:32PM -0400, Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > On Wed, May 11, 2005 at 05:28:22PM -0400, Tom Lane wrote: > >> No, actually, I was wondering where the potentially N levels of schema > >> names would appear in the output ... > > > My immediate thought is that they would be appended together in 'dot > > notation'; 'schema1.schema2.schema3', since that's the definative way to > > refer to the schema in such a scheme. > > That's OK for human consumption but I'm not so sure it'll be of any > value to programs. At the very least you'd have to quotify the names, > so that a.b can be told from "a.b". Very true. Ultimately the best way to handle this is probably to keep the views basically as they are (meaning you would only show the schema_name and oid of the schema that an object is in), and have a function that will provide you a full schema path given a schema_oid. On another note... is dbname.schema.table.column part of the standard? It seems like if we're ever going to allow native cross-database communication we'd want to preserve that. One thought is the use of a leading . to indicate you're starting at the database level. No leading . means you're in whatever database you're connected to. Another possibility is that 'remote' databases (which might be on the same server) get mapped into a fixed portion of the namespace hierarchy, such as pg_rdb. I don't like cryptic names, but I certainly don't want to type 'pg_remote_databas' everytime I refer to something remote. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Wed, May 11, 2005 at 02:41:43PM -0700, elein wrote: > Adding to the ambiguity is the dot notation used for > composite columns. Don't forget the other end ignoring > those required parens. > > is foo.bar.zap > a database.schema.table > a schema.table.column > a table.column.column Wouldn't that be handled by the FROM clause having to identify only tables and views? Is there anyplace where dot notation actually extends from database name down to columns? If that's the case, it seems reasonable to me to require the use of table aliases in cases where there's ambiguity. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On 2005-05-11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > There are a number of issues that would have to be solved to make this > actually work, but on first glance it seems like a possibly attractive > idea. > > Besides, I can't wait to hear the moans from the newsysviews crew when > the implications of this sink in ;-) ;-) I'm not too worried; how many other things assume that schema.tablename uniquely identifies a table? This is at least as large a change as adding schemas in the first place. Obvious strategies include: - if only one additional nesting level is defined, add a "catalog" column to match every "schema" column - if multiple levels are defined, add a "schema_path" column with an array of names to match every "schema" column. If schema.tablename becomes non-unique (because this feature was implemented _and_ someone creates the same schema in different catalogs) then anything that currently queries the catalogs, whether directly or via pg_tables (or even information_schema if you allow more than one additional level) is going to have issues. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, May 11, 2005 2:22 PM > To: Dave Held > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Oracle Style packages on postgres > > > "Dave Held" <dave.held@arraysg.com> writes: > > /* > > * We check the catalog name and then ignore it. > > */ > > if (!isValidNamespace(name[0])) > > { > > if (strcmp(name[0], > get_database_name(MyDatabaseId)) != 0) > > ereport(ERROR, > > Which more or less proves my point: the syntax is fundamentally > ambiguous. Not at all. Ambiguity means that there are two equally valid parses. Under the semantics I proposed, schema names take precedence. That is, given: db: foo schema: bar schema: foo.bar The expression foo.bar.rel.col refers to schema foo.bar, and not to db foo, schema bar. If by "fundamentally ambiguous", you mean "there is no a priori reason to choose one set of semantics over another", I would tend to disagree, but the syntax as I proposed it is not ambiguous. We use precedence to eliminate otherwise valid parses all the time. > I suppose people would learn not to use schema names that > match the database they are in, but that doesn't make it a > good idea to have sensible behavior depend on non-overlap of > those names. There's nothing wrong with using a schema name that matches the db. The only confusion comes when you put nested elements at both the db level and schema level having the same names. Since I presume most people don't specify db names in their queries, having schemas take precedence makes the most sense to me. > [ thinks for awhile ... ] > > OTOH, what if we pretended that two-level-nested schemas ARE > catalogs in the sense that the SQL spec expects? Then we could > get rid of the pro-forma special case here, which isn't ever > likely to do anything more useful than throw an error anyway. > Thus, we'd go back to the pre-7.3 notion that the current > Postgres DB's name isn't part of the SQL naming scheme at all, > and instead handle the spec's syntax requirements by setting up > some conventions that make a schema act like what the spec says > is a catalog. > [...] I think this would be worse than not having nested schemas at all. It looks, feels, and smells like a hack. I think there should be a reasonable depth to schema nesting, but I think it should be much larger than 2. I think 8 is much more reasonable. One can argue that nested schemas are nothing more than syntactic sugar, and this is most definitely true. But as programming language design teaches us, syntactic sugar is everything. The better our tools can model our problem spaces, the better they can help us solve our problems. A way in which nested schemas are more than syntactic sugar is in the fact that they can provide a convenient means of additinoal security management. Rather than twiddling with the privileges on groups of objects within a schema, objects that should have similar privileges can be put in the same subschema. However, returning to the original topic of the thread, nested schemas are not nearly as interesting to me as the encapsulation provided by a package-like feature. To be honest, though, what tantalizes me is not the prospect of a package feature but an expansion of the Type system. As a reasonably popular production system, Postgres must necessarily be conservative. But its roots lay in experimentation, and vestiges of those roots can still be seen in its structure. Because of its maturity, Postgres is well positioned to implement some rather advanced concepts, but perhaps the most radical of them should be implemented in a fork rather than the main system. Traditionally, a database is seen as a warehouse of raw data. ODBMSes position themselves as the next generation by viewing a database as a collection of persistent, richly structured objects. Both views have strengths and weaknesses. Postgres takes an interesting middle ground position within the ORDBMS space. It is heavily relational with strong support for standard SQL and numerous query tuning options. But it also features an interesting number of rather non-relational concepts, like custom operator definitions, operator classes, user-defined conversions and types. However, it seems to me that these features are probably very underutilized. This is probably due to two reasons: 1) most programmers aren't used to being able to define custom operators in their favorite programming language, so the concept isn't familiar enough to them to try it in their DBMS. 2) The other features which support this aren't designed or presented in a cohesive manner that impresses the programmer that this is a compelling and superior way to go about things. The fact is, operator overloading is a *very* powerful way to program. In particular, it is one of the key factors in supporting generic programming in a natural way. People who are unsure of this claim should examine the Spirit parser generator or Blitz++, among numerous other examples. It's something of a tragedy that such a powerful feature is sitting languishing in a database system, when most mainstream PLs have yet to catch up! But that is exactly the problem. Operator overloading is a concept that is ahead of its time, and was when it was invented way back in the early days of Lisp. Similarly, the user-defined conversions speak to a chimeric nature of Postgres, which is its attitude towards typing. In some areas, Postgres is strongly typed, and in others, it is weakly or dynamically typed. Yet in others it is polymorphically typed. But Postgres isn't alone in this conundrum. Languages that are very strongly typed tend to require casting between types. Languages that are weakly typed or dynamically typed are more prone to type errors (just ask Java programmers about ClassCastException). The trend in PL design is that strong typing is better. In fact, the type system is the essence of a PL. The grammar determines what is a legal program in a given language. The type system determines what is *not* a legal program. By eliminating nonsensical programs, a type system enforces good programming behavior. So the power of a type system is not in what it enables, but rather what it *disables* or disallows. That is why encapsulation is good. It does not add a powerful new form of computation to a programming language. Rather, it *restricts* what can be done so that dangerous operations can be eliminated or at least localized to an easily-inspected region of the code. Postgres has an opportunity to revolutionize database design by introducing a strongly typed DBMS that offers some of the features of a pure ODBMS while providing the traditional RDBMS capabilities. I don't know exactly what such a beast would look like, but I do know that I see a lot of the pieces already in Postgres, waiting to be chipped and molded and fit together into a more elegant whole. Let me give an illustration of how this might possibly come about. The canonical example is the Employee table. In a flat file world, this table would be a simple set of records, each containing possibly redundant information about a set of employees. Names, ssn, phones, position, salary, etc. To access the employees, we fetch the records of interest from the DB and manipulate them mostly in our client or middleware code. The DB does nothing more than load and save our data, and assist with searches. In the relational world, we factor out the common data and put them into different relations, linking to them with foreign keys. We basically break the record up into pieces, and reassemble the pieces as needed. For compiling reports that look at the data as an aggregate in different ways, this is a very powerful and efficient way to go about things. For dealing with individual employees as singular entities, this is not such an efficient way to do things. It is very space efficient, but it pays for that space efficiency with time. When we want our time back, we flatten the relations and materialize our views and basically tread back towards the the flat file world. When a client wants a tuple, the DBMS has to ressurect the tuple from its component relations, as if it were breathing life into it from the clay of the earth. However, the DBMS can be clever and offer some server-side triggers or procedures to assist the client in manipulating the object. In the ODBMS world, we return partway to the flat file world by serializing objects in a more or less flat way. However, the objects can refer to each other in a relational way, and those relations are captured as well. But the normalization typical in the relational world is by no means typical of the OOP world, so the data factoring is much less aggressive, which has both drawbacks and benefits, depending on the application. Once again, the DBMS is just a data server, serializing and unserializing objects to disk. The process is very fast and clean for object-oriented data, but not so fast for flat data. The problem in the database world is the same as the problem in the PL world: people are conflating types and representations. A representation is a raw data format, like a 16-bit 2's complement signed integer, or an 8-bit ASCII character. A type is a set of values, an instance of which is stored in some representation. An Integer may be stored in the 16-bit signed int, or it may be represented by some 128-bit memory location. But really, most real-world problems do not have Integers as a natural type in the model. Going back to our Employee, the natural types will be more like SSN: the values are always 9 digits long and there's rules about allowed prefixes, etc. The set of values in the SSN type is smaller than the set of values in the 9-digit-Integers type. So 9-digit-Integer can be a representation of SSN, but an SSN *type* implies more. It doesn't make sense to do arithmetic on an SSN, so arithmetical operations should neither be defined nor allowed on SSNs. To do arithmetic on one, you should be forced to cast to another type, and the cast should check whether this is a sensible or valid operation in the current context. Now the thing about the relational model is that it is essentially about *representations*. So the things that we call "attribute types" are really "attribute representations", which is why there are so few of them by default. In reality, every problem domain brings a large host of richly defined types that map onto those representations to a greater or lesser degree. The reason people want packages is because functions in Postgres, and indeed, most RDBMSes, operate on representations, which allow you to do something like take the square root of an SSN, which makes perfect sense if the SSN is a numeric(9, 0). Programmers who want greater correctness see that type safety is the way to get there. And type safety can only be enforced in a strongly typed environment. The interesting thing is that domains are essentially the formal notion of a type in the type vs. representation distinction. In a strongly typed ORDBMS, tuples would be defined as sets of domains, rather than sets of "types" (representations). And operations on tuples would be strictly type checked. Note that the other features of OOP, inheritance and polymorphism, are really just ways to define exceptions or loopholes in the type system. Inheritance allows you say that one type is substitutable for another, and polymorphism says that an operation knows how to act on more than one type, even when the types aren't related. So back to the future...in an ORDBMS world you should not ask for a collection of representations. You should ask for a collection of objects. You should not ask for transformation of tuples, but rather transformation of objects and sets of objects. And the user should be able to define a sufficiently rich interface to those objects that the ORDBMS can easily fulfill them. Say you want to increase the salary of every employee at pay grade 3. You should not be able to directly manipulate the salary field of the Employee relation. Rather, you should define a method like RaiseSalary() that operates on Employees (or better yet, is a member of the Employee type), and then define the set of Employees over which that method should be applied. To use a somewhat mathematical notation: RaiseSalary({Employee e | e.payGrade() == 3}); That's not say that SQL should go away. But it should be a language of the representations, an implementation detail that is hidden by a more powerful type-safe facade that helps ensure program correctness. Of course, I'm not making any specific proposal to change Postgres at this time. I'm just giving the community some food for thought to chew on when considering future directions of DBMSes in general and Postgres in particular. I could go on about generic programming and its relation to the ideas above and the current features in Postgres, but I will save that discussion for a rainy day. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
Added to TODO: * Add the features of packages o Make private objects accessable only to objects in the same schema o Allowcurrent_schema.objname to access current schema objects o Add session variables o Allow nested schemas --------------------------------------------------------------------------- Bruce Momjian wrote: > > OK, so it seems we need: > > o make private objects accessable only to objects > in the same schema > o Allow current_schema.objname to access current > schema objects > o session variables > o nested schemas? > > --------------------------------------------------------------------------- > > Dave Held wrote: > > > -----Original Message----- > > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > > Sent: Tuesday, May 10, 2005 8:43 AM > > > To: Thomas Hallgren > > > Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org > > > Subject: Re: [HACKERS] Oracle Style packages on postgres > > > > > > [...] > > > I suppose. I think we should focus on the use cases for Oracle > > > packages, rather than the specific functionality it provides. > > > What things do people need PostgreSQL to do that it already > > > doesn't do? > > > > Is that really the best way to go about things? Already RDBMSes > > are patchwork quilts of functionality. Is merely adding another > > patch the most elegant way to evolve the database? The problem is > > that Oracle et al are trying to be ORDBMSes and aren't exactly sure > > what the best way to go is. Instead of trying to formulate a > > rational plan for what an ORDBMS should even look like, they simply > > look at what would work with their existing infrastructure and tack > > on features. Then Postgres plays the copycat game. Instead of > > trying to play catch-up with Oracle, why not beat them at their own > > game? > > > > What packages provide is encapsulation. Hiding the data from the > > user and forcing him/her to use the public interface (methods). > > That is an important and admirable OO feature. Some people think > > that using the DB's security model can achieve the same thing. It > > can't, exactly, but there's an important lesson to be learned from > > the suggestion. The problem is that OOP is a *programming* paradigm, > > and a database is not a *programming language*. In a programming > > language, there really is no such thing as "security". There is > > only "visibility" and "accessibility". Private methods in an OOP > > language do not provide *security*; they only limit *accessibility*. > > Like so many other differences between the relational model and the > > OOP model, there is an impedance mismatch here. However, there is > > also opportunity. > > > > In an OOPL, you can say: "Users can call this method from here, but > > not from there." What you *can't* say is: "User X can call this > > method, but User Y cannot." As you can see, these are orthogonal > > concepts. You could call the first "accessibility by location" and > > the second "accessibility by authentication". An ORDBMS should > > support both. "Private" does not respect your identity, only your > > calling location. An ACL does not respect your calling scope, only > > your identity. A system that has both is clearly more flexible than > > one that only has one or the other. > > > > Now what you need to keep in mind is that each visibility model > > serves a different purpose. The purpose of a security model is to > > limit *who* can see/touch certain data because the data has intrinsic > > value. The purpose of an accessibility model is to limit *where* and > > *how* data can be seen/touched in order to preserve *program > > invariants*. So if you have an object (or tuple!) that records the > > start and stop time of some process, it is probably a logical > > invariant that the stop time is greater than or equal to the start > > time. For this reason, in a PL, you would encapsulate these fields > > (attributes) and only provide controlled access to update them that > > checks and preserves the invariant, *no matter who you are*. You > > don't want a superuser violating this invariant any more than Sue > > User. > > > > Now you might object that constraints allow you to preserve > > invariants as well, and indeed they do. But constraints do not > > respect calling scope. Suppose there is a process that needs to > > update the timestamps in a way that temporarily breaks the invariant > > but restores it afterwards. The only way to effect this in a > > constraint environment is to drop the constraint, perform the > > operation, and restore it. However, dropping a constraint is not an > > ideal solution because there may be other unprivileged processes > > operating on the relation that still need the constraint to be > > enforced. There is no way to say: "There is a priviledged class of > > methods that is allowed to violate this constraint because they are > > trusted to restore it upon completion." Note that this is different > > from saying "There is a priviledged class of users that is allowed > > to violate this constraint." If you try to do something like give > > read-only access to everybody and only write access to one user and > > define that user to be the owner of the methods that update the data, > > you have to follow the convention that that user only operates > > through the defined interface, and doesn't hack the data directly. > > That's because user-level accessibility is not the same as scope- > > level accessibility. Whereas, if you define something like a > > package, and say: "Package X is allowed full and complete access > > to relation Y", and stick the interface methods in X, you still have > > all the user-level security you want while preserving the invariants > > in the most elegant way. > > > > So you can think of a package as a scope in a programming language. > > It's like a user, but it is not a user. A user has privileges that > > cut across scopes. Now, whether packages should be different from > > schemas is a whole different ballgame. The purpose of a schema in > > Postgres is not entirely clear to me. There's lots of different ways > > to use schemas, and there is no obvious best way to use them. In > > order to implement the accessibility features of packages, schemas > > would have to be changed considerably. Probably a lot of users would > > be unhappy if schemas were changed in that way. My guess is that > > this would not be a good idea. > > > > I think we can get some guidance from PLs. C++ is what you call a > > "multi-paradigm language". You can do everything from assembly to > > metaprogramming in C++. As such, it is very loose and open in some > > respects. C++ has two kinds of scopes: it has classes and namespaces. > > Members of a class are encapsulated and support data hiding. Members > > of a namespace are only loosely grouped and do not support data hiding > > explicitly. Namespaces exist primarily to avoid name collisions. > > > > Java, on the other hand, decided that for OOP purity, everything must > > be a class. That would be like making schemas into packages and > > imposing accessibility rules on them. At the end of the day, I think > > many PL design experts agree that making everything a class is not > > necessarily the best way to go. > > > > So schemas can be like C++ namespaces - they provide a means to > > loosely group related objects and help avoid name collisions. So > > the package could be like a class - they provide OOP-like > > encapsulation via accessibility rules. However, that doesn't mean > > that nested schemas wouldn't also be a good thing. In C++, nested > > namespaces are extremely useful when one layer of scoping does not > > sufficiently partition the namespace to avoid frequent name > > collisions. I think the same is true of Postgres. I certainly would > > like to be able to use nested schema names in several contexts. > > Instead, I have to make a choice between making different schemas, > > or making different name prefixes. I wouldn't even mind if nested > > schemas were only allowed to contain schemas except at the leaves of > > the tree. Another feature that is very useful is the "using clause". > > Combined with nested namespaces, this is a very powerful way to give > > programmers/dbas control over names. You can give everything the > > most natural name, and just put it in the appropriate namespace, > > and use the namespace that is relevant to the given task at hand. > > > > So consider this example: > > > > Tables: > > etl.import.record > > etl.export.record > > > > As you can imagine, I don't really want to make an 'import' and > > 'export' schema at the top level. There's several tables in > > each schema, but that should illustrate the point. Then, when > > constructing queries, it would be nice to be able to do this: > > > > USING etl.import > > ; > > SELECT * > > FROM record > > JOIN header ON ... > > JOIN file ON ... > > ; > > > > The effect of a USING clause would be to import the schema names > > into the public namespace for the duration of the transaction. If > > that leads to ambiguous names, then the parser/planner should emit an > > error. > > > > __ > > David B. Held > > Software Engineer/Array Services Group > > 200 14th Ave. East, Sartell, MN 56377 > > 320.534.3637 320.253.7800 800.752.8129 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Thanks for adding this Bruce! Is anyone going to be working on this immediately? If so, I'd be glad to work with someone. Unfortunately, I don't have the time to devote to taking something this big on, but I think it would be a really great thing to have. Just let me know jharris@tvi.edu OR jonah.harris@gmail.com. Thanks! Bruce Momjian wrote: >Added to TODO: > >* Add the features of packages > o Make private objects accessable only to objects in the same schema > o Allow current_schema.objname to access current schema objects > o Add session variables > o Allow nested schemas > > >--------------------------------------------------------------------------- > >Bruce Momjian wrote: > > >>OK, so it seems we need: >> >> o make private objects accessable only to objects >> in the same schema >> o Allow current_schema.objname to access current >> schema objects >> o session variables >> o nested schemas? >> >>--------------------------------------------------------------------------- >> >>Dave Held wrote: >> >> >>>>-----Original Message----- >>>>From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] >>>>Sent: Tuesday, May 10, 2005 8:43 AM >>>>To: Thomas Hallgren >>>>Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org >>>>Subject: Re: [HACKERS] Oracle Style packages on postgres >>>> >>>>[...] >>>>I suppose. I think we should focus on the use cases for Oracle >>>>packages, rather than the specific functionality it provides. >>>>What things do people need PostgreSQL to do that it already >>>>doesn't do? >>>> >>>> >>>Is that really the best way to go about things? Already RDBMSes >>>are patchwork quilts of functionality. Is merely adding another >>>patch the most elegant way to evolve the database? The problem is >>>that Oracle et al are trying to be ORDBMSes and aren't exactly sure >>>what the best way to go is. Instead of trying to formulate a >>>rational plan for what an ORDBMS should even look like, they simply >>>look at what would work with their existing infrastructure and tack >>>on features. Then Postgres plays the copycat game. Instead of >>>trying to play catch-up with Oracle, why not beat them at their own >>>game? >>> >>>What packages provide is encapsulation. Hiding the data from the >>>user and forcing him/her to use the public interface (methods). >>>That is an important and admirable OO feature. Some people think >>>that using the DB's security model can achieve the same thing. It >>>can't, exactly, but there's an important lesson to be learned from >>>the suggestion. The problem is that OOP is a *programming* paradigm, >>>and a database is not a *programming language*. In a programming >>>language, there really is no such thing as "security". There is >>>only "visibility" and "accessibility". Private methods in an OOP >>>language do not provide *security*; they only limit *accessibility*. >>>Like so many other differences between the relational model and the >>>OOP model, there is an impedance mismatch here. However, there is >>>also opportunity. >>> >>>In an OOPL, you can say: "Users can call this method from here, but >>>not from there." What you *can't* say is: "User X can call this >>>method, but User Y cannot." As you can see, these are orthogonal >>>concepts. You could call the first "accessibility by location" and >>>the second "accessibility by authentication". An ORDBMS should >>>support both. "Private" does not respect your identity, only your >>>calling location. An ACL does not respect your calling scope, only >>>your identity. A system that has both is clearly more flexible than >>>one that only has one or the other. >>> >>>Now what you need to keep in mind is that each visibility model >>>serves a different purpose. The purpose of a security model is to >>>limit *who* can see/touch certain data because the data has intrinsic >>>value. The purpose of an accessibility model is to limit *where* and >>>*how* data can be seen/touched in order to preserve *program >>>invariants*. So if you have an object (or tuple!) that records the >>>start and stop time of some process, it is probably a logical >>>invariant that the stop time is greater than or equal to the start >>>time. For this reason, in a PL, you would encapsulate these fields >>>(attributes) and only provide controlled access to update them that >>>checks and preserves the invariant, *no matter who you are*. You >>>don't want a superuser violating this invariant any more than Sue >>>User. >>> >>>Now you might object that constraints allow you to preserve >>>invariants as well, and indeed they do. But constraints do not >>>respect calling scope. Suppose there is a process that needs to >>>update the timestamps in a way that temporarily breaks the invariant >>>but restores it afterwards. The only way to effect this in a >>>constraint environment is to drop the constraint, perform the >>>operation, and restore it. However, dropping a constraint is not an >>>ideal solution because there may be other unprivileged processes >>>operating on the relation that still need the constraint to be >>>enforced. There is no way to say: "There is a priviledged class of >>>methods that is allowed to violate this constraint because they are >>>trusted to restore it upon completion." Note that this is different >>>from saying "There is a priviledged class of users that is allowed >>>to violate this constraint." If you try to do something like give >>>read-only access to everybody and only write access to one user and >>>define that user to be the owner of the methods that update the data, >>>you have to follow the convention that that user only operates >>>through the defined interface, and doesn't hack the data directly. >>>That's because user-level accessibility is not the same as scope- >>>level accessibility. Whereas, if you define something like a >>>package, and say: "Package X is allowed full and complete access >>>to relation Y", and stick the interface methods in X, you still have >>>all the user-level security you want while preserving the invariants >>>in the most elegant way. >>> >>>So you can think of a package as a scope in a programming language. >>>It's like a user, but it is not a user. A user has privileges that >>>cut across scopes. Now, whether packages should be different from >>>schemas is a whole different ballgame. The purpose of a schema in >>>Postgres is not entirely clear to me. There's lots of different ways >>>to use schemas, and there is no obvious best way to use them. In >>>order to implement the accessibility features of packages, schemas >>>would have to be changed considerably. Probably a lot of users would >>>be unhappy if schemas were changed in that way. My guess is that >>>this would not be a good idea. >>> >>>I think we can get some guidance from PLs. C++ is what you call a >>>"multi-paradigm language". You can do everything from assembly to >>>metaprogramming in C++. As such, it is very loose and open in some >>>respects. C++ has two kinds of scopes: it has classes and namespaces. >>>Members of a class are encapsulated and support data hiding. Members >>>of a namespace are only loosely grouped and do not support data hiding >>>explicitly. Namespaces exist primarily to avoid name collisions. >>> >>>Java, on the other hand, decided that for OOP purity, everything must >>>be a class. That would be like making schemas into packages and >>>imposing accessibility rules on them. At the end of the day, I think >>>many PL design experts agree that making everything a class is not >>>necessarily the best way to go. >>> >>>So schemas can be like C++ namespaces - they provide a means to >>>loosely group related objects and help avoid name collisions. So >>>the package could be like a class - they provide OOP-like >>>encapsulation via accessibility rules. However, that doesn't mean >>>that nested schemas wouldn't also be a good thing. In C++, nested >>>namespaces are extremely useful when one layer of scoping does not >>>sufficiently partition the namespace to avoid frequent name >>>collisions. I think the same is true of Postgres. I certainly would >>>like to be able to use nested schema names in several contexts. >>>Instead, I have to make a choice between making different schemas, >>>or making different name prefixes. I wouldn't even mind if nested >>>schemas were only allowed to contain schemas except at the leaves of >>>the tree. Another feature that is very useful is the "using clause". >>>Combined with nested namespaces, this is a very powerful way to give >>>programmers/dbas control over names. You can give everything the >>>most natural name, and just put it in the appropriate namespace, >>>and use the namespace that is relevant to the given task at hand. >>> >>>So consider this example: >>> >>>Tables: >>> etl.import.record >>> etl.export.record >>> >>>As you can imagine, I don't really want to make an 'import' and >>>'export' schema at the top level. There's several tables in >>>each schema, but that should illustrate the point. Then, when >>>constructing queries, it would be nice to be able to do this: >>> >>> USING etl.import >>> ; >>>SELECT * >>> FROM record >>> JOIN header ON ... >>> JOIN file ON ... >>> ; >>> >>>The effect of a USING clause would be to import the schema names >>>into the public namespace for the duration of the transaction. If >>>that leads to ambiguous names, then the parser/planner should emit an >>>error. >>> >>>__ >>>David B. Held >>>Software Engineer/Array Services Group >>>200 14th Ave. East, Sartell, MN 56377 >>>320.534.3637 320.253.7800 800.752.8129 >>> >>>---------------------------(end of broadcast)--------------------------- >>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >>> >>> >>> >>-- >> Bruce Momjian | http://candle.pha.pa.us >> pgman@candle.pha.pa.us | (610) 359-1001 >> + If your life is a hard drive, | 13 Roberts Road >> + Christ can be your backup. | Newtown Square, Pennsylvania 19073 >> >>---------------------------(end of broadcast)--------------------------- >>TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> >> >> > > >
* Jonah H. Harris (jharris@tvi.edu) wrote: > Is anyone going to be working on this immediately? If so, I'd be glad > to work with someone. Unfortunately, I don't have the time to devote to > taking something this big on, but I think it would be a really great > thing to have. Just let me know jharris@tvi.edu OR > jonah.harris@gmail.com. Thanks! It strikes me as slightly unlikely that anyone will start working on this immediately, but I can tell you it's something that some of my users have been asking for and so once I finish off my current work on roles I'll probably be interested in working on this. Stephen > Bruce Momjian wrote: > > >Added to TODO: > > > >* Add the features of packages > > o Make private objects accessable only to objects in the same > > schema > > o Allow current_schema.objname to access current schema objects > > o Add session variables > > o Allow nested schemas > > > > > >--------------------------------------------------------------------------- > > > >Bruce Momjian wrote: > > > > > >>OK, so it seems we need: > >> > >> o make private objects accessable only to objects > >> in the same schema > >> o Allow current_schema.objname to access current > >> schema objects > >> o session variables > >> o nested schemas? > >> > >>--------------------------------------------------------------------------- > >> > >>Dave Held wrote: > >> > >> > >>>>-----Original Message----- > >>>>From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > >>>>Sent: Tuesday, May 10, 2005 8:43 AM > >>>>To: Thomas Hallgren > >>>>Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org > >>>>Subject: Re: [HACKERS] Oracle Style packages on postgres > >>>> > >>>>[...] > >>>>I suppose. I think we should focus on the use cases for Oracle > >>>>packages, rather than the specific functionality it provides. > >>>>What things do people need PostgreSQL to do that it already > >>>>doesn't do? > >>>> > >>>> > >>>Is that really the best way to go about things? Already RDBMSes > >>>are patchwork quilts of functionality. Is merely adding another > >>>patch the most elegant way to evolve the database? The problem is > >>>that Oracle et al are trying to be ORDBMSes and aren't exactly sure > >>>what the best way to go is. Instead of trying to formulate a > >>>rational plan for what an ORDBMS should even look like, they simply > >>>look at what would work with their existing infrastructure and tack > >>>on features. Then Postgres plays the copycat game. Instead of > >>>trying to play catch-up with Oracle, why not beat them at their own > >>>game? > >>> > >>>What packages provide is encapsulation. Hiding the data from the > >>>user and forcing him/her to use the public interface (methods). > >>>That is an important and admirable OO feature. Some people think > >>>that using the DB's security model can achieve the same thing. It > >>>can't, exactly, but there's an important lesson to be learned from > >>>the suggestion. The problem is that OOP is a *programming* paradigm, > >>>and a database is not a *programming language*. In a programming > >>>language, there really is no such thing as "security". There is > >>>only "visibility" and "accessibility". Private methods in an OOP > >>>language do not provide *security*; they only limit *accessibility*. > >>>Like so many other differences between the relational model and the > >>>OOP model, there is an impedance mismatch here. However, there is > >>>also opportunity. > >>> > >>>In an OOPL, you can say: "Users can call this method from here, but > >>>not from there." What you *can't* say is: "User X can call this > >>>method, but User Y cannot." As you can see, these are orthogonal > >>>concepts. You could call the first "accessibility by location" and > >>>the second "accessibility by authentication". An ORDBMS should > >>>support both. "Private" does not respect your identity, only your > >>>calling location. An ACL does not respect your calling scope, only > >>>your identity. A system that has both is clearly more flexible than > >>>one that only has one or the other. > >>> > >>>Now what you need to keep in mind is that each visibility model > >>>serves a different purpose. The purpose of a security model is to > >>>limit *who* can see/touch certain data because the data has intrinsic > >>>value. The purpose of an accessibility model is to limit *where* and > >>>*how* data can be seen/touched in order to preserve *program > >>>invariants*. So if you have an object (or tuple!) that records the > >>>start and stop time of some process, it is probably a logical > >>>invariant that the stop time is greater than or equal to the start > >>>time. For this reason, in a PL, you would encapsulate these fields > >>>(attributes) and only provide controlled access to update them that > >>>checks and preserves the invariant, *no matter who you are*. You > >>>don't want a superuser violating this invariant any more than Sue > >>>User. > >>> > >>>Now you might object that constraints allow you to preserve > >>>invariants as well, and indeed they do. But constraints do not > >>>respect calling scope. Suppose there is a process that needs to > >>>update the timestamps in a way that temporarily breaks the invariant > >>>but restores it afterwards. The only way to effect this in a > >>>constraint environment is to drop the constraint, perform the > >>>operation, and restore it. However, dropping a constraint is not an > >>>ideal solution because there may be other unprivileged processes > >>>operating on the relation that still need the constraint to be > >>>enforced. There is no way to say: "There is a priviledged class of > >>>methods that is allowed to violate this constraint because they are > >>>trusted to restore it upon completion." Note that this is different > >>>from saying "There is a priviledged class of users that is allowed > >>>to violate this constraint." If you try to do something like give > >>>read-only access to everybody and only write access to one user and > >>>define that user to be the owner of the methods that update the data, > >>>you have to follow the convention that that user only operates > >>>through the defined interface, and doesn't hack the data directly. > >>>That's because user-level accessibility is not the same as scope- > >>>level accessibility. Whereas, if you define something like a > >>>package, and say: "Package X is allowed full and complete access > >>>to relation Y", and stick the interface methods in X, you still have > >>>all the user-level security you want while preserving the invariants > >>>in the most elegant way. > >>> > >>>So you can think of a package as a scope in a programming language. > >>>It's like a user, but it is not a user. A user has privileges that > >>>cut across scopes. Now, whether packages should be different from > >>>schemas is a whole different ballgame. The purpose of a schema in > >>>Postgres is not entirely clear to me. There's lots of different ways > >>>to use schemas, and there is no obvious best way to use them. In > >>>order to implement the accessibility features of packages, schemas > >>>would have to be changed considerably. Probably a lot of users would > >>>be unhappy if schemas were changed in that way. My guess is that > >>>this would not be a good idea. > >>> > >>>I think we can get some guidance from PLs. C++ is what you call a > >>>"multi-paradigm language". You can do everything from assembly to > >>>metaprogramming in C++. As such, it is very loose and open in some > >>>respects. C++ has two kinds of scopes: it has classes and namespaces. > >>>Members of a class are encapsulated and support data hiding. Members > >>>of a namespace are only loosely grouped and do not support data hiding > >>>explicitly. Namespaces exist primarily to avoid name collisions. > >>> > >>>Java, on the other hand, decided that for OOP purity, everything must > >>>be a class. That would be like making schemas into packages and > >>>imposing accessibility rules on them. At the end of the day, I think > >>>many PL design experts agree that making everything a class is not > >>>necessarily the best way to go. > >>> > >>>So schemas can be like C++ namespaces - they provide a means to > >>>loosely group related objects and help avoid name collisions. So > >>>the package could be like a class - they provide OOP-like > >>>encapsulation via accessibility rules. However, that doesn't mean > >>>that nested schemas wouldn't also be a good thing. In C++, nested > >>>namespaces are extremely useful when one layer of scoping does not > >>>sufficiently partition the namespace to avoid frequent name > >>>collisions. I think the same is true of Postgres. I certainly would > >>>like to be able to use nested schema names in several contexts. > >>>Instead, I have to make a choice between making different schemas, > >>>or making different name prefixes. I wouldn't even mind if nested > >>>schemas were only allowed to contain schemas except at the leaves of > >>>the tree. Another feature that is very useful is the "using clause". > >>>Combined with nested namespaces, this is a very powerful way to give > >>>programmers/dbas control over names. You can give everything the > >>>most natural name, and just put it in the appropriate namespace, > >>>and use the namespace that is relevant to the given task at hand. > >>> > >>>So consider this example: > >>> > >>>Tables: > >>> etl.import.record > >>> etl.export.record > >>> > >>>As you can imagine, I don't really want to make an 'import' and > >>>'export' schema at the top level. There's several tables in > >>>each schema, but that should illustrate the point. Then, when > >>>constructing queries, it would be nice to be able to do this: > >>> > >>>USING etl.import > >>> ; > >>>SELECT * > >>> FROM record > >>> JOIN header ON ... > >>> JOIN file ON ... > >>> ; > >>> > >>>The effect of a USING clause would be to import the schema names > >>>into the public namespace for the duration of the transaction. If > >>>that leads to ambiguous names, then the parser/planner should emit an > >>>error. > >>> > >>>__ > >>>David B. Held > >>>Software Engineer/Array Services Group > >>>200 14th Ave. East, Sartell, MN 56377 > >>>320.534.3637 320.253.7800 800.752.8129 > >>> > >>>---------------------------(end of broadcast)--------------------------- > >>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >>> > >>> > >>> > >>-- > >> Bruce Momjian | http://candle.pha.pa.us > >> pgman@candle.pha.pa.us | (610) 359-1001 > >> + If your life is a hard drive, | 13 Roberts Road > >> + Christ can be your backup. | Newtown Square, Pennsylvania > >> 19073 > >> > >>---------------------------(end of broadcast)--------------------------- > >>TIP 2: you can get off all lists at once with the unregister command > >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >> > >> > >> > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Bruce, > > >Added to TODO: > > > > > >* Add the features of packages > > > o Make private objects accessable only to objects in the same > > > schema > > > o Allow current_schema.objname to access current schema objects > > > o Add session variables > > > o Allow nested schemas Hmmm ... was there a reason we decided not to just make this explicitly tied to SQL2003 TYPES? -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Bruce, > > > > >Added to TODO: > > > > > > > >* Add the features of packages > > > > o Make private objects accessable only to objects in the same > > > > schema > > > > o Allow current_schema.objname to access current schema objects > > > > o Add session variables > > > > o Allow nested schemas > > Hmmm ... was there a reason we decided not to just make this explicitly tied > to SQL2003 TYPES? I don't think anyone mentioned even knowing about TYPES. Do you have modifiations to this? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073