Re: Oracle Style packages on postgres - Mailing list pgsql-hackers

From Dave Held
Subject Re: Oracle Style packages on postgres
Date
Msg-id 49E94D0CFCD4DB43AFBA928DDD20C8F902618505@asg002.asg.local
Whole thread Raw
In response to Oracle Style packages on postgres  (rmm@sqlisor.com)
Responses Re: Oracle Style packages on postgres  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Oracle Style packages on postgres  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Tuesday, May 10, 2005 8:43 AM
> To: Thomas Hallgren
> Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Oracle Style packages on postgres
>
> [...]
> I suppose.  I think we should focus on the use cases for Oracle
> packages, rather than the specific functionality it provides.
> What things do people need PostgreSQL to do that it already
> doesn't do?

Is that really the best way to go about things?  Already RDBMSes
are patchwork quilts of functionality.  Is merely adding another
patch the most elegant way to evolve the database?  The problem is
that Oracle et al are trying to be ORDBMSes and aren't exactly sure
what the best way to go is.  Instead of trying to formulate a
rational plan for what an ORDBMS should even look like, they simply
look at what would work with their existing infrastructure and tack
on features.  Then Postgres plays the copycat game.  Instead of
trying to play catch-up with Oracle, why not beat them at their own
game?

What packages provide is encapsulation.  Hiding the data from the
user and forcing him/her to use the public interface (methods).
That is an important and admirable OO feature.  Some people think
that using the DB's security model can achieve the same thing.  It
can't, exactly, but there's an important lesson to be learned from
the suggestion.  The problem is that OOP is a *programming* paradigm,
and a database is not a *programming language*.  In a programming
language, there really is no such thing as "security".  There is
only "visibility" and "accessibility".  Private methods in an OOP
language do not provide *security*; they only limit *accessibility*.
Like so many other differences between the relational model and the
OOP model, there is an impedance mismatch here.  However, there is
also opportunity.

In an OOPL, you can say: "Users can call this method from here, but
not from there."  What you *can't* say is: "User X can call this
method, but User Y cannot."  As you can see, these are orthogonal
concepts.  You could call the first "accessibility by location" and
the second "accessibility by authentication".  An ORDBMS should
support both.  "Private" does not respect your identity, only your
calling location.  An ACL does not respect your calling scope, only
your identity.  A system that has both is clearly more flexible than
one that only has one or the other.

Now what you need to keep in mind is that each visibility model
serves a different purpose.  The purpose of a security model is to
limit *who* can see/touch certain data because the data has intrinsic
value.  The purpose of an accessibility model is to limit *where* and
*how* data can be seen/touched in order to preserve *program
invariants*.  So if you have an object (or tuple!) that records the
start and stop time of some process, it is probably a logical
invariant that the stop time is greater than or equal to the start
time.  For this reason, in a PL, you would encapsulate these fields
(attributes) and only provide controlled access to update them that
checks and preserves the invariant, *no matter who you are*.  You
don't want a superuser violating this invariant any more than Sue
User.

Now you might object that constraints allow you to preserve
invariants as well, and indeed they do.  But constraints do not
respect calling scope.  Suppose there is a process that needs to
update the timestamps in a way that temporarily breaks the invariant
but restores it afterwards.  The only way to effect this in a
constraint environment is to drop the constraint, perform the
operation, and restore it.  However, dropping a constraint is not an
ideal solution because there may be other unprivileged processes
operating on the relation that still need the constraint to be
enforced.  There is no way to say: "There is a priviledged class of
methods that is allowed to violate this constraint because they are
trusted to restore it upon completion."  Note that this is different
from saying "There is a priviledged class of users that is allowed
to violate this constraint."  If you try to do something like give
read-only access to everybody and only write access to one user and
define that user to be the owner of the methods that update the data,
you have to follow the convention that that user only operates
through the defined interface, and doesn't hack the data directly.
That's because user-level accessibility is not the same as scope-
level accessibility.  Whereas, if you define something like a
package, and say: "Package X is allowed full and complete access
to relation Y", and stick the interface methods in X, you still have
all the user-level security you want while preserving the invariants
in the most elegant way.

So you can think of a package as a scope in a programming language.
It's like a user, but it is not a user.  A user has privileges that
cut across scopes.  Now, whether packages should be different from
schemas is a whole different ballgame.  The purpose of a schema in
Postgres is not entirely clear to me.  There's lots of different ways
to use schemas, and there is no obvious best way to use them.  In
order to implement the accessibility features of packages, schemas
would have to be changed considerably.  Probably a lot of users would
be unhappy if schemas were changed in that way.  My guess is that
this would not be a good idea.

I think we can get some guidance from PLs.  C++ is what you call a
"multi-paradigm language".  You can do everything from assembly to
metaprogramming in C++.  As such, it is very loose and open in some
respects.  C++ has two kinds of scopes: it has classes and namespaces.
Members of a class are encapsulated and support data hiding.  Members
of a namespace are only loosely grouped and do not support data hiding
explicitly.  Namespaces exist primarily to avoid name collisions.

Java, on the other hand, decided that for OOP purity, everything must
be a class.  That would be like making schemas into packages and
imposing accessibility rules on them.  At the end of the day, I think
many PL design experts agree that making everything a class is not
necessarily the best way to go.

So schemas can be like C++ namespaces - they provide a means to
loosely group related objects and help avoid name collisions.  So
the package could be like a class - they provide OOP-like
encapsulation via accessibility rules.  However, that doesn't mean
that nested schemas wouldn't also be a good thing.  In C++, nested
namespaces are extremely useful when one layer of scoping does not
sufficiently partition the namespace to avoid frequent name
collisions.  I think the same is true of Postgres.  I certainly would
like to be able to use nested schema names in several contexts.
Instead, I have to make a choice between making different schemas,
or making different name prefixes.  I wouldn't even mind if nested
schemas were only allowed to contain schemas except at the leaves of
the tree.  Another feature that is very useful is the "using clause".
Combined with nested namespaces, this is a very powerful way to give
programmers/dbas control over names.  You can give everything the
most natural name, and just put it in the appropriate namespace,
and use the namespace that is relevant to the given task at hand.

So consider this example:

Tables: etl.import.record etl.export.record

As you can imagine, I don't really want to make an 'import' and
'export' schema at the top level.  There's several tables in
each schema, but that should illustrate the point.  Then, when
constructing queries, it would be nice to be able to do this:
USING etl.import      ;
SELECT * FROM record JOIN header ON ... JOIN file ON ...      ;

The effect of a USING clause would be to import the schema names
into the public namespace for the duration of the transaction.  If
that leads to ambiguous names, then the parser/planner should emit an
error.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Views, views, views! (long)
Next
From: "Merlin Moncure"
Date:
Subject: Re: Views, views, views! (long)