Thread: Re: schema support, was Package support for Postgres

Re: schema support, was Package support for Postgres

From
Bill Studenmund
Date:
On Mon, 15 Oct 2001, Tom Lane wrote:

> Bill Studenmund <wrstuden@netbsd.org> writes:
> > For the most part, I think packages and schemas are orthogonal. I'm taking
> > a cue from Oracle here. Oracle considers packages to be a schema-specific
> > object.
>
> Nonetheless, it's not clear to me that we need two independent concepts.
> Given a name search path that can go through multiple schemas, it seems
> to me that you could get all the benefits of a package from a schema.
>
> I'm not necessarily averse to accepting Oracle's syntax for declaring
> packages --- if we can make it easier for Oracle users to port to Postgres,
> that's great.  But I'm uncomfortable with the notion of implementing two
> separate mechanisms that seem to do the exact same thing, ie, control
> name visibility.

I'm at a loss as to what to say. I think that what packages do and what
schemas do are different - they are different kinds of namespaces. That's
why they should have different mechanisms. Packages are for making it
easier to write stored procedures for large programming projects or for
code reuse.  Schemas, well, I need to learn more. But they strike me more
as a tool to partition entire chunks of a database.

Also, packages have a whole concept of initialization routines and global
variables, which strike me as having no place alongside tables and views.

Take care,

Bill



Re: schema support, was Package support for Postgres

From
Bill Studenmund
Date:
On Sun, 14 Oct 2001, Bill Studenmund wrote:

> On Mon, 15 Oct 2001, Tom Lane wrote:
>
> > Bill Studenmund <wrstuden@netbsd.org> writes:
> > > For the most part, I think packages and schemas are orthogonal. I'm taking
> > > a cue from Oracle here. Oracle considers packages to be a schema-specific
> > > object.
> >
> > Nonetheless, it's not clear to me that we need two independent concepts.
> > Given a name search path that can go through multiple schemas, it seems
> > to me that you could get all the benefits of a package from a schema.

I've been thinking about this. I've changed my mind. Well, I've come to
realize that you can have multiple schemas in one db, so that multiple
schema support != one db reaching into another.

I still think that schemas and packages are different, but I now think
they are interrelated. And that it shouldn't be too hard to leverage the
package work into schema support. Still a lot of work, but the package
work has shown how to go from one to two in a number of ways. :-)

First off, do you (Tom) have a spec for schema support? I think that would
definitly help things.

Second, can you help me with gram.y? I'm trying to get gram.y to deal with
figuring out if you've typed in packagename.function name, rather than
relying on the lexer to notice you've typed ${identifier}\.${identifier}
where the first identifier is a package name & send a terminal saying so.
Twelve r/r conflicts. They involve a conflict between ColId and something
else, and focus on not knowing what reduction to take when seeing a '[',
',', or ')'. Thoughts?

Take care,

Bill



Re: schema support, was Package support for Postgres

From
Bill Studenmund
Date:
On Tue, 16 Oct 2001, Bill Studenmund wrote:

> I still think that schemas and packages are different, but I now think
> they are interrelated. And that it shouldn't be too hard to leverage the
> package work into schema support. Still a lot of work, but the package
> work has shown how to go from one to two in a number of ways. :-)
>
> First off, do you (Tom) have a spec for schema support? I think that would
> definitly help things.

I found an on-line copy of the SQL92 spec, and I've been looking at it.

I think it wouldn't be _that_ much more work to add shema support to what
I've done for packages. Not trivial, but certainly not double the work.

But I have some questions.

The big one for now is how should you log into one schema or another?
psql database.schema ?

Here's a plan for schema support. But first let me review what packages
have.

Right now (in my implimentation), packages have added a "standard" package
(oid 10) which contains all of the built-in procedures, aggregates, types,
and operators.  Whenever you use the normal CREATE commands, you add a
procedure, aggregate, operator, or type in the "standard" package.

There is a new table, pg_package, which lists the name of each installed
package and its owner. "standard" is owned by PGUID. packages are
referenced by the oid of the row describing the package in this table.

Whenever you look up a function or aggregate, you give the oid of the
package to look in in addition to the name (and types). Having the package
id in the index provides the namespacing.

Whenever you look up a type or operator, you don't have to give a package
id.

Whenever you call the parser to parse a command, you pass it the package
context (oid) in which the parsing takes place. If you are typing in
commands in psql, that package id is 10, or "standard". Likewise for sql
or plpgsql routines not in a package. If you are in an sql or plpgsql
routine which is in a package, the package's oid is passed in. That's what
has package routines look in the package first.

The parser also notes if you gave a package id or not (package.foo vs
foo). If you were in a package context and were not exact (foo in a
procedure in a package for instance), then all of the places which look up
functions will try "standard" if they don't find a match.

There is a table, pg_packglobal, which contains package globals for the
different PLs. It contains 5 columns. The first three are the package oid,
the language oid, and a sequence number. They are indexed. The two others
are variable name and variable type (of PostgreSQL type name and text
respectively). PLs for which these variables don't make sense are free to
ignore them.

Extending this for schema support.

Executive summary: all of the above becomes the infrastructure to let
different schemas have schema-private functions and aggregates.

We add a new table, pg_schema, which lists the schemas in this database.
It would contain a name column, an owner column, something to indicate
character set (?), and other stuff I don't know of. Schemas are referenced
internally by the oid of the entry in this table.

There is a built-in schema, "master". It will have a fixed oid, probalby 9
or 11.

The "master" schema will own the "standard" package oid 10, which contains
all of the built-in functions, and ones added by create function/etc.

Each new schema starts life with a "standard" package of its own. This
package is the one which holds functions & aggregates made with normal
commands (create function, create aggregate) when you're logged into that
schema.

pg_package grows two more columns. One references the schema containing
the package. The other contains the oid of the "parent" package. The idea
is this oid is the next oid to look in when you are doing an inexact oid
search. It's vaguely like ".." on a file system.

For master.standard, this column is 0, indicating no further searching.
For say foo.standard (foo is a schema), it would be the oid of
master.standard (10). Likewise for a package baz in the master schema, it
would be master.standard. For a package in a schema, it would be the oid
of the "standard" package of the schema. As an example, say the foo schema
had a package named bup. For baz.bup, this column would have the oid of
baz.standard.

Right now I'm in the process of redoing the parser changes I made so that
the scanner doesn't need to recognize package names. When this is done,
the parser will be able to deal with schema.function and package.function.
Oh, also schema.table.attr too. schema.package.function won't be hard, but
it will be messy.

The only other part (which is no small one) is to add namespacing to the
rest of the backend. I expect that will mean adding a schema column to
pg_class, pg_type, and pg_operator.

Hmmm... We probably also need a command to create operator classes, and
the tables it touches would need a schema column too, and accesses will
need to be schema savy.

Well, that's a lot for now. Thoughts?

Take care,

Bill






Re: schema support, was Package support for Postgres

From
Peter Eisentraut
Date:
Bill Studenmund writes:

> The big one for now is how should you log into one schema or another?
> psql database.schema ?

Each user has a default schema, which is by default the schema with the
same name as the user name, or if no such schema exists, it's the DEFAULT
schema (which I believe is what Oracle calls it).  Then there should be
something like set schema path.  I don't think schemas should be a
connection parameter. -- That would be my ideas anyway.

> Whenever you look up a function or aggregate, you give the oid of the
> package to look in in addition to the name (and types). Having the package
> id in the index provides the namespacing.
>
> Whenever you look up a type or operator, you don't have to give a package
> id.

While I understand that package.+ is silly, anything that make operators
and functions work fundamentally differently is suspicious.  A common
search mechanism that works for everything in packages (or subschemas,
which I'd prefer) would/should/could allow you to do without those
prefixes.

> There is a built-in schema, "master". It will have a fixed oid, probalby 9
> or 11.

The built-in schemas is called DEFINITION_SCHEMA.

> The only other part (which is no small one) is to add namespacing to the
> rest of the backend. I expect that will mean adding a schema column to
> pg_class, pg_type, and pg_operator.

Yup.  But you can replace the owner package with the schema column,
because the owner property will be transferred to the schema.

> Hmmm... We probably also need a command to create operator classes, and
> the tables it touches would need a schema column too, and accesses will
> need to be schema savy.
>
> Well, that's a lot for now. Thoughts?

That "lot" was sort of the problem with tackling this until now. ;-)

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: schema support, was Package support for Postgres

From
Bill Studenmund
Date:
On Sun, 21 Oct 2001, Peter Eisentraut wrote:

> Bill Studenmund writes:
>
> > The big one for now is how should you log into one schema or another?
> > psql database.schema ?
>
> Each user has a default schema, which is by default the schema with the
> same name as the user name, or if no such schema exists, it's the DEFAULT
> schema (which I believe is what Oracle calls it).  Then there should be
> something like set schema path.  I don't think schemas should be a
> connection parameter. -- That would be my ideas anyway.

I can see advantages for both; if you just connect to a database that has
schemas, you get a schema with your name if it's there, and a default
otherwise. But I can see definite advantages to being able to specify.

> > Whenever you look up a function or aggregate, you give the oid of the
> > package to look in in addition to the name (and types). Having the package
> > id in the index provides the namespacing.
> >
> > Whenever you look up a type or operator, you don't have to give a package
> > id.
>
> While I understand that package.+ is silly, anything that make operators
> and functions work fundamentally differently is suspicious.  A common
> search mechanism that works for everything in packages (or subschemas,
> which I'd prefer) would/should/could allow you to do without those
> prefixes.

Why? Operators are used differently than functions. That strikes me as a
good reason to namespace them differently.

Conceptually the main determiner of what function you want is the name, at
least as far as from what I can tell from talking with all the programmers
I know.  Yes, we make sure the types match (are part of the primary key),
but the name is the main concept. Operators, however, are more
intent-based. The '+' operator means I want these two things added
together. I don't care so much what types are involved, I want adding to
happen. That's a difference of intent. And that's the reason that I think
different namespacing rules make sense.

Part of it is that I only expect a package to add operators for types it
introduced. So to be considering them, you had to have done something that
ties in the type in the package. Like you had to make a column in a table
using it.

Another take on that is that I expect the main user of (direct) function
calls calling package functions will be other functions in that package,
while the main users of operators will be places which have used a type
from said package. Like queries pulling things out of tables using that
type. So the function namespacing is a concenience/tool primarily for the
package developer, while the operator and type namespacing is more a
convenience for the end application developer.

Also, you seem to be wanting a path-search ability that is something like
the PATH environment variable. This pathing is fundamentally different; to
use unix terms, it is ".:..". The fundamental difference is that there are
no "absolute" paths. The searching is totally location (of routine)
dependant.

To add something like an absolute path would totally break the whole
motivation for packages. The idea is to give a developer an area overwhich
s/he has total name control, but if s/he needs built-in routines, s/he
doesn't need to say "standard." to get at them.

If we allow something like "absolute paths" in the package namespacing,
then we totally destroy that. Because a package developer can't be sure
what pathing is going on, s/he really has no clue what packages will get
found in what order. So then you have to be explicit in the name of all
the functions you use (otherwise if a user essentially puts something
other than "." at the head of the path, then you don't get routines in
your own package), or run the risk of getting all sorts of run-time
errors. A feature designed to make writing packages easier now makes them
harder. That strikes me as a step backwards.

> > There is a built-in schema, "master". It will have a fixed oid, probalby 9
> > or 11.
>
> The built-in schemas is called DEFINITION_SCHEMA.

Why is it different from the "DEFAULT" you get when you log into a
database which doesn't have a schema whose name matches your username?

> > The only other part (which is no small one) is to add namespacing to the
> > rest of the backend. I expect that will mean adding a schema column to
> > pg_class, pg_type, and pg_operator.
>
> Yup.  But you can replace the owner package with the schema column,
> because the owner property will be transferred to the schema.

Not necessarily. A user other than the one who owns the schema can add a
package to it. It's the same thing as why we keep track of who added a
function. :-)

Take care,

Bill



Re: schema support, was Package support for Postgres

From
Thomas Lockhart
Date:
(I've been following the thread, at least casually ;)

> Why? Operators are used differently than functions. That strikes me as a
> good reason to namespace them differently.
> Conceptually the main determiner of what function you want is the name, at
> least as far as from what I can tell from talking with all the programmers
> I know.  Yes, we make sure the types match (are part of the primary key),
> but the name is the main concept. Operators, however, are more
> intent-based. The '+' operator means I want these two things added
> together. I don't care so much what types are involved, I want adding to
> happen. That's a difference of intent. And that's the reason that I think
> different namespacing rules make sense.

But operators *are* functions underneath the covers. So different
namespacing rules seem like a recipe for missed associations and
unexpected results.

> Part of it is that I only expect a package to add operators for types it
> introduced. So to be considering them, you had to have done something that
> ties in the type in the package. Like you had to make a column in a table
> using it.

I'd expect schemas/packages to have operators and functions for existing
types, not just new ones. That is certainly how our extensibility
features are used; we are extensible in several dimensions (types,
functions, operators) and they do not all travel together. We can't
guess at the future intent of a package developer, and placing
limitations or assumptions about what *must* be in a package just limits
future (unexpected or suprising) uses.

> Another take on that is that I expect the main user of (direct) function
> calls calling package functions will be other functions in that package,
> while the main users of operators will be places which have used a type
> from said package. Like queries pulling things out of tables using that
> type. So the function namespacing is a concenience/tool primarily for the
> package developer, while the operator and type namespacing is more a
> convenience for the end application developer.

We are probably drawing too fine a distinction here.

> Also, you seem to be wanting a path-search ability that is something like
> the PATH environment variable. This pathing is fundamentally different; to
> use unix terms, it is ".:..". The fundamental difference is that there are
> no "absolute" paths. The searching is totally location (of routine)
> dependant.
> To add something like an absolute path would totally break the whole
> motivation for packages. The idea is to give a developer an area overwhich
> s/he has total name control, but if s/he needs built-in routines, s/he
> doesn't need to say "standard." to get at them.
> If we allow something like "absolute paths" in the package namespacing,
> then we totally destroy that. Because a package developer can't be sure
> what pathing is going on, s/he really has no clue what packages will get
> found in what order. So then you have to be explicit in the name of all
> the functions you use (otherwise if a user essentially puts something
> other than "." at the head of the path, then you don't get routines in
> your own package), or run the risk of getting all sorts of run-time
> errors. A feature designed to make writing packages easier now makes them
> harder. That strikes me as a step backwards.

The "absolute path" scoping and lookup scheme is defined in SQL99. I'm
not sure I understand the issue in the last paragraph: you seem to be
making the point that absolute paths are Bad because package developers
don't know what those paths might be. But otoh allowing absolute paths
(and/or embedding them into a package) gives the developer *precise*
control over what their package calls and what the behaviors are. istm
that if a package developer needs to specify precisely the resources his
package requires, then he can do that. And if he wants to leave it
flexible and determined by scoping and pathing rules, then he can do
that too.

afaik relative pathing is not specified in the standard, but we might
want to consider how we would implement that as an extension and whether
that gives more power to the packager or developer.

> > > There is a built-in schema, "master". It will have a fixed oid, probalby 9
> > > or 11.
> > The built-in schemas is called DEFINITION_SCHEMA.
> Why is it different from the "DEFAULT" you get when you log into a
> database which doesn't have a schema whose name matches your username?

It may not be. But SQL99 specifies the name.
                - Thomas


Re: schema support, was Package support for Postgres

From
Peter Eisentraut
Date:
Bill Studenmund writes:

> Why? Operators are used differently than functions.

I don't think so.  Operators are a syntacticaly convenience for functions.
That's what they always have been and that's what they should stay.

> Conceptually the main determiner of what function you want is the name, at
> least as far as from what I can tell from talking with all the programmers
> I know.  Yes, we make sure the types match (are part of the primary key),
> but the name is the main concept. Operators, however, are more
> intent-based. The '+' operator means I want these two things added
> together. I don't care so much what types are involved, I want adding to
> happen. That's a difference of intent. And that's the reason that I think
> different namespacing rules make sense.

Naive developers all program by "intent".  If I invoke a + operator then I
expect it to add.  If I call a sqrt() function then I expect it to
calculate the square root.  If I execute an INSERT statement then I would
prefer that I did not delete anything.  Designing systems to work by
"intent" can be construed as an aspect of "user-friendliness".

But the more knowledgeable programmer is mildly aware of what's going on
behind the scenes: Both "+" and "sqrt" are just names for function code
that may or may not do what you think they do.  So this applies to both
functions and operators.

> Part of it is that I only expect a package to add operators for types it
> introduced.

This is an arbitrary restriction that you might find reasonable, but other
developers might not.

> Another take on that is that I expect the main user of (direct) function
> calls calling package functions will be other functions in that package,
> while the main users of operators will be places which have used a type
> from said package.

See above.

> To add something like an absolute path would totally break the whole
> motivation for packages.

Yes.  If we add something like subschemas then something more
sophisticated than a Unix-style path would have to be engineered.

> > The built-in schemas is called DEFINITION_SCHEMA.
>
> Why is it different from the "DEFAULT" you get when you log into a
> database which doesn't have a schema whose name matches your username?

Because SQL says so.

> > > The only other part (which is no small one) is to add namespacing to the
> > > rest of the backend. I expect that will mean adding a schema column to
> > > pg_class, pg_type, and pg_operator.
> >
> > Yup.  But you can replace the owner package with the schema column,
> > because the owner property will be transferred to the schema.
>
> Not necessarily. A user other than the one who owns the schema can add a
> package to it. It's the same thing as why we keep track of who added a
> function. :-)

Blech, I meant "you can replace the owner column with the schema column".

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: schema support, was Package support for Postgres

From
Bill Studenmund
Date:
On Tue, 23 Oct 2001, Thomas Lockhart wrote:

> (I've been following the thread, at least casually ;)
>
> > intent-based. The '+' operator means I want these two things added
> > together. I don't care so much what types are involved, I want adding to
> > happen. That's a difference of intent. And that's the reason that I think
> > different namespacing rules make sense.
>
> But operators *are* functions underneath the covers. So different
> namespacing rules seem like a recipe for missed associations and
> unexpected results.

Underneath the covers, yes. But "those covers" make the difference in what
I'm thinking of. An operator isn't just one function call, it can be
multiple ones. And not just multiple itterations, but multiple different
ones depending on what the optimizer is doing. That's why you can give an
operator more than just the procedure operator. You also give it a join
proc and a restrict proc, and you tie it in with a commutator, negator,
and two sort operators. When you use an operator, you're specifying an
intent, and all of these parts of an operator's definition help make that
intent happen.

The problem though is that if operators are namespaced the same as
functions, then we destroy one of the benefits of packages - a seperate
namespace for functions.

Can you think of a specific example where this namespacing causes
problems? The functions and aggregates are namespaced off of the
containing schema, but the types and operators aren't. Inside the package,
you have access to everything in the package. In the enclosing schema, you
have immediate access to the types and operators, and can get at the
functions and aggregates by "packname.".

> > Part of it is that I only expect a package to add operators for types it
> > introduced. So to be considering them, you had to have done something that
> > ties in the type in the package. Like you had to make a column in a table
> > using it.
>
> I'd expect schemas/packages to have operators and functions for existing
> types, not just new ones. That is certainly how our extensibility
> features are used; we are extensible in several dimensions (types,
> functions, operators) and they do not all travel together. We can't
> guess at the future intent of a package developer, and placing
> limitations or assumptions about what *must* be in a package just limits
> future (unexpected or suprising) uses.

Please play with the patch and try it.

There is no restriction in the patch that operators (and functions &
aggregates) can only be for types new to the package. You can add
operators for built-in types, and you can even add operators for other
user-specified types too. And pg_dump will make sure that a user-defiend
type used in a package will get dumped before the package.

> The "absolute path" scoping and lookup scheme is defined in SQL99. I'm
> not sure I understand the issue in the last paragraph: you seem to be
> making the point that absolute paths are Bad because package developers
> don't know what those paths might be. But otoh allowing absolute paths
> (and/or embedding them into a package) gives the developer *precise*
> control over what their package calls and what the behaviors are. istm
> that if a package developer needs to specify precisely the resources his
> package requires, then he can do that. And if he wants to leave it
> flexible and determined by scoping and pathing rules, then he can do
> that too.
>
> afaik relative pathing is not specified in the standard, but we might
> want to consider how we would implement that as an extension and whether
> that gives more power to the packager or developer.

I've found the spec, and am still studying it. Though what I've found so
far is a schema search path. My main interest is for the package itself to
be the first thing searched. After that, whatever search path is
appropriate for the schema seems like the right thing to do. So, besides
the fact I think we should do schemas as per the spec, I think using the
schema search path is the right thing to do.

> > > The built-in schemas is called DEFINITION_SCHEMA.
> > Why is it different from the "DEFAULT" you get when you log into a
> > database which doesn't have a schema whose name matches your username?
>
> It may not be. But SQL99 specifies the name.

Actually, the most interesting thing I saw was in the start of chapter 21
(the chapter on the DEFINITION_SCHEMA) at the bottom of section 21.1.

"The specification provides only a model of the base tables that are
required, and does not imply that an SQL-implimentation shall provide the
functionality in the manner described in this clause."

As I understand that, we are free to impliment things as we wish. We just
need to have the pieces/functionality described therein. We *don't* have
to use the names or exact formats used in the spec.

As a concrete example of what I mean, I believe that we are free to
pg_attribute as it is and still comply with section 21.7 ATTRIBUTES base
table. UDT_NAME we impliment with attrelid, ATTRIBUTE_NAME we do with
attname, ORDINAL_POSITION -> attnum, ATTRIBUTE_DEFAULT we do, IS_NULLABLE
-> attnotnull (we twist the sense, but the functionality is there),
ATTRIBUTES_PRIMARY_KEY we agree (to the extent we don't support schemas
or catalogs yet).

The main thing is that we eventually have an INFORMATION_SCHEMA full of
views which turn the system tables into what the standards want.

Take care,

Bill