Thread: Generalized concept of modules

Generalized concept of modules

From
Tom Lane
Date:
[ moving this thread to -hackers ]

Martijn van Oosterhout <kleptog@svana.org> writes:
> While you do have a good point about non-binary modules, our module
> handling need some help IMHO. For example, the current hack for CREATE
> LANGUAGE to fix things caused by old pg_dumps. I think that's the
> totally wrong approach long term, I think the pg_dump shouldn't be
> including the CREATE LANGUAGE statement at all, but should be saying
> something like "INSTALL plpgsql" and pg_restore works out what is
> needed for that module.

There's a lot to be said for this, but I keep having the nagging
feeling that people are equating "module" with "shared library", which
seems far from sufficiently general.  I'd like to see "module" mean
"an arbitrary collection of SQL objects".  So I think the raw definition
sought by your "INSTALL" would always be a SQL script, and any shared
libs that might come along with that are secondary.  The idea of using
pg_depend to manage UNINSTALL is an excellent one.

> 1. When given the name of an external module, you need to be able to
> find the SQL commands needed to make it work.

No problem, the name is the name of a SQL script file stored in a specific
installation directory.

> 2. You need to be able to tell if something is installed already or
> not.

pg_module system catalog.  You'd need this anyway since there has to be
some representation of the "module object" in the catalogs for its
component objects to have pg_depend dependencies on.

> With these in place, upgrades across versions of postgres could become
> a lot easier. People using tsearch2 now would get only "INSTALL
> tsearch2" in their dumps and when they upgrade to 8.2 they get the new
> definitions for tsearch using GIN. No old definitions to confuse people
> or the database. (Note: I'm not sure if tsearch would be compatable at
> the query level, but that's not relevent to the point I'm making).

Let's see, I guess pg_dump would have to be taught to ignore any objects
that it can see are directly dependent on a module object.  What about
indirect dependencies though?  The exact semantics don't seem clear to me.

Also, this seems to be getting into territory that Oracle has already
trod --- someone should study exactly what they do for PL/SQL modules
and whether we want to be compatible or not.  Perhaps there's even
something in SQL2003 about it?
        regards, tom lane


Re: Generalized concept of modules

From
Martijn van Oosterhout
Date:
On Wed, May 31, 2006 at 05:33:44PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > While you do have a good point about non-binary modules, our module
> > handling need some help IMHO. For example, the current hack for CREATE
> > LANGUAGE to fix things caused by old pg_dumps. I think that's the
> > totally wrong approach long term, I think the pg_dump shouldn't be
> > including the CREATE LANGUAGE statement at all, but should be saying
> > something like "INSTALL plpgsql" and pg_restore works out what is
> > needed for that module.
>
> There's a lot to be said for this, but I keep having the nagging
> feeling that people are equating "module" with "shared library", which
> seems far from sufficiently general.  I'd like to see "module" mean
> "an arbitrary collection of SQL objects".

I agree that module is often used interchangably with shared library.
We need to handle the other case too. It would be a lot easier if we
had an example of an SQL only module, since contrib doesn't appear to
have one (at first glance anyway).

> So I think the raw definition
> sought by your "INSTALL" would always be a SQL script, and any shared
> libs that might come along with that are secondary.  The idea of using
> pg_depend to manage UNINSTALL is an excellent one.

Well, in that case I'd like to give some concrete suggestions:

1. The $libdir in future may be used to find SQL scripts as well as
shared libraries. They'll have different extensions so no possibility
of conflict.

2. Create something like "BEGIN MODULE xxx" which starts a transaction
and marks any objects created within it as owned by module "xxx". I
think it should be tied to a transaction level to avoid half installed
things, but maybe people would prefer it to work more like schemas.

> pg_module system catalog.  You'd need this anyway since there has to be
> some representation of the "module object" in the catalogs for its
> component objects to have pg_depend dependencies on.

Ack. "Owned by" in the above sense means that the object depends on the
module. You could do it the other way round (module depends on object)
but that makes it harder to change things manually. DROP MODULE would
work easier too.

> Let's see, I guess pg_dump would have to be taught to ignore any objects
> that it can see are directly dependent on a module object.  What about
> indirect dependencies though?  The exact semantics don't seem clear to me.

At a base level, you could definitly drop the functions. Dropping types
is harder because columns might be using them. Normally we use CASCADE
to specify that.

> Also, this seems to be getting into territory that Oracle has already
> trod --- someone should study exactly what they do for PL/SQL modules
> and whether we want to be compatible or not.  Perhaps there's even
> something in SQL2003 about it?

Probably a good idea...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Generalized concept of modules

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Well, in that case I'd like to give some concrete suggestions:

> 1. The $libdir in future may be used to find SQL scripts as well as
> shared libraries. They'll have different extensions so no possibility
> of conflict.

No, it needs to be a separate directory, and the reason is that SQL
scripts are architecture-independent and belong under the share/
directory not the lib/ directory.  This is a minor point, but the
packagers will scream at us if we get it wrong.

> 2. Create something like "BEGIN MODULE xxx" which starts a transaction
> and marks any objects created within it as owned by module "xxx". I
> think it should be tied to a transaction level to avoid half installed
> things, but maybe people would prefer it to work more like schemas.

I think I'd sooner keep it decoupled from transactions, which means you
need both a BEGIN MODULE xxx and an END MODULE xxx.  Also, a module
developer might want to go back and add more stuff to an existing
module.  So there should be separate commands:CREATE MODULE xxx;BEGIN MODULE xxx;... anything created here belongs to
themoduleEND MODULE xxx;
 
An alternative possibility is to make it work like schemas: you set
a GUC variable to indicate that created objects should be associated
with that module.  This might be the best choice since it avoids having
two very different ways of doing very similar things.

(Come to think of it, what is the relation between modules and schemas
anyway?  Should a module's objects be required to be in schemas also
owned by the module?  If not, what happens when you try to INSTALL a
module whose objects lived in a schema you don't have?  This gets back
to the fact that we don't have a very nice answer for installing
existing contrib modules into user-chosen schemas.)

>> Let's see, I guess pg_dump would have to be taught to ignore any objects
>> that it can see are directly dependent on a module object.  What about
>> indirect dependencies though?  The exact semantics don't seem clear to me.

> At a base level, you could definitly drop the functions. Dropping types
> is harder because columns might be using them. Normally we use CASCADE
> to specify that.

I think we're talking at cross purposes.  It sounds like you're thinking
about "how do I remove one object in a module?"  AFAICS you just drop it.
What I was wondering was what is pg_dump's behavior.  ISTM we want two
modes:

* normal behavior: dump module objects as "INSTALL foo" commands.  Do
not dump any objects that are owned by a module; assume they will be
created by the INSTALL.  Use the dependency chains to make sure that
INSTALL commands are ordered properly relative to everything else.

* "dump module foo": dump the module object as a CREATE MODULE command,
and then dump creation commands for all the objects that are owned by
it.  Ignore all else.  This is an easy way to generate an updated module
definition script.

Something that's not at all clear to me is object ownership.  Should
objects belonging to a module be marked as being owned by the person
executing INSTALL, or should module dump/install try to preserve
original ownership?  I think in most scenarios trying to preserve the
original ownership is wrong, since commonly INSTALL will be used to
transfer objects to new databases where the original owner might not
exist at all.  But it's a bit non-orthogonal.

Also, what privileges are needed to execute either CREATE MODULE or
INSTALL?  Conservative design would say superuser-only, but I can't put
my finger on any specific reason for that, at least if none of the
contained objects need superuser privs to create.
        regards, tom lane


Re: Generalized concept of modules

From
PFC
Date:
Think about version API compatibility.
Suppose you have a working database on server A which uses module foo  
version 1.Some time passes, you buy another server B and install postgres on it.  
Meanwhile the module foo has evolved into version 2 which is cooler, but  
has some minor API incompatibilities.You dump the database on server A and reload it on server B. pg_dump  
issues an INSTALL MODULE which installs foo version 2 on the new server.Due to the "minor API incompatibilities", your
databasebreaks.
 
It's really cool not to pollute the dumps (and the global namespace...)  
with all the module functions, however implementing module functionality  
can be tricky.
So don't forget about versions and possible incompatibilities ; also  
versions means you might need an UPGRADE MODULE which does more than  
uninstall + reinstall. Suppose a module has created some tables for its  
use, these shouldn't be dumped when upgrading to a new version ; however  
maybe the new version will want to add a column...
Think gentoo portage, for instance.This excellent package system is a lot more evolved than the module  
system needs to be, but having a look at the feature list would be a good  
inspiration maybe.