Thread: Generalized concept of modules
[ moving this thread to -hackers ] Martijn van Oosterhout <kleptog@svana.org> writes: > While you do have a good point about non-binary modules, our module > handling need some help IMHO. For example, the current hack for CREATE > LANGUAGE to fix things caused by old pg_dumps. I think that's the > totally wrong approach long term, I think the pg_dump shouldn't be > including the CREATE LANGUAGE statement at all, but should be saying > something like "INSTALL plpgsql" and pg_restore works out what is > needed for that module. There's a lot to be said for this, but I keep having the nagging feeling that people are equating "module" with "shared library", which seems far from sufficiently general. I'd like to see "module" mean "an arbitrary collection of SQL objects". So I think the raw definition sought by your "INSTALL" would always be a SQL script, and any shared libs that might come along with that are secondary. The idea of using pg_depend to manage UNINSTALL is an excellent one. > 1. When given the name of an external module, you need to be able to > find the SQL commands needed to make it work. No problem, the name is the name of a SQL script file stored in a specific installation directory. > 2. You need to be able to tell if something is installed already or > not. pg_module system catalog. You'd need this anyway since there has to be some representation of the "module object" in the catalogs for its component objects to have pg_depend dependencies on. > With these in place, upgrades across versions of postgres could become > a lot easier. People using tsearch2 now would get only "INSTALL > tsearch2" in their dumps and when they upgrade to 8.2 they get the new > definitions for tsearch using GIN. No old definitions to confuse people > or the database. (Note: I'm not sure if tsearch would be compatable at > the query level, but that's not relevent to the point I'm making). Let's see, I guess pg_dump would have to be taught to ignore any objects that it can see are directly dependent on a module object. What about indirect dependencies though? The exact semantics don't seem clear to me. Also, this seems to be getting into territory that Oracle has already trod --- someone should study exactly what they do for PL/SQL modules and whether we want to be compatible or not. Perhaps there's even something in SQL2003 about it? regards, tom lane
On Wed, May 31, 2006 at 05:33:44PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > While you do have a good point about non-binary modules, our module > > handling need some help IMHO. For example, the current hack for CREATE > > LANGUAGE to fix things caused by old pg_dumps. I think that's the > > totally wrong approach long term, I think the pg_dump shouldn't be > > including the CREATE LANGUAGE statement at all, but should be saying > > something like "INSTALL plpgsql" and pg_restore works out what is > > needed for that module. > > There's a lot to be said for this, but I keep having the nagging > feeling that people are equating "module" with "shared library", which > seems far from sufficiently general. I'd like to see "module" mean > "an arbitrary collection of SQL objects". I agree that module is often used interchangably with shared library. We need to handle the other case too. It would be a lot easier if we had an example of an SQL only module, since contrib doesn't appear to have one (at first glance anyway). > So I think the raw definition > sought by your "INSTALL" would always be a SQL script, and any shared > libs that might come along with that are secondary. The idea of using > pg_depend to manage UNINSTALL is an excellent one. Well, in that case I'd like to give some concrete suggestions: 1. The $libdir in future may be used to find SQL scripts as well as shared libraries. They'll have different extensions so no possibility of conflict. 2. Create something like "BEGIN MODULE xxx" which starts a transaction and marks any objects created within it as owned by module "xxx". I think it should be tied to a transaction level to avoid half installed things, but maybe people would prefer it to work more like schemas. > pg_module system catalog. You'd need this anyway since there has to be > some representation of the "module object" in the catalogs for its > component objects to have pg_depend dependencies on. Ack. "Owned by" in the above sense means that the object depends on the module. You could do it the other way round (module depends on object) but that makes it harder to change things manually. DROP MODULE would work easier too. > Let's see, I guess pg_dump would have to be taught to ignore any objects > that it can see are directly dependent on a module object. What about > indirect dependencies though? The exact semantics don't seem clear to me. At a base level, you could definitly drop the functions. Dropping types is harder because columns might be using them. Normally we use CASCADE to specify that. > Also, this seems to be getting into territory that Oracle has already > trod --- someone should study exactly what they do for PL/SQL modules > and whether we want to be compatible or not. Perhaps there's even > something in SQL2003 about it? Probably a good idea... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > Well, in that case I'd like to give some concrete suggestions: > 1. The $libdir in future may be used to find SQL scripts as well as > shared libraries. They'll have different extensions so no possibility > of conflict. No, it needs to be a separate directory, and the reason is that SQL scripts are architecture-independent and belong under the share/ directory not the lib/ directory. This is a minor point, but the packagers will scream at us if we get it wrong. > 2. Create something like "BEGIN MODULE xxx" which starts a transaction > and marks any objects created within it as owned by module "xxx". I > think it should be tied to a transaction level to avoid half installed > things, but maybe people would prefer it to work more like schemas. I think I'd sooner keep it decoupled from transactions, which means you need both a BEGIN MODULE xxx and an END MODULE xxx. Also, a module developer might want to go back and add more stuff to an existing module. So there should be separate commands:CREATE MODULE xxx;BEGIN MODULE xxx;... anything created here belongs to themoduleEND MODULE xxx; An alternative possibility is to make it work like schemas: you set a GUC variable to indicate that created objects should be associated with that module. This might be the best choice since it avoids having two very different ways of doing very similar things. (Come to think of it, what is the relation between modules and schemas anyway? Should a module's objects be required to be in schemas also owned by the module? If not, what happens when you try to INSTALL a module whose objects lived in a schema you don't have? This gets back to the fact that we don't have a very nice answer for installing existing contrib modules into user-chosen schemas.) >> Let's see, I guess pg_dump would have to be taught to ignore any objects >> that it can see are directly dependent on a module object. What about >> indirect dependencies though? The exact semantics don't seem clear to me. > At a base level, you could definitly drop the functions. Dropping types > is harder because columns might be using them. Normally we use CASCADE > to specify that. I think we're talking at cross purposes. It sounds like you're thinking about "how do I remove one object in a module?" AFAICS you just drop it. What I was wondering was what is pg_dump's behavior. ISTM we want two modes: * normal behavior: dump module objects as "INSTALL foo" commands. Do not dump any objects that are owned by a module; assume they will be created by the INSTALL. Use the dependency chains to make sure that INSTALL commands are ordered properly relative to everything else. * "dump module foo": dump the module object as a CREATE MODULE command, and then dump creation commands for all the objects that are owned by it. Ignore all else. This is an easy way to generate an updated module definition script. Something that's not at all clear to me is object ownership. Should objects belonging to a module be marked as being owned by the person executing INSTALL, or should module dump/install try to preserve original ownership? I think in most scenarios trying to preserve the original ownership is wrong, since commonly INSTALL will be used to transfer objects to new databases where the original owner might not exist at all. But it's a bit non-orthogonal. Also, what privileges are needed to execute either CREATE MODULE or INSTALL? Conservative design would say superuser-only, but I can't put my finger on any specific reason for that, at least if none of the contained objects need superuser privs to create. regards, tom lane
Think about version API compatibility. Suppose you have a working database on server A which uses module foo version 1.Some time passes, you buy another server B and install postgres on it. Meanwhile the module foo has evolved into version 2 which is cooler, but has some minor API incompatibilities.You dump the database on server A and reload it on server B. pg_dump issues an INSTALL MODULE which installs foo version 2 on the new server.Due to the "minor API incompatibilities", your databasebreaks. It's really cool not to pollute the dumps (and the global namespace...) with all the module functions, however implementing module functionality can be tricky. So don't forget about versions and possible incompatibilities ; also versions means you might need an UPGRADE MODULE which does more than uninstall + reinstall. Suppose a module has created some tables for its use, these shouldn't be dumped when upgrading to a new version ; however maybe the new version will want to add a column... Think gentoo portage, for instance.This excellent package system is a lot more evolved than the module system needs to be, but having a look at the feature list would be a good inspiration maybe.