Thread: PostgreSQL extensions packaging
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I promised to have an in-depth look at the archives before to spend time on my ideas for $subject, but failed to do so. Here are the ideas (user level design if you will) :) As a PostgreSQL extension developer (see http://pgfoundry.org/projects/prefix) I'd like to benefit from a dump and restorefacility. We don't offer any easy way for the DBA to restore a dump which happen to depends on external modules. This proposal tries to solve this by providing a notion of package. A module is currently, as far as I understand it, a .so file installed into some superuser (postgres) owned filesystem place. I'm unsure if the related .sql file (needed to expose the module functions) is a part of the module notion, but I don't think so. A package is a namespace much comparable to a schema, to be found at the same level (in a database can live many packages), and allowed to have any SQL object under it. A package can also host variables, which visibility are package global: any SQL into the package can refer directly to package variables. And a package can host modules dependancies (not the .so code itself). Let's try with an example of an imaginary package declaration: create or replace package prefix_range with (version = 0.3, parameter = value, ...) as $pkg$ declare prefix_range_global_vartext := 'init value'; prefix_range_syntax_error exception; module prefix.so; begin create schemaprefix; create or replace function prefix_range_in(cstring) ...; create or replace function prefix_range_out(prefix_range) ...; create type prefix_range; create function ... create operator ... create operator class ... -- private hidden things? create role prefix nologin; create schema prefix_private owner to prefix; create tableprefix_private.relname ...; revoke all privileges on prefix_private to public; -- private stuff ... -- create table, create index, etc ... -- -- need some though as to how to allow this fromSQL objects -- declared into the package *only* end; $pkg$; The parameters in the with clause are visible inside the package body and allow package installer to tune the installation: we could use this for tablespace creation needs, e.g., and version at least should be displayed from \dP associated command (is this one free?). This package creation SQL command would fail if any contained SQL is wrong, of course, but also if one of the declared modules were not registered/known/found by the server. We would certainly want to add a package scope construct to existing CREATE commands, in order to be able to add a function to a package without re-running the entire create package command, but this could come at a later date: CREATE FUNCTION ... PACKAGE prefix ... Given this infrastructure, pg_dump would (have to) be able to dump the SQL and pg_restore to complain when the module dependancies are not met, error'ing out a list of modules to install. Now, what would be really good to have would be this pg_pkg command I was dreaming about in another -hacker mail: pg_pkg add-mirror http://packages.postgresql.org/ pg-pkg list [remote | available]pg_pkg add plproxy prefix citext pg_pkg install plproxy mydatabase pg_pkg uninstall [--force] plproxy mydatabasepg_pkg remove <package> ... ... First, we could have a packages.postgresql.org infrastructure where to provide source code packages depending on PostgreSQL major version. Those packages would be known to have received code review and -core acceptance, so would be as trustworthy as PostgreSQL itself is. And ideally, any developer could prepare his own PostgreSQL packaging facility where to propose his own packages, this time out of -core acceptance, but still integrated into the extension system. pg_pkg add <package> ... would fetch a source code archive (last version available, or maybe given version with pg_pkg add prefix=0.3 if we really want this feature) and compile and install it with PGXS. So you would need to have installed server development support to benefit from package distribution... Then pg_pkg install would install given package into given database, running its CREATE OR REPLACE PACKAGE sql script, responsible of package object creation and variable, tables, etc initialisation. The uninstall command would get rid of the package, only to produce errors if some object existing in the target database had some dependancy to the package, the -f would force a DROP PACKAGE pkgname CASCADE; The remove would get rid of the installed files (modules and .sql), only when the package is no more in use in any database of the cluster. With this command set and pg_restore giving a list of missing modules for each package of a given dump file, it would become easy to restore a database containing extensions. $ pg_restore ... ERROR: failed to create package prefix, missing module prefix.so $pg_pkg add prefix $ pg_restore ... $ psql -U myuser mydb && enjoy :) Of course, in case of a remote pg_restore call, the pg_pkg command line has to be done locally on the target server. Maybe this is a problem for share hosting facilities, but I don't see pg_restore going to compile and install stuff on the filesystem by itself. Anyone willing to share some comments on this dream? - -- Dimitri Fontaine -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkiHncQACgkQlBXRlnbh1bnreQCcCYMfln8BqDAcGzs1pTBe9RVa aN8AnjZ3viA9xbVg4Ka2lS0eIrbOJFpV =FGYQ -----END PGP SIGNATURE-----
Hi! On Wed, Jul 23, 2008 at 5:08 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > I promised to have an in-depth look at the archives before to spend time on > my ideas for $subject, but failed to do so. I guess that means you missed both the original discussion at http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and my initial patch in that direction and subsequent discussion at http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then :(. There were two core components to my idea of modules/packages:- Separation of installation at an OS level (RPM/yum, deb/dpkg,MSI installer etc) and installation into a database. The intention was a) to standardize package installation generally so that users didn't have to read n different sets of installation instructions for n different packages, and b) so that a db owner could install into their own database any module that had been installed on the system, even if that might include e.g. C functions that they otherwise would not be able to install without being a superuser. - Have dependency tracking so that pg_dump could emit e.g. "LOAD MODULE foo;" rather than all the different instructions to recreate the module. So the proposed installation procedure would be more along the lines of: yum install postgresql-module-postgis echo "load module postgis" | psql mydb My intention was to use whatever native package manager was appropriate for your distro rather than trying to recreate CPAN, although some people in the original discussion wanted to go down that route. The patch that I provided didn't do any of the dependency stuff yet - I had been investigating various ways to do it automagically, although I haven't worked on it for a little while. It may be that the straight forward explicit declaration that you have here is a better way to do it. I didn't have versioning and interdependencies between modules yet, although it's an obvious extension to the idea. > A package can also host variables, which visibility are > package global: any SQL into the package can refer directly to package > variables. That was way out of scope for my more modest suggestion - I certainly wasn't going to change pl/pgsql semantics. For example, how do those variables behave upon a transaction rollback? > Now, what would be really good to have would be this pg_pkg command I was > dreaming about in another -hacker mail: This turns into recreating CPAN. I like the idea of a "blessed" set of packages, but would rather not require all postgresql users to have a full build environment (especially on windows) and have to replace their native packaging solution. It seems that you agree that fetching/installing should be separate from loading/installing into the database. Good. Some posters on the original thread were suggesting that the fetch/install step should somehow do the database installation as well, which sounded like a huge can of worms. I think that we can come up with a package/module format that allows installation at the OS level without demanding a whole set of download / build machinery. If someone then wants to build that and have it install packages, then fine, but we definitely should not require it to be able to install stuff. Look forward to your comments Cheers Tom
Oops, sent with wrong from header... ---------- Forwarded message ---------- From: "Tom Dunstan" <tomdcc@gmail.com> To: "Dimitri Fontaine" <dfontaine@hi-media.com> Date: Wed, 23 Jul 2008 19:40:30 -0400 Subject: Re: [HACKERS] PostgreSQL extensions packaging Hi! On Wed, Jul 23, 2008 at 5:08 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > I promised to have an in-depth look at the archives before to spend time on > my ideas for $subject, but failed to do so. I guess that means you missed both the original discussion at http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and my initial patch in that direction and subsequent discussion at http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then :(. There were two core components to my idea of modules/packages:- Separation of installation at an OS level (RPM/yum, deb/dpkg,MSI installer etc) and installation into a database. The intention was a) to standardize package installation generally so that users didn't have to read n different sets of installation instructions for n different packages, and b) so that a db owner could install into their own database any module that had been installed on the system, even if that might include e.g. C functions that they otherwise would not be able to install without being a superuser. - Have dependency tracking so that pg_dump could emit e.g. "LOAD MODULE foo;" rather than all the different instructions to recreate the module. So the proposed installation procedure would be more along the lines of: yum install postgresql-module-postgis echo "load module postgis" | psql mydb My intention was to use whatever native package manager was appropriate for your distro rather than trying to recreate CPAN, although some people in the original discussion wanted to go down that route. The patch that I provided didn't do any of the dependency stuff yet - I had been investigating various ways to do it automagically, although I haven't worked on it for a little while. It may be that the straight forward explicit declaration that you have here is a better way to do it. I didn't have versioning and interdependencies between modules yet, although it's an obvious extension to the idea. > A package can also host variables, which visibility are > package global: any SQL into the package can refer directly to package > variables. That was way out of scope for my more modest suggestion - I certainly wasn't going to change pl/pgsql semantics. For example, how do those variables behave upon a transaction rollback? > Now, what would be really good to have would be this pg_pkg command I was > dreaming about in another -hacker mail: This turns into recreating CPAN. I like the idea of a "blessed" set of packages, but would rather not require all postgresql users to have a full build environment (especially on windows) and have to replace their native packaging solution. It seems that you agree that fetching/installing should be separate from loading/installing into the database. Good. Some posters on the original thread were suggesting that the fetch/install step should somehow do the database installation as well, which sounded like a huge can of worms. I think that we can come up with a package/module format that allows installation at the OS level without demanding a whole set of download / build machinery. If someone then wants to build that and have it install packages, then fine, but we definitely should not require it to be able to install stuff. Look forward to your comments Cheers Tom
Hi, Le jeudi 24 juillet 2008, Tom Dunstan a écrit : > I guess that means you missed both the original discussion at > http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and > my initial patch in that direction and subsequent discussion at > http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then Thanks for the links, I've read a little down there now :) > There were two core components to my idea of modules/packages: First reading makes me think your proposal is all about having a user-visible management of modules, which in my proposal are a part of packages, and a much needed one. So it seems to me both proposals are complementary, in that I didn't go in any detail about how to manage this module part of a package declaration, and it looks like your work is all about this. Where we're trying to solve the same issue(s) is on the OS level packaging. > - Separation of installation at an OS level (RPM/yum, deb/dpkg, MSI > installer etc) and installation into a database. The intention was a) > to standardize package installation generally so that users didn't > have to read n different sets of installation instructions for n > different packages, and b) so that a db owner could install into their > own database any module that had been installed on the system, even if > that might include e.g. C functions that they otherwise would not be > able to install without being a superuser. I'm proposing that PostgreSQL includes a source level package management toolset, and the OS distributions take advantage of it to release binary packages easy to install, as it's done now with make && make install (using PGXS) at PG level. As you're saying, OS install means the same thing as PGXS make install, that is having the .so and .sql files at the right place and in the right format. So even if PostgreSQL was to propose a source level integration with pg_pkg add <package>, distributions would still be left with the binary packaging work. As for the database level installation, I think this is best done by PostgreSQL itself this time, I'd much prefer the distributions not to bother with pg_pkg install <package> <database>. Of course, debian wrapper scripts would certainly repackage this in order for the user to choose which cluster to target here. > - Have dependency tracking so that pg_dump could emit e.g. "LOAD > MODULE foo;" rather than all the different instructions to recreate > the module. That could be what the module section of create package means internally. I don't foresee a need for separating module only management stuff out of package, but I'm all ears :) > So the proposed installation procedure would be more along the lines of: > > yum install postgresql-module-postgis > echo "load module postgis" | psql mydb Agreed, but with those little differences:- PostgreSQL provides pg_pkg add to distribution to ease binary packaging- apt-getinstall postgresql-module-8.3-prefix- and either $ pg_pkg install prefix mydb or $ psql -c "INSTALL PACKAGEprefix" mydb > My intention was to use whatever native package manager was > appropriate for your distro rather than trying to recreate CPAN, > although some people in the original discussion wanted to go down that > route. I know nothing about CPAN, but I hope offering tools for packagers to ease their work is a good idea. Plus this allows for the PostgreSQL project approved extensions, -core level quality, reviewed code at an easy to grasp place. And it allows advanced user, who compile their PostgreSQL theirself, to still benefit from a level of OS integration for packages. > The patch that I provided didn't do any of the dependency stuff yet - > I had been investigating various ways to do it automagically, although > I haven't worked on it for a little while. It may be that the straight > forward explicit declaration that you have here is a better way to do > it. It seems to me that your patch would certainly be a step towards implementing my idea of a package. > I didn't have versioning and interdependencies between modules yet, > although it's an obvious extension to the idea. And a much necessary one. As soon as we have a SQL level object for modules, with oids in the catalog and all, we surely are able to add entries in pg_depend about this? > > A package can also host variables, which visibility are > > package global: any SQL into the package can refer directly to package > > variables. > > That was way out of scope for my more modest suggestion - I certainly > wasn't going to change pl/pgsql semantics. For example, how do those > variables behave upon a transaction rollback? No idea yet, I just saw that Oracle packages host package level global variables, and I guess it would work the same as a SET [LOCAL] GUC, except you could only see the variable from objects within the package. > This turns into recreating CPAN. I like the idea of a "blessed" set of > packages, but would rather not require all postgresql users to have a > full build environment (especially on windows) and have to replace > their native packaging solution. As said before, I'm thinking about providing this pg_pkg add as a packager facility, not replacing binary distributions, but still available for advanced user. > I think that we can come up with a package/module format that allows > installation at the OS level without demanding a whole set of download > / build machinery. I think this part is up to distributions. Oh, and... well, I don't think I'm in a position to write the code to make this packaging idea a reality. I'm willing to contribute as a non-hacker here by helping to define a user-level specification which would need to find a developer. Hope this helps, regards, -- dim