Thread: Support for CREATE MODULE?

Support for CREATE MODULE?

From
Jim Mlodgenski
Date:
Working with users over the years, some have large libraries of server
side code sometimes consisting of 100k+ lines of code over 1000+ functions
and procedures. This usually comes from a migration of a commercial
database like Oracle where it was best practice to put all of your
business logic into stored procedures. In these types of apps, just
managing the code is a challenge. To help classify objects, schemas
are used, but you are at the mercy of a naming convention to show
association. For example, a frequent naming convention would be having
related schemas with the names of foo_bar and foo_baz. For devs, that's
akin to keeping a file like xlog.c in a directory structure like
backend_access_transam instead of backend/access/transam. IMHO, having
a hierarchy makes it simpler to reason about related code bits.

The SQL spec does have a concept of modules that help address this. It's
defined as a persistent object within a schema that contains one or more
routines. It also defines other things like local temporary tables and
path specifications. There are other databases like DB2 that have
implemented module support each with their own way of defining the
routines within the module. The spec doesn't really give guidance on
how to manipulate the objects within the module.

Attached is a POC patch for modules. I modeled it as a sub-schema because
that is more what it seems like to me. It adds additional columns to
pg_namespace and allows for 3-part (or 4 with the database name) naming
of objects within the module. This simple example works with the patch.

CREATE SCHEMA foo;
CREATE MODULE foo.bar
  CREATE FUNCTION hello() RETURNS text
     LANGUAGE sql
     RETURN 'hello'
  CREATE FUNCTION world() RETURNS text
     LANGUAGE sql
     RETURN 'world';
SELECT foo.bar.hello();

Questions
- Do we want to add module support?

- If we do, should it be implemented as a type of namespace or should it
  be its own object type that lives in something like pg_module?

- How should users interact with objects within a module? They could be
  mostly independent like the current POC or we can introduce a path like
  ALTER MODULE foo ADD FUNCTION blah

--Jim

Attachment

Re: Support for CREATE MODULE?

From
Tom Lane
Date:
Jim Mlodgenski <jimmy76@gmail.com> writes:
> Questions
> - Do we want to add module support?

Certainly many people have asked for that, or things like that.

> - If we do, should it be implemented as a type of namespace or should it
>   be its own object type that lives in something like pg_module?

While I didn't read the actual patch, your sketch just above this makes
me want to run away screaming.  In the first place, what do you think
the primary key of pg_namespace is now?  But the bigger problem is that
sub-namespaces just do not work in SQL syntax.  Back when we first added
schema support, I had some ambitions towards allowing nested schemas,
which is a big part of the reason why pg_namespace is named that and not
pg_schema.  But the idea fell apart after I understood the syntactic
ambiguities it'd introduce.  It's already quite hard to tell which part
of a multiply.qualified.name is which, given that SQL says that you can
optionally put a "catalog" (database) name in front of the others.
I really doubt there is a way to shoehorn sub-schemas in there without
creating terrible ambiguities.  Is "a.b.c" a reference to object c in
schema b in database a, or is it a reference to object c in sub-schema b
in schema a?  This is why we've ended up with bastard syntax like
(table.column).subcolumn.

> - How should users interact with objects within a module? They could be
>   mostly independent like the current POC or we can introduce a path like
>   ALTER MODULE foo ADD FUNCTION blah

I wonder whether it'd be better to consider modules as a kind of
extension, or at least things with the same sort of ownership relations
as extensions have.

            regards, tom lane



Re: Support for CREATE MODULE?

From
Jim Mlodgenski
Date:
On Wed, Jun 2, 2021 at 9:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> In the first place, what do you think the primary key of pg_namespace is now?

In the patch the unique constraint is (nspname, nspnamespace) which is
certainly awkward. I initially went down the pg_module route to avoid
adding another catalog, but in retrospect, that may be a cleaner way.


> It's already quite hard to tell which part
> of a multiply.qualified.name is which, given that SQL says that you can
> optionally put a "catalog" (database) name in front of the others.
> I really doubt there is a way to shoehorn sub-schemas in there without
> creating terrible ambiguities.  Is "a.b.c" a reference to object c in
> schema b in database a, or is it a reference to object c in sub-schema b
> in schema a?

That was the area I had the most difficult part to reason about. I tried to make
some simplifying assumptions by checking if "a" was the current database.
Since we don't support cross database access, if it was not, I assumed "a"
was a schema. I not sure if that would be valid, but it did scope things
to a more manageable problem.

> I wonder whether it'd be better to consider modules as a kind of
> extension, or at least things with the same sort of ownership relations
> as extensions have.

That would solve the problem of associating objects which is the larger
problem for users today. The objects can all live in their respective
schemas with the module tying them all together.



Re: Support for CREATE MODULE?

From
Bruce Momjian
Date:
On Wed, Jun  2, 2021 at 10:43:10AM -0400, Jim Mlodgenski wrote:
> On Wed, Jun 2, 2021 at 9:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > In the first place, what do you think the primary key of pg_namespace is now?
> 
> In the patch the unique constraint is (nspname, nspnamespace) which is
> certainly awkward. I initially went down the pg_module route to avoid
> adding another catalog, but in retrospect, that may be a cleaner way.
> 
> 
> > It's already quite hard to tell which part
> > of a multiply.qualified.name is which, given that SQL says that you can
> > optionally put a "catalog" (database) name in front of the others.
> > I really doubt there is a way to shoehorn sub-schemas in there without
> > creating terrible ambiguities.  Is "a.b.c" a reference to object c in
> > schema b in database a, or is it a reference to object c in sub-schema b
> > in schema a?
> 
> That was the area I had the most difficult part to reason about. I tried to make
> some simplifying assumptions by checking if "a" was the current database.
> Since we don't support cross database access, if it was not, I assumed "a"
> was a schema. I not sure if that would be valid, but it did scope things
> to a more manageable problem.

If we go in this direction, I assume we would just disallow a schema
name matching the database name.  CREATE DATABASE with TEMPLATE would
have to check that.  Also the common case where you create a database
name to match the user name, and also a schema inside to match the
username, would have to be disallowed, e.g. creating a 'postgres' schema
to match the 'postgres' user in the 'postgres' database.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Support for CREATE MODULE?

From
Joe Conway
Date:
On 6/2/21 10:43 AM, Jim Mlodgenski wrote:
> On Wed, Jun 2, 2021 at 9:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wonder whether it'd be better to consider modules as a kind of
>> extension, or at least things with the same sort of ownership relations
>> as extensions have.
> 
> That would solve the problem of associating objects which is the larger
> problem for users today. The objects can all live in their respective
> schemas with the module tying them all together.


Maybe something similar to "CREATE EXTENSION ... FROM unpackaged"?

Something like:
CREATE EXTENSION myfoo; /* shell extension */
ALTER EXTENSION myfoo ADD type ...;
ALTER EXTENSION myfoo ADD function ...;
...

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: Support for CREATE MODULE?

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> If we go in this direction, I assume we would just disallow a schema
> name matching the database name.

That seems quite impossible to enforce.

regression=# create database d1;
CREATE DATABASE
regression=# alter database d1 rename to d2;
ALTER DATABASE

The system had no way to know that d1 doesn't contain a schema named d2.
And you can't fix that by restricting the ALTER to be done on the
current database:

regression=# \c d2
You are now connected to database "d2" as user "postgres".
d2=# alter database d2 rename to d3;
ERROR:  current database cannot be renamed

Between that and the point that this restriction would certainly break
existing installations, this is a non-starter.

            regards, tom lane



Re: Support for CREATE MODULE?

From
"Joel Jacobson"
Date:
On Wed, Jun 2, 2021, at 16:43, Jim Mlodgenski wrote:
On Wed, Jun 2, 2021 at 9:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wonder whether it'd be better to consider modules as a kind of
> extension, or at least things with the same sort of ownership relations
> as extensions have.

That would solve the problem of associating objects which is the larger
problem for users today. The objects can all live in their respective
schemas with the module tying them all together.

I like the idea of somehow using extensions.

Right now, extensions can only be added from the command-line, via `make install`.

But maybe a new extension could be packaged from the SQL prompt, out of existing database objects that are not already part of an extension?

Maybe the interface could be:

init_new_extension(extension_name text) function, to register a new empty extension.
add_object_to_extension(extension_name text, type text, object_names text[], object_args text[])

Then, if dropping the extension, all objects would be dropped, and if creating the extension, all objects would be restored.

I don't have an idea on how to handle update scripts, but since it's not mandatory to provide extension update scripts, maybe that's not a problem.

/Joel

Re: Support for CREATE MODULE?

From
Alvaro Herrera
Date:
On 2021-Jun-02, Jim Mlodgenski wrote:

> Attached is a POC patch for modules. I modeled it as a sub-schema because
> that is more what it seems like to me. It adds additional columns to
> pg_namespace and allows for 3-part (or 4 with the database name) naming
> of objects within the module. This simple example works with the patch.

Given the downthread discussion, this idea doesn't seem workable.
People are now discussing "what if the module is some kind of
extension".  But to me that seems to go against the grain; you'd have to
implement a ton of stuff in order to let "extension-modules" be
installed without on-disk foo.control files.

But what if the module is just a particular kind of *namespace*?  I
mean, what if CREATE MODULE is implemented by creating a row in
pg_namespace with nspkind='m'?  So a pg_namespace row can refer to
either a regular schema (nspkind='s') or a module.  In a schema you can
create objects of any kind just like today, but in a module you're
restricted to having only functions (and maybe also operators? other
types of objects?).

Then, a qualified object name foo.bar() can refer to either the routine
bar() in schema foo, or routine bar in module foo.  To the low-level
code it's pretty much the same thing (look the namespace in pg_namespace
just as today).

What other properties do you want modules to have?  Are there "private"
functions?  (What *is* a private function in this context? I mean, how
does "being in a module" interact with object lookup rules?  Does
plpgsql have to be aware that a routine is in a module?)
Are there module-scoped variables?  (If so, you probably want Pavel
Stehule's variable patch pushed ahead of time).

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: Support for CREATE MODULE?

From
Peter Eisentraut
Date:
On 02.06.21 16:43, Jim Mlodgenski wrote:
>> It's already quite hard to tell which part
>> of a multiply.qualified.name is which, given that SQL says that you can
>> optionally put a "catalog" (database) name in front of the others.
>> I really doubt there is a way to shoehorn sub-schemas in there without
>> creating terrible ambiguities.  Is "a.b.c" a reference to object c in
>> schema b in database a, or is it a reference to object c in sub-schema b
>> in schema a?
> That was the area I had the most difficult part to reason about. I tried to make
> some simplifying assumptions by checking if "a" was the current database.
> Since we don't support cross database access, if it was not, I assumed "a"
> was a schema. I not sure if that would be valid, but it did scope things
> to a more manageable problem.

Given that, as you said, the concept of modules is in the SQL standard, 
there is surely some guidance in there about how this is supposed to 
affect name resolution.  So let's start with that.  Maybe we won't like 
it in the end or whatever, but we should surely look there first.



Re: Support for CREATE MODULE?

From
Jim Mlodgenski
Date:
On Thu, Jun 3, 2021 at 8:49 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> Given that, as you said, the concept of modules is in the SQL standard,
> there is surely some guidance in there about how this is supposed to
> affect name resolution.  So let's start with that.  Maybe we won't like
> it in the end or whatever, but we should surely look there first.

Studying the spec further, catalog/schema/module are all used to
identify a module-level routine. I don't see it spelled out that
is needs to be in that format of catalog.schema.module.routine to
fully qualify the routine, but it would likely be awkward for users
to come up with an alternative syntax like
(catalog.schema.module).routine or catalog.scheme.module->routine

The way the spec is worded, I read it as that schemas take precedence
over modules regarding path resolution. So for example with 2-level
naming if there is a schema 'foo' and a module 'public.foo' both with
functions 'bar' 'foo.bar' would refer to the schema-level function not
the module-level function. I've not found guidance on throwing catalog
into the mix and 3-level naming. Say we had a catalog 'postgres' with a
schema 'foo' with a function 'bar' and a schema 'postgres' with a module
'foo' with a function 'bar'. What would 'postgres.foo.bar' refer to? If
the SQL was executed from a catalog other than 'postgres', we'd have no
way of knowing if 'foo.bar' existed there. So if it's implementation
dependent, saying schemas take precedence over catalogs may make sense
and 'postgres.foo.bar' refers to the module-level function in the
'postgres' schema.



Re: Support for CREATE MODULE?

From
Pavel Stehule
Date:
Hi

st 2. 6. 2021 v 15:39 odesílatel Jim Mlodgenski <jimmy76@gmail.com> napsal:
Working with users over the years, some have large libraries of server
side code sometimes consisting of 100k+ lines of code over 1000+ functions
and procedures. This usually comes from a migration of a commercial
database like Oracle where it was best practice to put all of your
business logic into stored procedures. In these types of apps, just
managing the code is a challenge. To help classify objects, schemas
are used, but you are at the mercy of a naming convention to show
association. For example, a frequent naming convention would be having
related schemas with the names of foo_bar and foo_baz. For devs, that's
akin to keeping a file like xlog.c in a directory structure like
backend_access_transam instead of backend/access/transam. IMHO, having
a hierarchy makes it simpler to reason about related code bits.

The SQL spec does have a concept of modules that help address this. It's
defined as a persistent object within a schema that contains one or more
routines. It also defines other things like local temporary tables and
path specifications. There are other databases like DB2 that have
implemented module support each with their own way of defining the
routines within the module. The spec doesn't really give guidance on
how to manipulate the objects within the module.

Attached is a POC patch for modules. I modeled it as a sub-schema because
that is more what it seems like to me. It adds additional columns to
pg_namespace and allows for 3-part (or 4 with the database name) naming
of objects within the module. This simple example works with the patch.

CREATE SCHEMA foo;
CREATE MODULE foo.bar
  CREATE FUNCTION hello() RETURNS text
     LANGUAGE sql
     RETURN 'hello'
  CREATE FUNCTION world() RETURNS text
     LANGUAGE sql
     RETURN 'world';
SELECT foo.bar.hello();

Questions
- Do we want to add module support?

- If we do, should it be implemented as a type of namespace or should it
  be its own object type that lives in something like pg_module?

- How should users interact with objects within a module? They could be
  mostly independent like the current POC or we can introduce a path like
  ALTER MODULE foo ADD FUNCTION blah

I never liked the SQL/PSM concept of modules. The possibility to assign database objects to schema or to modules looks like schizophrenia.

There are only two advantages of modules - a) possibility to define private objects, b) local scope - the objects from modules shadows external objects without dependency of search_path.

But both these features are pretty hard to implement in PL/pgSQL - where expression executor is SQL executor.

Without these features I don't see strong benefits for modules.

Regards

Pavel

 

--Jim