Modules - Mailing list pgsql-hackers

From Mattias Kregert
Subject Modules
Date
Msg-id 351CF586.2FA40383@algonet.se
Whole thread Raw
In response to Re: [HACKERS] Data type removal  (dg@illustra.com (David Gould))
Responses Re: [HACKERS] Modules
List pgsql-hackers
David Gould wrote:
>
> To load a module into a kernel all you need to do is read the code in,
> resolve the symbols, and maybe call an intialization routine. This is
> merely a variation on loading a shared object (.so) file into a program.
>
> To add a type and related stuff to a database is really a much harder problem.

I don't agree.

> You need to be able to
>   - add one or more type descriptions                   types table
>   - add input and output functions                      types, functions tables
>   - add cast functions                                  casts, functions tables
>   - add any datatype specific behavior functions        functions table
>   - add access method operators (maybe)                 amops, functions tables
>   - add aggregate operators                             aggregates, functions
>   - add operators                                       operators, functions
>   - provide statistics functions
>   - provide destroy operators
>   - provide .so files for C functions, SQL for sql functions
>     (note this is the part needed for a unix kernel module)
>   - do all the above within a particular schema
>
> You may also need to create and populate data tables, rules, defaults, etc
> required by the implementation of the new type.

All this would be done by the init function in the module you load.
What we need is a set of functions callable by modules, like
module_register_type(name, descr, func*, textin*, textout*, whatever
...)
module_register_smgr(name, descr, .....)
module_register_command(....
Casts would be done by converting to a common format (text) and then to
the desired type. Use textin/textout. No special cast functions would
have to exist. Why doesn't it work this way already??? Would not that
solve all casting problems?


> To unload a type requires undoing all the above. But there is a wrinkle: first
> you have to check if there are any dependancies. That is, if the user has
> created a table with one of the new types, you have to drop that table
> (including column defs, indexes, rules, triggers, defaults etc) before
> you can drop the type. Of course the user may not want to drop their tables
> which brings us to the the next problem.

Dependencies are checked by the OS kernel when you try to unload
modules.
You cannot unload slhc without first unloading ppp, for example. What's
the
difference?
If you have Mod4X running with /dev/dsp opened, then you can't unload
the sound driver, because it is in use, and you cannot unload a.out
module
if you have a non-ELF program running, and you can see the refcount on
all
modules and so on... This would not be different in a SQL server.
If you have a cursor open, accessing IP types, then you cannot unload
the IP-types module. Close the cursor, and you can unload the module if
you want to.
You don't have to drop tables containing new types just because you
unload
the module. If you want to SELECT from it, then that module would be
loaded
automagically when it is needed.


> When this gets really hard is when it is time to upgrade an existing database
> to a new version. Suppose you add a new column to a type in the new version.
> How does a user with lots of data in dozens of tables using the old type
> install the new module?
>
> What about restoring a dump from an old version into a system with the new
> version installed?

Suppose you change TIMESTAMP to 64 bits time and 16 bits userid... how
do you
solve that problem? You would probably have to make the textin/textout
functions
for the type recognize the old format and make the appropriate
conversions.
Perhaps add zero userid, or default to postmaster userid?
This would not be any different if TIMESTAMP was in a separate module.

For the internal storage format, every type could have it's own way
of recognizing different versions of the data. For example, say you have
an IPv4 module and inserts millions of IP-addresses, then you upgrade
to IPv6 module. It would then be able to look at the data and see if
it is a IPv4 or IPv6 address. Of course, you would have problems if you
tried to downgrade and had lots of IPv6 addresses inserted.
MyOwnType could use the first few bits of the data to decide which
version it is, and later releases of MyOwnType-module would be able
to recognize the older formats.
This way, types could be upgraded without dump-and-load procedure.


> Or how about migrating to a different platform? Can we move data from
> a little endian platform (x86) to a big endian platform (sparc)? Obviously
> the .so files will be different, but what about the copying the data out and
> reloading it?

Is this a problem right now? Dump and reload, how can it fail?


> Just to belabor this, it is perfectly reasonable to add a set of types and
> functions that have no 'C' implementation. The 'loadable module' analogy
> misses a lot of the real requirements.

Why would someone want a type without implementation?
Ok, let the module's init function register a type marked as
"non-existant"? Null
function?

/* m */

pgsql-hackers by date:

Previous
From: Michael Bussmann
Date:
Subject: Ways to crash the backend
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Ways to crash the backend