Thread: Re: [HACKERS] Data type removal

Re: [HACKERS] Data type removal

From
darrenk@insightdist.com (Darren King)
Date:
> > If any restructuring happens which removes, or makes optional, some of
> > the fundamental types, it should be accomplished so that the types can
> > be added in transparently, from a single set of source code, during
> > build time or after. OIDs would have to be assigned, presumably, and the
> > hardcoding of the function lookups for builtin types must somehow be
> > done incrementally. Probably needs more than this to be done right, and
> > without careful planning and implementation we will be taking a big step
> > backwards.
>
> Exactly. Right now modules get installed by building the .so files and
> then creating all the types, functions, rules, tables, indexes etc. This
> is a bit more complicated than the Linux kernal 'insmod' operation. We could
> easily make the situation worse through careless "whacking".

Geez, Louise.  What I'm proposing will _SHOWCASE_ the extensibility. I'm not
looking to remove it and hardcode everything.

> > Seems to me that Postgres' niche is at the high end of size and
> > capability, not at the lightweight end competing for design wins against
> > systems which don't even have transactions.
>
> And, there are already a couple of perfectly good 'toy' database systems.
> What is the point of having another one? Postgres should move toward
> becoming an "industrial strength" solution.

Making some of the _mostly_unused_ data types loadable instead of always
compiled in will NOT make postgres into a 'toy'.  Does "industrial strength"
imply having every possible data type compiled in?  Regardless of use?

I think the opposite is true. Puttining some of these extra types into modules
will show people the greatest feature that separates us from the 'toy's.

I realize there might not be a performance hit _now_, but if someone doesn't
start this "loadable module" initiative, every Tom, Dick and Harry will want
their types in the backend and eventually there _will_ be a performance hit.
Then the problem would be big enough to be a major chore to convert the many,
many types to loadable instead of only doing a couple now.

I'm not trying to cry "Wolf" or proposing to do this to just push around some
code.  I really think there are benefits to it, if not now, in the future.

And I know there are other areas that are broken or could be written better.
We all do what we can...I'm not real familiar with the workings of the cache,
indices, etc., but working on AIX has given me a great understanding of how
to make/load modules.

There, my spleen feels _much_ better now. :)

darrenk



Re: [HACKERS] Data type removal

From
The Hermit Hacker
Date:
On Fri, 27 Mar 1998, Darren King wrote:

> And I know there are other areas that are broken or could be written better.
> We all do what we can...I'm not real familiar with the workings of the cache,
> indices, etc., but working on AIX has given me a great understanding of how
> to make/load modules.

This whole discussion has, IMHO, gone dry...Darren, if you can cleanly and
easily build a module for the ip_and_mac contrib types to use as a model
to work from, please do so...

I think the concept of modularization for types is a good idea, and agree
with your perspective that it *proves* our extensibility...

But, this has to be added in *perfectly* cleanly, such that there is no
extra work on anyone's part in order to make use of those types we already
have existing.

FreeBSD uses something called 'LKM's (loadable kernel modules) for doing
this in the kernel, and Linux does something similar, with the benefit
being, in most cases, that you can unload an older version and load in a
newer one relatively seamlessly...

Until a demo is produced as to how this can work, *please* kill this
thread...its gotten into a circular loop, and, quite frankly, isn't moving
anywhere...

if this is to be workable, the module has to be build when the system is
compiled, initially, for the base types, and installed into a directory
that the server can see and load from...the *base* modules have to be
transparent to the end user/adminstrator...




Re: [HACKERS] Data type removal

From
dg@illustra.com (David Gould)
Date:
Darren writes:
> > > If any restructuring happens which removes, or makes optional, some of
> > > the fundamental types, it should be accomplished so that the types can
> > > be added in transparently, from a single set of source code, during
> > > build time or after. OIDs would have to be assigned, presumably, and the
> > > hardcoding of the function lookups for builtin types must somehow be
> > > done incrementally. Probably needs more than this to be done right, and
> > > without careful planning and implementation we will be taking a big step
> > > backwards.
> >
> > Exactly. Right now modules get installed by building the .so files and
> > then creating all the types, functions, rules, tables, indexes etc. This
> > is a bit more complicated than the Linux kernal 'insmod' operation. We could
> > easily make the situation worse through careless "whacking".
>
> Geez, Louise.  What I'm proposing will _SHOWCASE_ the extensibility. I'm not
> looking to remove it and hardcode everything.

Apparently I was not clear. I am sorry that you find my comment upsetting, it
was not meant to be.

What I meant is that adding a type and related functions to a database is a
much more complicated job than loading a module into a Unix kernel.

To load a module into a kernel all you need to do is read the code in,
resolve the symbols, and maybe call an intialization routine. This is
merely a variation on loading a shared object (.so) file into a program.

To add a type and related stuff to a database is really a much harder problem.

You need to be able to
  - add one or more type descriptions            types table
  - add input and output functions            types, functions tables
  - add cast functions                    casts, functions tables
  - add any datatype specific behavior functions    functions table
  - add access method operators (maybe)            amops, functions tables
  - add aggregate operators                aggregates, functions
  - add operators                    operators, functions
  - provide statistics functions
  - provide destroy operators
  - provide .so files for C functions, SQL for sql functions
    (note this is the part needed for a unix kernel module)
  - do all the above within a particular schema

You may also need to create and populate data tables, rules, defaults, etc
required by the implementation of the new type.

And of course, a "module" may really implement dozens of types and hundreds
of functions.

To unload a type requires undoing all the above. But there is a wrinkle: first
you have to check if there are any dependancies. That is, if the user has
created a table with one of the new types, you have to drop that table
(including column defs, indexes, rules, triggers, defaults etc) before
you can drop the type. Of course the user may not want to drop their tables
which brings us to the the next problem.

When this gets really hard is when it is time to upgrade an existing database
to a new version. Suppose you add a new column to a type in the new version.
How does a user with lots of data in dozens of tables using the old type
install the new module?

What about restoring a dump from an old version into a system with the new
version installed?

Or how about migrating to a different platform? Can we move data from
a little endian platform (x86) to a big endian platform (sparc)? Obviously
the .so files will be different, but what about the copying the data out and
reloading it?

This is really the same problem as "schema evolution" which is (in the general
case) an open research topic.

> I realize there might not be a performance hit _now_, but if someone doesn't
> start this "loadable module" initiative, every Tom, Dick and Harry will want
> their types in the backend and eventually there _will_ be a performance hit.
> Then the problem would be big enough to be a major chore to convert the many,
> many types to loadable instead of only doing a couple now.

I agree that we might want to work on making installing new functionality
easier. I have no objection to this, I just don't want to see the problem
approached without some understanding of the real issues.

 I'm not trying to cry "Wolf" or proposing to do this to just push around some
> code.  I really think there are benefits to it, if not now, in the future.
>
> And I know there are other areas that are broken or could be written better.
> We all do what we can...I'm not real familiar with the workings of the cache,
> indices, etc., but working on AIX has given me a great understanding of how
> to make/load modules.

Just to belabor this, it is perfectly reasonable to add a set of types and
functions that have no 'C' implementation. The 'loadable module' analogy
misses a lot of the real requirements.

> There, my spleen feels _much_ better now. :)

It looks better too... ;-)

> darrenk

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - Linux. Not because it is free. Because it is better.


Modules

From
Mattias Kregert
Date:
David Gould wrote:
>
> To load a module into a kernel all you need to do is read the code in,
> resolve the symbols, and maybe call an intialization routine. This is
> merely a variation on loading a shared object (.so) file into a program.
>
> To add a type and related stuff to a database is really a much harder problem.

I don't agree.

> You need to be able to
>   - add one or more type descriptions                   types table
>   - add input and output functions                      types, functions tables
>   - add cast functions                                  casts, functions tables
>   - add any datatype specific behavior functions        functions table
>   - add access method operators (maybe)                 amops, functions tables
>   - add aggregate operators                             aggregates, functions
>   - add operators                                       operators, functions
>   - provide statistics functions
>   - provide destroy operators
>   - provide .so files for C functions, SQL for sql functions
>     (note this is the part needed for a unix kernel module)
>   - do all the above within a particular schema
>
> You may also need to create and populate data tables, rules, defaults, etc
> required by the implementation of the new type.

All this would be done by the init function in the module you load.
What we need is a set of functions callable by modules, like
module_register_type(name, descr, func*, textin*, textout*, whatever
...)
module_register_smgr(name, descr, .....)
module_register_command(....
Casts would be done by converting to a common format (text) and then to
the desired type. Use textin/textout. No special cast functions would
have to exist. Why doesn't it work this way already??? Would not that
solve all casting problems?


> To unload a type requires undoing all the above. But there is a wrinkle: first
> you have to check if there are any dependancies. That is, if the user has
> created a table with one of the new types, you have to drop that table
> (including column defs, indexes, rules, triggers, defaults etc) before
> you can drop the type. Of course the user may not want to drop their tables
> which brings us to the the next problem.

Dependencies are checked by the OS kernel when you try to unload
modules.
You cannot unload slhc without first unloading ppp, for example. What's
the
difference?
If you have Mod4X running with /dev/dsp opened, then you can't unload
the sound driver, because it is in use, and you cannot unload a.out
module
if you have a non-ELF program running, and you can see the refcount on
all
modules and so on... This would not be different in a SQL server.
If you have a cursor open, accessing IP types, then you cannot unload
the IP-types module. Close the cursor, and you can unload the module if
you want to.
You don't have to drop tables containing new types just because you
unload
the module. If you want to SELECT from it, then that module would be
loaded
automagically when it is needed.


> When this gets really hard is when it is time to upgrade an existing database
> to a new version. Suppose you add a new column to a type in the new version.
> How does a user with lots of data in dozens of tables using the old type
> install the new module?
>
> What about restoring a dump from an old version into a system with the new
> version installed?

Suppose you change TIMESTAMP to 64 bits time and 16 bits userid... how
do you
solve that problem? You would probably have to make the textin/textout
functions
for the type recognize the old format and make the appropriate
conversions.
Perhaps add zero userid, or default to postmaster userid?
This would not be any different if TIMESTAMP was in a separate module.

For the internal storage format, every type could have it's own way
of recognizing different versions of the data. For example, say you have
an IPv4 module and inserts millions of IP-addresses, then you upgrade
to IPv6 module. It would then be able to look at the data and see if
it is a IPv4 or IPv6 address. Of course, you would have problems if you
tried to downgrade and had lots of IPv6 addresses inserted.
MyOwnType could use the first few bits of the data to decide which
version it is, and later releases of MyOwnType-module would be able
to recognize the older formats.
This way, types could be upgraded without dump-and-load procedure.


> Or how about migrating to a different platform? Can we move data from
> a little endian platform (x86) to a big endian platform (sparc)? Obviously
> the .so files will be different, but what about the copying the data out and
> reloading it?

Is this a problem right now? Dump and reload, how can it fail?


> Just to belabor this, it is perfectly reasonable to add a set of types and
> functions that have no 'C' implementation. The 'loadable module' analogy
> misses a lot of the real requirements.

Why would someone want a type without implementation?
Ok, let the module's init function register a type marked as
"non-existant"? Null
function?

/* m */

Re: [HACKERS] Modules

From
dg@illustra.com (David Gould)
Date:
Mattias Kregert writes:
> David Gould wrote:
...
> > You need to be able to
> >   - add one or more type descriptions                   types table
 [big list of stuff to do deleted ]
> >   - do all the above within a particular schema
> >
> > You may also need to create and populate data tables, rules, defaults, etc
> > required by the implementation of the new type.
>
> All this would be done by the init function in the module you load.
> What we need is a set of functions callable by modules, like
> module_register_type(name, descr, func*, textin*, textout*, whatever
> ...)
> module_register_smgr(name, descr, .....)
> module_register_command(....

Ok, now you are requiring the module to handle all this in C. How does it
register a type, a default, a rule, a column, functions, etc?

Having thought about that, consider that currently all this can already be
done using SQL. postgreSQL is a relational database system. One of the
prime attributes of relational systems is that they are reflexive. That is,
you can use SQL to query and update the system catalogs that define the
characteristics of the the system.

By and large, all the tasks I mentioned previously can be done using SQL and
taking advantage of the high semantic level and power of the complete SQL
system. Given that we already have a perfectly good high level mechanism, I
just don't see any advantage to adding a bunch of low level APIs to
duplicate existing functionality.

> Casts would be done by converting to a common format (text) and then to
> the desired type. Use textin/textout. No special cast functions would
> have to exist. Why doesn't it work this way already??? Would not that
> solve all casting problems?

No. It is usable in some cases as an implementation of a defined cast, but
you still need defined casts. Think about these problems:

First, there is a significant performance penalty. Think about a query
like:

  select * from huge_table where account_balance > 1000.

The textout -> textin approach would be far slower than the current direct
int to float cast.

Second, how do you restrict the system to sensible casts or enforce a
meaningful order of attempted casts.

   create type yen based float;
   create type centigrade based float;

Would you allow?

   select yen * centigrade from market_data, weather_data
      where market_data.date = weather_data.date;

Even though the types 'yen' and 'centigrade' are implemented by float this
leaves open a few important questions:

 - what is the type of the result?
 - what could the result possibly mean?

Third you still can't do casts for many types:

  create type motion_picture (arrayof jpeg) ...

  select motion_picture * 10 from films...

There is no useful cast possible here.

> > To unload a type requires undoing all the above. But there is a wrinkle: first
> > you have to check if there are any dependancies. That is, if the user has
> > created a table with one of the new types, you have to drop that table
> > (including column defs, indexes, rules, triggers, defaults etc) before
> > you can drop the type. Of course the user may not want to drop their tables
> > which brings us to the the next problem.
>
> Dependencies are checked by the OS kernel when you try to unload
> modules.
> You cannot unload slhc without first unloading ppp, for example. What's
> the
> difference?

I could have several million objects that might use that type. I cannot
do anything with them without the type definition. Not even delete them.

> If you have Mod4X running with /dev/dsp opened, then you can't unload
> the sound driver, because it is in use, and you cannot unload a.out
> module
> modules and so on... This would not be different in a SQL server.

But it is very different. SQL servers are much more complex than OS kernels.
Having spent a number of years maintaining the OS kernel in a SQL engine
that was originally intended to run on bare hardware, I can tell you that
that kernel was less than 10% of the complete SQL engine.

> If you have a cursor open, accessing IP types, then you cannot unload
> the IP-types module. Close the cursor, and you can unload the module if
> you want to.
> You don't have to drop tables containing new types just because you
> unload
> the module. If you want to SELECT from it, then that module would be
> loaded
> automagically when it is needed.

Ahha, I start to understand. You are saying 'module' and meaning 'loadable
object file of functions'. Given that this is what you mean, we already
handle this.

What I took you to mean by 'module' was the SCHEMA defined to make the
functions useful, and the functions.

> > Just to belabor this, it is perfectly reasonable to add a set of types and
> > functions that have no 'C' implementation. The 'loadable module' analogy
> > misses a lot of the real requirements.
>
> Why would someone want a type without implementation?

Why should a type with no C functions fail to have an implementation? Right
now every table is also a type.

Many types are based on a extending an existing type, or are composites. Is
there some reason not to define the implementation (if any) in SQL?

--

I understand that modularity is good. I am asserting the postgreSQL is a
very modular and extendable system right now. There are mechanisms to add
just about any sort of extension you want. Very little is hard coded in
the core.

I think this discussion got started because someone wanted to remove the
ip and mac and money types. This is a mere matter of the current packaging,
as there is no reason to for them to be in or out except that historically
the system used some of these types before the extendibility was finished
so they went in the core code.

I don't think it matters much whether any particular type is part of the
core or not, so feel free to pull them out. Do package them up to
install in the normal way that extensions are supposed to install and
retest everything. Don't _add_ a whole new way to do the same kinds
of extensibilty that we _already_ do. Just use the mechanisms that already
exist.


This discussion has gone on too long as others are starting to point out,
so I am happy to take it to private mail if you wish to continue.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Of course, someone who knows more about this will correct me if I'm wrong,
 and someone who knows less will correct me if I'm right."
               --David Palmer (palmer@tybalt.caltech.edu)

Re: [HACKERS] Modules

From
"Thomas G. Lockhart"
Date:
> All this would be done by the init function in the module you load.
> What we need is a set of functions callable by modules, like
> module_register_type(name, descr, func*, textin*, textout*, whatever
> ...)
> module_register_smgr(name, descr, .....)
> module_register_command(....
> Casts would be done by converting to a common format (text) and then to
> the desired type. Use textin/textout. No special cast functions would
> have to exist. Why doesn't it work this way already??? Would not that
> solve all casting problems?

It does work this way already, at least in some cases. It definitely
does not solve casting problems, for several reasons, two of which are:
- textout->textin is inefficient compared to binary conversions.
- type conversion also may require value and format manipulation far
beyond what you would accept for an input function for a specific type.
That is, to convert a type to another type may require a conversion
which would not be acceptable in any case other than a conversion from
that specific type to the target type. You need to call a specialized
routine to do this.

> Dependencies are checked by the OS kernel when you try to unload
> modules. You cannot unload slhc without first unloading ppp, for
> example. What's the difference?

Granularity. If, for example, we had a package of 6 types and 250
functions, you would need to check each of these for dependencies. David
was just pointing out that it isn't as easy, not that it is impossible.
I thought his list of issues was fairly complete, and any solution would
address these somehow...

                         - Tom