Re: [HACKERS] Data type removal - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] Data type removal
Date
Msg-id 9803272015.AA27545@hawk.illustra.com
Whole thread Raw
In response to Re: [HACKERS] Data type removal  (darrenk@insightdist.com (Darren King))
List pgsql-hackers
Darren writes:
> > > If any restructuring happens which removes, or makes optional, some of
> > > the fundamental types, it should be accomplished so that the types can
> > > be added in transparently, from a single set of source code, during
> > > build time or after. OIDs would have to be assigned, presumably, and the
> > > hardcoding of the function lookups for builtin types must somehow be
> > > done incrementally. Probably needs more than this to be done right, and
> > > without careful planning and implementation we will be taking a big step
> > > backwards.
> >
> > Exactly. Right now modules get installed by building the .so files and
> > then creating all the types, functions, rules, tables, indexes etc. This
> > is a bit more complicated than the Linux kernal 'insmod' operation. We could
> > easily make the situation worse through careless "whacking".
>
> Geez, Louise.  What I'm proposing will _SHOWCASE_ the extensibility. I'm not
> looking to remove it and hardcode everything.

Apparently I was not clear. I am sorry that you find my comment upsetting, it
was not meant to be.

What I meant is that adding a type and related functions to a database is a
much more complicated job than loading a module into a Unix kernel.

To load a module into a kernel all you need to do is read the code in,
resolve the symbols, and maybe call an intialization routine. This is
merely a variation on loading a shared object (.so) file into a program.

To add a type and related stuff to a database is really a much harder problem.

You need to be able to
  - add one or more type descriptions            types table
  - add input and output functions            types, functions tables
  - add cast functions                    casts, functions tables
  - add any datatype specific behavior functions    functions table
  - add access method operators (maybe)            amops, functions tables
  - add aggregate operators                aggregates, functions
  - add operators                    operators, functions
  - provide statistics functions
  - provide destroy operators
  - provide .so files for C functions, SQL for sql functions
    (note this is the part needed for a unix kernel module)
  - do all the above within a particular schema

You may also need to create and populate data tables, rules, defaults, etc
required by the implementation of the new type.

And of course, a "module" may really implement dozens of types and hundreds
of functions.

To unload a type requires undoing all the above. But there is a wrinkle: first
you have to check if there are any dependancies. That is, if the user has
created a table with one of the new types, you have to drop that table
(including column defs, indexes, rules, triggers, defaults etc) before
you can drop the type. Of course the user may not want to drop their tables
which brings us to the the next problem.

When this gets really hard is when it is time to upgrade an existing database
to a new version. Suppose you add a new column to a type in the new version.
How does a user with lots of data in dozens of tables using the old type
install the new module?

What about restoring a dump from an old version into a system with the new
version installed?

Or how about migrating to a different platform? Can we move data from
a little endian platform (x86) to a big endian platform (sparc)? Obviously
the .so files will be different, but what about the copying the data out and
reloading it?

This is really the same problem as "schema evolution" which is (in the general
case) an open research topic.

> I realize there might not be a performance hit _now_, but if someone doesn't
> start this "loadable module" initiative, every Tom, Dick and Harry will want
> their types in the backend and eventually there _will_ be a performance hit.
> Then the problem would be big enough to be a major chore to convert the many,
> many types to loadable instead of only doing a couple now.

I agree that we might want to work on making installing new functionality
easier. I have no objection to this, I just don't want to see the problem
approached without some understanding of the real issues.

 I'm not trying to cry "Wolf" or proposing to do this to just push around some
> code.  I really think there are benefits to it, if not now, in the future.
>
> And I know there are other areas that are broken or could be written better.
> We all do what we can...I'm not real familiar with the workings of the cache,
> indices, etc., but working on AIX has given me a great understanding of how
> to make/load modules.

Just to belabor this, it is perfectly reasonable to add a set of types and
functions that have no 'C' implementation. The 'loadable module' analogy
misses a lot of the real requirements.

> There, my spleen feels _much_ better now. :)

It looks better too... ;-)

> darrenk

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - Linux. Not because it is free. Because it is better.


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Reminder: Indices are not used
Next
From: Bruce Momjian
Date:
Subject: Subquery limits