Thread: Re: [HACKERS] Data type removal
> > If any restructuring happens which removes, or makes optional, some of > > the fundamental types, it should be accomplished so that the types can > > be added in transparently, from a single set of source code, during > > build time or after. OIDs would have to be assigned, presumably, and the > > hardcoding of the function lookups for builtin types must somehow be > > done incrementally. Probably needs more than this to be done right, and > > without careful planning and implementation we will be taking a big step > > backwards. > > Exactly. Right now modules get installed by building the .so files and > then creating all the types, functions, rules, tables, indexes etc. This > is a bit more complicated than the Linux kernal 'insmod' operation. We could > easily make the situation worse through careless "whacking". Geez, Louise. What I'm proposing will _SHOWCASE_ the extensibility. I'm not looking to remove it and hardcode everything. > > Seems to me that Postgres' niche is at the high end of size and > > capability, not at the lightweight end competing for design wins against > > systems which don't even have transactions. > > And, there are already a couple of perfectly good 'toy' database systems. > What is the point of having another one? Postgres should move toward > becoming an "industrial strength" solution. Making some of the _mostly_unused_ data types loadable instead of always compiled in will NOT make postgres into a 'toy'. Does "industrial strength" imply having every possible data type compiled in? Regardless of use? I think the opposite is true. Puttining some of these extra types into modules will show people the greatest feature that separates us from the 'toy's. I realize there might not be a performance hit _now_, but if someone doesn't start this "loadable module" initiative, every Tom, Dick and Harry will want their types in the backend and eventually there _will_ be a performance hit. Then the problem would be big enough to be a major chore to convert the many, many types to loadable instead of only doing a couple now. I'm not trying to cry "Wolf" or proposing to do this to just push around some code. I really think there are benefits to it, if not now, in the future. And I know there are other areas that are broken or could be written better. We all do what we can...I'm not real familiar with the workings of the cache, indices, etc., but working on AIX has given me a great understanding of how to make/load modules. There, my spleen feels _much_ better now. :) darrenk
On Fri, 27 Mar 1998, Darren King wrote: > And I know there are other areas that are broken or could be written better. > We all do what we can...I'm not real familiar with the workings of the cache, > indices, etc., but working on AIX has given me a great understanding of how > to make/load modules. This whole discussion has, IMHO, gone dry...Darren, if you can cleanly and easily build a module for the ip_and_mac contrib types to use as a model to work from, please do so... I think the concept of modularization for types is a good idea, and agree with your perspective that it *proves* our extensibility... But, this has to be added in *perfectly* cleanly, such that there is no extra work on anyone's part in order to make use of those types we already have existing. FreeBSD uses something called 'LKM's (loadable kernel modules) for doing this in the kernel, and Linux does something similar, with the benefit being, in most cases, that you can unload an older version and load in a newer one relatively seamlessly... Until a demo is produced as to how this can work, *please* kill this thread...its gotten into a circular loop, and, quite frankly, isn't moving anywhere... if this is to be workable, the module has to be build when the system is compiled, initially, for the base types, and installed into a directory that the server can see and load from...the *base* modules have to be transparent to the end user/adminstrator...
Darren writes: > > > If any restructuring happens which removes, or makes optional, some of > > > the fundamental types, it should be accomplished so that the types can > > > be added in transparently, from a single set of source code, during > > > build time or after. OIDs would have to be assigned, presumably, and the > > > hardcoding of the function lookups for builtin types must somehow be > > > done incrementally. Probably needs more than this to be done right, and > > > without careful planning and implementation we will be taking a big step > > > backwards. > > > > Exactly. Right now modules get installed by building the .so files and > > then creating all the types, functions, rules, tables, indexes etc. This > > is a bit more complicated than the Linux kernal 'insmod' operation. We could > > easily make the situation worse through careless "whacking". > > Geez, Louise. What I'm proposing will _SHOWCASE_ the extensibility. I'm not > looking to remove it and hardcode everything. Apparently I was not clear. I am sorry that you find my comment upsetting, it was not meant to be. What I meant is that adding a type and related functions to a database is a much more complicated job than loading a module into a Unix kernel. To load a module into a kernel all you need to do is read the code in, resolve the symbols, and maybe call an intialization routine. This is merely a variation on loading a shared object (.so) file into a program. To add a type and related stuff to a database is really a much harder problem. You need to be able to - add one or more type descriptions types table - add input and output functions types, functions tables - add cast functions casts, functions tables - add any datatype specific behavior functions functions table - add access method operators (maybe) amops, functions tables - add aggregate operators aggregates, functions - add operators operators, functions - provide statistics functions - provide destroy operators - provide .so files for C functions, SQL for sql functions (note this is the part needed for a unix kernel module) - do all the above within a particular schema You may also need to create and populate data tables, rules, defaults, etc required by the implementation of the new type. And of course, a "module" may really implement dozens of types and hundreds of functions. To unload a type requires undoing all the above. But there is a wrinkle: first you have to check if there are any dependancies. That is, if the user has created a table with one of the new types, you have to drop that table (including column defs, indexes, rules, triggers, defaults etc) before you can drop the type. Of course the user may not want to drop their tables which brings us to the the next problem. When this gets really hard is when it is time to upgrade an existing database to a new version. Suppose you add a new column to a type in the new version. How does a user with lots of data in dozens of tables using the old type install the new module? What about restoring a dump from an old version into a system with the new version installed? Or how about migrating to a different platform? Can we move data from a little endian platform (x86) to a big endian platform (sparc)? Obviously the .so files will be different, but what about the copying the data out and reloading it? This is really the same problem as "schema evolution" which is (in the general case) an open research topic. > I realize there might not be a performance hit _now_, but if someone doesn't > start this "loadable module" initiative, every Tom, Dick and Harry will want > their types in the backend and eventually there _will_ be a performance hit. > Then the problem would be big enough to be a major chore to convert the many, > many types to loadable instead of only doing a couple now. I agree that we might want to work on making installing new functionality easier. I have no objection to this, I just don't want to see the problem approached without some understanding of the real issues. I'm not trying to cry "Wolf" or proposing to do this to just push around some > code. I really think there are benefits to it, if not now, in the future. > > And I know there are other areas that are broken or could be written better. > We all do what we can...I'm not real familiar with the workings of the cache, > indices, etc., but working on AIX has given me a great understanding of how > to make/load modules. Just to belabor this, it is perfectly reasonable to add a set of types and functions that have no 'C' implementation. The 'loadable module' analogy misses a lot of the real requirements. > There, my spleen feels _much_ better now. :) It looks better too... ;-) > darrenk -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - Linux. Not because it is free. Because it is better.
David Gould wrote: > > To load a module into a kernel all you need to do is read the code in, > resolve the symbols, and maybe call an intialization routine. This is > merely a variation on loading a shared object (.so) file into a program. > > To add a type and related stuff to a database is really a much harder problem. I don't agree. > You need to be able to > - add one or more type descriptions types table > - add input and output functions types, functions tables > - add cast functions casts, functions tables > - add any datatype specific behavior functions functions table > - add access method operators (maybe) amops, functions tables > - add aggregate operators aggregates, functions > - add operators operators, functions > - provide statistics functions > - provide destroy operators > - provide .so files for C functions, SQL for sql functions > (note this is the part needed for a unix kernel module) > - do all the above within a particular schema > > You may also need to create and populate data tables, rules, defaults, etc > required by the implementation of the new type. All this would be done by the init function in the module you load. What we need is a set of functions callable by modules, like module_register_type(name, descr, func*, textin*, textout*, whatever ...) module_register_smgr(name, descr, .....) module_register_command(.... Casts would be done by converting to a common format (text) and then to the desired type. Use textin/textout. No special cast functions would have to exist. Why doesn't it work this way already??? Would not that solve all casting problems? > To unload a type requires undoing all the above. But there is a wrinkle: first > you have to check if there are any dependancies. That is, if the user has > created a table with one of the new types, you have to drop that table > (including column defs, indexes, rules, triggers, defaults etc) before > you can drop the type. Of course the user may not want to drop their tables > which brings us to the the next problem. Dependencies are checked by the OS kernel when you try to unload modules. You cannot unload slhc without first unloading ppp, for example. What's the difference? If you have Mod4X running with /dev/dsp opened, then you can't unload the sound driver, because it is in use, and you cannot unload a.out module if you have a non-ELF program running, and you can see the refcount on all modules and so on... This would not be different in a SQL server. If you have a cursor open, accessing IP types, then you cannot unload the IP-types module. Close the cursor, and you can unload the module if you want to. You don't have to drop tables containing new types just because you unload the module. If you want to SELECT from it, then that module would be loaded automagically when it is needed. > When this gets really hard is when it is time to upgrade an existing database > to a new version. Suppose you add a new column to a type in the new version. > How does a user with lots of data in dozens of tables using the old type > install the new module? > > What about restoring a dump from an old version into a system with the new > version installed? Suppose you change TIMESTAMP to 64 bits time and 16 bits userid... how do you solve that problem? You would probably have to make the textin/textout functions for the type recognize the old format and make the appropriate conversions. Perhaps add zero userid, or default to postmaster userid? This would not be any different if TIMESTAMP was in a separate module. For the internal storage format, every type could have it's own way of recognizing different versions of the data. For example, say you have an IPv4 module and inserts millions of IP-addresses, then you upgrade to IPv6 module. It would then be able to look at the data and see if it is a IPv4 or IPv6 address. Of course, you would have problems if you tried to downgrade and had lots of IPv6 addresses inserted. MyOwnType could use the first few bits of the data to decide which version it is, and later releases of MyOwnType-module would be able to recognize the older formats. This way, types could be upgraded without dump-and-load procedure. > Or how about migrating to a different platform? Can we move data from > a little endian platform (x86) to a big endian platform (sparc)? Obviously > the .so files will be different, but what about the copying the data out and > reloading it? Is this a problem right now? Dump and reload, how can it fail? > Just to belabor this, it is perfectly reasonable to add a set of types and > functions that have no 'C' implementation. The 'loadable module' analogy > misses a lot of the real requirements. Why would someone want a type without implementation? Ok, let the module's init function register a type marked as "non-existant"? Null function? /* m */
Mattias Kregert writes: > David Gould wrote: ... > > You need to be able to > > - add one or more type descriptions types table [big list of stuff to do deleted ] > > - do all the above within a particular schema > > > > You may also need to create and populate data tables, rules, defaults, etc > > required by the implementation of the new type. > > All this would be done by the init function in the module you load. > What we need is a set of functions callable by modules, like > module_register_type(name, descr, func*, textin*, textout*, whatever > ...) > module_register_smgr(name, descr, .....) > module_register_command(.... Ok, now you are requiring the module to handle all this in C. How does it register a type, a default, a rule, a column, functions, etc? Having thought about that, consider that currently all this can already be done using SQL. postgreSQL is a relational database system. One of the prime attributes of relational systems is that they are reflexive. That is, you can use SQL to query and update the system catalogs that define the characteristics of the the system. By and large, all the tasks I mentioned previously can be done using SQL and taking advantage of the high semantic level and power of the complete SQL system. Given that we already have a perfectly good high level mechanism, I just don't see any advantage to adding a bunch of low level APIs to duplicate existing functionality. > Casts would be done by converting to a common format (text) and then to > the desired type. Use textin/textout. No special cast functions would > have to exist. Why doesn't it work this way already??? Would not that > solve all casting problems? No. It is usable in some cases as an implementation of a defined cast, but you still need defined casts. Think about these problems: First, there is a significant performance penalty. Think about a query like: select * from huge_table where account_balance > 1000. The textout -> textin approach would be far slower than the current direct int to float cast. Second, how do you restrict the system to sensible casts or enforce a meaningful order of attempted casts. create type yen based float; create type centigrade based float; Would you allow? select yen * centigrade from market_data, weather_data where market_data.date = weather_data.date; Even though the types 'yen' and 'centigrade' are implemented by float this leaves open a few important questions: - what is the type of the result? - what could the result possibly mean? Third you still can't do casts for many types: create type motion_picture (arrayof jpeg) ... select motion_picture * 10 from films... There is no useful cast possible here. > > To unload a type requires undoing all the above. But there is a wrinkle: first > > you have to check if there are any dependancies. That is, if the user has > > created a table with one of the new types, you have to drop that table > > (including column defs, indexes, rules, triggers, defaults etc) before > > you can drop the type. Of course the user may not want to drop their tables > > which brings us to the the next problem. > > Dependencies are checked by the OS kernel when you try to unload > modules. > You cannot unload slhc without first unloading ppp, for example. What's > the > difference? I could have several million objects that might use that type. I cannot do anything with them without the type definition. Not even delete them. > If you have Mod4X running with /dev/dsp opened, then you can't unload > the sound driver, because it is in use, and you cannot unload a.out > module > modules and so on... This would not be different in a SQL server. But it is very different. SQL servers are much more complex than OS kernels. Having spent a number of years maintaining the OS kernel in a SQL engine that was originally intended to run on bare hardware, I can tell you that that kernel was less than 10% of the complete SQL engine. > If you have a cursor open, accessing IP types, then you cannot unload > the IP-types module. Close the cursor, and you can unload the module if > you want to. > You don't have to drop tables containing new types just because you > unload > the module. If you want to SELECT from it, then that module would be > loaded > automagically when it is needed. Ahha, I start to understand. You are saying 'module' and meaning 'loadable object file of functions'. Given that this is what you mean, we already handle this. What I took you to mean by 'module' was the SCHEMA defined to make the functions useful, and the functions. > > Just to belabor this, it is perfectly reasonable to add a set of types and > > functions that have no 'C' implementation. The 'loadable module' analogy > > misses a lot of the real requirements. > > Why would someone want a type without implementation? Why should a type with no C functions fail to have an implementation? Right now every table is also a type. Many types are based on a extending an existing type, or are composites. Is there some reason not to define the implementation (if any) in SQL? -- I understand that modularity is good. I am asserting the postgreSQL is a very modular and extendable system right now. There are mechanisms to add just about any sort of extension you want. Very little is hard coded in the core. I think this discussion got started because someone wanted to remove the ip and mac and money types. This is a mere matter of the current packaging, as there is no reason to for them to be in or out except that historically the system used some of these types before the extendibility was finished so they went in the core code. I don't think it matters much whether any particular type is part of the core or not, so feel free to pull them out. Do package them up to install in the normal way that extensions are supposed to install and retest everything. Don't _add_ a whole new way to do the same kinds of extensibilty that we _already_ do. Just use the mechanisms that already exist. This discussion has gone on too long as others are starting to point out, so I am happy to take it to private mail if you wish to continue. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 "Of course, someone who knows more about this will correct me if I'm wrong, and someone who knows less will correct me if I'm right." --David Palmer (palmer@tybalt.caltech.edu)
> All this would be done by the init function in the module you load. > What we need is a set of functions callable by modules, like > module_register_type(name, descr, func*, textin*, textout*, whatever > ...) > module_register_smgr(name, descr, .....) > module_register_command(.... > Casts would be done by converting to a common format (text) and then to > the desired type. Use textin/textout. No special cast functions would > have to exist. Why doesn't it work this way already??? Would not that > solve all casting problems? It does work this way already, at least in some cases. It definitely does not solve casting problems, for several reasons, two of which are: - textout->textin is inefficient compared to binary conversions. - type conversion also may require value and format manipulation far beyond what you would accept for an input function for a specific type. That is, to convert a type to another type may require a conversion which would not be acceptable in any case other than a conversion from that specific type to the target type. You need to call a specialized routine to do this. > Dependencies are checked by the OS kernel when you try to unload > modules. You cannot unload slhc without first unloading ppp, for > example. What's the difference? Granularity. If, for example, we had a package of 6 types and 250 functions, you would need to check each of these for dependencies. David was just pointing out that it isn't as easy, not that it is impossible. I thought his list of issues was fairly complete, and any solution would address these somehow... - Tom