Thread: Data type removal
I, for one, am in favor of converting some of the type packages to loadable modules. Having those in the backend when they aren't being used is much like compiling extra modules into my Apache web server because they're "kinda neat", even though they won't be used. Also, if we follow the idea that we should have as many unique features in the backend as possible, we could end up with all sorts of features that are only used by a subset of users. For instance, I don't use the geometric types, but I do use a soundex type which I created. Why isn't the soundex type a standard part of the backend? I, personally, am glad it's not, because I like the version of this type that I created a lot better than the one that's in contrib. As far as the whole performance-improvement issue, I can say that if the backend is, say, 50K smaller due to the removal of those functions, that's just that much less swapping and that much more memory that's available for the OS buffer cache. Isn't that an improvement worth considering? How about this as a compromise. We make these packages available in the contrib or other such area as loadable modules as well as making them available right in the main backend code, but setup configure options to enable/disable them, so when I compile, I can say "--without-geometry" to compile without those types and functions. If I want to add them back in later, I can compile the loadable module version and add them in. -Brandon :)
On Tue, 24 Mar 1998, Brandon Ibach wrote: > I, for one, am in favor of converting some of the type packages to > loadable modules. Having those in the backend when they aren't being > used is much like compiling extra modules into my Apache web server > because they're "kinda neat", even though they won't be used. Also, I don't know about Apache, but is there any noticeable performance difference between having extra modules installed or not installed? It makes the binary slightly larger, but does it change performance? > if we follow the idea that we should have as many unique features in the > backend as possible, we could end up with all sorts of features that are > only used by a subset of users. For instance, I don't use the geometric > types, but I do use a soundex type which I created. Why isn't the > soundex type a standard part of the backend? I, personally, am glad > it's not, because I like the version of this type that I created a lot > better than the one that's in contrib. If yours is an improvement over what we have in contrib, why not submit it? > As far as the whole performance-improvement issue, I can say that > if the backend is, say, 50K smaller due to the removal of those > functions, that's just that much less swapping and that much more > memory that's available for the OS buffer cache. Isn't that an > improvement worth considering? Not if it removes the Postgres from PostgreSQL...I don't have the ip_and_mac contrib stuff loaded, because I never think of it. I know I can use it, mind you, just never think of adding it in... > How about this as a compromise. We make these packages available in > the contrib or other such area as loadable modules as well as making > them available right in the main backend code, but setup configure > options to enable/disable them, so when I compile, I can say > "--without-geometry" to compile without those types and functions. If I > want to add them back in later, I can compile the loadable module > version and add them in. As I stated earlier, if someone wants to add a '--without-geometry' option to configure that removes it, I have no problem with that...but it will only be to remove the feature, not add it in. Hell, I'm probably one that would even make use of it, since, right now, I don't use the geometric types either...but the default is to have everything included. I don't want to have to think about it someday when I decide I want to use those geometric types...
-----BEGIN PGP SIGNED MESSAGE----- Speaking of data type removal, I was wondering if there were a better way to handle arrays of types. From looking in the catalog, it appears that for each type, there is also declared a similar type, which is the array version. It seems that arrays should be considered more flags on a field, than a field type in themselves. Does this make sense to anybody else? - -- ===================================================================== | JAVA must have been developed in the wilds of West Virginia. | | After all, why else would it support only single inheritance?? | ===================================================================== | Finger geek@andrew.cmu.edu for my public key. | ===================================================================== -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQBVAwUBNRfhLIdzVnzma+gdAQGUvwH8CWMmMo633do81jgZd+pPPJPW481nfwB9 awec8H9PjZ3QsShK4cSIJmC9Yg+IMBp3E+goHYssAO4X42Nf15+0EA== =8YSU -----END PGP SIGNATURE-----
The Hermit Hacker said: > > On Tue, 24 Mar 1998, Brandon Ibach wrote: > > I, for one, am in favor of converting some of the type packages to > > loadable modules. Having those in the backend when they aren't being > > used is much like compiling extra modules into my Apache web server > > because they're "kinda neat", even though they won't be used. Also, > > I don't know about Apache, but is there any noticeable performance > difference between having extra modules installed or not installed? It > makes the binary slightly larger, but does it change performance? > I can't say for sure what effect this would have on performance. However, what example can you give of a monolithic piece of software that is superior to a slim, well-designed core with an architecture for expansion? In fact, if you want to talk about what puts the Postgres in PostgreSQL, I think the ability to dynamically add types and functions is a big part of that. I would say that a large part of the reason that packages like MySQL exist is so that people have an option of a light-weight, simple SQL database. Why can't we serve that same purpose? Never consider unneeded resource consumption lightly, as it can be a very important factor in someone's choice of a database package. > > if we follow the idea that we should have as many unique features in the > > backend as possible, we could end up with all sorts of features that are > > only used by a subset of users. For instance, I don't use the geometric > > types, but I do use a soundex type which I created. Why isn't the > > soundex type a standard part of the backend? I, personally, am glad > > it's not, because I like the version of this type that I created a lot > > better than the one that's in contrib. > > If yours is an improvement over what we have in contrib, why not > submit it? > Perhaps I will, but my point is that I had a *choice* to do my own implementation, rather than being forced to use one that didn't suit my needs as well. > > As far as the whole performance-improvement issue, I can say that > > if the backend is, say, 50K smaller due to the removal of those > > functions, that's just that much less swapping and that much more > > memory that's available for the OS buffer cache. Isn't that an > > improvement worth considering? > > Not if it removes the Postgres from PostgreSQL...I don't have the > ip_and_mac contrib stuff loaded, because I never think of it. I know I > can use it, mind you, just never think of adding it in... > Since when do geometric types put the Postgres in PostgreSQL? Or IP and MAC types, for that matter? I can use the improvements in 6.3, but I haven't upgraded yet because it will be a bit of a job. Is there something we can throw in to solve that, or is it maybe something I have to do if I want the benefits? > > How about this as a compromise. We make these packages available in > > the contrib or other such area as loadable modules as well as making > > them available right in the main backend code, but setup configure > > options to enable/disable them, so when I compile, I can say > > "--without-geometry" to compile without those types and functions. If I > > want to add them back in later, I can compile the loadable module > > version and add them in. > > As I stated earlier, if someone wants to add a > '--without-geometry' option to configure that removes it, I have no > problem with that...but it will only be to remove the feature, not add it > in. Hell, I'm probably one that would even make use of it, since, right > now, I don't use the geometric types either...but the default is to have > everything included. I don't want to have to think about it someday when > I decide I want to use those geometric types... > You may notice that the option I suggested was one that could be invoked to remove the feature. In other words, the default is to have the feature included unless the user asks for it to not be. All I'm saying is, don't force the users to run a Postgres which has unneeded baggage. -Brandon :)
Brandon writes: > I, for one, am in favor of converting some of the type packages to > loadable modules. Having those in the backend when they aren't being ... > As far as the whole performance-improvement issue, I can say that > if the backend is, say, 50K smaller due to the removal of those > functions, that's just that much less swapping and that much more > memory that's available for the OS buffer cache. Isn't that an > improvement worth considering? No. If the functions are not used, the pages of the executable that contain them are never loaded. So there is no effective memory impact from having these where they are. As far as disk space, disk costs $0.03 per megabyte these days so saving 50Kb is completely insignificant. It may be a good idea to make the system more modular, but there are thousands of good ideas for improving it. I think we would be better served by spending our time on something that makes a difference, not just mindlessly pushing code around for no particular benefit. > How about this as a compromise. We make these packages available > in the contrib or other such area as loadable modules as well as > making them available right in the main backend code, but setup > configure options to enable/disable them, so when I compile, I can say > "--without-geometry" to compile without those types and functions. If > I want to add them back in later, I can compile the loadable module > version and add them in. I don't particularly like this model of adding extensions either. It implies that you have to rerun configure and build the whole system to add a module. In fact, all that is needed is to build the module and run the setup scripts and the running database picks up the new functionality. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - Linux. Not because it is free. Because it is better.
> > Speaking of data type removal, I was wondering if there were a better > way to handle arrays of types. From looking in the catalog, it > appears that for each type, there is also declared a similar type, > which is the array version. It seems that arrays should be considered > more flags on a field, than a field type in themselves. Does this > make sense to anybody else? Is an array the same thing as a scalar? What would be the benefit of making arrays "flags on a field" instead of a "field type in themselves". Seriously, how would this improve _anything_? The postgres type system is very flexible and powerful as is. What is the problem this is trying to solve? What is the motivation for data type removal? -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - Linux. Not because it is free. Because it is better.
-----BEGIN PGP SIGNED MESSAGE----- > From: dg@illustra.com (David Gould) > > Speaking of data type removal, I was wondering if there were a better > > way to handle arrays of types. From looking in the catalog, it > > appears that for each type, there is also declared a similar type, > > which is the array version. It seems that arrays should be considered > > more flags on a field, than a field type in themselves. Does this > > make sense to anybody else? > > Is an array the same thing as a scalar? Uhm, I don't believe so. > What would be the benefit of making arrays "flags on a field" instead of > a "field type in themselves". Seriously, how would this improve _anything_? In particular, I was thinking of the PostgreSQL module for Python. It has a nice interface, but needs intimate knowledge of the data types. Somehow, it seems to be excessively kludgy to have to have e.g. an int4 type, and an int4[] type. When I query a table, if one of the fields is an array of integers, then I either want to know that it's an array, or it's integer. If the returned type is "array," then I have to magically know that the array is filled with integers. If it's an integer, then I just have to recognize that it's actually a series of integers. With the current setup, there has to be separate type handlers for char(2), char(2)[], int4, int4[], int2, int2[], float8, float8[], etcetera. I'd rather have handlers for the base type, and an iterator that's used for arrays. > What is the motivation for data type removal? In this case, I don't want to *remove* the data type, just change the identification method. An array of int4 should really be recognized as both (type)int4 and (type)array. Urhm, since this is PostgreSQL, I guess I'm arguing for type composition (how appropriate for an object-relational database). - -- ===================================================================== | JAVA must have been developed in the wilds of West Virginia. | | After all, why else would it support only single inheritance?? | ===================================================================== | Finger geek@andrew.cmu.edu for my public key. | ===================================================================== -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQBVAwUBNRgR1YdzVnzma+gdAQFGVgH+PX4jvMZfhFK1ZzVdYuL844gCrg5ZleDt T4x2b6TbOOqNGn7SZVQtpuCZXG7lSDFMNDRan2ft0OmE9ZlJ69ZpkA== =q92I -----END PGP SIGNATURE-----
On Tue, 24 Mar 1998, Brandon Ibach wrote: > The Hermit Hacker said: > > > > On Tue, 24 Mar 1998, Brandon Ibach wrote: > > > I, for one, am in favor of converting some of the type packages to > > > loadable modules. Having those in the backend when they aren't being > > > used is much like compiling extra modules into my Apache web server > > > because they're "kinda neat", even though they won't be used. Also, > > > > I don't know about Apache, but is there any noticeable performance > > difference between having extra modules installed or not installed? It > > makes the binary slightly larger, but does it change performance? > > > I can't say for sure what effect this would have on performance. > However, what example can you give of a monolithic piece of software > that is superior to a slim, well-designed core with an architecture > for expansion? PostgreSQL? Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Tue, 24 Mar 1998, David Gould wrote: > What is the motivation for data type removal? There appear to be two "motivations" from what I'm hearing: 1. reduced memory footprint 2. reduced "disk" footprint From what I've been able to gather, I believe that ppl are expecting that the result is a faster server, based on less "perceived" fat...nobody has yet been able to prove that though, which I believe Darren is working on. Part of this does have me thinking though, in that one of the *big* features that I perceived PostgreSQL to have was the fact that it was/is very easy to extend our datatypes and functions without having to recompile the server. So, right now, I'm kinda sitting on the fence in that I'm curious if we *are* taking anything away by moving the geometrics from inside of the server to a loadable module, given that the fact that we *are* able to do that is what appears to be one of our major points of "uniqueness"... Right now, there has been alot of speculation as to whether or not moving the geometrics into a loadable module or not would gain (or lose, for that matter) us anything... Let's say we have a $prefix/modules directory, where we move the geometrics stuff into. And, let's say we have an 'loadmod' command added that, if someone did 'loadmod geometrics;' from psql, it loaded up the geometrics module...would we have the *same* functionality as if it were compiled in, like it is now? Or we would lose something? To me, this is one of the *biggest* features of PostgreSQL, the fact that we can easily add in new features without recompiling...but without seeing actual numbers (performance as well as "disk"), we can discuss this till we are blue in the face. Darren...if you want to provide us with a working model of this, using the geometrics, so be it...I won't say that it will or won't be integrated without seeing this in operation, and without seeing some *actual* benefit to it...as the saying goes "Talk is Cheap..." So, I say, let's leave this as a "to be researched" and wait for something concrete to look at... Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> As far as the whole performance-improvement issue, I can say that > if the backend is, say, 50K smaller due to the removal of those > functions, that's just that much less swapping and that much more > memory that's available for the OS buffer cache. Isn't that an > improvement worth considering? Well, it is really only 50k per installation, not 50k per backend because they all share the same executable in the buffer cache. The problem with breaking it up, as I see it, is that all of a sudden the regression tests and the nice descriptions of \do, \df, etc are much harder to implement. Maybe in six months, when we will have resolved all of the performance/SQL92/feature issues, this will be a good use of our time, but at this point, trying to keep all that straight while working on much more important issues to end-users is hard to justify. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> The postgres type system is very flexible and powerful as is. What is > the problem this is trying to solve? > > What is the motivation for data type removal? There are many motivations involved here. I brought it up originally because the char2-16 types are not supported and do not provide any functionality over the char(),varchar(),text string types. Others suggested that since they do not care about the geometric types that those should be removed too. I regret bringing it up. Postgres has many unique features, and stripping it to become a plain vanilla SQL92 machine is a waste of time imo. If any restructuring happens which removes, or makes optional, some of the fundamental types, it should be accomplished so that the types can be added in transparently, from a single set of source code, during build time or after. OIDs would have to be assigned, presumably, and the hardcoding of the function lookups for builtin types must somehow be done incrementally. Probably needs more than this to be done right, and without careful planning and implementation we will be taking a big step backwards. With the amount of time being spent in discussion, _still without any quantitative estimates for performance improvement_, it seems like someone should do some measurements. Even without them though it is pretty clear that it won't benefit a large database, since the fraction of time spent constructing a query will be small compared to the time it takes to traverse the tables in the query. Seems to me that Postgres' niche is at the high end of size and capability, not at the lightweight end competing for design wins against systems which don't even have transactions. - Tom
> If any restructuring happens which removes, or makes optional, some of > the fundamental types, it should be accomplished so that the types can > be added in transparently, from a single set of source code, during > build time or after. OIDs would have to be assigned, presumably, and the > hardcoding of the function lookups for builtin types must somehow be > done incrementally. Probably needs more than this to be done right, and > without careful planning and implementation we will be taking a big step > backwards. This whole discussion reminds me of someone who thinks he is getting low gas milage because his tire pressure is low, when in fact, he has a whole in his gas tank. We have some major holes to plug, but they are hard to see. People see the tire pressure/pg_class table, and think the database can be improved. You could double the size of the system tables, or the binary, and see no performance change. Make them 10-times bigger, and see if you find a difference. Aren't there enough serious issues on the TODO list for people? These are actual complaints/bugs or requests from users, not pie-in-the-sky wouldn't-it-be-nice-if-we-had ideas. I had a private e-mail conversation with someone, and they mentioned that we seem to be mired in micro-optimizations, that really are not going to do us any significant good. I see his point. Look at the TCL/TK mess I got into just trying to get that to work, and removing the stuff that checked for specific tck/tk version numbers. Can you imagine the effort we will go through ripping out types? Another thing. It is the type extensibility that makes us more complex, not the types themselves. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
David Gould wrote: > > > How about this as a compromise. We make these packages available > > in the contrib or other such area as loadable modules as well as > > making them available right in the main backend code, but setup > > configure options to enable/disable them, so when I compile, I can say > > "--without-geometry" to compile without those types and functions. If > > I want to add them back in later, I can compile the loadable module > > version and add them in. > > I don't particularly like this model of adding extensions either. It implies > that you have to rerun configure and build the whole system to add a module. > In fact, all that is needed is to build the module and run the setup > scripts and the running database picks up the new functionality. That's the nice thing with Linux. You can throw out the sound driver any time and insert a new one whenever you want to. If I decide to use the MIDI port, I can rmmod the old driver and insmod the new one with midi support... If I connect a Wingdogs PC to my network, I can insmod smbfs and then I can mount the windows file resources from my Linux box. No need to upgrade kernel or reboot. If someone comes up with a new super-NFS, I just unmount my nfs's, insmod super-nfs, and mount -a -t nfs... With loadable modules in PostgreSQL, you could upgrade the ORACLE compatability module by typing "make modules modules-install". No need to kill all backends and postmaster, deal with angry users, make backups, install new binaries, dump-and-restore, track down bugs, and so on. It would also be easier to contribute to PostgreSQL by creating modules, than having to submit patches to the server core... Let's say I have made a storage manager for tapes. It would be easier to make a module which used a standard set of "modules" functions to register the new manager in the core, than submitting patches to be included in the core... And if something causes troubles, just remove that module! Then it would not be a big problem if a new feature was buggy. Perhaps it would be easier to find bugs then (binary search - remove module by module). Just imagine Linux if people had to submit patches to the kernel instead of just writing their drivers as modules!!! /* m */
Thomas G. Lockhart wrote: > > With the amount of time being spent in discussion, _still without any > quantitative estimates for performance improvement_, it seems like > someone should do some measurements. Even without them though it is > pretty clear that it won't benefit a large database, since the fraction > of time spent constructing a query will be small compared to the time it > takes to traverse the tables in the query. I don't think the main reason for moving things out from the core would be performance, but for making it easier to manage extensions. /* m */
On Wed, 25 Mar 1998, Mattias Kregert wrote: > Thomas G. Lockhart wrote: > > > > With the amount of time being spent in discussion, _still without any > > quantitative estimates for performance improvement_, it seems like > > someone should do some measurements. Even without them though it is > > pretty clear that it won't benefit a large database, since the fraction > > of time spent constructing a query will be small compared to the time it > > takes to traverse the tables in the query. > > I don't think the main reason for moving things out from the > core would be performance, but for making it easier to manage > extensions. "Easier to manage extensions"??
> I don't think the main reason for moving things out from the > core would be performance, but for making it easier to manage > extensions. ??
> > > The postgres type system is very flexible and powerful as is. What is > > the problem this is trying to solve? > > > > What is the motivation for data type removal? > > There are many motivations involved here. I brought it up originally > because the char2-16 types are not supported and do not provide any > functionality over the char(),varchar(),text string types. > > Others suggested that since they do not care about the geometric types > that those should be removed too. > > I regret bringing it up. Postgres has many unique features, and > stripping it to become a plain vanilla SQL92 machine is a waste of time > imo. I agree completely. This was the point I was trying to make by asking for the motivation. If there was a clearcut proven performance gain to be had, I would support it. But as it is just speculation, it seems kinda pointless to push a bunch of code around (risking breaking it) for no very good reason. > If any restructuring happens which removes, or makes optional, some of > the fundamental types, it should be accomplished so that the types can > be added in transparently, from a single set of source code, during > build time or after. OIDs would have to be assigned, presumably, and the > hardcoding of the function lookups for builtin types must somehow be > done incrementally. Probably needs more than this to be done right, and > without careful planning and implementation we will be taking a big step > backwards. Exactly. Right now modules get installed by building the .so files and then creating all the types, functions, rules, tables, indexes etc. This is a bit more complicated than the Linux kernal 'insmod' operation. We could easily make the situation worse through careless "whacking". > Seems to me that Postgres' niche is at the high end of size and > capability, not at the lightweight end competing for design wins against > systems which don't even have transactions. And, there are already a couple of perfectly good 'toy' database systems. What is the point of having another one? Postgres should move toward becoming an "industrial strength" solution. > - Tom Thank you for bringing some sense to this discussion. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - Linux. Not because it is free. Because it is better.
Bruce Momjian: > Aren't there enough serious issues on the TODO list for people? These > are actual complaints/bugs or requests from users, not pie-in-the-sky > wouldn't-it-be-nice-if-we-had ideas. > > I had a private e-mail conversation with someone, and they mentioned > that we seem to be mired in micro-optimizations, that really are not > going to do us any significant good. I see his point. Please feel free to quote me if I am the "someone" (I suspect I was). > Another thing. It is the type extensibility that makes us more complex, > not the types themselves. It is also the type extensibility that makes us more valuable. There are lots of "plain old SQL" systems out there. There are very few extendable systems. Perhaps we might want to start encourageing people to use the extensibility and write a few useful application modules. I know of a few that have been pretty sucessful in the commercial world: - Html generation from inside the database. Store your pages in the db with embedded <SQL ... /SQL> tags to dynamically generate web content based on queries. - Integrated full text search engines. Store documents and be able to query " select abstract from papers where contains(fulltext, 'database', 'transaction', 'recovery')". And so on. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - Linux. Not because it is free. Because it is better.