Thread: An introduction and a plea ...
Dear pgsql-hackers list, First a few words of introduction : I'm 43 and, while I have been introduced to computing for a long time (my first exposure was a small Fortran exercise I wrote in '74 (!) for a timesharing system on an hardcopy terminal ...), my coding abilities are somewhat rusty. I am mainly a user by now, no longer a coder, and my interests in computers is now in making my life simpler (I'm a biostatistician, among other things).I probably won't be contributing any code to PostgreSQL. Some bug reports, maybe ... However, I've lurked on some of the PostgreSQL lists for 2 to 3 months (through the Web interface), and I feel that I might offer some advice, based on my past experience of seeing a lot of projects growing (or dying, due to feeping creaturism(TM) ...). So I will shamelessly pull my first plea, related to the proposed change to the default behaviour of PostgreSQL in querying classes with subclasses. I *strongly* suggest not to change anything in the default behaviour, which is what is expected from an SQL-compliant system, even if the database in question uses inheritance internally. The reason for that plea is that a modification would crash any program not explicitly written for inheritance features : such features might be used by, say, the administrator and coere programmers of a database, who are not necessarily publish this internal use of inheritance to end-users. Furthermore, such a change would forbid evolution of a database from a pure-relational to an object-orien,ted one : the two representations would be incompatible. It should also pointed out that most interface programs (such as ODBC or JDBC drivers) are not and will not in a foreseeable future be designed for use of these features. Modifying the default behaviour would break them. Apart from that, I am, after 17 years of exposure to the concepts of object-oriented programming, still to be convinced of the value of this paradigm. This is *not* to suggest that these developments should be left over ! However, I *feel* that the real issues behind this concept are not yet fully understood, and that some deep theoretical work remains to be done (in logic, for example : while the well-understood relational theory directly relates to set theory, I think that a mathematically correct objects-and-types theory shoud emanate from category theory but remains to be created ...). Your thoughs ? Emmanuel Charpentier
Emmanuel Charpentier wrote: > I *strongly* suggest not to change anything in the default behaviour, > which is what is expected from an SQL-compliant system, even if the > database in question uses inheritance internally. Can I assure you that these changes have NO EFFECT on anybody who does not use inheritance. i.e. Postgres will remain as SQL compliant as it was before. > The reason for that plea is that a modification would crash any > program not explicitly written for inheritance features. No it won't. If you don't use inheritance, you will not be effected in any way. > : such > features might be used by, say, the administrator and coere > programmers of a database, who are not necessarily publish this > internal use of inheritance to end-users. Furthermore, such a change > would forbid evolution of a database from a pure-relational to an > object-orien,ted one : the two representations would be incompatible. > > It should also pointed out that most interface programs (such as ODBC > or JDBC drivers) are not and will not in a foreseeable future be > designed for use of these features. Modifying the default behaviour > would break them. In my opinion, this change will give users of ODBC and such tools MORE useful defaults. Of course if you are using a non-OO interface to an OO database there will always be things you can't do. But IMHO, this gives a more useful set of defaults as a trasition phase. For example, currently if I have student and employee inheriting from person, ODBC query of SELECT * from person will return... NOTHING! After these changes the query will return all the persons (which happen to be students and employees). > Apart from that, I am, after 17 years of exposure to the concepts of > object-oriented programming, still to be convinced of the value of > this paradigm. This is *not* to suggest that these developments should > be left over ! However, I *feel* that the real issues behind this > concept are not yet fully understood, and that some deep theoretical > work remains to be done (in logic, for example : while the > well-understood relational theory directly relates to set theory, I > think that a mathematically correct objects-and-types theory shoud > emanate from category theory but remains to be created ...). Well, the fact is people are using OO now, and it's hard for me to explain the development advantages of an OO database to someone who is not coding. But if you really want to find out why an OO database is good, head on over to versant.com or odi.com, download the database and write a small application. Apart from anything else, some people need the improved performance NOW, and can't wait for the academics to give their stamp of approval. And OO database coding simplicity is saving millions of $$$ NOW.
Emmanuel Charpentier wrote: > > However, I've lurked on some of the PostgreSQL lists for 2 to 3 months > (through the Web interface), and I feel that I might offer some > advice, based on my past experience of seeing a lot of projects > growing (or dying, due to feeping creaturism(TM) ...). > > So I will shamelessly pull my first plea, related to the proposed > change to the default behaviour of PostgreSQL in querying classes with > subclasses. > > I *strongly* suggest not to change anything in the default behaviour, > which is what is expected from an SQL-compliant system, even if the > database in question uses inheritance internally. I agree wrt the * returning different types of tuples from different subtypes. I somewhata disagree about default selects/updates/deletes working on inherited tables by default - If we want PostgreSQL to evolve back to an ORDBMS. We should not change the defaul _yet_, but we should not exclude the change in future. rather we should acknowledge the current state of affairs wrt inheritance and declare it deprecated (dont use in new projects, start fixing your old ones) > The reason for that plea is that a modification would crash any > program not explicitly written for inheritance features : such > features might be used by, say, the administrator and coere > programmers of a database, who are not necessarily publish this > internal use of inheritance to end-users. I saw something similar when going from python 1.5.1 to 1.5.2 - suddenly some broken usage became a show-stopping bug instead of just ignering it with some hidden default usage. It did not byte me directly, but several of our developers had never read the introductory parts of docs, or had not understood what was said. Currently inheritance features can be used in a very limited way - 1. for defining a table that shares some columns with some other table(s) this usage is actually broken, as it currently results in tables that can't be dumped properly after columns are added, and thus should be discouraged anyway until it is fixed. 2. for selecting (and not updating/deleting) from a group of said broken tables, using a non-ansi syntax. The performanceis also most likely suboptimal, as indexes are not inherited. Therefore I would propose the following, more radical approach - * officially acknowledge the current lacking OO support of PostgreSQL and declare the current usages deprecated and soon-to-be-removedin 7.0 * not remove the support for them in the backend, but instead start to investigate ways to fix the buga and add the missingfeatures. * hide the OO development behind "set ORDBMS to 'ON'", which case would behave in the new way for the current two OO features(create .. inherits .., and select), if it is set to 'off' (the default) spit out a warning on each use but behavecompatibly. (maybe make psql check if it is invoked as osql and send the set command automatically) * for migrating databases provide a way to dump inherited tables as standalone so that it would be easy for people to clearup the inherits-as-macro usage * The OO development should solve the following problems (independent of which syntax will be eventually used) 1. if a table inherits another table, it has to (at least) inherit the following by default 1.1 columns - in a way that allows add/delete column (requires changes to storage manager, probably introductionof deleted/missing columns) 1.2 indexes, both unique and ordinary, where unique indexes should be unique _over_all_tables_ involved 1.3 constraints, including being the foreign end of foreign key constraint 2. a way to go from OID to tuple The must efficient solution seems to be a file with a simple structure that has records of (TUPLE_OID,TABLE_OID) wher a record is added at each insert. As this file is ordered wrt. TULE_OID and has fixed size records, it can be efficiently searche with binary search.As it is append-only it is also quite (probably most) efficient on inserts. I can't think of any solutions usingcurrent structures which would be nearly as efficient. If we sacrifice space for lookup speed we may write all oids and never shrink that file and have a computed lookup whic would require at most one disk access per oid lookup. We could use some kind of weightedbinary search in any case. The same kind of file could be used for re_introducing time-travel in an efficient way. 3. a way to get full tuples(tuple type + all columns) from inherited tables. This would require minimal changes to wire protocol, but more changes to client API's. 4.possibly a bit unrelated to OO, but still a must-do - Start working on a binary cross-platform protocol, that couldbe used for _both_ insert/update/delete and select (instead of current single-platform select only binary protocol) It would mean adding PREPARE to the backend (already exists in SPI) as well as smarter client libraries that would exposeit and that could marshal binary data given to BIND over wire. Having PREPARE-d queries can also speed up ourperformance on standard benchmarks, as much of prepare/optimise can be skipped. From there on it gets a bit foggy as it is really a distant future (possibly more than 1 year ;) 5. become even more object-oriented and add methods to tables that can do different things depending on which table theyoperate on. 6. allow writing these mathods in a platform-independent language (java/python/tcl/perl/...) and also passed from backendto frontend. > Furthermore, such a change > would forbid evolution of a database from a pure-relational to an > object-orien,ted one : the two representations would be incompatible. Do you propose the two-separate-parsers way of doing things ? > It should also pointed out that most interface programs (such as ODBC > or JDBC drivers) are not and will not in a foreseeable future be > designed for use of these features. Modifying the default behaviour > would break them. Standard SQL queries should give standard SQL responses. OTOH, there is an evolving API for interfacing ObjectDatabases with Java > Apart from that, I am, after 17 years of exposure to the concepts of > object-oriented programming, still to be convinced of the value of > this paradigm. My experience is exactly the opposite - after zenning the concept I'm unable to write anything longer than 15 lines that is not OO, (with the possible exclusion of SQL scripts, which do not fit nicely to that concept ;) It does _not_ mean writing in an "OO language", but just a way of thinking about problems and expressing these thoughts. > This is *not* to suggest that these developments should > be left over ! However, I *feel* that the real issues behind this > concept are not yet fully understood, and that some deep theoretical > work remains to be done There will _always_ remain theoretical work to be done, at least for any live concept. > (in logic, for example : while the > well-understood relational theory directly relates to set theory, I > think that a mathematically correct objects-and-types theory shoud > emanate from category theory but remains to be created ...). > > Your thoughs ? I suspect that OO programming as a whole could be complex enough that Goedels theorem forbids any complete"mathematically correct objects-and-types theory" ---------------- Hannu
Hannu Krosing wrote: > 2. a way to go from OID to tuple > > The must efficient solution seems to be a file with a simple structure > that > has records of (TUPLE_OID,TABLE_OID) wher a record is added at each > insert. > As this file is ordered wrt. TULE_OID and has fixed size records, it can > be efficiently searche with binary search. As it is append-only it is also > quite (probably most) efficient on inserts. I can't think of any solutions > using current structures which would be nearly as efficient. If you have your suggested indexes that apply over multiple relations, I can't see why that can't be used for this too. It just means that if you use ODBMS it is recommended that you do a CREATE INDEX oid_idx ON object (oid), where "object" is a conceptual super-class of all other objects. Your append-only file would grow without limit, which I think is a bit of a problem for some apps. Also the way ODBMS will work is an application will ask for a chunk of oids from the database, some of which may be later "wasted".(This is how Versant works and it is also a technique documented by Stonebraker in his postgres papers). This technique is so that applications don't have to talk to the backend to create objects in the front end that need oids. This means objects may not be created with oids in order. So you have to store space for oids in your file that may not be used. I think we need first more conventional style index that works well. Then we can experiment with more radical ideas. > The same kind of file could be used for re_introducing time-travel in an > efficient way. How? > 5. become even more object-oriented and add methods to tables that can do > different things depending on which table they operate on. Does this definitely not work now?
Chris Bitmead wrote: > > Hannu Krosing wrote: > > > 2. a way to go from OID to tuple > > > > The must efficient solution seems to be a file with a simple structure > > that > > has records of (TUPLE_OID,TABLE_OID) wher a record is added at each > > insert. > > As this file is ordered wrt. TULE_OID and has fixed size records, it can > > be efficiently searche with binary search. As it is append-only it is also > > quite (probably most) efficient on inserts. I can't think of any solutions > > using current structures which would be nearly as efficient. > > If you have your suggested indexes that apply over multiple relations, I > can't see why that can't be used for this too. The insert performance would be much worse for indexes than for append-only file. > It just means that if you use ODBMS it is recommended that you do a > CREATE INDEX oid_idx ON object (oid), where "object" > is a conceptual super-class of all other objects. > > Your append-only file would grow without limit, which I think is a bit > of a problem for some apps. I meant vacuum to compress it (which AFAIK it does not do for indexes currently) > Also the way ODBMS will work is an application will ask for a chunk > of oids from the database, some of which may be later "wasted".(This is > how Versant works and it is also a technique documented by Stonebraker in > his postgres papers). This technique is so that applications don't have to > talk to the backend to create objects in the front end that need oids. > This means objects may not be created with oids in order. > So you have to store space for oids in your file that may not be used. Yes, it needs some more book-keeping than I thought (keep the oid-file pages that could possibly be updated in memory until the front-end which requested the oids disconnects), or just assume all oids will be used and compress the unused ones below watermark out in VACUUM. > I think we need first more conventional style index that works well. > Then we can experiment with more radical ideas. An index spanning multiple tables is quite radical anyway. Initially we could get by with multiple indexes and extra (but slow) check for uniqueness (when index is unique). > > > The same kind of file could be used for re_introducing time-travel in an > > efficient way. > > How? By writing (TID,TIMESTAMP) tuples there and using that info to retrieve tuples active at specified time by examinimg TIDs in "deleted" tuples. As bot TID and TIMESTAMP should be monotonuously growing again binary search can be used on retrieve and inserts are append-only (meaning fast) Both cases assume that we are oriented on fast inserts, as b-tree would probably be faster than binary search on retrieves, but is much slower on inserts. > > > 5. become even more object-oriented and add methods to tables that can do > > different things depending on which table they operate on. > > Does this definitely not work now? AFAIK functions are selected based on their arguments which can be either a full tuple or several simple types, but not both. So the first kind _may_ actually work, we must ask someone more familiar on when the actual function is selected for "SELECT T.func() from TAB* T" queries. -------------- Hannu
Hannu Krosing wrote: > > > The same kind of file could be used for re_introducing time-travel in an > > > efficient way. > > > > How? > > By writing (TID,TIMESTAMP) tuples there and using that info to retrieve tuples > active at specified time by examinimg TIDs in "deleted" tuples. > As bot TID and TIMESTAMP should be monotonuously growing again binary search > can be used on retrieve and inserts are append-only (meaning fast) But since we are already storing all the time travel stuff already in the storage pages do we need this to reinstate time travel? Also if you reinstate time travel this way it will only work for people using this odbms feature. Wouldn't it be better to reinstate the old timetravel so it works for everyone?
Chris Bitmead wrote: > > Hannu Krosing wrote: > > > > > The same kind of file could be used for re_introducing time-travel in an > > > > efficient way. > > > > > > How? > > > > By writing (TID,TIMESTAMP) tuples there and using that info to retrieve tuples > > active at specified time by examinimg TIDs in "deleted" tuples. > > As bot TID and TIMESTAMP should be monotonuously growing again binary search > > can be used on retrieve and inserts are append-only (meaning fast) > > But since we are already storing all the time travel stuff already in > the storage pages do we need this to reinstate time travel? If we want to query for old tuples by wallclock time (which is not stored) and not only by transaction-id (which are) we need something to go from wc-time to tid and back. > Also if you reinstate time travel this way it will only work for people using > this odbms feature. > Wouldn't it be better to reinstate the old timetravel so it works for > everyone? It would be probably better to do it under another set, probably at dbinit (or createdb) time. so maybe set TIME_TRAVEL to 'on'; CREATE DATABASE TIME_TRAVELLERS_DB; would create a database that can use the time-travel features. It could of course be included in the db create statement: CREATE DATABASE TIME_TRAVELLERS_DB WITH TIME_TRAVEL='ON'; BTW, have you considered making OO a per-database feature or at least the default being settable when creating the database. ----------------------- Hannu