Thread: Proposed Changes to PostgreSQL
Hi, I've been spending a lot of time lately with gdb and tracing the back-end seeing if I can understand it enough to make some changes. I'm starting to actually understand a lot of stuff, so in order to have some possibility of having my changes accepted, I want to discuss them here first. Based on that, I'm going to hopefully make an attempt at implementation. I have a patch for one of these changes already if I get the go ahead. THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. Speak now about these changes or please, forever hold your peace. Of course you can comment later if I screw up implementation. The proposed changes are.... 1) An imaginary field in every tuple that tells you the class it came from. This is useful when you select from table* and want to know which relation the object actually came from. It wouldn't be stored on disk, and like oid it wouldn't be displayed when you do SELECT *. The field would be called classname. So you could have... SELECT p.classname, p.name FROM person p; person | Fred student | Bill employee | Jim person | Chris If you want to know the exact behaviour it is as if every table in the database had done to it... ALTER TABLE foo ADD COLUMN classname TEXT; UPDATE foo SET classname='foo'; Of course this is not how it would be implemented. It is just reference for how it will appear to work. BTW, this idea was also in the original berkeley design notes. 2) Changing the sense of the default for getting inherited tuples. Currently you only get inherited tuples if you specify "tablename*". This would be changed so that you get all sub-class tuples too by default unless you specify "ONLY tablename". There are several rationale for this. Firstly this is what Illustra/Informix have implemented. Secondly, I believe it is more logical from an OO perspective as well as giving a more useful default. If a politician IS a person and I say SELECT * from person, then logically I should see all the politicians because they are people too (so they claim :). Thirdly, there are a whole range of SQL statements that should probably be disallowed without including sub-classes. e.g. an ALTER TABLE ADD COLUMN that does not include sub-classes is almost certainly undesirable. It seems ashame to have to resort to non-standard SQL with the "*" syntax in this case when it is really your only choice. Basicly, wanting ONLY a classname is a far more unusual choice, and leaving off the "*" is a common error. Fourthly, it seems out of character for the SQL language to have this single character operator. The SQL style is to use wordy descriptions of the operators meaning. "ONLY" fits well here because it describes its own meaning perfectly whereas to the unitiated, "*" is harder to guess at. While this change is an incompatibility I hope for those few people using inheritance they can accept the need to move forward without over-burden of backwards compatibility. 3) The ability to return different types of rows from a SELECT. This is to allow implementation of ODBMS functionality where a query could be required to instantiate objects of differing types with differing attributes. I would propose that that anytime you do a SELECT * from a base table that you would get back the full rows from those sub tables. Since the current PQ interface which doesn't support this notion would remain unchanged this wouldn't affect current users. It's probably also desirable to have a syntax for getting just the columns of the base table when this is desired. Say perhaps SELECT % from table. This would be a performance hack for users of libpq and a functionality difference for users of psql. The reason I think the "*" syntax should take on the new functionality is because it would be more consistent with what the OQL (object query language) standard specifies, and also because it seems the more useful default. Also there is no compatibility reason not to do it. In addition it would be legal to specify columns that only exist in sub-classes. For example, if we had CREATE TABLE person (name TEXT); CREATE TABLE student (studentid TEXT, faculty TEXT) INHERITS (person); it would be legal to say... > SELECT * FROM person; NAME ---- Fred Bill NAME | STUDENTID | FACULTY -------------------------- Jim | 23455 | Science Chris| 45666 | Arts > SELECT *, studentid FROM person; NAME ---- Fred Bill NAME | STUDENTID ---------------- Jim | 23455 Chris| 45666 > SELECT *, studentid FROM ONLY person; ERROR: person does not contain studentid. > SELECT % FROM person; NAME ---- Fred Bill Jim Chris As you can see, it is desirable that psql be modified to be able to print these differing tuple types. Presumably new column headings will be printed when a tuple is differing to the previous one. Likely it will be often desirable to do a SELECT * FROM person p ORDER BY p.classname; in order to have all the tuples of a particular type grouped together. In addition some extenions will be done to the PQ interface to support these differing return types. The current PQ interface will be left unchanged and backwards compatible for retrieving rows of a single type. Also there should be an settable option that specifies that "*" should also return the normally ignored columns of oid and classname. This is so that OO programs that embed SQL into them also get back the oid and classname which are required for the behind the scenes implementation of an ODMG client. Something like... SET SHOW_OID TRUE; SHOW_CLASSNAME TRUE; SELECT * FROM person; OID CLASSNAME NAME ------------------- 2344 person Fred 3445 person Bill OID CLASSNAME NAME | STUDENTID | FACULTY ----------------------------------------- 2355 student Jim | 23455 | Science 5655 student Chris| 45666 | Arts
> THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. > > Speak now about these changes or please, forever hold your peace. Of > course you can comment later if I screw up implementation. > > The proposed changes are.... > > 1) An imaginary field in every tuple that tells you the class it came > from. > This is useful when you select from table* and want to know which > relation the object actually came from. It wouldn't be stored on disk, > and like oid it wouldn't be displayed when you do SELECT *. The field > would be called classname. So you could have... > SELECT p.classname, p.name FROM person p; > person | Fred > student | Bill > employee | Jim > person | Chris So the field is created on the fly to show what table it came from. Seems like a good idea, though implementing another usually-invisible column will be tough. However, because it is not really a column like the oid is a column, it should be ok. Of course, internally it is relid. > 2) Changing the sense of the default for getting inherited tuples. > Currently you only get inherited tuples if you specify "tablename*". > This would be changed so that you get all sub-class tuples too by > default unless you specify "ONLY tablename". There are several > rationale for this. Firstly this is what Illustra/Informix have > implemented. Secondly, I believe it is more logical from an OO > perspective as well as giving a more useful default. If a politician > IS a person and I say SELECT * from person, then logically I should > see all the politicians because they are people too (so they claim > :). Thirdly, there are a whole range of SQL statements that should > probably be disallowed without including sub-classes. e.g. an ALTER > TABLE ADD COLUMN that does not include sub-classes is almost certainly > undesirable. It seems ashame to have to resort to non-standard SQL > with the "*" syntax in this case when it is really your only > choice. Basicly, wanting ONLY a classname is a far more unusual > choice, and leaving off the "*" is a common error. Fourthly, it seems > out of character for the SQL language to have this single character > operator. The SQL style is to use wordy descriptions of the operators > meaning. "ONLY" fits well here because it describes its own meaning > perfectly whereas to the unitiated, "*" is harder to guess at. While > this change is an incompatibility I hope for those few people using > inheritance they can accept the need to move forward without > over-burden of backwards compatibility. Sounds fine to me. Just realize you are taking on a long-overdue but big job here. > > 3) The ability to return different types of rows from a SELECT. This > is to allow implementation of ODBMS functionality where a query could > be required to instantiate objects of differing types with differing > attributes. This bothers me. We return relational data, showing the same number of columns and types for every query. I don't think we want to change that, even for OO. How are you going to return that info the the client side? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > So the field is created on the fly to show what table it came from. > Seems like a good idea, though implementing another usually-invisible > column will be tough. What problems do you forsee? > However, because it is not really a column like > the oid is a column, it should be ok. Of course, internally it is > relid. > > > 2) Changing the sense of the default for getting inherited tuples. > > Currently you only get inherited tuples if you specify "tablename*". > > Sounds fine to me. Just realize you are taking on a long-overdue but > big job here. I already have a patch for this one. The change is a few pretty simple changes to gram.y. > > 3) The ability to return different types of rows from a SELECT. This > > is to allow implementation of ODBMS functionality where a query could > > be required to instantiate objects of differing types with differing > > attributes. > > This bothers me. We return relational data, showing the same number of > columns and types for every query. I don't think we want to change > that, even for OO. What aspects bother you? This is the fundamental important thing about object databases. It's also something that I'm always wanting to do when generating web pages. I have web links like http://foo.com/page?id=123. I want to retrieve the webpage object (which is an inheritance hierarchy) of id=123 which may represent a web page of different types. Then process appropriately for different objects. i.e. typical OO polymorphism. > How are you going to return that info the the client side? Well the backend <-> frontend protocol that used to be able to return tuples of different types would be put back in. Also the berkerly postgres docs had other scenarios where different tuples could be returned. One is you could have a field of type postquel called say EMP.hobbies which had a value of "retrieve HOBBIES.all where...", and then "retrieve EMP.hobbies would return tuples of different types of hobbies.
[ I trimmed the cc list a bit ] Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > The proposed changes are.... > 1) An imaginary field in every tuple that tells you the class it came > from. > This is useful when you select from table* and want to know which > relation the object actually came from. It wouldn't be stored on disk, > and like oid it wouldn't be displayed when you do SELECT *. The field > would be called classname. So you could have... > SELECT p.classname, p.name FROM person p; This is a good idea, but it seems to me that it'd fit into the system traditions better if the pseudo-field gave the OID of the source relation. If you wanted the actual name of the relation, you'd need to join against pg_class. You could argue it either way I suppose; a name would be more convenient for simple interactive uses, but an OID would probably be more convenient and efficient for applications using this feature. I tend to lean towards the programmatic convenience side --- far more SQL queries are issued by programs than humans. > 2) Changing the sense of the default for getting inherited tuples. > Currently you only get inherited tuples if you specify "tablename*". > This would be changed so that you get all sub-class tuples too by > default unless you specify "ONLY tablename". There are several > rationale for this. Firstly this is what Illustra/Informix have > implemented. Secondly, I believe it is more logical from an OO > perspective as well as giving a more useful default. Well, mumble. That would be the cleanest choice if we were designing in a green field, but we aren't. You're talking about breaking every single extant Postgres application that uses inheritance, and possibly some that don't use it except as a shorthand for making their schemas more compact. (That's not a hypothetical case; I have DBs that use schema inheritance but never do SELECT FROM table*.) I think that's a mighty high price to pay for achieving a little more logical cleanliness. There is also a nontrivial performance penalty that would be paid for reversing this default, because then every ordinary SQL query would suffer the overhead of looking to see whether there are child tables for each table named in the query. That *really* doesn't strike me as a good idea. If Illustra were popular enough to have defined an industry standard about inheritance, I might think we should follow their lead --- but who else has followed their lead? In short, I vote for leaving well enough alone. It's not so badly wrong as to be intolerable, and the pain of changing looks high. > Thirdly, there are a whole range of SQL statements that should > probably be disallowed without including sub-classes. e.g. an ALTER > TABLE ADD COLUMN that does not include sub-classes is almost certainly > undesirable. This is true. We could either silently add *, or reject it ("hey bozo, have you forgotten that this table has subclasses?"). The reject option would be more conservative, just in case the admin *has* forgotten that the table has subclasses --- as a crude analogy, Unix "rm" doesn't assume "-r" by default ;-). I agree that allowing an ALTER to make a parent table inconsistent with its children is very bad news and should be prevented. (Dropping an inherited column is another example of something we shouldn't allow.) > I would propose that that anytime you do a SELECT * from a base table > that you would get back the full rows from those sub tables. Frankly: ugh. This doesn't square with *my* ideas of object inheritance. When you are dealing with something that ISA person, you do not really want to hear about any additional properties it may have; you are dealing with it as a person and not at any finer grain of detail. That goes double for dealing with whole collections of persons. If you want to examine a particular member of the collection and dynamically downcast it to some more-specific type, the proposed classname/classoid feature will give you the ability to do that; but I think it's a mistake to assume that this should happen by default. > Since the current PQ interface which doesn't support this notion would > remain unchanged this wouldn't affect current users. How would you implement this without actually breaking the current PQ interface? > It's probably also desirable to have a syntax for getting just the > columns of the base table when this is desired. Say perhaps SELECT % > from table. This would be a performance hack for users of libpq and a > functionality difference for users of psql. Again, I think you've got the default backwards. I remind you also of something we've been beating on Peter about: psql is an application scripting tool, so you don't get to redefine its behavior at whim, anymore than you can change libpq's API at whim. > In addition it would be legal to specify columns that only exist in > sub-classes. For example, > it would be legal to say... >> SELECT *, studentid FROM person; Yipes. I really, really, really DON'T like that one. At the level of table person, studentid is unequivocally an invalid column name. If you do this, you couldn't even guarantee that different subtables that had studentid columns would have compatible datatypes for those columns. > SELECT * FROM person; > OID CLASSNAME NAME > ------------------- > 2344 person Fred > 3445 person Bill > OID CLASSNAME NAME | STUDENTID | FACULTY > ----------------------------------------- > 2355 student Jim | 23455 | Science > 5655 student Chris| 45666 | Arts This is not too hard for a person to make sense of, but I think that it'd be mighty unwieldy for a program to deal with. What would the libpq-like interface look like, and what would a typical client routine look like? regards, tom lane
> Bruce Momjian wrote: > > > So the field is created on the fly to show what table it came from. > > Seems like a good idea, though implementing another usually-invisible > > column will be tough. > > What problems do you forsee? Well, it is usually pretty strange to carry around a column that doesn't exist through all the code and finally contruct it at the end. I would suspect something in the rewrite system could do that pretty easily, though. That is the direction I would go with that. > > > However, because it is not really a column like > > the oid is a column, it should be ok. Of course, internally it is > > relid. > > > > > 2) Changing the sense of the default for getting inherited tuples. > > > Currently you only get inherited tuples if you specify "tablename*". > > > > Sounds fine to me. Just realize you are taking on a long-overdue but > > big job here. > > I already have a patch for this one. The change is a few pretty simple > changes > to gram.y. OK, you will have to canvas the general list to make sure this does not break things for people, though our inheritance system needs an overhaul badly. > > > > 3) The ability to return different types of rows from a SELECT. This > > > is to allow implementation of ODBMS functionality where a query could > > > be required to instantiate objects of differing types with differing > > > attributes. > > > > This bothers me. We return relational data, showing the same number of > > columns and types for every query. I don't think we want to change > > that, even for OO. > > What aspects bother you? This is the fundamental important thing about > object databases. I fear it is totally against the way our API works. How does someone see how many columns in the returned row? > > How are you going to return that info the the client side? > > Well the backend <-> frontend protocol that used to be able to return > tuples of different types would be put back in. > > Also the berkerly postgres docs had other scenarios where different > tuples > could be returned. One is you could have a field of type postquel called > say > EMP.hobbies which had a value of "retrieve HOBBIES.all where...", and > then "retrieve > EMP.hobbies would return tuples of different types of hobbies. Yikes. Strange. Can we just return nulls for the empty fields? How many new API calls are required? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >>>> So the field is created on the fly to show what table it came from. >>>> Seems like a good idea, though implementing another usually-invisible >>>> column will be tough. >> >> What problems do you forsee? > Well, it is usually pretty strange to carry around a column that doesn't > exist through all the code and finally contruct it at the end. I would > suspect something in the rewrite system could do that pretty easily, > though. That is the direction I would go with that. Yeah. In fact, since the field is not required except on specific user request (explicit SELECT, or if you like Chris' SET SHOW_CLASSNAME idea, that'd still get translated into a SELECT target item at some pretty early stage), I don't see any need for it to get added to the HeapTupleHeader fields. That makes the implementation a *lot* cleaner because you wouldn't need in-memory HeapTupleHeader to be different from on-disk headers. I'm visualizing this as a parameterless function (or maybe a new primitive expression node type) that gets evaluated during ExecProject's construction of the output tuple for a a bottom-level seqscan or indexscan plan node. The only trick is to persuade the planner to push it down to the bottom level; normally anything that isn't a Var gets evaluated at the top of the plan tree. >>>> This bothers me. We return relational data, showing the same number of >>>> columns and types for every query. I don't think we want to change >>>> that, even for OO. My thought also. If we had a *real* object orientation, then a returned column would have an abstract data type that might correspond to an object supertype. Of course that just pushes the problem down a level: how does the application know what methods the returned object has? How can it even invoke those methods --- whatever code might exist for them would live on the server, presumably, not get shipped around in query results. regards, tom lane
On Wed, 2 Feb 2000, Tom Lane wrote: > > 2) Changing the sense of the default for getting inherited tuples. > > Currently you only get inherited tuples if you specify "tablename*". > > This would be changed so that you get all sub-class tuples too by > > default unless you specify "ONLY tablename". There are several > > rationale for this. Firstly this is what Illustra/Informix have > > implemented. Secondly, I believe it is more logical from an OO > > perspective as well as giving a more useful default. > > Well, mumble. That would be the cleanest choice if we were designing > in a green field, but we aren't. You're talking about breaking every > single extant Postgres application that uses inheritance, and possibly > some that don't use it except as a shorthand for making their schemas > more compact. (That's not a hypothetical case; I have DBs that use > schema inheritance but never do SELECT FROM table*.) I think that's > a mighty high price to pay for achieving a little more logical > cleanliness. > > There is also a nontrivial performance penalty that would be paid > for reversing this default, because then every ordinary SQL query > would suffer the overhead of looking to see whether there are > child tables for each table named in the query. That *really* > doesn't strike me as a good idea. > > If Illustra were popular enough to have defined an industry standard > about inheritance, I might think we should follow their lead --- but > who else has followed their lead? > > In short, I vote for leaving well enough alone. It's not so badly > wrong as to be intolerable, and the pain of changing looks high. Could this be implemented/patched in using #ifdef's, so that you could configure using --old-style-inheritance so that those that require it still have it, giving applications a chance to catch up? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Tom Lane wrote: > > 1) An imaginary field in every tuple that tells you the class it came > This is a good idea, but it seems to me that it'd fit into the system > traditions better if the pseudo-field gave the OID of the source > relation. This was my initial thought too, but then it occured to me that SQL doesn't normally deal in oids. For example you don't do a DROP TABLE oid; OTOH, oids are probably programmatically useful for things like ODBMSs. What do you think about having both? I know you can go from one to the other by joining with pg_class, but that's too inconvenient, and I can't make up my mind which is the better "system tradition" either. I'm not overly fussed on this point though. > Well, mumble. That would be the cleanest choice if we were designing > in a green field, but we aren't. You're talking about breaking every > single extant Postgres application that uses inheritance, and possibly > some that don't use it except as a shorthand for making their schemas > more compact. (That's not a hypothetical case; I have DBs that use > schema inheritance but never do SELECT FROM table*.) I think that's > a mighty high price to pay for achieving a little more logical > cleanliness. Ok, well compatibility is always a contentious thing. But in your case you are mis-using the inheritance feature. The question is, are you willing to do the (simple) changes to your code to cater for the common good? I'm wanting to make postgresql into a REAL odbms, and this is a stumbling point that will eventually affect 100x as many users as it does now (I hope :). We can also leave the old gram.y for people who want to retain compatibility for longer. > There is also a nontrivial performance penalty that would be paid > for reversing this default, because then every ordinary SQL query > would suffer the overhead of looking to see whether there are > child tables for each table named in the query. That *really* > doesn't strike me as a good idea. I can't comment on what the current performance penalty would be, but I'm sure this can be optimised to be a completely trivial overhead. > If Illustra were popular enough to have defined an industry standard > about inheritance, I might think we should follow their lead --- but > who else has followed their lead? Well Informix of course, which is not small potatoes. > > I would propose that that anytime you do a SELECT * from a base table > > that you would get back the full rows from those sub tables. > > Frankly: ugh. This doesn't square with *my* ideas of object > inheritance. When you are dealing with something that ISA person, > you do not really want to hear about any additional properties it may > have; you are dealing with it as a person and not at any finer grain of > detail. That goes double for dealing with whole collections of persons. > If you want to examine a particular member of the collection and > dynamically downcast it to some more-specific type, the proposed > classname/classoid feature will give you the ability to do that; > but I think it's a mistake to assume that this should happen by default. This would be the case if the database were the whole world. But it is not, it is a repository for applications written in other languages. How can you "dynamically downcast to a more specific type" if the database hasn't returned the columns of the more specific type? How can I instantiate a C++ object of type "Student" if the database has only returned to me the data members of type "Person"? > > Since the current PQ interface which doesn't support this notion would > > remain unchanged this wouldn't affect current users. > > How would you implement this without actually breaking the current > PQ interface? By adding new functions for use when you need to access the extra columns. > > It's probably also desirable to have a syntax for getting just the > > columns of the base table when this is desired. Say perhaps SELECT % > > from table. This would be a performance hack for users of libpq and a > > functionality difference for users of psql. > > Again, I think you've got the default backwards. I remind you also > of something we've been beating on Peter about: psql is an application > scripting tool, so you don't get to redefine its behavior at whim, > anymore than you can change libpq's API at whim. I am less adamant about the default in this scenario than in the "ONLY table" scenario. I'm a bit concerned about the fact that this would break compatibility with OQL standards, but I can live with this. > > In addition it would be legal to specify columns that only exist in > > sub-classes. For example, > > it would be legal to say... > >> SELECT *, studentid FROM person; > > Yipes. I really, really, really DON'T like that one. At the level > of table person, studentid is unequivocally an invalid column name. The reason for this is you need some kind of compromise between seeing every single column (which overwhelms you in psql) and not seeing any sub-type columns at all. > If you do this, you couldn't even guarantee that different subtables > that had studentid columns would have compatible datatypes for those > columns. I think you can because postgres won't let you create sub-types with column of the same name with incompatible data types. In fact it is this very fact about postgres that makes this feature feasible. > > SELECT * FROM person; > > > OID CLASSNAME NAME > > ------------------- > > 2344 person Fred > > 3445 person Bill > > > OID CLASSNAME NAME | STUDENTID | FACULTY > > ----------------------------------------- > > 2355 student Jim | 23455 | Science > > 5655 student Chris| 45666 | Arts > > This is not too hard for a person to make sense of, but I think that > it'd be mighty unwieldy for a program to deal with. What would the > libpq-like interface look like, and what would a typical client > routine look like? The PQ interface would have a new function something like PQnfieldsv(PQresult,tuplenum), so it returns a different number for each tuple. But the real benefit is not writing "unwieldy" code in C, but ODBMS style code where you can go... List<Shape> l = query("SELECT * FROM shape"); Shape *s; for (l.begin(); s = l.get(); l.next()) s.display(); Because if the dbms returns ALL the columns, a C++ runtime system can properly instantiate subtypes and use polymorphism.
Bruce Momjian wrote: > > I already have a patch for this one. The change is a few pretty simple > > changes > > to gram.y. > > OK, you will have to canvas the general list to make sure this does not > break things for people, though our inheritance system needs an overhaul > badly. This is already CCed to the general list. > I fear it is totally against the way our API works. How does someone > see how many columns in the returned row? A new API PQnfieldsv(PQresult, tupnum) or some such. > Yikes. Strange. Strange for C code perhaps. Very useful for constructing real objects in OO application code framework. > Can we just return nulls for the empty fields? Well, I think we should probably distinguish between a field that is null, and a field that simply doesn't exist. > How many new API calls are required? Perhaps just the one. (above).
> Yeah. In fact, since the field is not required except on specific > user request (explicit SELECT, or if you like Chris' SET SHOW_CLASSNAME > idea, that'd still get translated into a SELECT target item at some > pretty early stage), I don't see any need for it to get added to the > HeapTupleHeader fields. That makes the implementation a *lot* cleaner > because you wouldn't need in-memory HeapTupleHeader to be different from > on-disk headers. I'm visualizing this as a parameterless function (or > maybe a new primitive expression node type) that gets evaluated during > ExecProject's construction of the output tuple for a a bottom-level > seqscan or indexscan plan node. The only trick is to persuade the > planner to push it down to the bottom level; normally anything that > isn't a Var gets evaluated at the top of the plan tree. Yes, I agree this is a good way to do it. > >>>> This bothers me. We return relational data, showing the same number of > >>>> columns and types for every query. I don't think we want to change > >>>> that, even for OO. > > My thought also. If we had a *real* object orientation, then a returned > column would have an abstract data type that might correspond to an > object supertype. Of course that just pushes the problem down a level: > how does the application know what methods the returned object has? > How can it even invoke those methods --- whatever code might exist > for them would live on the server, presumably, not get shipped around > in query results. Agreed. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > >>>> This bothers me. We return relational data, showing the same number of > >>>> columns and types for every query. I don't think we want to change > >>>> that, even for OO. > > My thought also. If we had a *real* object orientation, then a returned > column would have an abstract data type that might correspond to an > object supertype. Of course that just pushes the problem down a level: > how does the application know what methods the returned object has? > How can it even invoke those methods --- whatever code might exist > for them would live on the server, presumably, not get shipped around > in query results. In (most) ODBMSes, the code for a class does NOT live in the database server. (How would you store a C++ binary in a database?). What happens is when a query returns an object, some magic behind the scenes checks the type of the returned object (thus the need for the "classname" column or similar.) The magic behind the scenes then instantiates a C++ object of the correct class and populates all the data members from the query results. The application code is then free to make polymorphic calls on the object because ALL the fields are populated, not just those of the base class.
The Hermit Hacker wrote: > Could this be implemented/patched in using #ifdef's, so that you could > configure using --old-style-inheritance so that those that require it > still have it, giving applications a chance to catch up? Sounds like an excellent idea, although I'm not sure how to ifdef a .y bison file.
I can live with this. Thanks. > Bruce Momjian wrote: > > > > I already have a patch for this one. The change is a few pretty simple > > > changes > > > to gram.y. > > > > OK, you will have to canvas the general list to make sure this does not > > break things for people, though our inheritance system needs an overhaul > > badly. > > This is already CCed to the general list. > > > I fear it is totally against the way our API works. How does someone > > see how many columns in the returned row? > > A new API PQnfieldsv(PQresult, tupnum) or some such. > > > Yikes. Strange. > > Strange for C code perhaps. Very useful for constructing real objects in > OO application code framework. > > > Can we just return nulls for the empty fields? > > Well, I think we should probably distinguish between a field that is > null, > and a field that simply doesn't exist. > > > How many new API calls are required? > > Perhaps just the one. (above). > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Tom Lane wrote: > > > > 1) An imaginary field in every tuple that tells you the class it came > > This is a good idea, but it seems to me that it'd fit into the system > > traditions better if the pseudo-field gave the OID of the source > > relation. > > This was my initial thought too, but then it occured to me that SQL > doesn't normally deal in oids. For example you don't do a DROP TABLE > oid; > > OTOH, oids are probably programmatically useful for things like ODBMSs. > > What do you think about having both? I know you can go from one to the > other by joining with pg_class, but that's too inconvenient, and I can't > make up my mind which is the better "system tradition" either. Sure, let them have both. Why not, or you could force them to join to pg_class for the name. That would work too. > Ok, well compatibility is always a contentious thing. But in your case > you are mis-using the inheritance feature. > > The question is, are you willing to do the (simple) changes to your > code to cater for the common good? I'm wanting to make postgresql into a > REAL odbms, and this is a stumbling point that will eventually affect > 100x > as many users as it does now (I hope :). > > We can also leave the old gram.y for people who want to retain > compatibility > for longer. I would canvas the list to find out how many people object, and if there are few, you may be able to get away with something in config.h.in that they can change if they want the old behavour. > > > Since the current PQ interface which doesn't support this notion would > > > remain unchanged this wouldn't affect current users. > > > > How would you implement this without actually breaking the current > > PQ interface? > > By adding new functions for use when you need to access the extra > columns. Whatever it is, the API has to be lean and clean. I saw your PQnfieldsv, and that looks fine to me. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > Again, I think you've got the default backwards. I remind you also > of something we've been beating on Peter about: psql is an application > scripting tool, so you don't get to redefine its behavior at whim, > anymore than you can change libpq's API at whim. If this is the only objection, we could make the old behaviour available by a SET command, as well as a command-line switch, as well as a ./configure option. I hope we can get the best design here possible without over-emphasis on compatibility.
> Tom Lane wrote: > > > Again, I think you've got the default backwards. I remind you also > > of something we've been beating on Peter about: psql is an application > > scripting tool, so you don't get to redefine its behavior at whim, > > anymore than you can change libpq's API at whim. > > If this is the only objection, we could make the old behaviour available > by a SET command, as well as a command-line switch, as well as a > ./configure option. > > I hope we can get the best design here possible without over-emphasis > on compatibility. SET command is probably the best. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 09:55 PM 2/2/00 -0500, Tom Lane wrote: >There is also a nontrivial performance penalty that would be paid >for reversing this default, because then every ordinary SQL query >would suffer the overhead of looking to see whether there are >child tables for each table named in the query. That *really* >doesn't strike me as a good idea. Thank you for pointing this out, because my first reaction to the proposal was "what's the overhead for SQL users"? Given the stated goals of becoming a fast, efficient, reliable SQL engine, this has to be a crucial consideration. On the other hand, as someone who once made his living off his designed and implemented optimizing multi-language, multi-platform compiler technology...is it entirely out of the question to consider more greatly abstracting the language (gram.y/analyze.c) and backend (optimizer and executor) interfaces so more than one front-end could exist (even if only in experimental and research environments)? Along with front-end specific versions of libpq? These front-ends wouldn't necessarily need to be supported by the mainstream PG development group, except to support a defined and sufficiently abstract interface to the optimization/planning and executing guts of the system so that folks could mess around to their heart's content. And bear the burden of doing so if they pick up users :) Just a thought... >> I would propose that that anytime you do a SELECT * from a base table >> that you would get back the full rows from those sub tables. > >Frankly: ugh. This doesn't square with *my* ideas of object >inheritance. Nor mine, in fact the stuff I've seen about primitive OO in databases make me thing the folks just don't get it. Not to mention that I'm not convinced that "getting it" is worth it. OO fits some paradigms, not others, when programming in the large. And most database stuff is really programming in the small (the query parts, the data is often huge, of course). The notion of asking a query, as in (say) psql is more related to the notion of typing a few lines at BASIC than the notion of writing a few million lines of integrated code. In database design, even more so than in conventional programming, it is the data model that reigns supreme and the actual size tends to be manageable, though the models themselves can be very complex. I offer this as a reason why commercial DB users are more concerned with things like performance, scalability, and the like than with reworking of the RDBMS paradigm. Complaints about queries seem to place heavy emphasis on "why they are slow", and the OO paradigm doesn't help here. I'm not certain that psuedo-OO features help. One reason I raise the issue of possible multiple front-ends (or making it easy for folks to make there own by making the parser->optimizer/backend interface more general) is that this whole area would seem to be one that begs for RESEARCH and experimentalism. The reality, AFAIK, is that in the crucible of commercial use, real OO databases and thinking simply haven't penetrated. Nor is Postgres written in C++ :) (GOOD decision to abandon that thought, IMO, though at the moment I'm working on C++ tools for my current client). >Again, I think you've got the default backwards. I remind you also >of something we've been beating on Peter about: psql is an application >scripting tool, so you don't get to redefine its behavior at whim, >anymore than you can change libpq's API at whim. Yeah, this is VERY important. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus wrote: > Given the stated goals of becoming a fast, efficient, reliable > SQL engine, this has to be a crucial consideration. I'm sure this can be made fast. > On the other hand, as someone who once made his living off his > designed and implemented optimizing multi-language, multi-platform > compiler technology...is it entirely out of the question to > consider more greatly abstracting the language (gram.y/analyze.c) > and backend (optimizer and executor) interfaces so more than one > front-end could exist (even if only in experimental and research > environments)? Along with front-end specific versions of libpq? A good thought, but we still need one good front end that supports all the features. > >> I would propose that that anytime you do a SELECT * from a base table > >> that you would get back the full rows from those sub tables. > > > >Frankly: ugh. This doesn't square with *my* ideas of object > >inheritance. > > Nor mine, in fact the stuff I've seen about primitive OO in databases > make me thing the folks just don't get it. > > Not to mention that I'm not convinced that "getting it" is worth it. OO > fits some paradigms, not others, when programming in the large. Well, the features I'm talking about don't affect you unless you want OO. > And > most database stuff is really programming in the small (the query parts, > the data is often huge, of course). The notion of asking a query, as > in (say) psql is more related to the notion of typing a few lines at > BASIC than the notion of writing a few million lines of integrated > code. In database design, even more so than in conventional programming, > it is the data model that reigns supreme and the actual size tends to > be manageable, though the models themselves can be very complex. And as those models become so complex it is crucial that the data-model that "reigns supreme" is properly integrated with the programming language. For example, in an IBM Java project I'm working on there is 15000 lines of code that converts about 10 or so SQL tables into Java objects. Insane stuff. > I offer this as a reason why commercial DB users are more concerned > with things like performance, scalability, and the like than with > reworking of the RDBMS paradigm. Actually developers are very interested in supporting the ODBMS paradigm as you can see from the Sun proposed standard for RDBMS interface which is an exact copy of the ODMG ODBMS interface standard. In fact I think about 90% of "stuff" is best solved with an ODBMS style of interaction. The trouble is that most ODBMS don't do the other 10% very well (i.e. wierd and wonderful queries), which is where postgresql _could_ be the ultimate at solving both. > Complaints about queries seem to > place heavy emphasis on "why they are slow", and the OO paradigm > doesn't help here. Huh? The OO paradigm helps heaps here because you can model something with a far smaller number of tables. > I'm not certain that psuedo-OO features help. Don't know what a pseudo-OO feature is. > One reason I raise the issue of possible multiple front-ends (or making > it easy for folks to make there own by making the parser->optimizer/backend > interface more general) is that this whole area would seem to be one > that begs for RESEARCH and experimentalism. No research is required. I simply want to implement the ODMG STANDARD for ODBMS databases on PostgreSQL. There are no great design issues here, just a matter of nailing down the details so that everyone can live with them. > The reality, AFAIK, is that in the crucible of commercial use, real > OO databases and thinking simply haven't penetrated. Not really true. In certain areas ODBMSes are pervasive. For example many Telco companies use ODBMSes for the majority of their stuff. It's necessary to get the performance they need. Also of course CAD apps can only use an ODBMS. No offence, but you havn't actually used one have you?
Don Baccus wrote: > > At 09:55 PM 2/2/00 -0500, Tom Lane wrote: > > >There is also a nontrivial performance penalty that would be paid > >for reversing this default, because then every ordinary SQL query > >would suffer the overhead of looking to see whether there are > >child tables for each table named in the query. That *really* > >doesn't strike me as a good idea. > > Thank you for pointing this out, because my first reaction to > the proposal was "what's the overhead for SQL users"? I just did a performance check on this. I found that the overhead is one tenth of a millisecond on a Sun desktop workstation. Pretty trivial, and I'm sure it can be improved.
On Wed, Feb 02, 2000 at 09:57:48PM -0500, Bruce Momjian allegedly wrote: > > > > 3) The ability to return different types of rows from a SELECT. This > > > > is to allow implementation of ODBMS functionality where a query could > > > > be required to instantiate objects of differing types with differing > > > > attributes. > > > > > > This bothers me. We return relational data, showing the same number of > > > columns and types for every query. I don't think we want to change > > > that, even for OO. > > > > What aspects bother you? This is the fundamental important thing about > > object databases. > > I fear it is totally against the way our API works. How does someone > see how many columns in the returned row? This would probably break applications written in PHP and Perl (and possibly others) that have their queryresults returned to them in a numerically indexed array (index by offset). If this behaviour could be turned off, than it shouldn't be a problem. Mathijs
Mathijs Brands wrote: > > On Wed, Feb 02, 2000 at 09:57:48PM -0500, Bruce Momjian allegedly wrote: > > > > > 3) The ability to return different types of rows from a SELECT. This > > > > > is to allow implementation of ODBMS functionality where a query could > > > > > be required to instantiate objects of differing types with differing > > > > > attributes. > > > > > > > > This bothers me. We return relational data, showing the same number of > > > > columns and types for every query. I don't think we want to change > > > > that, even for OO. > > > > > > What aspects bother you? This is the fundamental important thing about > > > object databases. > > > > I fear it is totally against the way our API works. How does someone > > see how many columns in the returned row? > > This would probably break applications written in PHP and Perl (and > possibly others) that have their queryresults returned to them in a > numerically indexed array (index by offset). If this behaviour could > be turned off, than it shouldn't be a problem. It wouldn't affect them because the current APIs would continue to return the same base-level columns. You would only get access to the extra columns with a new API.
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > Tom Lane wrote: >>>> 1) An imaginary field in every tuple that tells you the class it came >> This is a good idea, but it seems to me that it'd fit into the system >> traditions better if the pseudo-field gave the OID of the source >> relation. > What do you think about having both? I know you can go from one to the > other by joining with pg_class, but that's too inconvenient, and I can't > make up my mind which is the better "system tradition" either. If we can implement it as I sketched before, there's no reason not to offer both, since either one would create zero overhead for any query not using the feature. I'll comment on the other issues later ... but I will say that I don't think it's acceptable to add *any* overhead to standard-SQL queries in order to support inheritance better. The vast majority of our users want SQL performance and don't give a damn about inheritance. We have to pay attention to that. regards, tom lane
Tom Lane wrote: > I'll comment on the other issues later ... but I will say that I don't > think it's acceptable to add *any* overhead to standard-SQL queries > in order to support inheritance better. The vast majority of our users > want SQL performance and don't give a damn about inheritance. We have > to pay attention to that. Well I see that pg_class has columns like "relhasindex". If we added a "relhassubclass", the overhead should be unmeasureable.
Don Baccus wrote: > > > The reality, AFAIK, is that in the crucible of commercial use, real > OO databases and thinking simply haven't penetrated. AFAIK Informix integrated most OO features from Illustra into their UDB and also latest versions of Oracle have moved a lot in that direction too. > Nor is Postgres written in C++ :) what does C++ have to do with OO ;) ---------------------- Hannu
Bruce Momjian wrote: > > > Bruce Momjian wrote: > > > > > So the field is created on the fly to show what table it came from. > > > Seems like a good idea, though implementing another usually-invisible > > > column will be tough. > > > > What problems do you forsee? > > Well, it is usually pretty strange to carry around a column that doesn't > exist through all the code and finally contruct it at the end. I would > suspect something in the rewrite system could do that pretty easily, > though. That is the direction I would go with that. > Oracle has a ROWNR (IIRC) pseudo-column that is added in th every end of query and is a convienient way to put numbers on report rows (among other things). ------------ Hannu
Bruce Momjian wrote: > > > Yikes. Strange. Can we just return nulls for the empty fields? I think more natural way would be to define a new type (NAF - NotAFiled), like we have NAN for floats (do we ?, at least IEEE has) ----------------- Hannu
Chris Bitmead wrote: > > Tom Lane wrote: > > > > 1) An imaginary field in every tuple that tells you the class it came > > This is a good idea, but it seems to me that it'd fit into the system > > traditions better if the pseudo-field gave the OID of the source > > relation. > > This was my initial thought too, but then it occured to me that SQL > doesn't normally deal in oids. For example you don't do a DROP TABLE > oid; DROP TABLE (SELECT relname FROM pg_class WHERE oid=the_oid); would be cool ;) > > > I would propose that that anytime you do a SELECT * from a base table > > > that you would get back the full rows from those sub tables. Maybe SELECT ** FROM BASE would be more flexible as it leaves the standard SQL with its "standard" meaning ? > > Frankly: ugh. This doesn't square with *my* ideas of object > > inheritance. When you are dealing with something that ISA person, > > you do not really want to hear about any additional properties it may > > have; you are dealing with it as a person and not at any finer grain of > > detail. That goes double for dealing with whole collections of persons. > > If you want to examine a particular member of the collection and > > dynamically downcast it to some more-specific type, the proposed > > classname/classoid feature will give you the ability to do that; > > but I think it's a mistake to assume that this should happen by default. > > This would be the case if the database were the whole world. But it is > not, > it is a repository for applications written in other languages. How can > you > "dynamically downcast to a more specific type" if the database hasn't > returned > the columns of the more specific type? How can I instantiate a C++ > object of > type "Student" if the database has only returned to me the data members > of type > "Person"? You could do as some DB's (IIRC Oracle) do with large objects - return the whole row if doing a select that has many rows. return just a handle when going over a cursor with FETCH 1 and then have calls to get the rest. We will have to change the API sometime not too distant anyway, the current api is unable to deal with anything that does not have a nice textual representation (like an image or sound) in spite of all the talks about easy extensibility - the extensibility is all in the backend, ther is no easy way to get new datatypes in/out. --------------- Hannu
Hannu Krosing wrote: > Maybe SELECT ** FROM BASE would be more flexible as it leaves the standard > SQL with its "standard" meaning ? That was my first thought and it's definitely a possibility. My argument against it is that SQL doesn't have a "standard meaning" in the case of inheritance, and ** is an incompatibility with OQL. I suspect we need both. Something like SET GET_INHERITED_COLUMNS true; etc. > We will have to change the API sometime not too distant anyway, the current > api is unable to deal with anything that does not have a nice textual > representation (like an image or sound) in spite of all the talks about > easy extensibility - the extensibility is all in the backend, ther is no > easy way to get new datatypes in/out. What about PQbinaryTuples() and friends?
> Don Baccus wrote: > > > > At 09:55 PM 2/2/00 -0500, Tom Lane wrote: > > > > >There is also a nontrivial performance penalty that would be paid > > >for reversing this default, because then every ordinary SQL query > > >would suffer the overhead of looking to see whether there are > > >child tables for each table named in the query. That *really* > > >doesn't strike me as a good idea. > > > > Thank you for pointing this out, because my first reaction to > > the proposal was "what's the overhead for SQL users"? > > > I just did a performance check on this. I found that the overhead > is one tenth of a millisecond on a Sun desktop workstation. Pretty > trivial, and I'm sure it can be improved. Good point. Has to be non-mearurable performance penalty because most people don't use it. Maybe you will need a system cache entry for this. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Tom Lane wrote: > > > I'll comment on the other issues later ... but I will say that I don't > > think it's acceptable to add *any* overhead to standard-SQL queries > > in order to support inheritance better. The vast majority of our users > > want SQL performance and don't give a damn about inheritance. We have > > to pay attention to that. > > Well I see that pg_class has columns like "relhasindex". If we added a > "relhassubclass", the overhead should be unmeasureable. Yes, but how do you keep that accurate? If I add indexes, then drop them, does relhasindex go to false. Could you do that for relhassubclass? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > Well I see that pg_class has columns like "relhasindex". If we added a > > "relhassubclass", the overhead should be unmeasureable. > > Yes, but how do you keep that accurate? If I add indexes, then drop > them, does relhasindex go to false. I don't know. Does it? >Could you do that for relhassubclass? If we made it relnumsubclasses and incremented/decremented on CREATE/DROP, it seems easy in theory. -- Chris Bitmead mailto:chris@bitmead.com http://www.techphoto.org - Photography News, Stuff that Matters
> Bruce Momjian wrote: > > > > Well I see that pg_class has columns like "relhasindex". If we added a > > > "relhassubclass", the overhead should be unmeasureable. > > > > Yes, but how do you keep that accurate? If I add indexes, then drop > > them, does relhasindex go to false. > > I don't know. Does it? Oops:test=> create table test(x int);CREATEtest=> create index i_test on test(x);CREATEtest=> select relhasindex from pg_classwhere relname = 'test'; relhasindex ------------- t(1 row) test=> drop index i_test;DROPtest=> select relhasindex from pg_class where relname = 'test'; relhasindex ------------- t(1row) Let me add that to the TODO list. > > >Could you do that for relhassubclass? > > If we made it relnumsubclasses and incremented/decremented on > CREATE/DROP, it seems easy in theory. Yes, that would work. Seems hasindex has problems. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Oops: > test=> drop index i_test; > DROP > test=> select relhasindex from pg_class where relname = 'test'; > relhasindex > ------------- > t > (1 row) > > Let me add that to the TODO list. Why not change that to a relnumindexes as well? Easier to maintain and more useful information. > > >Could you do that for relhassubclass? > > > > If we made it relnumsubclasses and incremented/decremented on > > CREATE/DROP, it seems easy in theory. > > Yes, that would work. Seems hasindex has problems.
> > Let me add that to the TODO list. > > Why not change that to a relnumindexes as well? Easier to maintain and > more useful information. Yes, we probably should do that, but I bet some interfaces us it. Comments? Actually, looks like only pg_dump uses it, so maybe we would be OK. Maybe 7.0 is a good time to fix this. > > > > >Could you do that for relhassubclass? > > > > > > If we made it relnumsubclasses and incremented/decremented on > > > CREATE/DROP, it seems easy in theory. > > > > Yes, that would work. Seems hasindex has problems. > > ************ > > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> [ discussion on changing the default to getting subclasses ] I object. How about a set variable? SET GETSUBCLASSES = true With the '*' and ONLY being explicit overrides to the setting of the variable. The default would be 'false'. I would not object to a configuration switch that would change the default. -- Mark Hollomon mhh@nortelnetworks.com ESN 451-9008 (302)454-9008
At 12:00 PM 2/3/00 +0200, Hannu Krosing wrote: >Don Baccus wrote: >what does C++ have to do with OO ;) Nothing, but don't tell them :) Having worked on C++ compilers, don't get me started on THAT subject! - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 04:38 PM 2/3/00 +1100, Chris Bitmead wrote: >Don Baccus wrote: >> On the other hand, as someone who once made his living off his >> designed and implemented optimizing multi-language, multi-platform >> compiler technology...is it entirely out of the question to >> consider more greatly abstracting the language (gram.y/analyze.c) >> and backend (optimizer and executor) interfaces so more than one >> front-end could exist (even if only in experimental and research >> environments)? Along with front-end specific versions of libpq? > >A good thought, but we still need one good front end that supports >all the features. I wasn't think in terms of this being mutually exclusive with your desires. Merely raising up the notion that the possibility exists of creating a sandbox, so to speak, for people to play in, a tool for the exploration of such concepts. >> Nor mine, in fact the stuff I've seen about primitive OO in databases >> make me thing the folks just don't get it. >> >> Not to mention that I'm not convinced that "getting it" is worth it. OO >> fits some paradigms, not others, when programming in the large. > >Well, the features I'm talking about don't affect you unless you want >OO. No, and I wasn't arguing that you shouldn't move forward, either. I was just stating my personal opinion regarding the utility of simple OO-ish features, that's all. >> One reason I raise the issue of possible multiple front-ends (or making >> it easy for folks to make there own by making the parser->optimizer/backend >> interface more general) is that this whole area would seem to be one >> that begs for RESEARCH and experimentalism. > >No research is required. I simply want to implement the ODMG STANDARD >for ODBMS databases on PostgreSQL. There are no great design issues >here, >just a matter of nailing down the details so that everyone can live >with them. Well...that's sorta like saying no research into procedural language design is needed 'cause now we've got C++. Whether or not the existing standard for ODBMS is the greatest thing since sliced bread, I find it hard to believe that no research is required or design issues raised by the fundamental problems of database technology. Maybe I'm wrong, though, maybe the problem's been solved. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Chris <chris@bitmead.com> writes: > Why not change that to a relnumindexes as well? Easier to maintain and > more useful information. Maintaining an accurate count of descendants (or indexes for that matter) would be expensive; in particular, it'd create severe concurrency problems. If one transaction is in the middle of creating or dropping a child C of table P, then all other transactions would be blocked from creating or dropping any other children of P until the C transaction commits or aborts. They'd have to wait or they wouldn't know what to set relnumchildren to. For the purpose at hand, I think it would be OK to have a "relhaschildren" field that is set true when the first child is created and then never changed. If you have a table that once had children but has none at the moment, then you pay the price of looking through pg_inherits; but the case that we're really concerned about (a pure SQL, no-inheritance table) would still win. Not sure whether we can concurrently create/delete indexes on a rel, but I'd be inclined to leave relhasindexes alone: again its main function in life is to let you short-circuit looking for indexes on a table that's never had and never will have any. regards, tom lane
While I think that these kinds of changes are a No Go because they'd break a lot of applications (including mine), IF (big if) you really want to make major changes to the inheritance scheme, I got a few ideas. First let me say that I like the conceptual simplicity of relational databases. Some or all of the ideas thrown around here break with simplicity and consistency, by suggesting, e.g., that some commands be allowed only on entire inheritance structures, while others be allowed on individual tables, and attached to it a discussion which ones those should be. That doesn't strike me as too promising. A lot of people use inheritance to create "consistent schemas", that is, they empty create base tables, such as "address" which are inherited by tables such as customer, vendor, office, etc. That is probably not what inheritance is for, perhaps it should be some sort of a macro-like concept, such as create table vendor (name text, copy_schema_from(address), more fields), expanded by the parser. This is pretty much what it does now, only this scheme wouldn't have to actually store the (useless) inheritance link. Anyway, an idea I had would be to reimplement inheritance based on joins, since this is what the "pure relational" solution would be anyway. When I create a table B that is based on A, all the system does is create the table B as usual and store a note "I inherit from A". Any row you insert into B also creates a row in A, and the row in B contains an oid pointer to it. Thus a select on B performs a join on A.oid and B.row_in_A_pointer. A select on A just returns all the rows in A, no extras needed. A delete on B deletes the row in B and in A. A delete in A would cascade to B. Both of this can be gotten for free with foreign keys. Adding a column to A just adds the column to A, all other tables get the new column magically and in the right order. Same with dropping columns, etc. In short, this approach solves all inheritance problems at once and does so without adding any extra kludges besides the "I inherited from" field, which is static, plus the necessary transformations necessary in the parser. The drawback is of course that a select from an inherited table would always incur a join, perhaps some optimizing could be done in this direction. But the bottom line is that the compatibility issue looms big. -Peter On Thu, 3 Feb 2000, Chris Bitmead wrote: > Hi, > > I've been spending a lot of time lately with gdb and tracing the > back-end seeing if I can understand it enough to make some changes. > I'm starting to actually understand a lot of stuff, so in order > to have some possibility of having my changes accepted, I want to > discuss > them here first. Based on that, I'm going to hopefully make an attempt > at implementation. I have a patch for one of these changes already > if I get the go ahead. > > THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. > > Speak now about these changes or please, forever hold your peace. Of > course you can comment later if I screw up implementation. > > The proposed changes are.... > > 1) An imaginary field in every tuple that tells you the class it came > from. > This is useful when you select from table* and want to know which > relation the object actually came from. It wouldn't be stored on disk, > and like oid it wouldn't be displayed when you do SELECT *. The field > would be called classname. So you could have... > SELECT p.classname, p.name FROM person p; > person | Fred > student | Bill > employee | Jim > person | Chris > > If you want to know the exact behaviour it is as if every table in the > database had done to it... > ALTER TABLE foo ADD COLUMN classname TEXT; > UPDATE foo SET classname='foo'; > > Of course this is not how it would be implemented. It is just > reference for how it will appear to work. BTW, this idea was also > in the original berkeley design notes. > > 2) Changing the sense of the default for getting inherited tuples. > Currently you only get inherited tuples if you specify "tablename*". > This would be changed so that you get all sub-class tuples too by > default unless you specify "ONLY tablename". There are several > rationale for this. Firstly this is what Illustra/Informix have > implemented. Secondly, I believe it is more logical from an OO > perspective as well as giving a more useful default. If a politician > IS a person and I say SELECT * from person, then logically I should > see all the politicians because they are people too (so they claim > :). Thirdly, there are a whole range of SQL statements that should > probably be disallowed without including sub-classes. e.g. an ALTER > TABLE ADD COLUMN that does not include sub-classes is almost certainly > undesirable. It seems ashame to have to resort to non-standard SQL > with the "*" syntax in this case when it is really your only > choice. Basicly, wanting ONLY a classname is a far more unusual > choice, and leaving off the "*" is a common error. Fourthly, it seems > out of character for the SQL language to have this single character > operator. The SQL style is to use wordy descriptions of the operators > meaning. "ONLY" fits well here because it describes its own meaning > perfectly whereas to the unitiated, "*" is harder to guess at. While > this change is an incompatibility I hope for those few people using > inheritance they can accept the need to move forward without > over-burden of backwards compatibility. > > 3) The ability to return different types of rows from a SELECT. This > is to allow implementation of ODBMS functionality where a query could > be required to instantiate objects of differing types with differing > attributes. > > I would propose that that anytime you do a SELECT * from a base table > that you would get back the full rows from those sub tables. Since the > current PQ interface which doesn't support this notion would remain > unchanged this wouldn't affect current users. > > It's probably also desirable to have a syntax for getting just the > columns of the base table when this is desired. Say perhaps SELECT % > from table. This would be a performance hack for users of libpq and a > functionality difference for users of psql. > > The reason I think the "*" syntax should take on the new functionality > is because it would be more consistent with what the OQL (object query > language) standard specifies, and also because it seems the more > useful default. Also there is no compatibility reason not to do it. > > In addition it would be legal to specify columns that only exist in > sub-classes. For example, if we had > > CREATE TABLE person (name TEXT); > CREATE TABLE student (studentid TEXT, faculty TEXT) INHERITS (person); > > it would be legal to say... > > SELECT * FROM person; > NAME > ---- > Fred > Bill > > NAME | STUDENTID | FACULTY > -------------------------- > Jim | 23455 | Science > Chris| 45666 | Arts > > > SELECT *, studentid FROM person; > NAME > ---- > Fred > Bill > > NAME | STUDENTID > ---------------- > Jim | 23455 > Chris| 45666 > > > SELECT *, studentid FROM ONLY person; > ERROR: person does not contain studentid. > > > SELECT % FROM person; > NAME > ---- > Fred > Bill > Jim > Chris > > As you can see, it is desirable that psql be modified to be able to > print these differing tuple types. Presumably new column headings will > be printed when a tuple is differing to the previous one. Likely it > will be often desirable to do a > SELECT * FROM person p ORDER BY p.classname; > in order to have all the tuples of a particular type grouped together. > > In addition some extenions will be done to the PQ interface to support > these differing return types. The current PQ interface will be left > unchanged and backwards compatible for retrieving rows of a single > type. > > Also there should be an settable option that specifies that "*" should > also return the normally ignored columns of oid and classname. This is > so that OO programs that embed SQL into them also get back the oid and > classname which are required for the behind the scenes implementation > of an ODMG client. Something like... > > SET SHOW_OID TRUE; > SHOW_CLASSNAME TRUE; > > SELECT * FROM person; > > OID CLASSNAME NAME > ------------------- > 2344 person Fred > 3445 person Bill > > OID CLASSNAME NAME | STUDENTID | FACULTY > ----------------------------------------- > 2355 student Jim | 23455 | Science > 5655 student Chris| 45666 | Arts > > ************ > > -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
On Thu, 3 Feb 2000, Tom Lane wrote: > Maintaining an accurate count of descendants (or indexes for that > matter) would be expensive; in particular, it'd create severe > concurrency problems. What about fixing these things on VACUUM then? Taral
"Mark Hollomon" <mhh@nortelnetworks.com> writes: > How about a set variable? > SET GETSUBCLASSES = true > With the '*' and ONLY being explicit overrides to the setting > of the variable. The default would be 'false'. I like that a lot. Clean, flexible, doesn't break any existing applications. Perhaps the business of whether to fetch extra columns from subclasses could be done similarly. I am beginning to understand why Chris wants to do that, and I see that it would support a particular style of database programming very nicely. But I really fail to see why it's necessary to change the default behavior to cater to those apps rather than existing ones. Let the new apps use a variant syntax; don't expect people to change existing code in order to avoid getting tripped up by a new feature. Note that "oh they won't see the extra columns if they're using an old API" doesn't answer my objection. I'm concerned about the performance hit from fetching those columns and transferring them to the client, as well as the memory hit of storing them in query results on the client side. We should *not* set things up in such a way that that happens by default when the client didn't ask for it and isn't even using an API that can support it. That's why it'd be a mistake to redefine the existing query syntax to act this way. The suggestion of "SELECT ** FROM ..." sounds pretty good to me, actually. I don't really see any need for changing the behavior of anything that looks like a standard SQL query. Applications that need this feature will know that they need it and can issue a query that specifically requests it. > I would not object to a configuration switch that would change the > default. Mmm, I think that would probably not be such a hot idea. That would introduce a pretty fundamental semantics incompatibility between different installations, which would hurt script portability, complicate debugging and support, yadda yadda. I think a SET variable is enough... regards, tom lane
Taral <taral@taral.net> writes: > On Thu, 3 Feb 2000, Tom Lane wrote: >> Maintaining an accurate count of descendants (or indexes for that >> matter) would be expensive; in particular, it'd create severe >> concurrency problems. > What about fixing these things on VACUUM then? Could probably do that ... not sure if it's worth the trouble ... regards, tom lane
Tom Lane wrote: > > Chris <chris@bitmead.com> writes: > > Why not change that to a relnumindexes as well? Easier to maintain and > > more useful information. > > Maintaining an accurate count of descendants (or indexes for that > matter) would be expensive; in particular, it'd create severe > concurrency problems. If one transaction is in the middle of creating > or dropping a child C of table P, then all other transactions would be > blocked from creating or dropping any other children of P until the C > transaction commits or aborts. They'd have to wait or they wouldn't > know what to set relnumchildren to. > > For the purpose at hand, I think it would be OK to have a > "relhaschildren" field that is set true when the first child is created > and then never changed. If you have a table that once had children but > has none at the moment, then you pay the price of looking through > pg_inherits; but the case that we're really concerned about (a pure SQL, > no-inheritance table) would still win. > > Not sure whether we can concurrently create/delete indexes on a rel, > but I'd be inclined to leave relhasindexes alone: again its main > function in life is to let you short-circuit looking for indexes on > a table that's never had and never will have any. > WOuld it be possible to consider this a 'statistic' and let vacuum update it? In other words, creating an index (or subtable) sets relhasindex (relhaschild) but vacuum will set it to false if it finds no children or indexes. or would this run into concurrency problems as well? -- Mark Hollomon mhh@nortelnetworks.com ESN 451-9008 (302)454-9008
Chris wrote: > > Hannu Krosing wrote: > > > Maybe SELECT ** FROM BASE would be more flexible as it leaves the standard > > SQL with its "standard" meaning ? > > That was my first thought and it's definitely a possibility. My argument > against it is that SQL doesn't have a "standard meaning" in the case of > inheritance, and ** is an incompatibility with OQL. > > I suspect we need both. Something like > SET GET_INHERITED_COLUMNS true; etc. > > > We will have to change the API sometime not too distant anyway, the current > > api is unable to deal with anything that does not have a nice textual > > representation (like an image or sound) in spite of all the talks about > > easy extensibility - the extensibility is all in the backend, ther is no > > easy way to get new datatypes in/out. > > What about PQbinaryTuples() and friends? They don't help you at all when doing inserts and are by definition in native byte order on queries. Something like [ PREPARE query; BIND arguments ; EXEC ] which knows about binary formats would be needed here. One could use LOs except that the current ineffective implementation. ------------- Hannu
Mark Hollomon wrote: > > > [ discussion on changing the default to getting subclasses ] > > I object. Tell me why you object. Performance concerns? Compatibility? A SET might be a good idea, but to decide whether and also a default, it's good to know what the objections are. > > How about a set variable? > > SET GETSUBCLASSES = true > > With the '*' and ONLY being explicit overrides to the setting > of the variable. The default would be 'false'. I would not > object to a configuration switch that would change the > default. > -- > > Mark Hollomon > mhh@nortelnetworks.com > ESN 451-9008 (302)454-9008
Don Baccus wrote: > > > > >A good thought, but we still need one good front end that supports > >all the features. > > I wasn't think in terms of this being mutually exclusive with your > desires. Merely raising up the notion that the possibility exists > of creating a sandbox, so to speak, for people to play in, a tool > for the exploration of such concepts. So we would be returning to roots. The original Postgres was exactly that - a tool for the exploration of such concepts. > No, and I wasn't arguing that you shouldn't move forward, either. I > was just stating my personal opinion regarding the utility of simple > OO-ish features, that's all. Yes, it needs quite much discussion/design befor going forth, lest we will be in the next level of the current situation where some peoples usage of the current limited inheritance is an obstacle to moving forward to a more developed one. > >> One reason I raise the issue of possible multiple front-ends (or making > >> it easy for folks to make there own by making the parser->optimizer/backend > >> interface more general) is that this whole area would seem to be one > >> that begs for RESEARCH and experimentalism. > > > >No research is required. I simply want to implement the ODMG STANDARD > >for ODBMS databases on PostgreSQL. There are no great design issues > >here, just a matter of nailing down the details so that everyone can > >live with them. > > Well...that's sorta like saying no research into procedural language > design is needed 'cause now we've got C++. > > Whether or not the existing standard for ODBMS is the greatest thing > since sliced bread, I find it hard to believe that no research is > required or design issues raised by the fundamental problems of > database technology. > > Maybe I'm wrong, though, maybe the problem's been solved. > My wife has forbidden me to buy any sliced bread, because the slices are of wrong thickness. Hardly the situation can be any better in OODB design. The ODMG standard may be a good starting point for discussion, but one can't run any programs on a standard - one needs a real db. And IIRC the standard is only semi-public (not freely available/distributable). ------------------ Hannu
Don Baccus wrote: > >No research is required. I simply want to implement the ODMG STANDARD > >for ODBMS databases on PostgreSQL. There are no great design issues > >here, > >just a matter of nailing down the details so that everyone can live > >with them. > > Well...that's sorta like saying no research into procedural language > design is needed 'cause now we've got C++. > > Whether or not the existing standard for ODBMS is the greatest thing > since sliced bread, I find it hard to believe that no research is > required or design issues raised by the fundamental problems of > database technology. > > Maybe I'm wrong, though, maybe the problem's been solved. No research is required _for what I want to do_. (or if there is research required, I think I've just done it over the last 5 years :). i.e. I don't want to explore some new style database, only implement a current ODMG standard on postgresql. This style of database is fairly well understood now for good or bad. Once the RDBMS and ODBMS features exist in one database, maybe then research can be done to move forward. That's my opinion anyway.
Peter Eisentraut wrote: > A lot of people use inheritance to create "consistent schemas", that is, > they empty create base tables, such as "address" which are inherited by > tables such as customer, vendor, office, etc. This is a really bad idea. You could never have both a postal address AND a home address for example. I thought the original postgres supported this by having CREATE TABLE ADDRESS (...) CREATE TABLE PERSON(add ADDRESS). Anyway, this is what Oracle and others can do these days, and this is the right thing. > Anyway, an idea I had would be to reimplement inheritance based on joins, > since this is what the "pure relational" solution would be anyway. When I > create a table B that is based on A, all the system does is create the > table B as usual and store a note "I inherit from A". Any row you insert > into B also creates a row in A, and the row in B contains an oid pointer > to it. This is a really stu^H^H^H bad idea. I have hierarchies 5 levels deep with multiple inheritance, and I don't want to do a 10 way join just to retrieve an object. This is why RDBMS's performance sucks so incredibly badly on some applications. an ODBMS can perform 100x as fast in these cases just because of what you are proposing. > Thus a select on B performs a join on A.oid and B.row_in_A_pointer. > A select on A just returns all the rows in A, no extras needed. A delete > on B deletes the row in B and in A. A delete in A would cascade to B. Both > of this can be gotten for free with foreign keys. Adding a column to A > just adds the column to A, all other tables get the new column magically > and in the right order. Same with dropping columns, etc. > > In short, this approach solves all inheritance problems at once and does > so without adding any extra kludges besides the "I inherited from" field, > which is static, plus the necessary transformations necessary in the > parser. The drawback is of course that a select from an inherited table > would always incur a join, perhaps some optimizing could be done in this > direction. But the bottom line is that the compatibility issue looms big. > > -Peter > > On Thu, 3 Feb 2000, Chris Bitmead wrote: > > > Hi, > > > > I've been spending a lot of time lately with gdb and tracing the > > back-end seeing if I can understand it enough to make some changes. > > I'm starting to actually understand a lot of stuff, so in order > > to have some possibility of having my changes accepted, I want to > > discuss > > them here first. Based on that, I'm going to hopefully make an attempt > > at implementation. I have a patch for one of these changes already > > if I get the go ahead. > > > > THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. > > > > Speak now about these changes or please, forever hold your peace. Of > > course you can comment later if I screw up implementation. > > > > The proposed changes are.... > > > > 1) An imaginary field in every tuple that tells you the class it came > > from. > > This is useful when you select from table* and want to know which > > relation the object actually came from. It wouldn't be stored on disk, > > and like oid it wouldn't be displayed when you do SELECT *. The field > > would be called classname. So you could have... > > SELECT p.classname, p.name FROM person p; > > person | Fred > > student | Bill > > employee | Jim > > person | Chris > > > > If you want to know the exact behaviour it is as if every table in the > > database had done to it... > > ALTER TABLE foo ADD COLUMN classname TEXT; > > UPDATE foo SET classname='foo'; > > > > Of course this is not how it would be implemented. It is just > > reference for how it will appear to work. BTW, this idea was also > > in the original berkeley design notes. > > > > 2) Changing the sense of the default for getting inherited tuples. > > Currently you only get inherited tuples if you specify "tablename*". > > This would be changed so that you get all sub-class tuples too by > > default unless you specify "ONLY tablename". There are several > > rationale for this. Firstly this is what Illustra/Informix have > > implemented. Secondly, I believe it is more logical from an OO > > perspective as well as giving a more useful default. If a politician > > IS a person and I say SELECT * from person, then logically I should > > see all the politicians because they are people too (so they claim > > :). Thirdly, there are a whole range of SQL statements that should > > probably be disallowed without including sub-classes. e.g. an ALTER > > TABLE ADD COLUMN that does not include sub-classes is almost certainly > > undesirable. It seems ashame to have to resort to non-standard SQL > > with the "*" syntax in this case when it is really your only > > choice. Basicly, wanting ONLY a classname is a far more unusual > > choice, and leaving off the "*" is a common error. Fourthly, it seems > > out of character for the SQL language to have this single character > > operator. The SQL style is to use wordy descriptions of the operators > > meaning. "ONLY" fits well here because it describes its own meaning > > perfectly whereas to the unitiated, "*" is harder to guess at. While > > this change is an incompatibility I hope for those few people using > > inheritance they can accept the need to move forward without > > over-burden of backwards compatibility. > > > > 3) The ability to return different types of rows from a SELECT. This > > is to allow implementation of ODBMS functionality where a query could > > be required to instantiate objects of differing types with differing > > attributes. > > > > I would propose that that anytime you do a SELECT * from a base table > > that you would get back the full rows from those sub tables. Since the > > current PQ interface which doesn't support this notion would remain > > unchanged this wouldn't affect current users. > > > > It's probably also desirable to have a syntax for getting just the > > columns of the base table when this is desired. Say perhaps SELECT % > > from table. This would be a performance hack for users of libpq and a > > functionality difference for users of psql. > > > > The reason I think the "*" syntax should take on the new functionality > > is because it would be more consistent with what the OQL (object query > > language) standard specifies, and also because it seems the more > > useful default. Also there is no compatibility reason not to do it. > > > > In addition it would be legal to specify columns that only exist in > > sub-classes. For example, if we had > > > > CREATE TABLE person (name TEXT); > > CREATE TABLE student (studentid TEXT, faculty TEXT) INHERITS (person); > > > > it would be legal to say... > > > SELECT * FROM person; > > NAME > > ---- > > Fred > > Bill > > > > NAME | STUDENTID | FACULTY > > -------------------------- > > Jim | 23455 | Science > > Chris| 45666 | Arts > > > > > SELECT *, studentid FROM person; > > NAME > > ---- > > Fred > > Bill > > > > NAME | STUDENTID > > ---------------- > > Jim | 23455 > > Chris| 45666 > > > > > SELECT *, studentid FROM ONLY person; > > ERROR: person does not contain studentid. > > > > > SELECT % FROM person; > > NAME > > ---- > > Fred > > Bill > > Jim > > Chris > > > > As you can see, it is desirable that psql be modified to be able to > > print these differing tuple types. Presumably new column headings will > > be printed when a tuple is differing to the previous one. Likely it > > will be often desirable to do a > > SELECT * FROM person p ORDER BY p.classname; > > in order to have all the tuples of a particular type grouped together. > > > > In addition some extenions will be done to the PQ interface to support > > these differing return types. The current PQ interface will be left > > unchanged and backwards compatible for retrieving rows of a single > > type. > > > > Also there should be an settable option that specifies that "*" should > > also return the normally ignored columns of oid and classname. This is > > so that OO programs that embed SQL into them also get back the oid and > > classname which are required for the behind the scenes implementation > > of an ODMG client. Something like... > > > > SET SHOW_OID TRUE; > > SHOW_CLASSNAME TRUE; > > > > SELECT * FROM person; > > > > OID CLASSNAME NAME > > ------------------- > > 2344 person Fred > > 3445 person Bill > > > > OID CLASSNAME NAME | STUDENTID | FACULTY > > ----------------------------------------- > > 2355 student Jim | 23455 | Science > > 5655 student Chris| 45666 | Arts > > > > ************ > > > > > > -- > Peter Eisentraut Sernanders vaeg 10:115 > peter_e@gmx.net 75262 Uppsala > http://yi.org/peter-e/ Sweden
Taral wrote: > > On Thu, 3 Feb 2000, Tom Lane wrote: > > > Maintaining an accurate count of descendants (or indexes for that > > matter) would be expensive; in particular, it'd create severe > > concurrency problems. > > What about fixing these things on VACUUM then? It could produce wrong results to queries if the data is wrong.
Tom, I agree with most of what you say. If we want to have ** be the default syntax for getting sub-columns I can live with that (for suggestion (3)) But for (2), I do feel very strongly that getting sub-tuples should be the "default default", and a SET GETSUBCLASSES=true should be the default setting. I've been using the postgres inheritance for a real system and I can say with certainty that this is a massive source of errors. Not wanting sub-class tuples seems rarely needed, and leaving off the "*" is something that too often seems forgotten. I often can trawl through code and realise that some query is missing the "*" but it hasn't been discovered yet. In fact I find that almost all queries require the "*" when you have a proper OO model, and not using "*" is usually laziness. Also when adding a sub-class where there previously was none, one usually has to trawl through the queries and add "*" to all of them because as I said, there are almost never occasions where "*" is not required in real life OO models. So I understand the compatibility issue here, but I really feel strongly that this should be changed now before there really are a lot of people using it. Sure, have as many compatibility modes as you like, but I think this is a broken enough design that the default should be changed. Apparently Illustra/Informix agreed. Tom Lane wrote: > > "Mark Hollomon" <mhh@nortelnetworks.com> writes: > > How about a set variable? > > > SET GETSUBCLASSES = true > > > With the '*' and ONLY being explicit overrides to the setting > > of the variable. The default would be 'false'. > > I like that a lot. Clean, flexible, doesn't break any existing > applications. > > Perhaps the business of whether to fetch extra columns from subclasses > could be done similarly. I am beginning to understand why Chris wants > to do that, and I see that it would support a particular style of > database programming very nicely. But I really fail to see why it's > necessary to change the default behavior to cater to those apps rather > than existing ones. Let the new apps use a variant syntax; don't > expect people to change existing code in order to avoid getting tripped > up by a new feature. > > Note that "oh they won't see the extra columns if they're using an > old API" doesn't answer my objection. I'm concerned about the > performance hit from fetching those columns and transferring them to > the client, as well as the memory hit of storing them in query results > on the client side. We should *not* set things up in such a way that > that happens by default when the client didn't ask for it and isn't > even using an API that can support it. That's why it'd be a mistake > to redefine the existing query syntax to act this way. > > The suggestion of "SELECT ** FROM ..." sounds pretty good to me, > actually. I don't really see any need for changing the behavior of > anything that looks like a standard SQL query. Applications that > need this feature will know that they need it and can issue a query > that specifically requests it. > > > I would not object to a configuration switch that would change the > > default. > > Mmm, I think that would probably not be such a hot idea. That would > introduce a pretty fundamental semantics incompatibility between > different installations, which would hurt script portability, complicate > debugging and support, yadda yadda. I think a SET variable is enough... > > regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian > > > Bruce Momjian wrote: > > > > > > Well I see that pg_class has columns like "relhasindex". If > we added a > > > > "relhassubclass", the overhead should be unmeasureable. > > > > > > Yes, but how do you keep that accurate? If I add indexes, then drop > > > them, does relhasindex go to false. > > > > I don't know. Does it? > > Let me add that to the TODO list. > > > > > >Could you do that for relhassubclass? > > > > If we made it relnumsubclasses and incremented/decremented on > > CREATE/DROP, it seems easy in theory. > > Yes, that would work. Seems hasindex has problems. > This posting may be off the point,sorry. Isn't relhasindex a kind of item that we can live without it ? I proposed to change the use of this item in [[HACKERS] Index recreation in vacuum]. Though I have heard no clear objection, I want to confirm again. My proposal is as follows. 1) DDL commands don't rely on relhasindex. 2) DML commands don't take indexes into account if relhasindex is set to false. 3) REINDEX command and vacuum with REINDEX option sets this flag to false at the beginning and sets it to true when recreationof all indexes completed. Comments ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
On Fri, Feb 04, 2000 at 10:55:39AM +1100, Chris Bitmead wrote: <snipped it all!> Just wanted to chime in on this thread with the sugestion that Chris clearly has been thinking about this a lot, and has some strong opinions about the 'right way to do things'. How about an offical, postgresql.org hosted, CVS branch for ORDBMS development? Let Chris and whomever is interested take a crack at doing it however they want, and _prove_ that the performance is as good, or much better, and is compatible, etc. Clearly, details of implementation can be discussed to death, until Chris gets fed up and goes away: not good. So, what do the core developers think? Sound feasable? As to problems of keeping in sync with HEAD, etc., that'd be up to Chris and his crew. Does postgresql.org have the extra 20-30 MB of disk? Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > > I'll comment on the other issues later ... but I will say that I don't > think it's acceptable to add *any* overhead to standard-SQL queries > in order to support inheritance better. The vast majority of our users > want SQL performance and don't give a damn about inheritance. We have > to pay attention to that. > Well said ! Actually I'm a little bit uncertain what ORDBMS really improves ? After writing a full mapper and wrapper for PostgreSQL and a Smalltalk dialect I see really no usage for these additional inheritance features databases like PostgreSQL offer. Some points about this: - all these additional features are very specific to PostgreSQL and are not compatible with other databases. Writing anapplication based on these features results in non-portable systems. - Speed is still a very, very important featurefor a database. A single query, which uses about 5 seconds because the optimizer is not very clever to use severalindices to improove the query execution is much more worse and can change the structure of the whole applicationprogram. - when creating automatic sql-queries through a mapper one can get very complicated sql queries which tests the parservery hard and the limits of PostgreSQL has been seen very quickly during the development of the wrapper above. What I'm missing from these new database are structural changes to the query system: the possibility to execute complicated concatenated queries on the server .. perhaps with different parameters. Just some ideas about all these nice features Marten
Marten Feldtmann wrote: > Actually I'm a little bit uncertain what ORDBMS really improves ? After > writing a full mapper and wrapper for PostgreSQL and a Smalltalk dialect > I see really no usage for these additional inheritance features databases > like PostgreSQL offer. > > Some points about this: > > - all these additional features are very specific to PostgreSQL and > are not compatible with other databases. Writing an application > based on these features results in non-portable systems. Not true, because if the wrapper conforms to the ODMG standard, it will be compatible with ObjectStore, Versant, the new Sun RDBS standard, Gemstone, and many others. > - Speed is still a very, very important feature for a database. A > single query, which uses about 5 seconds because the optimizer > is not very clever to use several indices to improove the > query execution is much more worse and can change the structure > of the whole application program. The biggest thing you can do for speed is to have less objects/tuples in the database. Inheritance and the array feature of postgresql can improve things here by orders of magnitude. The problem is that these two features are not viable to use at present. With an ODMG interface, and TOAST to allow tuples of unlimited size this will then be a viable feature. In some situations this will improve queries by 100x even with the most brain-dead optimizer. ODBMS doesn't care a great deal about wonderful optimizers because joins are less necessary. > - when creating automatic sql-queries through a mapper one can get > very complicated sql queries which tests the parser very hard and > the limits of PostgreSQL has been seen very quickly during > the development of the wrapper above. Exactly, so stop mapping things and creating complicated joins. ODBMSes do not do ANY joins to re-create objects. That's why mappers suck so hard. > What I'm missing from these new database are structural changes to > the query system: the possibility to execute complicated > concatenated queries on the server .. perhaps with different > parameters. What is a concatenated query? I'm all in favour of more powerful queries, but that is not what this proposal is about. This is about AVOIDING queries. Mappers and so forth are great query generators because the database representation is different from the in-memory object representation. This proposal is all about making the in-memory object representation the same as in the database. If you still don't get it take an example.. class CarPart {int volume; } class Wheel : CarPart {int diameter; } class SteeringWheel : Wheel { boolean horn; } class RoadWheel : Wheel { int airpressure; } class Car { List<CarPart> parts; } Now with an ODBMS, a Car with 4 wheels and a steering wheel we'll have 6 objects in the database - 1 Car, 4 RoadWheels and 1 SteeringWheel. With a relational mapper, depending on how you map it you'll have 21 objects - 5 CarPart objects, 5 wheel objects, 4 road wheel, 1 steering wheel, 1 car and 5 car_carpart relation entities. And when you join it all together you'll have to join against 6 tables instead of 3.
<I trimmed the CC list a bit> Chris Bitmead wrote: > > Mark Hollomon wrote: > > > > > [ discussion on changing the default to getting subclasses ] > > > > I object. > > Tell me why you object. Performance concerns? Compatibility? Definitely compatibility. The load I see (200 - 300 queries a DAY) isn't enough for me to be concerned about an extra millisecond or two per query. But I certainly understand others concerns in this area. One of my responsibilities at work is the maintenance of a homegrown document indexing and retrieval system. It is about 100K of Perl that calls into a custom Perl wrapper around libpq. The system is an escaped 'proof-of-concept'. I wrote it using inheritance features of Postgres95. The upshot is, that this proposed change would require me to examine almost every line of this system in order to make sure that I put ONLY in just the right spots. Yes, this would be where ever there _isn't_ a '*', but how do I grep for the lack of a asterisk? Since it is a "prototype", The code feels very free to pass around small snippets of SQL, a disembodied FROM clause, a portion of a VALUES clause. I simply would not be allowed the time to do the rewrite necessary to accomodate this change. And if I _did_ have the time, I would probably rewrite it for Oracle because then DB Admin would be someone _else's_ job. Now, one of the days, I will find a good excuse (eg new feature) to do a complete rewrite. And _then_ your proposal will actually be a help. And that is why I suggest a SET variable. When I'm ready to use the new feature, I can. But no work is necessary until that day arrives. Thanks for listening. -- Mark Hollomon mhh@nortelnetworks.com ESN 451-9008 (302)454-9008
At 02:42 PM 2/4/00 +1100, Chris Bitmead wrote: >Not true, because if the wrapper conforms to the ODMG standard, it will >be compatible with ObjectStore, Versant, the new Sun RDBS standard, >Gemstone, and many others. Without prejudice, I'd be interested in some order-of-magnitude market share for these technologies vs., say, Oracle. >The biggest thing you can do for speed is to have less objects/tuples >in the database. Inheritance and the array feature of postgresql >can improve things here by orders of magnitude. There's no doubt of this, for applications that can make use of the paradigms. >The problem is that >these >two features are not viable to use at present. With an ODMG interface, >and TOAST to allow tuples of unlimited size this will then be a viable >feature. In some situations this will improve queries by 100x even >with the most brain-dead optimizer. ODBMS doesn't care a great deal >about wonderful optimizers because joins are less necessary. And this last statement I really have to wonder about. For restricted application spaces, yeah, no doubt. But in general, no way. >Exactly, so stop mapping things and creating complicated joins. ODBMSes >do not do ANY joins to re-create objects. That's why mappers suck so >hard. If they don't do joins, then presumably they map many-to-one relations by copying data into each of the "many" table rows. TANSTAAFL, no? Though this strategy is a very viable one in today's big-memory, big-disk environment. It's not clear to me that a extremely smart RDBMS system couldn't decide to add redundancy itself and gain much of the efficiency, but, heck, that's just my weak, uncreative compiler-writer mind at work again. (and clearly, of course, PG isn't on any threshold of doing it, I'm thinking in theoretical space here). >Now with an ODBMS, a Car with 4 wheels and a steering wheel we'll have 6 >objects in the database - 1 Car, 4 RoadWheels and 1 SteeringWheel. With >a relational mapper, depending on how you map it you'll have 21 objects >- 5 CarPart objects, 5 wheel objects, 4 road wheel, 1 steering wheel, 1 >car and 5 car_carpart relation entities. And when you join it all >together you'll have to join against 6 tables instead of 3. Not really. You'd probably denormalize and not worry about it, in practice. Would the result be as beautiful? I don't know - do most car designers think that SteeringMechanism and PavementInterface are the same? It's true for a variety of reasons in today's cars that aren't actually related, and high-end race cars are exploring joystick control. So one could claim that your hierarchy is merely limiting creative expression... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
> Peter Eisentraut wrote: > > > This is a really stu^H^H^H bad idea. I have hierarchies 5 levels deep > with > multiple inheritance, and I > don't want to do a 10 way join just to retrieve an object. > > This is why RDBMS's performance sucks so incredibly badly on some > applications. > an ODBMS can perform 100x as fast in these cases just because of what > you > are proposing. > Hmm, and yes one may find problems where the pure relational system is 100x faster than your ODBMS. After doing a project with VERSANT and VisualWorks (election projection system for the first television sender here in Germany) I like the idea of OODBMS, but I've also noticed, that they are not the solution to all problems. Clever database desing leeds to good performance on both systems, but one should consider, that the designs of the database layout will be different. There are cases, where a pure relational system is very fast and an ODBMS never get it, but there are the examples you mentioned. Joins per se are not that bad .. it depends on when and how they are used and how good the analyzer of the database is and how good he uses the indices to get the job done. One very good point is the query language of the rdbms systems. On the odbms side no standard is really available, which can be seen as the sql of the odbms. Marten
Chris Bitmead wrote: > > Tom, I agree with most of what you say. If we want to have ** be the > default > syntax for getting sub-columns I can live with that (for suggestion (3)) > > But for (2), I do feel very strongly that getting sub-tuples should be > the > "default default", and a SET GETSUBCLASSES=true should be the default > setting. Then maybe we need a way to "break off" inheritance, i.e. make the inherited table independent but retain the columns as they are at the time of breakage. At least it could be given as an option in pg_dump. (--dump_flat_creates or smth.) > I've been using the postgres inheritance for a real system and I can > say with certainty that this is a massive source of errors. Not > wanting sub-class tuples seems rarely needed, and leaving off the "*" is > something that too often seems forgotten. I often can trawl through > code and realise that some query is missing the "*" but it hasn't been > discovered yet. In fact I find that almost all queries require the "*" > when you have a proper OO model, and not using "*" is usually laziness. True. I also think that people who used inheritance as a create table shortcut can most easily ensure compatibility by dumping their not-really-inherited tables as independent. They will have to dump-relaod anyway. > Also when adding a sub-class where there previously was none, one > usually has to trawl through the queries and add "*" to all of them > because as I said, there are almost never occasions where "*" is not > required in real life OO models. > > So I understand the compatibility issue here, but I really feel strongly > that this should be changed now before there really are a lot of people > using it. Sure, have as many compatibility modes as you like, but I > think > this is a broken enough design that the default should be changed. > Apparently Illustra/Informix agreed. And they are probably the only external DB we can aim to be compatible with, or what does SQL3 say? ---------------------- Hannu
Don Baccus wrote: > Without prejudice, I'd be interested in some order-of-magnitude > market share for these technologies vs., say, Oracle. Would you be interested in the market share of Win98 compared to Linux? Nobody uses an ODBMS if they can get it to work with Oracle. They go to an ODBMS when they realise that's the only way they can get it to work. However, as I said, Sun is defining for Java a standard interface for RDBMS which is exactly the same as ODMG. So expect a lot of people using Oracle to be writing code that ports to an ODBMS. Maybe when they realise they can slot a real ODBMS under their app and greatly increase performance, it might be good for the ODBMS market. > There's no doubt of this, for applications that can > make use of the paradigms. To my mind that is like saying OO is useful for programs that can make use of the paradigms. In fact I think nearly all programs can make use of OO. > And this last statement I really have to wonder > about. For restricted > application spaces, yeah, no doubt. But in general, > no way. It's only when you need a great deal of ad-hoc queries that you really need a RDBMS. But a very great proportion of apps have only very specific querying needs, and an ODBMS can do those queries MUCH faster. And if postgresql has *both*, then it should be the best of both worlds. I'm not going to go around claiming RDBMS is obsolete, but I do know that ODBMS is much more convenient to use for programming. Once you've done your app and you want to spew off a few reports, that's when you wish you had RDBMS. > >Exactly, so stop mapping things and creating complicated joins. ODBMSes > >do not do ANY joins to re-create objects. That's why mappers suck so > >hard. > > If they don't do joins, then presumably they map many-to-one relations > by copying data into each of the "many" table rows. TANSTAAFL, no? ^^^ ? They have a similar layout on disk to what you might have in memory. So if you store a 1:M in memory as an array of pointers, that's how you might do it on disk too. > Though this strategy is a very viable one in today's > big-memory, big-disk > environment. It's not clear to me that a extremely > smart RDBMS system > couldn't decide to add redundancy itself and gain > much of the efficiency, > but, heck, that's just my weak, uncreative > compiler-writer mind at work again. Do you mean an RDBMS might try and be smart and store it the same way? Well if it did that, we might call it an ODBMS. But the other main benefit of an ODBMS is that retrieving records for many cases (non-ad-hoc) is very simple to program for because you don't have to map say a join table into say a C++ List<type>. In other words it's not just the performance of ODBMS that is good, but also the interface. Also if an RDBMS maps an object to a table and then maps it back to an array on disk, well you've done an unnecessary conversion. > >Now with an ODBMS, a Car with 4 wheels and a steering wheel we'll have 6 > >objects in the database - 1 Car, 4 RoadWheels and 1 SteeringWheel. With > >a relational mapper, depending on how you map it you'll have 21 objects > >- 5 CarPart objects, 5 wheel objects, 4 road wheel, 1 steering wheel, 1 > >car and 5 car_carpart relation entities. And when you join it all > >together you'll have to join against 6 tables instead of 3. > > Not really. You'd probably denormalize and not > worry about it, in practice. Then what happens to your RDBMSes wonderful ad-hoc query facility if you de-normalise? Will you have to do a UNION with about 5000 clauses to retrieve the volume and price of each type of car part? > Would the result be as beautiful? I don't know - do > most car designers > think that SteeringMechanism and PavementInterface >are the same? It's > true for a variety of reasons in today's cars that > aren't actually > related, and high-end race cars are exploring > joystick control. > > So one could claim that your hierarchy is merely > limiting creative expression... My hierarchy? The point is that you can _have_ a hierarchy. It's well accepted that OO hierarchies are good. The good thing here is being able to directly store it in the database.
Marten Feldtmann wrote: > Hmm, and yes one may find problems where the pure > relational system is 100x faster than your ODBMS. > > After doing a project with VERSANT and VisualWorks > (election projection system for the first television > sender here in Germany) I like the idea of OODBMS, > but I've also noticed, that they are not the > solution to all problems. Give me a clear application spec and VERSANT, and I will ALWAYS flog Oracle into the dust. But... Where SQL comes into it's own is _conveniently_ doing queries that I never thought of when I first designed my app. Of course many ODBMSes have SQL or similar too. > Joins per se are not that bad .. it depends on when > and how they are used and how good the analyzer of > the database is and how good he uses the indices to > get the job done. Take the simple SUPPLIER, PART and SUPPLIER_PART situation. The very fact that you've got an extra table here means you've got to touch many more disk pages and transfer more data. An RDBMS just can't win when the ODBMS data model is designed right. > One very good point is the query language of the > rdbms systems. On the odbms side no standard is > really available, which can be seen as the sql of > the odbms. There is a standard called OQL which is very similar to SQL. It's just rather poorly supported. -- Chris Bitmead mailto:chris@bitmead.com
At 12:41 PM 2/5/00 +1100, Chris wrote: >Don Baccus wrote: > >> Without prejudice, I'd be interested in some order-of-magnitude >> market share for these technologies vs., say, Oracle. > >Would you be interested in the market share of Win98 compared to Linux? Postgres isn't in competition with either of those software products. It is probably worth pointing out that at least some of the folks in the Linux community would like to derail Win98 to some degree. And I, at least, would love to see Postgres derail Oracle to some degree. ... >> There's no doubt of this, for applications that can > make use of the paradigms. > >To my mind that is like saying OO is useful for programs that can make >use of the paradigms. In fact I think nearly all programs can make use >of OO. This really isn't the place for a religious fight. Personally, I believe the OO paradigm is well-suited to the decomposition of some problems, not particularly well-suited to others. I've only been a professional software engineer for 29 years, though, so I don't pretend to have all the answers. I'd humbly suggest that OO methodologists don't, either. But, that's just my opinion. Far more important to me is that SQL queries not suffer performance hits as a result of whatever changes to OO support make it into the standard version of PG. Let's just leave it at that, OK? - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Chris wrote: > > > One very good point is the query language of the > > rdbms systems. On the odbms side no standard is > > really available, which can be seen as the sql of > > the odbms. > > There is a standard called OQL which is very similar to SQL. It's just > rather poorly supported. > I think the operative word here is "available". I know that SQL specs are'nt freely available either, but due to SQL being already widely supported one can get the general idea from many freely available sources, like the bunch of freely downloadable DB's currently available for linux. Most of them have some docs included. It is still quite a job to reconstruct SQL92 from them ;) I know now description (except a BNF syntax available from some ODBMS website) that I could use to get some idea about OQL. ---------------------- Hannu
Hannu Krosing wrote: > > > It is still quite a job to reconstruct SQL92 from them ;) > > I know now description (except a BNF syntax available from some ODBMS website) SHould be "I know no description ..." > that I could use to get some idea about OQL. > > ---------------------- > Hannu > > ************
Hannu Krosing wrote: > I think the operative word here is "available". I know that SQL specs > are'nt freely available either, but due to SQL being already widely > supported one can get the general idea from many freely available sources, > like the bunch of freely downloadable DB's currently available for linux. > Most of them have some docs included. > > It is still quite a job to reconstruct SQL92 from them ;) > > I know now description (except a BNF syntax available from some ODBMS website) > that I could use to get some idea about OQL. Poet at http://www.poet.com have their doco online including OQL. -- Chris Bitmead mailto:chris@bitmead.com
Chris wrote: > > Hannu Krosing wrote: > > > I think the operative word here is "available". I know that SQL specs > > are'nt freely available either, but due to SQL being already widely > > supported one can get the general idea from many freely available sources, > > like the bunch of freely downloadable DB's currently available for linux. > > Most of them have some docs included. > > > > It is still quite a job to reconstruct SQL92 from them ;) > > > > I know now description (except a BNF syntax available from some ODBMS website) > > that I could use to get some idea about OQL. > > Poet at http://www.poet.com have their doco online including OQL. > Thanks, I'll check that. Btw, has anyone compared PostgreSQL's object features with SQL3 (draft) features. For example they seem to use UNDER instead of INHERITS and no parentheses. They also have a special privilege also called UNDER for being able to define a subtype (as they call it). A lot of other features seem to be considered too - no wonder it is 2.3 MB text file. available at: ftp://ftp.digital.com/pub/standards/sql/ or http://gatekeeper.dec.com/pub/standards/sql/ the main file is sql-foundation-aug94.txt ------------------------ Hannu
1) free is freedom, not free bear ;-) also, there are some sites has oql draft -- like sql draft. sorry, can not remember. 2) good books, like " C++ object databases" (David Jordan) has a lot material. 3) a lot of OODBM evaluation copy there. Not totally complied, but together with 1 and 2, still can see what is going on. so, no excuse for not knowing oodbm/oql :-) -- I'm waiting for trying them on pg . On Sat, 5 Feb 2000, Hannu Krosing wrote: > Chris wrote: > > > > > One very good point is the query language of the > > > rdbms systems. On the odbms side no standard is > > > really available, which can be seen as the sql of > > > the odbms. > > > > There is a standard called OQL which is very similar to SQL. It's just > > rather poorly supported. > > > > I think the operative word here is "available". I know that SQL specs > are'nt freely available either, but due to SQL being already widely > supported one can get the general idea from many freely available sources, > like the bunch of freely downloadable DB's currently available for linux. > Most of them have some docs included. > > It is still quite a job to reconstruct SQL92 from them ;) > > I know now description (except a BNF syntax available from some ODBMS website) > that I could use to get some idea about OQL. > > ---------------------- > Hannu > > ************ >
> 2) good books, like " C++ object databases" (David Jordan) has > a lot material. As an example: Cattel, "The Object Database Standard ODMG 2.0" Morgan Kaufmann, ISBN 1 - 55860 - 463 -4 Marten Feldtmann