Thread: Proposed Changes to PostgreSQL
Hi, I've been spending a lot of time lately with gdb and tracing the back-end seeing if I can understand it enough to make some changes. I'm starting to actually understand a lot of stuff, so in order to have some possibility of having my changes accepted, I want to discuss them here first. Based on that, I'm going to hopefully make an attempt at implementation. I have a patch for one of these changes already if I get the go ahead. THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. Speak now about these changes or please, forever hold your peace. Of course you can comment later if I screw up implementation. The proposed changes are.... 1) An imaginary field in every tuple that tells you the class it came from. This is useful when you select from table* and want to know which relation the object actually came from. It wouldn't be stored on disk, and like oid it wouldn't be displayed when you do SELECT *. The field would be called classname. So you could have... SELECT p.classname, p.name FROM person p; person | Fred student | Bill employee | Jim person | Chris If you want to know the exact behaviour it is as if every table in the database had done to it... ALTER TABLE foo ADD COLUMN classname TEXT; UPDATE foo SET classname='foo'; Of course this is not how it would be implemented. It is just reference for how it will appear to work. BTW, this idea was also in the original berkeley design notes. 2) Changing the sense of the default for getting inherited tuples. Currently you only get inherited tuples if you specify "tablename*". This would be changed so that you get all sub-class tuples too by default unless you specify "ONLY tablename". There are several rationale for this. Firstly this is what Illustra/Informix have implemented. Secondly, I believe it is more logical from an OO perspective as well as giving a more useful default. If a politician IS a person and I say SELECT * from person, then logically I should see all the politicians because they are people too (so they claim :). Thirdly, there are a whole range of SQL statements that should probably be disallowed without including sub-classes. e.g. an ALTER TABLE ADD COLUMN that does not include sub-classes is almost certainly undesirable. It seems ashame to have to resort to non-standard SQL with the "*" syntax in this case when it is really your only choice. Basicly, wanting ONLY a classname is a far more unusual choice, and leaving off the "*" is a common error. Fourthly, it seems out of character for the SQL language to have this single character operator. The SQL style is to use wordy descriptions of the operators meaning. "ONLY" fits well here because it describes its own meaning perfectly whereas to the unitiated, "*" is harder to guess at. While this change is an incompatibility I hope for those few people using inheritance they can accept the need to move forward without over-burden of backwards compatibility. 3) The ability to return different types of rows from a SELECT. This is to allow implementation of ODBMS functionality where a query could be required to instantiate objects of differing types with differing attributes. I would propose that that anytime you do a SELECT * from a base table that you would get back the full rows from those sub tables. Since the current PQ interface which doesn't support this notion would remain unchanged this wouldn't affect current users. It's probably also desirable to have a syntax for getting just the columns of the base table when this is desired. Say perhaps SELECT % from table. This would be a performance hack for users of libpq and a functionality difference for users of psql. The reason I think the "*" syntax should take on the new functionality is because it would be more consistent with what the OQL (object query language) standard specifies, and also because it seems the more useful default. Also there is no compatibility reason not to do it. In addition it would be legal to specify columns that only exist in sub-classes. For example, if we had CREATE TABLE person (name TEXT); CREATE TABLE student (studentid TEXT, faculty TEXT) INHERITS (person); it would be legal to say... > SELECT * FROM person; NAME ---- Fred Bill NAME | STUDENTID | FACULTY -------------------------- Jim | 23455 | Science Chris| 45666 | Arts > SELECT *, studentid FROM person; NAME ---- Fred Bill NAME | STUDENTID ---------------- Jim | 23455 Chris| 45666 > SELECT *, studentid FROM ONLY person; ERROR: person does not contain studentid. > SELECT % FROM person; NAME ---- Fred Bill Jim Chris As you can see, it is desirable that psql be modified to be able to print these differing tuple types. Presumably new column headings will be printed when a tuple is differing to the previous one. Likely it will be often desirable to do a SELECT * FROM person p ORDER BY p.classname; in order to have all the tuples of a particular type grouped together. In addition some extenions will be done to the PQ interface to support these differing return types. The current PQ interface will be left unchanged and backwards compatible for retrieving rows of a single type. Also there should be an settable option that specifies that "*" should also return the normally ignored columns of oid and classname. This is so that OO programs that embed SQL into them also get back the oid and classname which are required for the behind the scenes implementation of an ODMG client. Something like... SET SHOW_OID TRUE; SHOW_CLASSNAME TRUE; SELECT * FROM person; OID CLASSNAME NAME ------------------- 2344 person Fred 3445 person Bill OID CLASSNAME NAME | STUDENTID | FACULTY ----------------------------------------- 2355 student Jim | 23455 | Science 5655 student Chris| 45666 | Arts
> THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. > > Speak now about these changes or please, forever hold your peace. Of > course you can comment later if I screw up implementation. > > The proposed changes are.... > > 1) An imaginary field in every tuple that tells you the class it came > from. > This is useful when you select from table* and want to know which > relation the object actually came from. It wouldn't be stored on disk, > and like oid it wouldn't be displayed when you do SELECT *. The field > would be called classname. So you could have... > SELECT p.classname, p.name FROM person p; > person | Fred > student | Bill > employee | Jim > person | Chris So the field is created on the fly to show what table it came from. Seems like a good idea, though implementing another usually-invisible column will be tough. However, because it is not really a column like the oid is a column, it should be ok. Of course, internally it is relid. > 2) Changing the sense of the default for getting inherited tuples. > Currently you only get inherited tuples if you specify "tablename*". > This would be changed so that you get all sub-class tuples too by > default unless you specify "ONLY tablename". There are several > rationale for this. Firstly this is what Illustra/Informix have > implemented. Secondly, I believe it is more logical from an OO > perspective as well as giving a more useful default. If a politician > IS a person and I say SELECT * from person, then logically I should > see all the politicians because they are people too (so they claim > :). Thirdly, there are a whole range of SQL statements that should > probably be disallowed without including sub-classes. e.g. an ALTER > TABLE ADD COLUMN that does not include sub-classes is almost certainly > undesirable. It seems ashame to have to resort to non-standard SQL > with the "*" syntax in this case when it is really your only > choice. Basicly, wanting ONLY a classname is a far more unusual > choice, and leaving off the "*" is a common error. Fourthly, it seems > out of character for the SQL language to have this single character > operator. The SQL style is to use wordy descriptions of the operators > meaning. "ONLY" fits well here because it describes its own meaning > perfectly whereas to the unitiated, "*" is harder to guess at. While > this change is an incompatibility I hope for those few people using > inheritance they can accept the need to move forward without > over-burden of backwards compatibility. Sounds fine to me. Just realize you are taking on a long-overdue but big job here. > > 3) The ability to return different types of rows from a SELECT. This > is to allow implementation of ODBMS functionality where a query could > be required to instantiate objects of differing types with differing > attributes. This bothers me. We return relational data, showing the same number of columns and types for every query. I don't think we want to change that, even for OO. How are you going to return that info the the client side? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > So the field is created on the fly to show what table it came from. > Seems like a good idea, though implementing another usually-invisible > column will be tough. What problems do you forsee? > However, because it is not really a column like > the oid is a column, it should be ok. Of course, internally it is > relid. > > > 2) Changing the sense of the default for getting inherited tuples. > > Currently you only get inherited tuples if you specify "tablename*". > > Sounds fine to me. Just realize you are taking on a long-overdue but > big job here. I already have a patch for this one. The change is a few pretty simple changes to gram.y. > > 3) The ability to return different types of rows from a SELECT. This > > is to allow implementation of ODBMS functionality where a query could > > be required to instantiate objects of differing types with differing > > attributes. > > This bothers me. We return relational data, showing the same number of > columns and types for every query. I don't think we want to change > that, even for OO. What aspects bother you? This is the fundamental important thing about object databases. It's also something that I'm always wanting to do when generating web pages. I have web links like http://foo.com/page?id=123. I want to retrieve the webpage object (which is an inheritance hierarchy) of id=123 which may represent a web page of different types. Then process appropriately for different objects. i.e. typical OO polymorphism. > How are you going to return that info the the client side? Well the backend <-> frontend protocol that used to be able to return tuples of different types would be put back in. Also the berkerly postgres docs had other scenarios where different tuples could be returned. One is you could have a field of type postquel called say EMP.hobbies which had a value of "retrieve HOBBIES.all where...", and then "retrieve EMP.hobbies would return tuples of different types of hobbies.
> Bruce Momjian wrote: > > > So the field is created on the fly to show what table it came from. > > Seems like a good idea, though implementing another usually-invisible > > column will be tough. > > What problems do you forsee? Well, it is usually pretty strange to carry around a column that doesn't exist through all the code and finally contruct it at the end. I would suspect something in the rewrite system could do that pretty easily, though. That is the direction I would go with that. > > > However, because it is not really a column like > > the oid is a column, it should be ok. Of course, internally it is > > relid. > > > > > 2) Changing the sense of the default for getting inherited tuples. > > > Currently you only get inherited tuples if you specify "tablename*". > > > > Sounds fine to me. Just realize you are taking on a long-overdue but > > big job here. > > I already have a patch for this one. The change is a few pretty simple > changes > to gram.y. OK, you will have to canvas the general list to make sure this does not break things for people, though our inheritance system needs an overhaul badly. > > > > 3) The ability to return different types of rows from a SELECT. This > > > is to allow implementation of ODBMS functionality where a query could > > > be required to instantiate objects of differing types with differing > > > attributes. > > > > This bothers me. We return relational data, showing the same number of > > columns and types for every query. I don't think we want to change > > that, even for OO. > > What aspects bother you? This is the fundamental important thing about > object databases. I fear it is totally against the way our API works. How does someone see how many columns in the returned row? > > How are you going to return that info the the client side? > > Well the backend <-> frontend protocol that used to be able to return > tuples of different types would be put back in. > > Also the berkerly postgres docs had other scenarios where different > tuples > could be returned. One is you could have a field of type postquel called > say > EMP.hobbies which had a value of "retrieve HOBBIES.all where...", and > then "retrieve > EMP.hobbies would return tuples of different types of hobbies. Yikes. Strange. Can we just return nulls for the empty fields? How many new API calls are required? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > I already have a patch for this one. The change is a few pretty simple > > changes > > to gram.y. > > OK, you will have to canvas the general list to make sure this does not > break things for people, though our inheritance system needs an overhaul > badly. This is already CCed to the general list. > I fear it is totally against the way our API works. How does someone > see how many columns in the returned row? A new API PQnfieldsv(PQresult, tupnum) or some such. > Yikes. Strange. Strange for C code perhaps. Very useful for constructing real objects in OO application code framework. > Can we just return nulls for the empty fields? Well, I think we should probably distinguish between a field that is null, and a field that simply doesn't exist. > How many new API calls are required? Perhaps just the one. (above).
I can live with this. Thanks. > Bruce Momjian wrote: > > > > I already have a patch for this one. The change is a few pretty simple > > > changes > > > to gram.y. > > > > OK, you will have to canvas the general list to make sure this does not > > break things for people, though our inheritance system needs an overhaul > > badly. > > This is already CCed to the general list. > > > I fear it is totally against the way our API works. How does someone > > see how many columns in the returned row? > > A new API PQnfieldsv(PQresult, tupnum) or some such. > > > Yikes. Strange. > > Strange for C code perhaps. Very useful for constructing real objects in > OO application code framework. > > > Can we just return nulls for the empty fields? > > Well, I think we should probably distinguish between a field that is > null, > and a field that simply doesn't exist. > > > How many new API calls are required? > > Perhaps just the one. (above). > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, Feb 02, 2000 at 09:57:48PM -0500, Bruce Momjian allegedly wrote: > > > > 3) The ability to return different types of rows from a SELECT. This > > > > is to allow implementation of ODBMS functionality where a query could > > > > be required to instantiate objects of differing types with differing > > > > attributes. > > > > > > This bothers me. We return relational data, showing the same number of > > > columns and types for every query. I don't think we want to change > > > that, even for OO. > > > > What aspects bother you? This is the fundamental important thing about > > object databases. > > I fear it is totally against the way our API works. How does someone > see how many columns in the returned row? This would probably break applications written in PHP and Perl (and possibly others) that have their queryresults returned to them in a numerically indexed array (index by offset). If this behaviour could be turned off, than it shouldn't be a problem. Mathijs
Mathijs Brands wrote: > > On Wed, Feb 02, 2000 at 09:57:48PM -0500, Bruce Momjian allegedly wrote: > > > > > 3) The ability to return different types of rows from a SELECT. This > > > > > is to allow implementation of ODBMS functionality where a query could > > > > > be required to instantiate objects of differing types with differing > > > > > attributes. > > > > > > > > This bothers me. We return relational data, showing the same number of > > > > columns and types for every query. I don't think we want to change > > > > that, even for OO. > > > > > > What aspects bother you? This is the fundamental important thing about > > > object databases. > > > > I fear it is totally against the way our API works. How does someone > > see how many columns in the returned row? > > This would probably break applications written in PHP and Perl (and > possibly others) that have their queryresults returned to them in a > numerically indexed array (index by offset). If this behaviour could > be turned off, than it shouldn't be a problem. It wouldn't affect them because the current APIs would continue to return the same base-level columns. You would only get access to the extra columns with a new API.
Bruce Momjian wrote: > > > Bruce Momjian wrote: > > > > > So the field is created on the fly to show what table it came from. > > > Seems like a good idea, though implementing another usually-invisible > > > column will be tough. > > > > What problems do you forsee? > > Well, it is usually pretty strange to carry around a column that doesn't > exist through all the code and finally contruct it at the end. I would > suspect something in the rewrite system could do that pretty easily, > though. That is the direction I would go with that. > Oracle has a ROWNR (IIRC) pseudo-column that is added in th every end of query and is a convienient way to put numbers on report rows (among other things). ------------ Hannu
Bruce Momjian wrote: > > > Yikes. Strange. Can we just return nulls for the empty fields? I think more natural way would be to define a new type (NAF - NotAFiled), like we have NAN for floats (do we ?, at least IEEE has) ----------------- Hannu
Hello Chris, Once, Thursday, February 03, 2000, 6:30:26 AM, you wrote: CB> 1) An imaginary field in every tuple that tells you the class it came CB> from. CB> This is useful when you select from table* and want to know which CB> relation the object actually came from. It wouldn't be stored on disk, CB> and like oid it wouldn't be displayed when you do SELECT *. The field CB> would be called classname. So you could have... CB> SELECT p.classname, p.name FROM person p; CB> person | Fred CB> student | Bill CB> employee | Jim CB> person | Chris I am voting for this by both hands. Now we forced to use an additional column classname in every table and rule to fill this column. CB> 2) Changing the sense of the default for getting inherited tuples. CB> Currently you only get inherited tuples if you specify "tablename*". CB> This would be changed so that you get all sub-class tuples too by CB> default unless you specify "ONLY tablename". There are several CB> rationale for this. Firstly this is what Illustra/Informix have CB> implemented. Secondly, I believe it is more logical from an OO CB> perspective as well as giving a more useful default. If a politician CB> IS a person and I say SELECT * from person, then logically I should CB> see all the politicians because they are people too (so they claim CB> :). Thirdly, there are a whole range of SQL statements that should CB> probably be disallowed without including sub-classes. e.g. an ALTER CB> TABLE ADD COLUMN that does not include sub-classes is almost certainly CB> undesirable. It seems ashame to have to resort to non-standard SQL CB> with the "*" syntax in this case when it is really your only CB> choice. Basicly, wanting ONLY a classname is a far more unusual CB> choice, and leaving off the "*" is a common error. Fourthly, it seems CB> out of character for the SQL language to have this single character CB> operator. The SQL style is to use wordy descriptions of the operators CB> meaning. "ONLY" fits well here because it describes its own meaning CB> perfectly whereas to the unitiated, "*" is harder to guess at. While CB> this change is an incompatibility I hope for those few people using CB> inheritance they can accept the need to move forward without CB> over-burden of backwards compatibility. Sounds very logically. -- Best regards, Yury ICQ 11831432 mailto:yura@vpcit.ru
While I think that these kinds of changes are a No Go because they'd break a lot of applications (including mine), IF (big if) you really want to make major changes to the inheritance scheme, I got a few ideas. First let me say that I like the conceptual simplicity of relational databases. Some or all of the ideas thrown around here break with simplicity and consistency, by suggesting, e.g., that some commands be allowed only on entire inheritance structures, while others be allowed on individual tables, and attached to it a discussion which ones those should be. That doesn't strike me as too promising. A lot of people use inheritance to create "consistent schemas", that is, they empty create base tables, such as "address" which are inherited by tables such as customer, vendor, office, etc. That is probably not what inheritance is for, perhaps it should be some sort of a macro-like concept, such as create table vendor (name text, copy_schema_from(address), more fields), expanded by the parser. This is pretty much what it does now, only this scheme wouldn't have to actually store the (useless) inheritance link. Anyway, an idea I had would be to reimplement inheritance based on joins, since this is what the "pure relational" solution would be anyway. When I create a table B that is based on A, all the system does is create the table B as usual and store a note "I inherit from A". Any row you insert into B also creates a row in A, and the row in B contains an oid pointer to it. Thus a select on B performs a join on A.oid and B.row_in_A_pointer. A select on A just returns all the rows in A, no extras needed. A delete on B deletes the row in B and in A. A delete in A would cascade to B. Both of this can be gotten for free with foreign keys. Adding a column to A just adds the column to A, all other tables get the new column magically and in the right order. Same with dropping columns, etc. In short, this approach solves all inheritance problems at once and does so without adding any extra kludges besides the "I inherited from" field, which is static, plus the necessary transformations necessary in the parser. The drawback is of course that a select from an inherited table would always incur a join, perhaps some optimizing could be done in this direction. But the bottom line is that the compatibility issue looms big. -Peter On Thu, 3 Feb 2000, Chris Bitmead wrote: > Hi, > > I've been spending a lot of time lately with gdb and tracing the > back-end seeing if I can understand it enough to make some changes. > I'm starting to actually understand a lot of stuff, so in order > to have some possibility of having my changes accepted, I want to > discuss > them here first. Based on that, I'm going to hopefully make an attempt > at implementation. I have a patch for one of these changes already > if I get the go ahead. > > THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. > > Speak now about these changes or please, forever hold your peace. Of > course you can comment later if I screw up implementation. > > The proposed changes are.... > > 1) An imaginary field in every tuple that tells you the class it came > from. > This is useful when you select from table* and want to know which > relation the object actually came from. It wouldn't be stored on disk, > and like oid it wouldn't be displayed when you do SELECT *. The field > would be called classname. So you could have... > SELECT p.classname, p.name FROM person p; > person | Fred > student | Bill > employee | Jim > person | Chris > > If you want to know the exact behaviour it is as if every table in the > database had done to it... > ALTER TABLE foo ADD COLUMN classname TEXT; > UPDATE foo SET classname='foo'; > > Of course this is not how it would be implemented. It is just > reference for how it will appear to work. BTW, this idea was also > in the original berkeley design notes. > > 2) Changing the sense of the default for getting inherited tuples. > Currently you only get inherited tuples if you specify "tablename*". > This would be changed so that you get all sub-class tuples too by > default unless you specify "ONLY tablename". There are several > rationale for this. Firstly this is what Illustra/Informix have > implemented. Secondly, I believe it is more logical from an OO > perspective as well as giving a more useful default. If a politician > IS a person and I say SELECT * from person, then logically I should > see all the politicians because they are people too (so they claim > :). Thirdly, there are a whole range of SQL statements that should > probably be disallowed without including sub-classes. e.g. an ALTER > TABLE ADD COLUMN that does not include sub-classes is almost certainly > undesirable. It seems ashame to have to resort to non-standard SQL > with the "*" syntax in this case when it is really your only > choice. Basicly, wanting ONLY a classname is a far more unusual > choice, and leaving off the "*" is a common error. Fourthly, it seems > out of character for the SQL language to have this single character > operator. The SQL style is to use wordy descriptions of the operators > meaning. "ONLY" fits well here because it describes its own meaning > perfectly whereas to the unitiated, "*" is harder to guess at. While > this change is an incompatibility I hope for those few people using > inheritance they can accept the need to move forward without > over-burden of backwards compatibility. > > 3) The ability to return different types of rows from a SELECT. This > is to allow implementation of ODBMS functionality where a query could > be required to instantiate objects of differing types with differing > attributes. > > I would propose that that anytime you do a SELECT * from a base table > that you would get back the full rows from those sub tables. Since the > current PQ interface which doesn't support this notion would remain > unchanged this wouldn't affect current users. > > It's probably also desirable to have a syntax for getting just the > columns of the base table when this is desired. Say perhaps SELECT % > from table. This would be a performance hack for users of libpq and a > functionality difference for users of psql. > > The reason I think the "*" syntax should take on the new functionality > is because it would be more consistent with what the OQL (object query > language) standard specifies, and also because it seems the more > useful default. Also there is no compatibility reason not to do it. > > In addition it would be legal to specify columns that only exist in > sub-classes. For example, if we had > > CREATE TABLE person (name TEXT); > CREATE TABLE student (studentid TEXT, faculty TEXT) INHERITS (person); > > it would be legal to say... > > SELECT * FROM person; > NAME > ---- > Fred > Bill > > NAME | STUDENTID | FACULTY > -------------------------- > Jim | 23455 | Science > Chris| 45666 | Arts > > > SELECT *, studentid FROM person; > NAME > ---- > Fred > Bill > > NAME | STUDENTID > ---------------- > Jim | 23455 > Chris| 45666 > > > SELECT *, studentid FROM ONLY person; > ERROR: person does not contain studentid. > > > SELECT % FROM person; > NAME > ---- > Fred > Bill > Jim > Chris > > As you can see, it is desirable that psql be modified to be able to > print these differing tuple types. Presumably new column headings will > be printed when a tuple is differing to the previous one. Likely it > will be often desirable to do a > SELECT * FROM person p ORDER BY p.classname; > in order to have all the tuples of a particular type grouped together. > > In addition some extenions will be done to the PQ interface to support > these differing return types. The current PQ interface will be left > unchanged and backwards compatible for retrieving rows of a single > type. > > Also there should be an settable option that specifies that "*" should > also return the normally ignored columns of oid and classname. This is > so that OO programs that embed SQL into them also get back the oid and > classname which are required for the behind the scenes implementation > of an ODMG client. Something like... > > SET SHOW_OID TRUE; > SHOW_CLASSNAME TRUE; > > SELECT * FROM person; > > OID CLASSNAME NAME > ------------------- > 2344 person Fred > 3445 person Bill > > OID CLASSNAME NAME | STUDENTID | FACULTY > ----------------------------------------- > 2355 student Jim | 23455 | Science > 5655 student Chris| 45666 | Arts > > ************ > > -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut wrote: > A lot of people use inheritance to create "consistent schemas", that is, > they empty create base tables, such as "address" which are inherited by > tables such as customer, vendor, office, etc. This is a really bad idea. You could never have both a postal address AND a home address for example. I thought the original postgres supported this by having CREATE TABLE ADDRESS (...) CREATE TABLE PERSON(add ADDRESS). Anyway, this is what Oracle and others can do these days, and this is the right thing. > Anyway, an idea I had would be to reimplement inheritance based on joins, > since this is what the "pure relational" solution would be anyway. When I > create a table B that is based on A, all the system does is create the > table B as usual and store a note "I inherit from A". Any row you insert > into B also creates a row in A, and the row in B contains an oid pointer > to it. This is a really stu^H^H^H bad idea. I have hierarchies 5 levels deep with multiple inheritance, and I don't want to do a 10 way join just to retrieve an object. This is why RDBMS's performance sucks so incredibly badly on some applications. an ODBMS can perform 100x as fast in these cases just because of what you are proposing. > Thus a select on B performs a join on A.oid and B.row_in_A_pointer. > A select on A just returns all the rows in A, no extras needed. A delete > on B deletes the row in B and in A. A delete in A would cascade to B. Both > of this can be gotten for free with foreign keys. Adding a column to A > just adds the column to A, all other tables get the new column magically > and in the right order. Same with dropping columns, etc. > > In short, this approach solves all inheritance problems at once and does > so without adding any extra kludges besides the "I inherited from" field, > which is static, plus the necessary transformations necessary in the > parser. The drawback is of course that a select from an inherited table > would always incur a join, perhaps some optimizing could be done in this > direction. But the bottom line is that the compatibility issue looms big. > > -Peter > > On Thu, 3 Feb 2000, Chris Bitmead wrote: > > > Hi, > > > > I've been spending a lot of time lately with gdb and tracing the > > back-end seeing if I can understand it enough to make some changes. > > I'm starting to actually understand a lot of stuff, so in order > > to have some possibility of having my changes accepted, I want to > > discuss > > them here first. Based on that, I'm going to hopefully make an attempt > > at implementation. I have a patch for one of these changes already > > if I get the go ahead. > > > > THESE CHANGES DON'T AFFECT YOU IF YOU DON'T USE INHERITANCE. > > > > Speak now about these changes or please, forever hold your peace. Of > > course you can comment later if I screw up implementation. > > > > The proposed changes are.... > > > > 1) An imaginary field in every tuple that tells you the class it came > > from. > > This is useful when you select from table* and want to know which > > relation the object actually came from. It wouldn't be stored on disk, > > and like oid it wouldn't be displayed when you do SELECT *. The field > > would be called classname. So you could have... > > SELECT p.classname, p.name FROM person p; > > person | Fred > > student | Bill > > employee | Jim > > person | Chris > > > > If you want to know the exact behaviour it is as if every table in the > > database had done to it... > > ALTER TABLE foo ADD COLUMN classname TEXT; > > UPDATE foo SET classname='foo'; > > > > Of course this is not how it would be implemented. It is just > > reference for how it will appear to work. BTW, this idea was also > > in the original berkeley design notes. > > > > 2) Changing the sense of the default for getting inherited tuples. > > Currently you only get inherited tuples if you specify "tablename*". > > This would be changed so that you get all sub-class tuples too by > > default unless you specify "ONLY tablename". There are several > > rationale for this. Firstly this is what Illustra/Informix have > > implemented. Secondly, I believe it is more logical from an OO > > perspective as well as giving a more useful default. If a politician > > IS a person and I say SELECT * from person, then logically I should > > see all the politicians because they are people too (so they claim > > :). Thirdly, there are a whole range of SQL statements that should > > probably be disallowed without including sub-classes. e.g. an ALTER > > TABLE ADD COLUMN that does not include sub-classes is almost certainly > > undesirable. It seems ashame to have to resort to non-standard SQL > > with the "*" syntax in this case when it is really your only > > choice. Basicly, wanting ONLY a classname is a far more unusual > > choice, and leaving off the "*" is a common error. Fourthly, it seems > > out of character for the SQL language to have this single character > > operator. The SQL style is to use wordy descriptions of the operators > > meaning. "ONLY" fits well here because it describes its own meaning > > perfectly whereas to the unitiated, "*" is harder to guess at. While > > this change is an incompatibility I hope for those few people using > > inheritance they can accept the need to move forward without > > over-burden of backwards compatibility. > > > > 3) The ability to return different types of rows from a SELECT. This > > is to allow implementation of ODBMS functionality where a query could > > be required to instantiate objects of differing types with differing > > attributes. > > > > I would propose that that anytime you do a SELECT * from a base table > > that you would get back the full rows from those sub tables. Since the > > current PQ interface which doesn't support this notion would remain > > unchanged this wouldn't affect current users. > > > > It's probably also desirable to have a syntax for getting just the > > columns of the base table when this is desired. Say perhaps SELECT % > > from table. This would be a performance hack for users of libpq and a > > functionality difference for users of psql. > > > > The reason I think the "*" syntax should take on the new functionality > > is because it would be more consistent with what the OQL (object query > > language) standard specifies, and also because it seems the more > > useful default. Also there is no compatibility reason not to do it. > > > > In addition it would be legal to specify columns that only exist in > > sub-classes. For example, if we had > > > > CREATE TABLE person (name TEXT); > > CREATE TABLE student (studentid TEXT, faculty TEXT) INHERITS (person); > > > > it would be legal to say... > > > SELECT * FROM person; > > NAME > > ---- > > Fred > > Bill > > > > NAME | STUDENTID | FACULTY > > -------------------------- > > Jim | 23455 | Science > > Chris| 45666 | Arts > > > > > SELECT *, studentid FROM person; > > NAME > > ---- > > Fred > > Bill > > > > NAME | STUDENTID > > ---------------- > > Jim | 23455 > > Chris| 45666 > > > > > SELECT *, studentid FROM ONLY person; > > ERROR: person does not contain studentid. > > > > > SELECT % FROM person; > > NAME > > ---- > > Fred > > Bill > > Jim > > Chris > > > > As you can see, it is desirable that psql be modified to be able to > > print these differing tuple types. Presumably new column headings will > > be printed when a tuple is differing to the previous one. Likely it > > will be often desirable to do a > > SELECT * FROM person p ORDER BY p.classname; > > in order to have all the tuples of a particular type grouped together. > > > > In addition some extenions will be done to the PQ interface to support > > these differing return types. The current PQ interface will be left > > unchanged and backwards compatible for retrieving rows of a single > > type. > > > > Also there should be an settable option that specifies that "*" should > > also return the normally ignored columns of oid and classname. This is > > so that OO programs that embed SQL into them also get back the oid and > > classname which are required for the behind the scenes implementation > > of an ODMG client. Something like... > > > > SET SHOW_OID TRUE; > > SHOW_CLASSNAME TRUE; > > > > SELECT * FROM person; > > > > OID CLASSNAME NAME > > ------------------- > > 2344 person Fred > > 3445 person Bill > > > > OID CLASSNAME NAME | STUDENTID | FACULTY > > ----------------------------------------- > > 2355 student Jim | 23455 | Science > > 5655 student Chris| 45666 | Arts > > > > ************ > > > > > > -- > Peter Eisentraut Sernanders vaeg 10:115 > peter_e@gmx.net 75262 Uppsala > http://yi.org/peter-e/ Sweden
> Peter Eisentraut wrote: > > > This is a really stu^H^H^H bad idea. I have hierarchies 5 levels deep > with > multiple inheritance, and I > don't want to do a 10 way join just to retrieve an object. > > This is why RDBMS's performance sucks so incredibly badly on some > applications. > an ODBMS can perform 100x as fast in these cases just because of what > you > are proposing. > Hmm, and yes one may find problems where the pure relational system is 100x faster than your ODBMS. After doing a project with VERSANT and VisualWorks (election projection system for the first television sender here in Germany) I like the idea of OODBMS, but I've also noticed, that they are not the solution to all problems. Clever database desing leeds to good performance on both systems, but one should consider, that the designs of the database layout will be different. There are cases, where a pure relational system is very fast and an ODBMS never get it, but there are the examples you mentioned. Joins per se are not that bad .. it depends on when and how they are used and how good the analyzer of the database is and how good he uses the indices to get the job done. One very good point is the query language of the rdbms systems. On the odbms side no standard is really available, which can be seen as the sql of the odbms. Marten
Marten Feldtmann wrote: > Hmm, and yes one may find problems where the pure > relational system is 100x faster than your ODBMS. > > After doing a project with VERSANT and VisualWorks > (election projection system for the first television > sender here in Germany) I like the idea of OODBMS, > but I've also noticed, that they are not the > solution to all problems. Give me a clear application spec and VERSANT, and I will ALWAYS flog Oracle into the dust. But... Where SQL comes into it's own is _conveniently_ doing queries that I never thought of when I first designed my app. Of course many ODBMSes have SQL or similar too. > Joins per se are not that bad .. it depends on when > and how they are used and how good the analyzer of > the database is and how good he uses the indices to > get the job done. Take the simple SUPPLIER, PART and SUPPLIER_PART situation. The very fact that you've got an extra table here means you've got to touch many more disk pages and transfer more data. An RDBMS just can't win when the ODBMS data model is designed right. > One very good point is the query language of the > rdbms systems. On the odbms side no standard is > really available, which can be seen as the sql of > the odbms. There is a standard called OQL which is very similar to SQL. It's just rather poorly supported. -- Chris Bitmead mailto:chris@bitmead.com
Chris wrote: > > > One very good point is the query language of the > > rdbms systems. On the odbms side no standard is > > really available, which can be seen as the sql of > > the odbms. > > There is a standard called OQL which is very similar to SQL. It's just > rather poorly supported. > I think the operative word here is "available". I know that SQL specs are'nt freely available either, but due to SQL being already widely supported one can get the general idea from many freely available sources, like the bunch of freely downloadable DB's currently available for linux. Most of them have some docs included. It is still quite a job to reconstruct SQL92 from them ;) I know now description (except a BNF syntax available from some ODBMS website) that I could use to get some idea about OQL. ---------------------- Hannu
Hannu Krosing wrote: > > > It is still quite a job to reconstruct SQL92 from them ;) > > I know now description (except a BNF syntax available from some ODBMS website) SHould be "I know no description ..." > that I could use to get some idea about OQL. > > ---------------------- > Hannu > > ************
1) free is freedom, not free bear ;-) also, there are some sites has oql draft -- like sql draft. sorry, can not remember. 2) good books, like " C++ object databases" (David Jordan) has a lot material. 3) a lot of OODBM evaluation copy there. Not totally complied, but together with 1 and 2, still can see what is going on. so, no excuse for not knowing oodbm/oql :-) -- I'm waiting for trying them on pg . On Sat, 5 Feb 2000, Hannu Krosing wrote: > Chris wrote: > > > > > One very good point is the query language of the > > > rdbms systems. On the odbms side no standard is > > > really available, which can be seen as the sql of > > > the odbms. > > > > There is a standard called OQL which is very similar to SQL. It's just > > rather poorly supported. > > > > I think the operative word here is "available". I know that SQL specs > are'nt freely available either, but due to SQL being already widely > supported one can get the general idea from many freely available sources, > like the bunch of freely downloadable DB's currently available for linux. > Most of them have some docs included. > > It is still quite a job to reconstruct SQL92 from them ;) > > I know now description (except a BNF syntax available from some ODBMS website) > that I could use to get some idea about OQL. > > ---------------------- > Hannu > > ************ >
> 2) good books, like " C++ object databases" (David Jordan) has > a lot material. As an example: Cattel, "The Object Database Standard ODMG 2.0" Morgan Kaufmann, ISBN 1 - 55860 - 463 -4 Marten Feldtmann