Thus spoke SQL3 (on OO) - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Thus spoke SQL3 (on OO) |
Date | |
Msg-id | Pine.LNX.4.21.0005201507330.423-100000@localhost.localdomain Whole thread Raw |
Responses |
Re: Thus spoke SQL3 (on OO)
|
List | pgsql-hackers |
Okay, here's the ugly and dirty truth. Before you complain, please keep in mind that I didn't write the standard, although I tend to give it the benefit of the doubt after ignoring the truly silly stuff. :) * How to create inheritance hierarchies ** CREATE TABLE syntax Ignoring the parts that do no not pertain to inheritance, the CREATE TABLE syntax in SQL3 looks like this: CREATE TABLE <table name> { <table element list> | <subtable clause> } where <table element list> is the usual `(colname type, colname type, ...)' variety and <subtable clause> looks like this: UNDER <supertable>, <supertable>, ... or optionally like UNDER <supertable> WITH (<old colname> AS <new colname>, <old colname2> AS <new colname2>, ...), ... But notice how the syntax { <table element list> | <subtable clause> } would force you to either create new columns in the table or inherit all columns from one or more supertables. That evidently cannot be right. Reinforcing this believe is that the standard at several places talks about "inherited columns" vs "originally defined columns", which would of course not be possible under this scheme. Let's therefore assume that the syntax should really be more something like this: CREATE TABLE <table name> <table element list> <subtable clause> | CREATE TABLE <table name> <table element list> | CREATE TABLE <table name> <subtable clause> This is really not any different from the current INHERITS syntax; perhaps in a fit of purity someone is willing to complement it accordingly. One key element here that is new is the column renaming option. ** Column naming and ordering The ordering of the columns has apparently been a problem even for the authors of the standard. The rules for CREATE TABLE merely say that the columns are ordered according the order in which the supertables are listed in the the UNDER clause. It does not say anything about whether the originally defined columns come before or after the inherited ones. This does not make the issue of adding columns easier. The rules say: "If [the table being altered] is a supertable, then an <add column definition>, without further Access Rule checking, is effectively performed for each of its subtables, thereby adding the column as an inherited column in these subtables." This would make some sort of sense if the originally defined columns came before the inherited ones, but the way it stands it doesn't help a lot. The resolution of names is done as follows. First, a list of all inherited columns is created. If one column is "replicated", that is, more than one supertable inherited it from the same super-supertable, all but the first occurrence is dropped. Then the column renaming clauses are applied. The resulting list must not contain a duplicate column name. This scheme is quite different from the current PostgreSQL implementation, which merges columns of equal name and datatype. It has been mentioned before during OO discussions that the association of inherited columns by name alone should probably be dropped and replaced by pg_attribute.oid references. This would seem like a good thing to do because it would allow us to detect replicated columns reliably and give a chance to the column renaming option. ** OID, Identity, et al. "An object identifier OID is a value generated when an object is created, to give that object an immutable identity. It is unique in the known universe of objects that are instances of abstract data types, and is conceptually separate from the value, or state, of the instance." Since the way I understand it a table definition also defines an abstract data type in some way or other, and rows are instantiations of that data type, this definition of OID matches ours pretty good. "The OID value is materialized as a character string with an implementation-defined length and character set SQL_TEXT." ... or maybe not. :-) What exactly IDENTITY is is still a bit unclear to me but it is definitely not the proposed identification of the table a row came from. The implicit column IDENTITY contains a `row identifier'. "The value of a row identifier for a given base table row is equal to itself and it not equal to the value of a row identifier for any other base table row within the database." (Note: `base table' is the opposite of `derived table' (a view), and is unrelated to whether a table is a sub- or supertable.) There is no literal for row identifiers and they do not interact with other data types. The only manifestation is through the API, where they are mapped to unique row "handles" in a language specific fashion. Not every table has row identifiers, you must ask for them when creating the table. This all relates to inheritance because "A row identifier is implicitly defined for [the table to be created]. For every table ST named in the <subtable clause>, a row identifier is implicitly defined for ST." Row identifiers were ANSI-only at the time of the draft, ISO simply says that any supertable must have a primary key. I can't quite put my finger on either of these requirements, though. In any case I'd advise against overloading IDENTITY in the manner that was proposed. * Cloning One thing that often comes up in `various ways of looking at inheritance' threads is the idea of cloning the definition of a given table as part of a newly created table. There's a syntax for that as well in SQL3: CREATE TABLE name ( colname type, colname type, LIKE other_table, colname type, ... ); This effectively pastes whatever you wrote when you created the "other_table" at the place of the LIKE. After the create table is done, the new and the "old" table are completely unrelated. Of course if you want to clone the data as well in one shot you could use CREATE TABLE AS. In any case, this has really very little to do with the inheritance we're discussing here, other than that it `feels' the same. * Operating on data ** SELECT To make a long story short: subtables are automatically examined, unless ONLY is specified. To make the story longer, phrasing and viewing it like this is really quite incorrect. Instead: "Any row of a subtable must correspond to one and only one row of each direct supertable. Any row of a supertable corresponds to at most one row of a direct subtable." The key word I see here is `correspondence', namely that a given row is always a member of both the sub- and the supertable (probably having more columns in the subtable obviously) and doesn't belong to either of them more then to the other. In other words, the row is conceptually shared. Then what the ONLY table reference really does is select all rows of a supertable that do not have any corresponding row in any subtable. (This is the wording the standard chose.) Implementation-wise this would be the easier thing to do (which is probably why it's done this way now), but conceptually it is really the unnatural situation because it's similar to an `except' query. ** Insert, Update, Delete These commands have no special notion of inheritance. Since all rows are effectively shared between sub- and supertables you cannot update them in one of them "only" without some copy-on-write concept. Of course some rows in a supertable may have no corresponding rows in any subtable, but that's nothing the row knows about or cares about. In sophisticated inheritance hierarchies, rows and parts of rows may be shared in very involved ways, so I foresee some issues with Update Only. (This sounds stranger than it really is: There is no requirement that the `corresponding row' is physically stored at the supertable, it is only required that it effectively exists there as well, which is satisfied if SELECT retrieves it by default. In some sense the apparently `favoured' storage model here is that all the inherited attributes and their values are stored in the supertable heap and only the originally defined attributes of subtables are in the subtable heap. This method favours the SELECT semantics on supertables because it doesn't have to worry about subclasses at all. But queries on subtables effectively become joins.) -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
pgsql-hackers by date: