Thus spoke SQL3 (on OO) - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Thus spoke SQL3 (on OO)
Date
Msg-id Pine.LNX.4.21.0005201507330.423-100000@localhost.localdomain
Whole thread Raw
Responses Re: Thus spoke SQL3 (on OO)
List pgsql-hackers
Okay, here's the ugly and dirty truth. Before you complain, please keep in
mind that I didn't write the standard, although I tend to give it the
benefit of the doubt after ignoring the truly silly stuff. :)


* How to create inheritance hierarchies

** CREATE TABLE syntax

Ignoring the parts that do no not pertain to inheritance, the CREATE
TABLE syntax in SQL3 looks like this:

CREATE TABLE <table name> { <table element list> | <subtable clause> }

where <table element list> is the usual `(colname type, colname type,
...)' variety and <subtable clause> looks like this:

UNDER <supertable>, <supertable>, ...

or optionally like

UNDER <supertable> WITH (<old colname> AS <new colname>, <old
colname2> AS <new colname2>, ...), ...

But notice how the syntax { <table element list> | <subtable clause> }
would force you to either create new columns in the table or inherit
all columns from one or more supertables. That evidently cannot be
right. Reinforcing this believe is that the standard at several places
talks about "inherited columns" vs "originally defined columns", which
would of course not be possible under this scheme. Let's therefore
assume that the syntax should really be more something like this:

CREATE TABLE <table name> <table element list> <subtable clause>
| CREATE TABLE <table name> <table element list>
| CREATE TABLE <table name> <subtable clause>

This is really not any different from the current INHERITS syntax;
perhaps in a fit of purity someone is willing to complement it
accordingly. One key element here that is new is the column renaming
option.

** Column naming and ordering

The ordering of the columns has apparently been a problem even for the
authors of the standard. The rules for CREATE TABLE merely say that
the columns are ordered according the order in which the supertables
are listed in the the UNDER clause. It does not say anything about
whether the originally defined columns come before or after the
inherited ones.

This does not make the issue of adding columns easier. The rules say:

"If [the table being altered] is a supertable, then an <add column
definition>, without further Access Rule checking, is effectively
performed for each of its subtables, thereby adding the column as an
inherited column in these subtables."

This would make some sort of sense if the originally defined columns
came before the inherited ones, but the way it stands it doesn't help
a lot.

The resolution of names is done as follows. First, a list of all
inherited columns is created. If one column is "replicated", that is,
more than one supertable inherited it from the same super-supertable,
all but the first occurrence is dropped. Then the column renaming
clauses are applied. The resulting list must not contain a duplicate
column name.

This scheme is quite different from the current PostgreSQL
implementation, which merges columns of equal name and datatype. It
has been mentioned before during OO discussions that the association
of inherited columns by name alone should probably be dropped and
replaced by pg_attribute.oid references. This would seem like a good
thing to do because it would allow us to detect replicated columns
reliably and give a chance to the column renaming option.

** OID, Identity, et al.

"An object identifier OID is a value generated when an object is
created, to give that object an immutable identity. It is unique in
the known universe of objects that are instances of abstract data
types, and is conceptually separate from the value, or state, of the
instance."

Since the way I understand it a table definition also defines an
abstract data type in some way or other, and rows are instantiations
of that data type, this definition of OID matches ours pretty good.

"The OID value is materialized as a character string with an
implementation-defined length and character set SQL_TEXT."

... or maybe not. :-)

What exactly IDENTITY is is still a bit unclear to me but it is
definitely not the proposed identification of the table a row came
from. The implicit column IDENTITY contains a `row identifier'.

"The value of a row identifier for a given base table row is equal to
itself and it not equal to the value of a row identifier for any other
base table row within the database."

(Note: `base table' is the opposite of `derived table' (a view), and
is unrelated to whether a table is a sub- or supertable.)

There is no literal for row identifiers and they do not interact with
other data types. The only manifestation is through the API, where
they are mapped to unique row "handles" in a language specific
fashion.

Not every table has row identifiers, you must ask for them when
creating the table. This all relates to inheritance because

"A row identifier is implicitly defined for [the table to be created].
For every table ST named in the <subtable clause>, a row identifier is
implicitly defined for ST."

Row identifiers were ANSI-only at the time of the draft, ISO simply
says that any supertable must have a primary key. I can't quite put my
finger on either of these requirements, though.

In any case I'd advise against overloading IDENTITY in the manner that
was proposed.

* Cloning

One thing that often comes up in `various ways of looking at
inheritance' threads is the idea of cloning the definition of a given
table as part of a newly created table. There's a syntax for that as
well in SQL3:

CREATE TABLE name (   colname type,   colname type,   LIKE other_table,   colname type,   ...
);

This effectively pastes whatever you wrote when you created the
"other_table" at the place of the LIKE. After the create table is
done, the new and the "old" table are completely unrelated. Of course
if you want to clone the data as well in one shot you could use CREATE
TABLE AS. In any case, this has really very little to do with the
inheritance we're discussing here, other than that it `feels' the
same.

* Operating on data

** SELECT

To make a long story short: subtables are automatically examined,
unless ONLY is specified.

To make the story longer, phrasing and viewing it like this is really
quite incorrect. Instead:

"Any row of a subtable must correspond to one and only one row of each
direct supertable. Any row of a supertable corresponds to at most one
row of a direct subtable."

The key word I see here is `correspondence', namely that a given row
is always a member of both the sub- and the supertable (probably
having more columns in the subtable obviously) and doesn't belong to
either of them more then to the other. In other words, the row is
conceptually shared. Then what the ONLY table reference really does is
select all rows of a supertable that do not have any corresponding row
in any subtable. (This is the wording the standard chose.)
Implementation-wise this would be the easier thing to do (which is
probably why it's done this way now), but conceptually it is really
the unnatural situation because it's similar to an `except' query.

** Insert, Update, Delete

These commands have no special notion of inheritance. Since all rows
are effectively shared between sub- and supertables you cannot update
them in one of them "only" without some copy-on-write concept. Of
course some rows in a supertable may have no corresponding rows in any
subtable, but that's nothing the row knows about or cares about. In
sophisticated inheritance hierarchies, rows and parts of rows may be
shared in very involved ways, so I foresee some issues with Update
Only.

(This sounds stranger than it really is: There is no requirement that
the `corresponding row' is physically stored at the supertable, it is
only required that it effectively exists there as well, which is
satisfied if SELECT retrieves it by default. In some sense the
apparently `favoured' storage model here is that all the inherited
attributes and their values are stored in the supertable heap and only
the originally defined attributes of subtables are in the subtable
heap. This method favours the SELECT semantics on supertables because
it doesn't have to worry about subclasses at all. But queries on
subtables effectively become joins.)


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: type conversion discussion
Next
From: Bruce Momjian
Date:
Subject: plpgsql chapter in docs