Thread: Inheritance vs. LIKE - need advice

Inheritance vs. LIKE - need advice

From

William Bug

Date:

05 August 2005, 03:19:23

Hi All,

Sorry to bring up the topic of PostgreSQL inheritance again, but
after going through the archives and Google results, I still don't
have a clear sense of whether my plan to implement a schema I'm
working on is the preferred way to go.

First, I'd like to find out if the way I'm thinking about Inheritance
vs. the SQL DDL CREATE TABLE modifier LIKE is correct.

The simplest analogy I can think of from OO techniques is PGSQL
Inheritance corresponds to Class Inheritance, while LIKE is more like
an inheritable Interface (Java) or Mixin (Ruby).  Inheritance
maintains strict hierarchical relationships propagating the "Class"
identity down through to all progeny. LIKE on the other hand simply
provides a means to re-use a set of fields in an unlimited number
tables without having to redefine those fields for each table you use
them in.

This view is incomplete and far from a perfect fit to the way PGSQL
Inheritance & LIKE work, but I think it's a helpful way of thinking
of these 2 related mechanisms, when trying to decide how and when to
use them in their current form.  As has been mentioned many times in
posts here, as well as in the PGSQL docs, PGSQL Inheritance is only
partial. Table fields are propagated as well as the group identity,
but no other RDBMS objects created on the parent ( INDEXES,
CONSTRAINTS & SEQUENCES primarily) are inherited.  As has been
endlessly stated in posts here and elsewhere, this is a significant
short-coming for the PGSQL Inheritance mechanism which those of us
desirous of using Inheritance would love to see fixed (I understand
it has been on the TODO list for many years, as this mechanism has
been in the PGSQL code base for over 15 years).

I don't agree this makes PGSQL Inheritance unusable.  There are
situations where I think it can still be useful, and I describe one
below.  I'd welcome feedback on that opinion, however, as I'd hate to
have my relative ignorance doom the data schema I'm about to fill
with a few million rows of data to serious problems later.

The following is an example of using both Inheritance and LIKE in the
context described above.

CREATE TABLE curation_info (
    created_by       TEXT    NOT NULL,
    create_date     TIMESTAMP WITH TIME ZONE,
    modified_by     TEXT    NOT NULL,
    mod_date          TIMESTAMP WITH TIME ZONE
);

CREATE TABLE book (
     id_pk                    SERIAL    PRIMARY KEY,
     title                        TEXT        NOT NULL,
     author_id_fk        INT            NOT NULL,
    publisher_id_fk    INT            NOT NULL,
     pub_year              DATE        NOT NULL,
     total_pages         INT            NOT NULL
     LIKE curation_info
);

CREATE TABLE novel (
     id_pk                    SERIAL    PRIMARY KEY,
     genre_id_fk        INT            NOT NULL
) INHERITS (book);

CREATE TABLE textbook (
     id_pk                    SERIAL    PRIMARY KEY,
     subject_id_fk    INT    NOT NULL
) INHERITS (book);


CREATE TABLE publisher (
     id_pk                    SERIAL    PRIMARY KEY,
     name                        TEXT        NOT NULL,
     address_id_fk        INT            NOT NULL,
     LIKE curation_info
);

CREATE TABLE author (
     id_pk                    SERIAL    PRIMARY KEY,
     last_name           TEXT        NOT NULL,
     first_name             TEXT        NOT NULL,
     middle_name         TEXT        NOT NULL,
     address_id_fk        INT            NOT NULL,
     LIKE curation_info
);

This is not the best way to model book info (for instance, books are
only allowed to have 1 author in this schema), but it will help me to
make my point.

Books, novels and textbooks will be considered equivalent in the
context of many queries.  At the same time, there will be other
queries where it will be important to consider novels & textbooks as
distinct entities.  The PGSQL Inheritance mechanism easily supports
both of these situations.

The curation fields listed in the 'curation_info' table are found
ubiquitously in tables throughout many data schema.  However, it is
not likely there would be a circumstance where you would want to
consider all tables containing these fields "curatable entities" to
be queried as a group.  That simply makes no sense.  In this case,
LIKE seems to be the best way to propagate these fields, since it
doesn't couple all tables containing them to the parent
'curation_info' table.

As I see it, there are at least 3 major problems with adopting such a
schema - despite the obvious efficiencies it offers (most of which
have been reported elsewhere):
     1) none of the parent table ('book') CONSTRAINTS or INDEXES are
propagated to the children.  This means if you want the children to
have the same CONSTRAINTS - as you probably will - you need to build
them yourself for each child table.
     2) the primary keys generated across the book, novel & textbook
tables are completely uncouple and will definitely collide.  In other
words, due to the fact that neither the SEQUENCE behind the
'book.id_pk' SERIAL field, not the PK CONSTRAINTS & INDEX that comes
with that field will automatically propagate to the child tables.
That is why the SQL DDL given above has an 'id_pk' SERIAL field in
all 3 tables.  There may be some conditions where you want those PKs
to be independent from one another, but those will be much less
frequent than the times when you will require they all derive from
the same SEQUENCE.
     3) The fields inherited from the 'curation_info' table via the
LIKE modifier are in no way linked back to the table from which they
originated, unlike a an Interface (in Java) or Mixin (in Ruby) would
be.  If the 'mod_date' field is remove from 'curation_info' it will
still remain in all the tables created using 'curation_info' prior to
making that change.  Same is true if a new field is added to
'curation_info'.  If you want that field to be represented in all
those tables that had previously been created using the LIKE
'curation_info' modifier, you will have to re-CREATE those tables
from scratch.

As I see it, '1' & '3' above are significant drawbacks with no
obvious work-around, but they are not deal breakers.  I would still
have reason to want to use both Inheritance and LIKE because of the
efficiencies they provide.

'2' above has a simple and obvious work around which I've been
surprised I've not been able to find posted anywhere (leading me to
believe I must be missing something).  PGSQL automatically builds a
SEQUENCE object on SERIAL fields.  The name of the SEQUENCE is simply
the concatenated name of the table + SERIAL field with '_seq'
appended to the end - e.g., for the 'book' table, it would be
'book_id_pk_seq'.  In order to guarantee the child tables use that
same sequence, you simply declare them as follows instead of using
the SERIAL type:

CREATE TABLE textbook (
     id_pk                    INT8     DEFAULT nextval
('book_id_pk_seq')         NOT NULL,
     subject_id_fk      INT        NOT NULL
) INHERITS (book);
ALTER TABLE textbook ADD CONSTRAINT textbook_pk_c PRIMARY KEY(id_pk);

Is it a pain to have to write - and maintain - this extra SQL DDL?
Yes.  Having said that, will it provide the desired behavior?  Most
definitely yes - again, unless I'm missing something.

So this is how I plan to use INHERIT & LIKE.

My main reason for posting this here, is to get a chance to draw on
the breadth of PostgreSQL experience out there to bring the major
pitfalls in taking this approach to my attention.  I think whatever
feedback folks have to offer will be very helpful to me and to others
in search of guidance on this issue.

I also thought it would be helpful to present this for archive
purposes, since, despite the fact much of what I say here is
mentioned elsewhere, I had to search far and wide to find it all and
consolidate my thinking on the topic, so it might save others some
time to see it all in one place.

It might also be worth adding some of this "advice" - if the
consensus is this view is reasonable given the current state of the
Inheritance mechanism in PostgreSQL - to one of the official
PostgreSQL docs - e.g., FAQ, etc..

Many thanks ahead of time for your feedback and patience in reading
this through to the end.

Cheers,
Bill Bug

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)

Re: Inheritance vs. LIKE - need advice

From

Jeff Davis

Date:

06 August 2005, 18:15:33

William Bug wrote:
> Hi All,
>
> Sorry to bring up the topic of PostgreSQL inheritance again, but  after
> going through the archives and Google results, I still don't  have a
> clear sense of whether my plan to implement a schema I'm  working on is
> the preferred way to go.
>
> First, I'd like to find out if the way I'm thinking about Inheritance
> vs. the SQL DDL CREATE TABLE modifier LIKE is correct.
>
> The simplest analogy I can think of from OO techniques is PGSQL
> Inheritance corresponds to Class Inheritance, while LIKE is more like
> an inheritable Interface (Java) or Mixin (Ruby).  Inheritance  maintains
> strict hierarchical relationships propagating the "Class"  identity down
> through to all progeny. LIKE on the other hand simply  provides a means
> to re-use a set of fields in an unlimited number  tables without having
> to redefine those fields for each table you use  them in.

As far as the "database table = application object class" analogy goes,
I think you're correct.

> This view is incomplete and far from a perfect fit to the way PGSQL
> Inheritance & LIKE work, but I think it's a helpful way of thinking  of
> these 2 related mechanisms, when trying to decide how and when to  use
> them in their current form.  As has been mentioned many times in  posts
> here, as well as in the PGSQL docs, PGSQL Inheritance is only  partial.
> Table fields are propagated as well as the group identity,  but no other
> RDBMS objects created on the parent ( INDEXES,  CONSTRAINTS & SEQUENCES
> primarily) are inherited.  As has been  endlessly stated in posts here
> and elsewhere, this is a significant  short-coming for the PGSQL
> Inheritance mechanism which those of us  desirous of using Inheritance
> would love to see fixed (I understand  it has been on the TODO list for
> many years, as this mechanism has  been in the PGSQL code base for over
> 15 years).

For a variety of reasons, it seems Inheritance is basically on
life-support in PostgreSQL, only there for backwards-compatibility.
First, some of the problems like inheriting keys and constraints are the
exact same problems with table partitioning. When PG has a
mostly-complete table partitioning system, I would imagine those
features will appear in the inheritance facilities. Second, there is
limited demand, because inheritance isn't "needed" (more on that below).

> I don't agree this makes PGSQL Inheritance unusable.  There are
> situations where I think it can still be useful, and I describe one
> below.  I'd welcome feedback on that opinion, however, as I'd hate to
> have my relative ignorance doom the data schema I'm about to fill  with
> a few million rows of data to serious problems later.

It's not unusable, but because it's not all that well defined, and not
under active development and maintenance, you might be wary of using it.

> The following is an example of using both Inheritance and LIKE in the
> context described above.

<snip>

>
> This is not the best way to model book info (for instance, books are
> only allowed to have 1 author in this schema), but it will help me to
> make my point.
>
> Books, novels and textbooks will be considered equivalent in the
> context of many queries.  At the same time, there will be other  queries
> where it will be important to consider novels & textbooks as  distinct
> entities.  The PGSQL Inheritance mechanism easily supports  both of
> these situations.
>
> The curation fields listed in the 'curation_info' table are found
> ubiquitously in tables throughout many data schema.  However, it is  not
> likely there would be a circumstance where you would want to  consider
> all tables containing these fields "curatable entities" to  be queried
> as a group.  That simply makes no sense.  In this case,  LIKE seems to
> be the best way to propagate these fields, since it  doesn't couple all
> tables containing them to the parent  'curation_info' table.
>
> As I see it, there are at least 3 major problems with adopting such a
> schema - despite the obvious efficiencies it offers (most of which  have
> been reported elsewhere):
>     1) none of the parent table ('book') CONSTRAINTS or INDEXES are
> propagated to the children.  This means if you want the children to
> have the same CONSTRAINTS - as you probably will - you need to build
> them yourself for each child table.

That's because PostgreSQL hasn't really solved the table-partitioning
problem yet, as I mentioned above.

>     2) the primary keys generated across the book, novel & textbook
> tables are completely uncouple and will definitely collide.  In other
> words, due to the fact that neither the SEQUENCE behind the
> 'book.id_pk' SERIAL field, not the PK CONSTRAINTS & INDEX that comes
> with that field will automatically propagate to the child tables.   That
> is why the SQL DDL given above has an 'id_pk' SERIAL field in  all 3
> tables.  There may be some conditions where you want those PKs  to be
> independent from one another, but those will be much less  frequent than
> the times when you will require they all derive from  the same SEQUENCE.

You answered this one below.

>     3) The fields inherited from the 'curation_info' table via the  LIKE
> modifier are in no way linked back to the table from which they
> originated, unlike a an Interface (in Java) or Mixin (in Ruby) would
> be.  If the 'mod_date' field is remove from 'curation_info' it will
> still remain in all the tables created using 'curation_info' prior to
> making that change.  Same is true if a new field is added to
> 'curation_info'.  If you want that field to be represented in all  those
> tables that had previously been created using the LIKE  'curation_info'
> modifier, you will have to re-CREATE those tables  from scratch.

An interface can't be changed after compile-time, right? So that part of
the analogy is a moot point.

And also, PostgreSQL provides the ALTER TABLE ... ADD COLUMN ..., so you
don't have to recreate the tables from scratch. Of course if you have a
lot of tables that could be a problem.

> As I see it, '1' & '3' above are significant drawbacks with no  obvious
> work-around, but they are not deal breakers.  I would still  have reason
> to want to use both Inheritance and LIKE because of the  efficiencies
> they provide.
>

What efficiencies, specifically?

> '2' above has a simple and obvious work around which I've been
> surprised I've not been able to find posted anywhere (leading me to
> believe I must be missing something).  PGSQL automatically builds a
> SEQUENCE object on SERIAL fields.  The name of the SEQUENCE is simply
> the concatenated name of the table + SERIAL field with '_seq'  appended
> to the end - e.g., for the 'book' table, it would be  'book_id_pk_seq'.
> In order to guarantee the child tables use that  same sequence, you
> simply declare them as follows instead of using  the SERIAL type:
>
> CREATE TABLE textbook (
>     id_pk                    INT8     DEFAULT nextval
> ('book_id_pk_seq')         NOT NULL,
>     subject_id_fk      INT        NOT NULL
> ) INHERITS (book);
> ALTER TABLE textbook ADD CONSTRAINT textbook_pk_c PRIMARY KEY(id_pk);
>
> Is it a pain to have to write - and maintain - this extra SQL DDL?
> Yes.  Having said that, will it provide the desired behavior?  Most
> definitely yes - again, unless I'm missing something.
>
> So this is how I plan to use INHERIT & LIKE.
>

If there was no such thing as INHERITS, you could basically replace each
"INHERITS" with a foreign key.

To insert a book, just insert into "book". To insert a textbook, insert
into "book" and "textbook".

Now, anytime you want to select all books (textbook, novel, anything),
you select from "book", and you only get the set of attributes
associated with all books (title, pub_year, etc).

When you want to select all novels, you join "book" and "novel" and you
get all the novels and all of the attributes associated with a novel
(which include all attributes associated with a book).

If you want to exclude some inherited type, use WHERE NOT EXISTS.

This model seems to mostly agree with your inheritance model. To make it
more sytactically clean, you might want to use views and rules.

The book id is globally unique, solving problem 2. Problem 1 is solved.
And problem 3 is pretty much as you left it. You could solve problem 3
by using foreign keys for that as well, but problem 3 wasn't a big
problem to begin with, because LIKE is only shorthand for defining the
table to begin with.

The main new problem this might create is more joins, which could be a
performance problem. You can largely solve that problem by adjusting the
physical layout as you see fit, and providing views which look and act
like a sane design. Possible physical layouts might be:
(1) one giant table with enough information to distinguish a novel from
another book, for instance
(2) separate tables which have copied attributes

The great part about this is that the performance problem can be solved
after the application is already running, and only after the performance
problem has appeared. You can create views and rename tables in atomic
transactions, and the application would never miss a beat.

You might be careful of the assumption at the beginning, that "table
heading = object class", or the similar assumptions, that "tuple =
application object variable" and that "table contents = collection of
application object variables". If you go too far down that road, your
relational database ends up being nothing more than an object
persistance system. You might forego some of the great benefits of a
relational database.

One benefit of a relational system is that you radically change the way
a single application (or single application version) reads and writes
data to a common central database without changing the physical layout
of the data. And conversely, you can radically change the physical
layout of the data without affecting the way any of your applications
read or write data. PostgreSQL provides the best mechanisms to do this
around, because schema changes are atomic transactional changes (perhaps
some other databases do this, I don't know).

I advise you to think of application object variables as entities that
are represented in a group of relations*, that may be virtual relations
(views) or may be physical relations (tables).

The reason for this is because there may come a time when your
application object class hierarchy changes. For instance, you may
introduce an intermediate class between book and novel. If your layout
is too dependent on your class hierarchy, you will have trouble adapting
your data (keep in mind the new applications still want to access the
old data, and old versions of the application may still want to access
the new data).

If you adopt a relational model, and you use the application object
class hierarchy as merely a mental model or goal for the way you'd like
to read and write data, you'll be in good shape. You have a set of very
well-defined relational operators from which you can create virtual
relations to solve whatever needs your application has, and you don't
need to even consider the effects on other applications (because those
other applications can continue as they were before). Every relational
operator results in another valid relation (the set of relations is
closed wrt all relational operators).

I'll ignore LIKE because it's really just a syntactic shorthand. But
INHERITS could create severe problems if a new application does not
treat a novel as a direct descendent of a book. In fact, it will create
severe problems if all of your applications' class heirarchies aren't
identical to the database layout. When you use INHERITS, you're locked
in, or rather your data is locked in.

There are many similar pitfalls any time you take the analogy of
"database table = application object class" or similar analogies too far.

You'll notice I avoided using the bare word "object". It's a little
unclear, so I tried to specify what I was really talking about, e.g.
"object class" (a type/domain), "object variable" (a variable that holds
an "object value"), or "object value" (a value in the domain defined by
an "object class").

I hope this is helpful. My word is by no means definitive, so determine
what benefits inheritance can offer you, and determine if those benefits
outweigh any potential pitfalls that I mention.

Regards,
    Jeff Davis

*: I oversimplified in that description, but basically what I was
getting at was an E/R model of data. You can search online, but it's the
basic system of one-one, one-many, and many-many relationships that can
represent entities and their relationships to eachother.

PS: I'd be interested to hear some follow-up. If you think something
doesn't belong on -general, feel free to email me directly.

Re: Inheritance vs. LIKE - need advice

From

Jeff Davis

Date:

10 August 2005, 17:09:32

William Bug wrote:
> As you say, both LIKE & INHERIT are a bit anemic and unnecessary,  since
> what they provide can be implemented via VIEWs and RULEs/ TRIGGERs.  I'd

I'd like to point out that INHERITS is unique, although I'm not sure all
of the exact differences. The main difference that I see is that SELECT
reads from multiple tables, and to make it only read one you do SELECT
... ONLY.

LIKE is purely syntactic sugar. Use it whenever it saves time and/or
reduces confusion.

> VIEWs, at least).  Being able to create  MATERIALIZED VIEWs would
> probably help to remove any realtime  performance issues introduced by
> requiring additional JOINs to  harvest this view of the data.  I can do
> this myself via RULEs/ TRIGGERs or use the system being developed as an
> extension to Pg -  The matview Project
> (http://gborg.postgresql.org/project/matview/ projdisplay.php).

Agreed. Materialized views are a great way to get whatever performance
benefits you need from physical representation without imposing on the
logical layout.

For some reason a good document on the subject is on a server that's not
responding right now (hopefully temporarily!). Here's the google cache:

<http://66.102.7.104/search?q=cache:jaaXngt0hioJ:jonathangardner.net/PostgreSQL/materialized_views/matviews.html+materialized+view+postgresql&hl=en>

> I actually think the INHERITs & LIKE features of PostgreSQL - in  their
> simplicity - potentially offer a better means to providing  Object
> properties to SQL relations, than the more complex, fully  realized
> Object-Relational systems, which are inherently better  suited to the
> task of providing a seamless persistence mechanism for  OOP code
> objects.  I do hope its not true Pg's INHERIT & LIKE  features are
> vestigial and will continue to be developed into the  future.  in the

I think I overstated what I meant in my previous email... it should be
more like "it's in a stasis" rather than "on life support". When PG
solves some of the table partitioning issues in future releases, you can
bet that those features will help complete the inheritance model. After
a while, INHERITS will also be merely a syntax for capabilities that are
available otherwise.

> meantime, in light of what you say regarding the lack  of active
> development and maintenance they are receiving, I'll  probably lay off
> using them much for now.  As you say, heavy use of  INHERITs given the
> current support given to this feature is more a  liability than a
> convenience at this point.

Not too many people use INHERITS. I think it's fairly independent in the
code and probably not too many bugs appear, but if a bug creeps in, the
limited testing might not catch it. Maybe a brief scan of the -bugs list
might indicate whether inheritance is a source of problems, or benign
syntax that primarily depends on other database features which are
well-tested.

>
> I wasn't clear about this in that initial post, but my references to
> OOP techniques were just by way of analogy.  I wasn't really asking  the
> question with a thought toward creating a model to mirror my OOP  models
> - to be simply a persistence mechanism.  There are many  wonderful

It was mainly just a warning that sometimes application algorithms tend
to mix with the data model.

> So - the real point I was trying to make is:
>     1) INHERIT & OOP inheritance: each provide a relatively  efficient
> means to model real-world objects with inherent parent- child,
> hierarchical relations to one another;

True, many real-world entities have the famous "isa" relationship, and
that's valueable to take into account in the data model.

>     2) LIKE & Interfaces(Java)/Mixins(Ruby): each provide a means to
> share a set of pre-defined attributes/methods amongst a set of class
> that otherwise have no inherent hierarchical relation to one another.
>
> You are certainly correct - Interfaces (Java) are essentially a
> compile-time enforcement mechanism.  Changes you make to an Interface
> after it has been used to generate bytecodes (i.e., compile) for a  Java
> class will not have any effect on the class definition until you
> compile it again.  Interfaces provide a fairly simple - but effective  -
> means to an end - for Java to provide for "inheriting" aspects from
> more than one existing code entity without supporting multiple
> inheritance (which Pg actually does support).  Mixins in Ruby, on the
> other hand, don't really get added to a class until runtime (though
> they are used at compile time to resolve function & variable calls).
> If you add to a Mixin, you could actually use that new feature next
> time you use a class whose definition file included that Mixin.  This
> is largely due to the fact that Ruby is interpreted at runtime and is  a
> very loosely typed language.

Definitely a tangent, but I think most people would consider Ruby
strongly typed. Consider:

$ ruby -e 'puts 1+"1"'
-e:1:in `+': String can't be coerced into Fixnum (TypeError)
        from -e:1
$ perl -e 'print 1+"1","\n";'
2

However, ruby is, like python, late-binding. That means a variable can
take on a value of any type, but it gets the type with the value. I
break it down kind of like this: Ruby and Python are strongly typed and
late binding; perl and PHP are weakly typed and late binding; C is
weakly typed and early binding; Java is strongly typed and early binding
(that isn't entirely true... dynamic binding is sort of like late
binding). That's not official or anything, just my opinion of the
languages that I use.

>
> I don't think I was very clear about this, but I was asking these
> questions about Pg's schema reusability mechanisms solely from the
> vantage of how it might save me time in writing and maintaining the  DDL
> SQL for my core relational model.  I was also interested in how  the
> INHERITANCE mechanism might make certain SQL operations easier to
> implement in the case of where you have real-world objects you are
> modeling which possess an inherent hierarchical relation to one
> another.  Finally, I wanted to know whether my thinking regarding  when
> to use LIKE over INHERIT - and visa versa - was correct given  their
> current implementation in Pg.

I think you have the right idea about LIKE and INHERITS. I thought your
comment about how you would be unlikely to want to query "curatable
entities" was an insightful way of looking at it.

I think overall, the most important thing is flexibility. I always think
to myself: how much work will it take if I wanted to change, or add
applications accessing the same data set?

If your applications have "SELECT ... ONLY" in them, what effect will
that have on your ability to change the physical layout (I actually
don't know the answer to that. If you create a view on some relations,
whether they're parent relations or not, and you "SELECT ... ONLY" on
the view, is the ONLY ignored? Can views resemble a part of an
inheritance hierarchy?)?

I would recommend that if you use INHERITS, always have a view in
between so that your application aren't tied to the physical layout in
any way. Because a view is not treated like a physical table in
inheritance (you can't INHERIT a view), it's too closely tied to the
physical layout.

> When I say "LIKE" offers some efficiencies, I mean just what I say
> above.  For instance, with the example I gave previously, if I have a
> set of fields - curation_info (creation_curator TEXT, creation_date
> TIMESTAMP, mod_curator TEXT, mod_date TIMESTAMP) - I want represented
> in all 100 tables in my model, I save a great deal of SQL writing by
> creating a curation_info table by including it in these tables via
> LIKE.  The fact the LIKE included table is in no way linked to the
> tables that include it after execution of the initial CREATE TABLE
> statement makes this a bit less useful than it might otherwise be.   If
> a change is made to the fields in the underlying 'curation_info'  table,
> you are correct, I can always use ALTER TABLE to update the  100 tables
> that included the fields from 'curation_info' later.   Since I can't use
> LIKE in a ALTER TABLE statement, however, I'd have  to specify each
> field to be changed in the ALTER statement for each  of the 100 tables.
> This forces me to write a lot more SQL than I'd  like to write.  Of
> course, to ask this of the "LIKE" function is to  ask for more than you
> get from a Java Interface and maybe it would  add unwanted
> inefficiencies to the query planner/execution framework.

LIKE actually fills in the fields at table creation time exactly as if
you wrote the field names out yourself, so I don't follow the
planner/execution comment. But I see what you're saying about adding
attributes if you have a lot of relations involved. Originally I was
unsure whether you meant "performance efficiency", but it's apparent you
mean "efficiency of DBA's time".

> Thanks very much for your insight on how the recent implementation of
> TABLESPACEs might lay the groundwork for adding new efficiencies to  the
> INDEXing and inherited CONSTRAINT implementation for INHERITed  tables.
> This would be a big plus.

Actually, I was talking about table partitioning, which is different
than a tablespace. A table space is a named storage "area" that you can
assign whole tables to (already implemented in 8.0). If you partition a
table, you break a table into pieces and those pieces can go to a
different place in physical storage. Partitioning is closely related to
INHERITS, because if they can implement constraints or keys across parts
of a table in different locations, they can do it for INHERITS also.

Regards,
    Jeff Davis

Re: Inheritance vs. LIKE - need advice

From

William Bug

Date:

11 August 2005, 00:28:31

Once again, many many thanks Jeff for taking the time to think
through these issues and provide your well-informed comments & opinions!

On Aug 10, 2005, at 4:09 PM, Jeff Davis wrote:

> William Bug wrote:
>
>> As you say, both LIKE & INHERIT are a bit anemic and unnecessary,
>> since
>> what they provide can be implemented via VIEWs and RULEs/
>> TRIGGERs.  I'd
>>
>
> I'd like to point out that INHERITS is unique,

a point well taken.  I can remember how excited I was when Oracle
first started providing object extensions (which provides richer
capabilities than INHERITs currently does).  Of course, that was a
good decade after PostgreSQL/post-Ingres had added INHERITs (http://
en.wikipedia.org/wiki/PostgreSQL).

> although I'm not sure all
> of the exact differences. The main difference that I see is that
> SELECT
> reads from multiple tables, and to make it only read one you do SELECT
> ... ONLY.
>
> LIKE is purely syntactic sugar. Use it whenever it saves time and/or
> reduces confusion.
>
>
>> VIEWs, at least).  Being able to create  MATERIALIZED VIEWs would
>> probably help to remove any realtime  performance issues
>> introduced by
>> requiring additional JOINs to  harvest this view of the data.  I
>> can do
>> this myself via RULEs/ TRIGGERs or use the system being developed
>> as an
>> extension to Pg -  The matview Project
>> (http://gborg.postgresql.org/project/matview/ projdisplay.php).
>>
>
> Agreed. Materialized views are a great way to get whatever performance
> benefits you need from physical representation without imposing on the
> logical layout.
>
> For some reason a good document on the subject is on a server
> that's not
> responding right now (hopefully temporarily!). Here's the google
> cache:
> <http://66.102.7.104/search?
> q=cache:jaaXngt0hioJ:jonathangardner.net/PostgreSQL/
> materialized_views/matviews.html+materialized+view+postgresql&hl=en>

Many thanks for this link.  I had grown quite addicted to
Materialized Views in Oracle, especially when working on OLAP
applications.  Though I've been a very happy convert to PostgreSQL
(for about 2 years ago), I've really missed having MATERIALIZED VIEWs
integrated into the core DDL SQL of the RDBMS.

>
>
>> I actually think the INHERITs & LIKE features of PostgreSQL - in
>> their
>> simplicity - potentially offer a better means to providing  Object
>> properties to SQL relations, than the more complex, fully  realized
>> Object-Relational systems, which are inherently better  suited to the
>> task of providing a seamless persistence mechanism for  OOP code
>> objects.  I do hope its not true Pg's INHERIT & LIKE  features are
>> vestigial and will continue to be developed into the  future.  in the
>>
>
> I think I overstated what I meant in my previous email... it should be
> more like "it's in a stasis" rather than "on life support". When PG
> solves some of the table partitioning issues in future releases,
> you can
> bet that those features will help complete the inheritance model.
> After
> a while, INHERITS will also be merely a syntax for capabilities
> that are
> available otherwise.
>
>
>> meantime, in light of what you say regarding the lack  of active
>> development and maintenance they are receiving, I'll  probably lay
>> off
>> using them much for now.  As you say, heavy use of  INHERITs given
>> the
>> current support given to this feature is more a  liability than a
>> convenience at this point.
>>
>
> Not too many people use INHERITS. I think it's fairly independent
> in the
> code and probably not too many bugs appear, but if a bug creeps in,
> the
> limited testing might not catch it. Maybe a brief scan of the -bugs
> list
> might indicate whether inheritance is a source of problems, or benign
> syntax that primarily depends on other database features which are
> well-tested.

Both are excellent points.  I will certainly check the bug lists
before getting too committed to using INHERITs, though, as you say,
hopefully the implementation relies on other components in the system
getting heavy use (and testing).

>
>
>>
>> I wasn't clear about this in that initial post, but my references to
>> OOP techniques were just by way of analogy.  I wasn't really
>> asking  the
>> question with a thought toward creating a model to mirror my OOP
>> models
>> - to be simply a persistence mechanism.  There are many  wonderful
>>
>
> It was mainly just a warning that sometimes application algorithms
> tend
> to mix with the data model.

I'm a VERY STRONG believer in keeping application requirements out of
the logical data model - probably too much so sometimes.  In general,
it has rarely served me wrong, when I've needed to go in and write a
wholly separate application to the same underlying data.  I really
appreciate your placing INHERITs in this context.  It would have
completely slipped by me, though it should have been obvious.

>
>
>> So - the real point I was trying to make is:
>>     1) INHERIT & OOP inheritance: each provide a relatively
>> efficient
>> means to model real-world objects with inherent parent- child,
>> hierarchical relations to one another;
>>
>
> True, many real-world entities have the famous "isa" relationship, and
> that's valueable to take into account in the data model.
>
>
>>     2) LIKE & Interfaces(Java)/Mixins(Ruby): each provide a means to
>> share a set of pre-defined attributes/methods amongst a set of class
>> that otherwise have no inherent hierarchical relation to one another.
>>
>> You are certainly correct - Interfaces (Java) are essentially a
>> compile-time enforcement mechanism.  Changes you make to an Interface
>> after it has been used to generate bytecodes (i.e., compile) for
>> a  Java
>> class will not have any effect on the class definition until you
>> compile it again.  Interfaces provide a fairly simple - but
>> effective  -
>> means to an end - for Java to provide for "inheriting" aspects from
>> more than one existing code entity without supporting multiple
>> inheritance (which Pg actually does support).  Mixins in Ruby, on the
>> other hand, don't really get added to a class until runtime (though
>> they are used at compile time to resolve function & variable calls).
>> If you add to a Mixin, you could actually use that new feature next
>> time you use a class whose definition file included that Mixin.  This
>> is largely due to the fact that Ruby is interpreted at runtime and
>> is  a
>> very loosely typed language.
>>
>
> Definitely a tangent, but I think most people would consider Ruby
> strongly typed. Consider:
>
> $ ruby -e 'puts 1+"1"'
> -e:1:in `+': String can't be coerced into Fixnum (TypeError)
>         from -e:1
> $ perl -e 'print 1+"1","\n";'
> 2
>
> However, ruby is, like python, late-binding. That means a variable can
> take on a value of any type, but it gets the type with the value. I
> break it down kind of like this: Ruby and Python are strongly typed
> and
> late binding; perl and PHP are weakly typed and late binding; C is
> weakly typed and early binding; Java is strongly typed and early
> binding
> (that isn't entirely true... dynamic binding is sort of like late
> binding). That's not official or anything, just my opinion of the
> languages that I use.

I really like your breakdown here.  You are absolutely right.  I most
definitely meant "late binding", not "loosely typed".  As a very
heavy user of Ruby - having received many a compiler error just like
the one you site - I should have remembered that distinction.  In
fact, it's Ruby's class typing that has made it such a powerful tool
for the work I do.  It has also enabled me to take scripts I quickly
work up in Ruby and port them straight to Java (where the bulk of my
production code lives) with relative ease.

>
>
>>
>> I don't think I was very clear about this, but I was asking these
>> questions about Pg's schema reusability mechanisms solely from the
>> vantage of how it might save me time in writing and maintaining
>> the  DDL
>> SQL for my core relational model.  I was also interested in how  the
>> INHERITANCE mechanism might make certain SQL operations easier to
>> implement in the case of where you have real-world objects you are
>> modeling which possess an inherent hierarchical relation to one
>> another.  Finally, I wanted to know whether my thinking regarding
>> when
>> to use LIKE over INHERIT - and visa versa - was correct given  their
>> current implementation in Pg.
>>
>
> I think you have the right idea about LIKE and INHERITS. I thought
> your
> comment about how you would be unlikely to want to query "curatable
> entities" was an insightful way of looking at it.
>
> I think overall, the most important thing is flexibility. I always
> think
> to myself: how much work will it take if I wanted to change, or add
> applications accessing the same data set?
>
> If your applications have "SELECT ... ONLY" in them, what effect will
> that have on your ability to change the physical layout (I actually
> don't know the answer to that. If you create a view on some relations,
> whether they're parent relations or not, and you "SELECT ... ONLY" on
> the view, is the ONLY ignored? Can views resemble a part of an
> inheritance hierarchy?)?

All excellent, subtle questions that will greatly effect the
implementation flexibility INHERITs provides.

>
> I would recommend that if you use INHERITS, always have a view in
> between so that your application aren't tied to the physical layout in
> any way. Because a view is not treated like a physical table in
> inheritance (you can't INHERIT a view), it's too closely tied to the
> physical layout.

I'm not certain I understand what you mean here?  Are you
recommending all application layer interaction with tables using
INHERIT should be done via a VIEW intermediary?  If so, wouldn't the
VIEW (built from a "SELECT ... ONLY...") then be as dependent on the
fixed structure determined by the INHERITs relationship, as much as
the application code would be?

>
>
>> When I say "LIKE" offers some efficiencies, I mean just what I say
>> above.  For instance, with the example I gave previously, if I have a
>> set of fields - curation_info (creation_curator TEXT, creation_date
>> TIMESTAMP, mod_curator TEXT, mod_date TIMESTAMP) - I want represented
>> in all 100 tables in my model, I save a great deal of SQL writing by
>> creating a curation_info table by including it in these tables via
>> LIKE.  The fact the LIKE included table is in no way linked to the
>> tables that include it after execution of the initial CREATE TABLE
>> statement makes this a bit less useful than it might otherwise
>> be.   If
>> a change is made to the fields in the underlying 'curation_info'
>> table,
>> you are correct, I can always use ALTER TABLE to update the  100
>> tables
>> that included the fields from 'curation_info' later.   Since I
>> can't use
>> LIKE in a ALTER TABLE statement, however, I'd have  to specify each
>> field to be changed in the ALTER statement for each  of the 100
>> tables.
>> This forces me to write a lot more SQL than I'd  like to write.  Of
>> course, to ask this of the "LIKE" function is to  ask for more
>> than you
>> get from a Java Interface and maybe it would  add unwanted
>> inefficiencies to the query planner/execution framework.
>>
>
> LIKE actually fills in the fields at table creation time exactly as if
> you wrote the field names out yourself, so I don't follow the
> planner/execution comment. But I see what you're saying about adding
> attributes if you have a lot of relations involved. Originally I was
> unsure whether you meant "performance efficiency", but it's
> apparent you
> mean "efficiency of DBA's time".

Absolutely, yes.  Since I need to wear several hats - DBA, database
programmer, Java GUI App architect/implementer - anything I can do to
maximize my efficiency in any one role helps to ensure I have at
least a little of my weekend time to myself.  :-)

>
>
>> Thanks very much for your insight on how the recent implementation of
>> TABLESPACEs might lay the groundwork for adding new efficiencies
>> to  the
>> INDEXing and inherited CONSTRAINT implementation for INHERITed
>> tables.
>> This would be a big plus.
>>
>
> Actually, I was talking about table partitioning, which is different
> than a tablespace. A table space is a named storage "area" that you
> can
> assign whole tables to (already implemented in 8.0). If you
> partition a
> table, you break a table into pieces and those pieces can go to a
> different place in physical storage. Partitioning is closely
> related to
> INHERITS, because if they can implement constraints or keys across
> parts
> of a table in different locations, they can do it for INHERITS also.

Sorry - you are right, of course.  I'm used to the Oracle
implementation of TABLESPACEs, where these two separate issues are
somewhat convolved together.

>
> Regards,
>     Jeff Davis
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: Inheritance vs. LIKE - need advice

From

Jeff Davis

Date:

15 August 2005, 16:54:18

William Bug wrote:
>
>
> I'm not certain I understand what you mean here?  Are you  recommending
> all application layer interaction with tables using  INHERIT should be
> done via a VIEW intermediary?  If so, wouldn't the  VIEW (built from a
> "SELECT ... ONLY...") then be as dependent on the  fixed structure
> determined by the INHERITs relationship, as much as  the application
> code would be?
>

Well, what I'm concerned about is this: you have an inheritance
hierarchy in PG, and some application has a "SELECT ... ONLY" in it. If
you want to change the inheritance hierarchy in PG around, you may not
be able to make it "look like" the old hierarchy to the application with
views.

If you use a view in between, maybe the view does the "SELECT ... ONLY".
That way, if you change the PG inheritance hierarchy, you can just
change the view without changing the application. The application would
never use "SELECT ... ONLY" so you would always have a way out if you
need it.

My basic philosophy here is that as long as you have a way out, it's not
 wrong.

Regards,
    Jeff Davis