Thread: Re: [HACKERS] On improving OO support in posgresql and relaxing oid bottleneck at the same time

-----Original Message-----
From: Thomas G. Lockhart <lockhart@alumni.caltech.edu>
To: Maurice Gittens <mgittens@gits.nl>
Cc: hackers@postgreSQL.org <hackers@postgreSQL.org>
Date: zondag 5 april 1998 23:56
Subject: Re: [HACKERS] On improving OO support in posgresql and relaxing oid
bottleneck at the same time


>> I'm currently under the impression that the following change in the
>> postgresql system would benefict the overall performance and quality
>> of the system.
>>
>> Tuples for a class and all it's derived classes stored in one file.
>
>I hate to sound like a "small thinker" here, but I'd be concerned about
>some issues:
>
>1) true OO semantics are difficult/impossible to accomplish with SQL.
>This is one reason why Postgres is probably in the OR realm rather than
>true OO.

Ok, let's be more specific. We'll define OO semantics as
support for:
1. identity
(We already have partial support); we just have to get the details right
2. inheritance
We allready have support. I'm suggesting an implementation which
is a better overall choice IMO. Because we avoid the system oid
unique oid requirement while it also provides for an improvent
in the support for polymorphism.
3. polymorphism
partially supported, but some necesary properties are not yet
inherited automatically. I believe overriding triggers
is likely to work automatically. There are however some
choices which we'll have to make.

As far as I see, these concepts can be implemented without
any changes to the current definition of the query language in
postgresql.

Encapsulation would seem to require new syntax. It also
seems not to fully fit in to the the relation model, so
we leave it out.

>
>2) Supporting inheritance using one-file storage probably leads to
>larger overhead in _all_ file accesses, not just ones containing
>inherited tables. Tuples would now contain a variable number of fields,
>with variable definitions, with ... Ack! :)

Yes but this overhead is very small for tables without inheritance.
An extra statement like:

heap_getnext(ScanDesc scanDesc)
{
...
while(!done)
{
   tuple = readTuple(...)
...
if (IsInstanceOf(tuple -> reloid, scanDesc -> reloid)
{
    return tuple;
}
...
}

The information on inheritance relations between classes can be precomputed
when a heap scandesc is created.

This IMO this overhead is not significant, when there is no inheritance.
When there is inheritance we simple use indices to speed things up,
if it's deemed necesary.

>
>3) Indices are fundamentally present to speed up access, though we use
>them for other purposes too (such as enforcing uniqueness). Perhaps the
>topic of inheritance, uniqueness, and referential integrity (foreign
>keys, etc) should be solved (or at least discussed) independent of
>indices, though indices or index-like structures may be involved in the
>solution.

Lets consider the following mail to the questions list by Brett McCormick
<brett@work.chicken.org> (copied from the list archive):

> I've got a table that has a primary key with a default of
> nextval('seq').  I've got another table which inherits this one, but
> it fails to inherit the unique btree index.  It does inherit the
> default value.  So, I'm assuming that if I create a unique index for
> that field on the child table, it won't keep you from inserting values
> that exist in that field in the parent table (and since they both
> share the same sequence, that's what I want).
>
> So primary keys do not work in this situation.  Are there plans to
> enhance the inheritance?  I have no idea how it works, is it
> intelligent?  Seems more klunky than not, but I haven't really looked
> at the code.  Should I stop using inheritance altogether, considering
> its drawbacks (no idea what child class it is in when selecting from
> parent and all children, no shared indices/pkeys) when I don't select
> from them all at once?

This person identifies a number of problems with the current system.
- no idea what child class it is when selecting from parent and all children
- no shared indices/primary keys
- no inheritance of unique attribute etc.

I can also add similar points
- triggers should also be inherited. This gives us polymorphism without
  without introducing any new syntax.
- etc.

I agree that conceptually indices are present only for speed. But the
reality is that by inheriting them we give users that which they
expect. (There are more emails like this one to be found on
the questions lists).

I think that what Brett wants to do is legitemate.
Storing the tuples of a same class hierarchy in different files is IMO
an unfortunate design choice of the original implementors
of postgresql.

The suggestion I'm making solves all of Brett's problems.
>
>4) imho, the roughest areas of existing (or missing) capability in
>Postgres involve array types and types which require additional support
>information (such as exact numerics). Focusing on fixing/improving these
>areas may lead to cleaning up semantics, mechanisms, and capabilities in
>the backend, and make other (more derived?) features such as constraint
>inheritance and enforcement easier to implement. Well, it will help
>something anyway, even if not constraints :)

I see that we have similar ideas about where the system should eventually
be. I do however believe that we'll get there by means of cleaning up
the semantics and then using these cleaned semantics to
make the system as a whole more conceptually pure.

In my experience systems which are conceptually pure can be
made to be very efficient.

I think that removing the oid bottleneck, while also solving
a number of fundamental problems (from an OO perspective)
with one and the same change, is a Good Thing (tm) -:).

Thanks, with regards from Maurice.



OO resources

From
Brett McCormick
Date:
are there some good, human-readable documents that outline these and
other basic OO concepts?  I've done some OO programming, but I'm fuzzy
on a lot of issues.  sorry to be so off-topic

--brett

On Mon, 6 April 1998, at 12:49:23, Maurice Gittens wrote:

> -----Original Message-----
> From: Thomas G. Lockhart <lockhart@alumni.caltech.edu>
> To: Maurice Gittens <mgittens@gits.nl>
> Cc: hackers@postgreSQL.org <hackers@postgreSQL.org>
> Date: zondag 5 april 1998 23:56
> Subject: Re: [HACKERS] On improving OO support in posgresql and relaxing oid
> bottleneck at the same time
>
>
> >> I'm currently under the impression that the following change in the
> >> postgresql system would benefict the overall performance and quality
> >> of the system.
> >>
> >> Tuples for a class and all it's derived classes stored in one file.
> >
> >I hate to sound like a "small thinker" here, but I'd be concerned about
> >some issues:
> >
> >1) true OO semantics are difficult/impossible to accomplish with SQL.
> >This is one reason why Postgres is probably in the OR realm rather than
> >true OO.
>
> Ok, let's be more specific. We'll define OO semantics as
> support for:
> 1. identity
> (We already have partial support); we just have to get the details right
> 2. inheritance
> We allready have support. I'm suggesting an implementation which
> is a better overall choice IMO. Because we avoid the system oid
> unique oid requirement while it also provides for an improvent
> in the support for polymorphism.
> 3. polymorphism
> partially supported, but some necesary properties are not yet
> inherited automatically. I believe overriding triggers
> is likely to work automatically. There are however some
> choices which we'll have to make.
>
> As far as I see, these concepts can be implemented without
> any changes to the current definition of the query language in
> postgresql.
>
> Encapsulation would seem to require new syntax. It also
> seems not to fully fit in to the the relation model, so
> we leave it out.
>
> >
> >2) Supporting inheritance using one-file storage probably leads to
> >larger overhead in _all_ file accesses, not just ones containing
> >inherited tables. Tuples would now contain a variable number of fields,
> >with variable definitions, with ... Ack! :)
>
> Yes but this overhead is very small for tables without inheritance.
> An extra statement like:
>
> heap_getnext(ScanDesc scanDesc)
> {
> ...
> while(!done)
> {
>    tuple = readTuple(...)
> ...
> if (IsInstanceOf(tuple -> reloid, scanDesc -> reloid)
> {
>     return tuple;
> }
> ...
> }
>
> The information on inheritance relations between classes can be precomputed
> when a heap scandesc is created.
>
> This IMO this overhead is not significant, when there is no inheritance.
> When there is inheritance we simple use indices to speed things up,
> if it's deemed necesary.
>
> >
> >3) Indices are fundamentally present to speed up access, though we use
> >them for other purposes too (such as enforcing uniqueness). Perhaps the
> >topic of inheritance, uniqueness, and referential integrity (foreign
> >keys, etc) should be solved (or at least discussed) independent of
> >indices, though indices or index-like structures may be involved in the
> >solution.
>
> Lets consider the following mail to the questions list by Brett McCormick
> <brett@work.chicken.org> (copied from the list archive):
>
> > I've got a table that has a primary key with a default of
> > nextval('seq').  I've got another table which inherits this one, but
> > it fails to inherit the unique btree index.  It does inherit the
> > default value.  So, I'm assuming that if I create a unique index for
> > that field on the child table, it won't keep you from inserting values
> > that exist in that field in the parent table (and since they both
> > share the same sequence, that's what I want).
> >
> > So primary keys do not work in this situation.  Are there plans to
> > enhance the inheritance?  I have no idea how it works, is it
> > intelligent?  Seems more klunky than not, but I haven't really looked
> > at the code.  Should I stop using inheritance altogether, considering
> > its drawbacks (no idea what child class it is in when selecting from
> > parent and all children, no shared indices/pkeys) when I don't select
> > from them all at once?
>
> This person identifies a number of problems with the current system.
> - no idea what child class it is when selecting from parent and all children
> - no shared indices/primary keys
> - no inheritance of unique attribute etc.
>
> I can also add similar points
> - triggers should also be inherited. This gives us polymorphism without
>   without introducing any new syntax.
> - etc.
>
> I agree that conceptually indices are present only for speed. But the
> reality is that by inheriting them we give users that which they
> expect. (There are more emails like this one to be found on
> the questions lists).
>
> I think that what Brett wants to do is legitemate.
> Storing the tuples of a same class hierarchy in different files is IMO
> an unfortunate design choice of the original implementors
> of postgresql.
>
> The suggestion I'm making solves all of Brett's problems.
> >
> >4) imho, the roughest areas of existing (or missing) capability in
> >Postgres involve array types and types which require additional support
> >information (such as exact numerics). Focusing on fixing/improving these
> >areas may lead to cleaning up semantics, mechanisms, and capabilities in
> >the backend, and make other (more derived?) features such as constraint
> >inheritance and enforcement easier to implement. Well, it will help
> >something anyway, even if not constraints :)
>
> I see that we have similar ideas about where the system should eventually
> be. I do however believe that we'll get there by means of cleaning up
> the semantics and then using these cleaned semantics to
> make the system as a whole more conceptually pure.
>
> In my experience systems which are conceptually pure can be
> made to be very efficient.
>
> I think that removing the oid bottleneck, while also solving
> a number of fundamental problems (from an OO perspective)
> with one and the same change, is a Good Thing (tm) -:).
>
> Thanks, with regards from Maurice.
>
>

Re: [HACKERS] On improving OO support in posgresql and relaxing oid bottleneck at the same time

From
dg@illustra.com (David Gould)
Date:
Maurice Gittens>
> >> I'm currently under the impression that the following change in the
> >> postgresql system would benefict the overall performance and quality
> >> of the system.
> >>
> >> Tuples for a class and all it's derived classes stored in one file.
> >
> >I hate to sound like a "small thinker" here, but I'd be concerned about
> >some issues:
> >
...
> >2) Supporting inheritance using one-file storage probably leads to
> >larger overhead in _all_ file accesses, not just ones containing
> >inherited tables. Tuples would now contain a variable number of fields,
> >with variable definitions, with ... Ack! :)
>
> Yes but this overhead is very small for tables without inheritance.
> An extra statement like:

Anything that gets done for every row is on _the_ critical path. Any extra
code here will have a performance penalty. We are already an order of
magnitude too slow on scans. Think in terms of a few hundred instructions
per row.

I will also say that table inheritance is rarely used in real applications.
Partly no doubt this is because the implementation is not wonderful, but
I also think that it may be one of those ideas like time travel that
sound great but in practice noone can figure out a use for it.

> heap_getnext(ScanDesc scanDesc)
> {
> ...
> while(!done)
> {
>    tuple = readTuple(...)
> ...
> if (IsInstanceOf(tuple -> reloid, scanDesc -> reloid)
> {
>     return tuple;
> }
> ...
> }
>
> The information on inheritance relations between classes can be precomputed
> when a heap scandesc is created.
>
> This IMO this overhead is not significant, when there is no inheritance.
> When there is inheritance we simple use indices to speed things up,
> if it's deemed necesary.

I disagree, all per row overhead is significant. The primary operation in
the system is sifting rows.

But this is just the start of the extra overhead. What about the expression
evaluator trying to determine if this tuple matchs the where clause. Now it
has to determine column offset and type and "Equal" function etc for
each row.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - Linux. Not because it is free. Because it is better.