Thread: On improving OO support in posgresql and relaxing oid bottleneck at the same time

On improving OO support in posgresql and relaxing oid bottleneck at the same time

From
"Maurice Gittens"
Date:
Hi,

I'm currently under the impression that the following change in the
postgresql system would benefict the overall performance and quality
of the system.

    Tuples for a class and all it's derived classes are stored in one file.

Advantages:
- Since all tuples for a given class hierarchy are stored in the same
physical file,
   oids now need only be unique to a single inheritance hierarchy
   (instead of unique in  each  posgresql installation).

   So no longer is there any _necessity_ for a systemwide unique oid.
   (This necessity existed because all objects by definition (in OO
sematics)
   must posses the identity property ("this" in C++/Java sometimes also
called "self"),
   and because instances of the same hierachy were stored in different files
   it was necesary to provide the identity property in a "file independant"
way.

   The bottleneck formed by the systemwide unique oid is replaced by a
   bottleneck for each inheritance hierarchy within an installation.
   If one doesn't use inheritance then it translates to a per table
bottleneck.
   (Which is what we have now anyway isn't it?).

- Indices, triggers, contraints, etc. are automatically inherited.
  so that we can showcase classic OO semantics (including polymorphism).

- Makes easy implementation of referential integrity for oids possible.

- It becomes possible to store more than 4Giga tuples
  on 32 bit systems

- given an instance of a class identified by an oid it is easy to determine
  the most derived class it belongs to.
  (This feature has been requested by a number of poeple on the
    questions list.)

- It is the first step to support tables with no oids at all (not that this
  is particularly interesting to me though). I'd suggest that system
catalogues
  keep their oids though our we would be in for a major rewrite I think.


Disadvantages
- sequential heapscans for tables _with_ derived classes will be less
efficient
  in general,   because now some tuples may have to be skipped since they
may
  belong to the wrong class. This is easily solved using indices.

- slight space overhead for tuple when not using inheritance.
  The space is used to tag each tuple with the most derived class it
  belongs to.

To improve OO support the implementation plan is to:
1. Add a system attribute to each heap tuple which identifies the most
derived
    class the instance belongs to. (easy; I think)
2. Store instances of derived classes in the same physical file as the top
    most base class. I hope that hacking heapopen() to tell it in which file
    it should look for tuples of a particular relation will be enough.
    Maybe this might have implications for caching etc. which I don't
understand.
    (difficult?)
3. modify the heap_scanning functions to support the new sceem. (easy; I
think)

Now for my questions.
- Is implementing the above major surgery?
- Am I missing something important?
- What do you guys think of this?

With regards from Maurice.




> I'm currently under the impression that the following change in the
> postgresql system would benefict the overall performance and quality
> of the system.
>
> Tuples for a class and all it's derived classes stored in one file.

I hate to sound like a "small thinker" here, but I'd be concerned about
some issues:

1) true OO semantics are difficult/impossible to accomplish with SQL.
This is one reason why Postgres is probably in the OR realm rather than
true OO.

2) Supporting inheritance using one-file storage probably leads to
larger overhead in _all_ file accesses, not just ones containing
inherited tables. Tuples would now contain a variable number of fields,
with variable definitions, with ... Ack! :)

3) Indices are fundamentally present to speed up access, though we use
them for other purposes too (such as enforcing uniqueness). Perhaps the
topic of inheritance, uniqueness, and referential integrity (foreign
keys, etc) should be solved (or at least discussed) independent of
indices, though indices or index-like structures may be involved in the
solution.

4) imho, the roughest areas of existing (or missing) capability in
Postgres involve array types and types which require additional support
information (such as exact numerics). Focusing on fixing/improving these
areas may lead to cleaning up semantics, mechanisms, and capabilities in
the backend, and make other (more derived?) features such as constraint
inheritance and enforcement easier to implement. Well, it will help
something anyway, even if not constraints :)

                      - Tom