On improving OO support in posgresql and relaxing oid bottleneck at the same time - Mailing list pgsql-hackers

From Maurice Gittens
Subject On improving OO support in posgresql and relaxing oid bottleneck at the same time
Date
Msg-id 009e01bd6097$31a58060$fcf3b2c2@caleb..gits.nl
Whole thread Raw
List pgsql-hackers
Hi,

I'm currently under the impression that the following change in the
postgresql system would benefict the overall performance and quality
of the system.

    Tuples for a class and all it's derived classes are stored in one file.

Advantages:
- Since all tuples for a given class hierarchy are stored in the same
physical file,
   oids now need only be unique to a single inheritance hierarchy
   (instead of unique in  each  posgresql installation).

   So no longer is there any _necessity_ for a systemwide unique oid.
   (This necessity existed because all objects by definition (in OO
sematics)
   must posses the identity property ("this" in C++/Java sometimes also
called "self"),
   and because instances of the same hierachy were stored in different files
   it was necesary to provide the identity property in a "file independant"
way.

   The bottleneck formed by the systemwide unique oid is replaced by a
   bottleneck for each inheritance hierarchy within an installation.
   If one doesn't use inheritance then it translates to a per table
bottleneck.
   (Which is what we have now anyway isn't it?).

- Indices, triggers, contraints, etc. are automatically inherited.
  so that we can showcase classic OO semantics (including polymorphism).

- Makes easy implementation of referential integrity for oids possible.

- It becomes possible to store more than 4Giga tuples
  on 32 bit systems

- given an instance of a class identified by an oid it is easy to determine
  the most derived class it belongs to.
  (This feature has been requested by a number of poeple on the
    questions list.)

- It is the first step to support tables with no oids at all (not that this
  is particularly interesting to me though). I'd suggest that system
catalogues
  keep their oids though our we would be in for a major rewrite I think.


Disadvantages
- sequential heapscans for tables _with_ derived classes will be less
efficient
  in general,   because now some tuples may have to be skipped since they
may
  belong to the wrong class. This is easily solved using indices.

- slight space overhead for tuple when not using inheritance.
  The space is used to tag each tuple with the most derived class it
  belongs to.

To improve OO support the implementation plan is to:
1. Add a system attribute to each heap tuple which identifies the most
derived
    class the instance belongs to. (easy; I think)
2. Store instances of derived classes in the same physical file as the top
    most base class. I hope that hacking heapopen() to tell it in which file
    it should look for tuples of a particular relation will be enough.
    Maybe this might have implications for caching etc. which I don't
understand.
    (difficult?)
3. modify the heap_scanning functions to support the new sceem. (easy; I
think)

Now for my questions.
- Is implementing the above major surgery?
- Am I missing something important?
- What do you guys think of this?

With regards from Maurice.




pgsql-hackers by date:

Previous
From: "Vadim B. Mikheev"
Date:
Subject: Re: [HACKERS] Everything leaks; How it mm suppose to work?
Next
From: "Thomas G. Lockhart"
Date:
Subject: Re: [HACKERS] Open 6.3.1 issues