Thread: On improving OO support in posgresql and relaxing oid bottleneck at the same time
On improving OO support in posgresql and relaxing oid bottleneck at the same time
From
"Maurice Gittens"
Date:
Hi, I'm currently under the impression that the following change in the postgresql system would benefict the overall performance and quality of the system. Tuples for a class and all it's derived classes are stored in one file. Advantages: - Since all tuples for a given class hierarchy are stored in the same physical file, oids now need only be unique to a single inheritance hierarchy (instead of unique in each posgresql installation). So no longer is there any _necessity_ for a systemwide unique oid. (This necessity existed because all objects by definition (in OO sematics) must posses the identity property ("this" in C++/Java sometimes also called "self"), and because instances of the same hierachy were stored in different files it was necesary to provide the identity property in a "file independant" way. The bottleneck formed by the systemwide unique oid is replaced by a bottleneck for each inheritance hierarchy within an installation. If one doesn't use inheritance then it translates to a per table bottleneck. (Which is what we have now anyway isn't it?). - Indices, triggers, contraints, etc. are automatically inherited. so that we can showcase classic OO semantics (including polymorphism). - Makes easy implementation of referential integrity for oids possible. - It becomes possible to store more than 4Giga tuples on 32 bit systems - given an instance of a class identified by an oid it is easy to determine the most derived class it belongs to. (This feature has been requested by a number of poeple on the questions list.) - It is the first step to support tables with no oids at all (not that this is particularly interesting to me though). I'd suggest that system catalogues keep their oids though our we would be in for a major rewrite I think. Disadvantages - sequential heapscans for tables _with_ derived classes will be less efficient in general, because now some tuples may have to be skipped since they may belong to the wrong class. This is easily solved using indices. - slight space overhead for tuple when not using inheritance. The space is used to tag each tuple with the most derived class it belongs to. To improve OO support the implementation plan is to: 1. Add a system attribute to each heap tuple which identifies the most derived class the instance belongs to. (easy; I think) 2. Store instances of derived classes in the same physical file as the top most base class. I hope that hacking heapopen() to tell it in which file it should look for tuples of a particular relation will be enough. Maybe this might have implications for caching etc. which I don't understand. (difficult?) 3. modify the heap_scanning functions to support the new sceem. (easy; I think) Now for my questions. - Is implementing the above major surgery? - Am I missing something important? - What do you guys think of this? With regards from Maurice.
Re: [HACKERS] On improving OO support in posgresql and relaxing oid bottleneck at the same time
From
"Thomas G. Lockhart"
Date:
> I'm currently under the impression that the following change in the > postgresql system would benefict the overall performance and quality > of the system. > > Tuples for a class and all it's derived classes stored in one file. I hate to sound like a "small thinker" here, but I'd be concerned about some issues: 1) true OO semantics are difficult/impossible to accomplish with SQL. This is one reason why Postgres is probably in the OR realm rather than true OO. 2) Supporting inheritance using one-file storage probably leads to larger overhead in _all_ file accesses, not just ones containing inherited tables. Tuples would now contain a variable number of fields, with variable definitions, with ... Ack! :) 3) Indices are fundamentally present to speed up access, though we use them for other purposes too (such as enforcing uniqueness). Perhaps the topic of inheritance, uniqueness, and referential integrity (foreign keys, etc) should be solved (or at least discussed) independent of indices, though indices or index-like structures may be involved in the solution. 4) imho, the roughest areas of existing (or missing) capability in Postgres involve array types and types which require additional support information (such as exact numerics). Focusing on fixing/improving these areas may lead to cleaning up semantics, mechanisms, and capabilities in the backend, and make other (more derived?) features such as constraint inheritance and enforcement easier to implement. Well, it will help something anyway, even if not constraints :) - Tom