relational class vs partitioned table (was Inherited indexes) - Mailing list pgsql-hackers
From | Trent Shipley |
---|---|
Subject | relational class vs partitioned table (was Inherited indexes) |
Date | |
Msg-id | 200510101958.50435.tshipley@deru.com Whole thread Raw |
In response to | Re: Inherited indexes. (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: relational class vs partitioned table (was
|
List | pgsql-hackers |
On Tuesday 2005-10-04 13:54, Simon Riggs wrote: > On Tue, 2005-10-04 at 18:16 +0200, Zeugswetter Andreas DAZ SD wrote: > > > Another possibility is optimizing for the special case of > > > indexing on a partitioning key. In this case, index values > > > would be very localized to one table, so just storing the > > > table info on each index page (or something similar) would work well. > > > > If you have the partitioning key in the index and the partitions don't > > overlap, it is better to create separate [unique] indexes on the > > subtables. > > Building separate indexes per partition is usually preferred because of: > > 1. performance of dropping a partition > > 2. smaller index for CE > > ... <snip> merge node strategy. </snip> > > Usually you will only want the "one big unique index" when the > > partitioning is not > > reflectable in the index keys, and then (also in other db's) such an > > index is usually a pain ... > > Agreed^2. The idea of a global index is a non-starter for all of the > reasons that Tom gave and the main one: Its's unusably huge. There's no > point in partitioning a 1TB table if you then have to build a 500GB > index on it. The tree would be so deep that each insert would require > maybe 3 I/Os on index branch blocks before you got to the leaf. Insert > performance would suck real bad, which is a blocker since if you have a > large table you almost certainly have a lot of data to load. If you > don't have a big table you shouldn't be partitioning it anyway. It has taken me a while to organize my thoughts on this post to the thread, but I am struck by the fact that what started out as a discussion of relational inheritance and support for multi-relation uniqueness by indexes morphed into a discussion of partitioning table storage and how that might be supported by indexes. It is possible that the topical change was simply due to the usual meandering of threads but I fear that instead it may not have been random but caused by conflating the inheritance problem with partitioning. The two problems have seeming similarities inasmuch as both involve multiple internal tables. Unfortunately, they are rather distinct. Partitioning is a database engineering tool to enhance the performance of HUGE databases, most notably biogenetic databases. Partitioning is a classic divide and conquer strategy -- the goal is to chop something unmanageably large into things that are manageably small. Just as critical partitioning is in no way a relational problem. It has no effect on data representation and it should be irrelevant to a database developer or a report writer. Partitioning is strictly a storage and implementation problem. In general, one partitions by a hash rule (usually a simple modulus operation on a serial number or OID) or by a range rule -- typically on a date-time or possibly postal codes. Since the partition rule is "published" the database engine can "hash" to the proper index or table. Note that just as one can have multi-dimensional arrays, partitioning presents an analogous data storage problem and there is no logical reason that a relation could not be partitioned by any number of ranges or hashes. In practice, this serves no useful performance advantage, but simply divides the logical table (relation) into many small physical tables. In Oracle indexes of attributes that are used as partitioning criteria are simply partitioned in parallel to the (logical) table they reference. This seems to be the sort of solution that the thread was heading toward. --- In contrast relational inheritance is a design tool that would allow polymorphic access to modestly large relations. Note that an instance of relational inheritance explicitly or implicitly establishes a relational class. Furtheremore, a relational class is a type of multi-relation. Relational inheritance provides partial semantic unification over a taxonomic space. The semantic element in relational inheritance is critical. With a relation partitioned on _sequence_result_, given any _sequence_result_ one knows in what physical table(s) to look up the associated tuple (because by definition partitioned relations have partition rules). With a multi-relation partially unified on _sequence_result_, however, there is no way one can know what member table has _sequence_result_'s tuple. In the case of a relational class (or any other partially unified multi-relation) you have to use a layer of indirection. Class-wide uniqueness (partial unification of multi-relations) can be solved by using a global index for the relational class. It cannot be solved using partitioning. In many ways the problems are quite distinct. ===== Of course, there is no reason a relation in a relational class might not be huge. By way of inclusion, a relational class containing one or more huge relations would also be huge. It seems to me that rather that partitioning member relations on unifying attribute(s), implementation would be easier if one were required to partition entire semi-unified multi-relations as a whole. Orthoganal partion rules would be created for the class. The rules would be applied to each member relation. Finally, the rules would be applied to the relevant unifying (presumably unique) indexes. But inasmuch as Postgresql has implemented neither partitioning nor unique constraints for relational classes we are getting somewhat ahead of ourselves. ==== Partitioning is obviously dominated by partitioning rules. Oracle's SQL dialect provides a negative example of how to elegantly incorporate partitioning rules into SQL. Ideally partitioning rules should be first-class objects. A database engineer or the poor DBA who inherits his implementation should be able to query the meta-data to get a listing of all partitioned relations.
pgsql-hackers by date: