Table Partitioning, Part 1 - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Table Partitioning, Part 1 |
Date | |
Msg-id | 1115677858.3830.131.camel@localhost.localdomain Whole thread Raw |
Responses |
Re: Table Partitioning, Part 1
Re: Table Partitioning, Part 1 |
List | pgsql-hackers |
A more in-depth consideration of the major design options and trade-offs available to us... this is an internals related discussion. Comments? 1. Embellish inheritance or separate infrastructure? Inheritance is a somewhat strange PostgreSQL feature, with a few gotchas. The "big one" is the lack of a multi-relation index that can be used against all of the inherited relations also. The full complexity of multiple inheritance, possible multi-level inheritance doesn't seem to solve any problems for us. The main argument in favour of improving inheritance as a route to partitioning is that many people have already developed applications using it to provide some of the desirable aspects of partitioning. The amount of code used to provide some of these features is actually quite low, and the code itself is already specialised. It seems prudent to avoid building on that foundation, even though we may decide to use some similar approaches. 2. Individual Relations explicitly in the plan or MultiRelation plan nodes? (i.e. is a SeqScan of a Partitioned Table one Node or many nodes?) Currently, Inheritance produces a plan for a SeqScan that looks like this... Append SeqScan SeqScan SeqScan ISTM fairly straightforward to produce a similar "static" plan along the same lines, using Result nodes to implement Partition Elimination. Append Result SeqScan Result SeqScan Result SeqScan Some may think that looks a little clunky, especially since the plan length would increase as the number of Partitions increased. An alternative, would be to have a single Scan node that performs a Multi- Partition Scan, which looks like this... PartitionSeqScan There'd always one node, no matter how many partitions. Again, easy to implement, since we just add a new type of Scan access method, or alter the existing ones iff we access partitioned tables. We'd be able to use physical tlists, and all the complexity is hidden. Both solve the basic case of Direct restriction of Partitioning Key (PPUC1) The multi-scan node stops us from doing a mixed Scan plan: Append Result IndexScan Result SeqScan Result SeqScan Why? The node acts at execution time, though the planner is the usual place to decide between scan types. But why do we need that? Why not just make all partitions the same size, give them all the same Statistics etc.. Well, there are Use Cases that strongly suggest that partition size could vary considerably (PPUC1) and also that there is some incentive to have tables that have indexes on the most recent partitions yet none on the older partitions (PPUC5.2) My current thinking is to implement explicitly partition plans. But what about join plans? There are Use Cases that depend almost entirely upon joins for Partition Elimination rather than direct restriction of the PPK. (PPUC2) How could a join plan be performed with the explicit plan shown above? The only option would seem to be a Sort/Merge, which could be a disaster if we only wanted a few rows returned from a very large table. If all partitions in the query had identical indexes on them, then we have another option. In that case, each index could be thought to form part of a larger index ordered initially on the Partitioning Key (PPK). If the first column was actually the PPK, then the set of indexes would be exactly equivalent to a Global Index. We can call this a Pseudo Global Index. The Pseudo Global Index could also be used to enforce uniqueness. If all of the composite indexes were defined unique and the index contained the PPK as one of its columns, this would work. The index enforces uniqueness within each partition and the PPK enforces uniqueness across partitions because the same PPK value cannot be in two partitions. We can also use the Pseudo Global Index as the object of a multi-index scan node, since the PGI would be ordered on the PPK. The plan node would use the partition boundary definitions to locate the starting partition, then seek within the partition to the correct part of the index then scan sideways, skipping to the next partition as required. This sounds a great idea, but it is exactly the technique I argued against only 4 paragraphs ago for use with SeqScans. My current thinking is that we need a mix of the two approaches, though I raise all of this here to gain everybody's input... 3. Partitions are relations? Or sub-relations? If we ever want to implement explicit plans, as discussed above, ISTM that we should consider partitions to be full relations. If partitions become sub-relations then we complicate many other parts of the machinery in the planner. This would also complicate many other things, since each rel currently has a single relfilenode for almost every aspect. We might think about doing this as a way of making "Global Indexes". If its all one relation, we already have them! But that would only work for indexes defined with the PPK as the leading column because otherwise we would have no way of adding or dropping new partitions without ripping the guts out of each index (and failing PPUC4.2 as a result). It turns out that we can have roughly the same kind of Pseudo Global Index whether we consider partitions to be relations or sub-relations. It seems much simpler to have partitions as relations... 4. Should all partitions have the same indexes? It also seems that we have somewhat opposing Use Cases: PPUC2 requires joins to achieve Partition Elimination, PPUC 5.2 requires us to have no indexes on older partitions (since they reside on tape). Some people will say that having all indexes defined the same eases maintenance. Others will say it gives us flexibility. I would argue in favour of flexibility now and automation later. 5. Constraints or specific Partitioning syntax? Partition Elimination relies upon being able to prove at execution time that two clauses in a query restriction are always false when taken together. Taken generically, extending the concept of Partitioning to the max, we might imagine that we could take any number of constraints, then AND them together with an arbitrarily complex restriction clause, solve it and then eliminate partitions. That seems too much work to solve that generically and provably. I aim for an easier subset, with a more bounded solution algorithm. If we know which constraints are to be used for Partition Elimination, and we make some restrictions on the types of restricion clauses that can take part in Partition Elimination, then the task gets easier. So, I think this requires explicit syntax, even though I am very attracted to minimal syntax implementations. Anyway, that was a bit of a ramble, so thanks for reading this far. Best regards, Simon Riggs
pgsql-hackers by date: