Thread: Join optimization for inheritance tables
Hi all, We are working with Aster for the summer and we would like to bounce some ideas that we are having for some possible PostgreSQLextensions. In order to describe our ideas we will use the following example: create table msg( msg_id int, msg text ); create table receiver( msg_id int, user_id int, ts timestamp ); create table msg_100( check ( 1 <= msg_id and msg_id < 100 ) ) inherits (msg); create table msg_200( check ( 100 <= msg_id and msg_id < 200 ) ) inherits (msg); create table msg_300( check ( 200 <= msg_id and msg_id < 300 ) ) inherits (msg); create table receiver_100( check ( 1 <= msg_id and msg_id < 100 ) ) inherits (receiver); create table receiver_200( check ( 100 <= msg_id and msg_id < 200 ) ) inherits (receiver); create table receiver_300( check ( 200 <= msg_id and msg_id < 300 ) ) inherits (receiver); When we are issuing queries on one of the parent tables, like, SELECT * FROM msg WHERE msg_id BETWEEN 50 AND 70; PostgreSQL is smart enough to filter out child tables with check constraints that are refuted by the filter conditions. Inthis example, the optimizer will pick a plan that only considers the parent table 'msg' and one of the child tables 'msg_100': Result -> Append -> Seq Scan on msg Filter: ((msg_id >= 50) AND (msg_id <= 70)) -> Seq Scan on msg_100 msg Filter: ((msg_id >= 50) AND (msg_id <= 70)) Plan costs are removed for simplicity of the presentation. Now, if we issue a join query between the two parent tables, like, SELECT * FROM msg m JOIN receiver r ON m.msg_id = r.msg_id; the execution plan will be: Merge Join Merge Cond: (m.msg_id = r.msg_id) -> Sort Sort Key: m.msg_id -> Append -> Seq Scan on msg m -> Seq Scan on msg_100 m -> Seq Scan on msg_200m -> Seq Scan on msg_300 m -> Sort Sort Key: r.msg_id -> Append -> Seq Scan on receiver r -> Seq Scan on receiver_100 r -> Seq Scan onreceiver_200 r -> Seq Scan on receiver_300 r During the planning phase, the optimizer treats an entire hierarchy as a single entity. Hence, it first considers the mostefficient way to create the append paths for the two hierarchies, and then the best way to join them. However, thereare some optimizations that are possible here, similar to the table filtering described above. In particular, insteadof joining the two appends, we could "push down" the join to the child relations - that is, create pairwise joinsbetween the children and then append the join results together. Based on the check conditions of the children and the join predicate, it is possible to filter out joins that cannot produceany results. For example, joining 'msg_100' with 'receiver_300' is redundant since the check constraints of thesetwo tables do not overlap. Tuples in 'msg_100' have 'msg_id' between 1 and 100, whereas tuples in 'receiver_300' have'msg_id' between 200 and 300. Therefore, no tuples can be produce from this join. A plan with such optimizations could be: Result -> Append -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scanon msg msg -> Hash -> Seq Scan on receiver receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg msg -> Hash -> Seq Scan on receiver_100 receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg msg -> Hash -> Seq Scan on receiver_200 receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg msg -> Hash -> Seq Scan on receiver_300 receiver -> Hash Join Hash Cond: (msg.msg_id= receiver.msg_id) -> Seq Scan on msg_100 msg -> Hash -> Seq Scanon receiver receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> SeqScan on msg_100 msg -> Hash -> Seq Scan on receiver_100 receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg_200 msg -> Hash -> Seq Scan on receiver receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg_200 msg -> Hash -> Seq Scan on receiver_200receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scanon msg_300 msg -> Hash -> Seq Scan on receiver receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg_300 msg -> Hash -> Seq Scan on receiver_300 receiver The parent tables don't have any check constraints in the above example and hence will need to join with all the child relations.However, based on usage and feedback among Aster's customers, we believe that it is common practice not to storeany data in the parent tables. Therefore, we were thinking of creating an "Empty Check Constraint". This constraintwill enforce that a particular table is empty and will prevent the insertion of any rows into it. If the parenttables have the empty check constraint, then the optimizer will be able to further eliminate some of the child joins. A plan with all optimizations and features suggested could be: Result -> Append -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scanon msg_100 msg -> Hash -> Seq Scan on receiver_100 receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg_200 msg -> Hash -> Seq Scan on receiver_200 receiver -> Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) -> Seq Scan on msg_300 msg -> Hash -> Seq Scan on receiver_300receiver In summary, we are making two suggestions: 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together. 2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty. We've created some basic implementation, just to verify the benefits of these ideas. The initial results seem promising aswe observed decreased execution time and for large data set the query execution time could be several times faster thanbefore. We would greatly appreciate your comments, thoughts or suggestions that you might have regarding these ideas. Regards, Nedyalko Borisov and Herodotos Herodotou
Nedyalko Borisov <nedyalko@asterdata.com> writes: > In summary, we are making two suggestions: > 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together. We already handle this for the case where the join is nestloop with inner index scan, and I'm not convinced that there's any real gain to be had for other join types. > 2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty. The trouble with that is that a constraint that doesn't propagate to its child tables is a weird beast that I'd just as soon not invent. We are currently thinking about inventing an explicit notion of partitioned tables. If we had that, it would be reasonable to have a special kind of "parent" table for a partitioned set and refuse to allow any data in that relation. But I'm not excited about contorting the general constraint mechanism in the way that would be necessary to express this as a constraint. regards, tom lane
Tom Lane wrote: > Nedyalko Borisov <nedyalko@asterdata.com> writes: > >> In summary, we are making two suggestions: >> 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together. >> > > We already handle this for the case where the join is nestloop with > inner index scan, and I'm not convinced that there's any real gain to be > had for other join types. > > From OLTP perspective this proposal won't introduce any benefits due to the fact that queries operate on small parts of the data, so we can add a flag that will disable/enable the inherited join. However, the OLAP queries process significant amount of data and to leverage this fact the DB admins partition the data. We think that the optimizer should take advantage of this partitioning and consider plans where the joins are performed on small parts of the data. For example, typical observed scenario is: optimizer chooses Merge Join for two partitioned tables like the plan below:Merge Cond: (table1.id = table2.id) -> Sort Sort Key: id ->Append -> Seq Scan on table1_part1 -> Seq Scan on table1_part2 -> .... -> Seq Scanon table1_partN -> Sort Sort Key: id -> Append -> Seq Scan on table2_part1 -> Seq Scan on table2_part2 -> .... -> Seq Scan on table2_partM This plan ignores the fact there are indexes on the table2 partitions and that the pairwise partitions joins (index nested loop or hash join) will be faster than scanning all the partitions and sorting them. To see the effect of the pairwise joins we performed some experiments with the initial implementation. The experiments consisted of joining two partitioned tables where each of the tables have around 200 children and the 2 int columns id and count. We generated data of different sizes and measured the execution times and here are the results: 0.5 million records -> regular plan 0.69s -> modified plan 0.51 1 million records -> regular plan 2.9s -> modified plan 1 2.5 million records -> regular plan 4.17s -> modified plan 2.28 5 million records -> regular plan 11.25s -> modified plan 4.46 Increasing the data size or adding more columns will increase the difference between the current plan that the database picks and the proposed modification of the plans. Thus, we thing that it might be useful if the optimizer considers plans with inherited joins. >> 2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty. >> > > The trouble with that is that a constraint that doesn't propagate to its > child tables is a weird beast that I'd just as soon not invent. > > We are currently thinking about inventing an explicit notion of > partitioned tables. If we had that, it would be reasonable to have > a special kind of "parent" table for a partitioned set and refuse to > allow any data in that relation. But I'm not excited about contorting > the general constraint mechanism in the way that would be necessary to > express this as a constraint. > > OK, implementing a special "abstract"/"parent" table would make more sense. In this line of thoughts could you elaborate on the explicit notion of partitioned tables or give us some references. Thanks, Nedyalko Borisov and Herodotos Herodotou > regards, tom lane >
On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko@asterdata.com> wrote: > For example, typical observed scenario is: optimizer chooses Merge Join for > two partitioned tables like the plan below: > Merge Cond: (table1.id = table2.id) > -> Sort > Sort Key: id > -> Append > -> Seq Scan on table1_part1 > -> Seq Scan on table1_part2 > -> .... > -> Seq Scan on table1_partN > -> Sort > Sort Key: id > -> Append > -> Seq Scan on table2_part1 > -> Seq Scan on table2_part2 > -> .... > -> Seq Scan on table2_partM > > This plan ignores the fact there are indexes on the table2 partitions and > that the pairwise partitions joins (index nested loop or hash join) will be > faster than scanning all the partitions and sorting them. To some degree my merge-append patch would mitigate this case. It would allow the use of indexes on some or all the partitions to avoid the sorts. However it would still force all the partitions to be appended on each side and then merged. If we could match up all the partitions then I think this plan would be faster with the Append on top and separate merge joins for each pair of partitions. Aside from skipping the cpu cost of the merge-append I think it would win for a few other reasons as well. Each join would be able to come up with much better statistics which would enable it to pick a better join when one is available. Even if the planner still picks a merge join it would be much more likely to finish early and skip the remainder of a partition on one side or the other. > OK, implementing a special "abstract"/"parent" table would make more sense. I had in mind to tackle this in a bit of a roundabout way. If we mark the parent table "read-only" then notice that all tuples (all 0 of them) in the table are frozen then we can discard that table from the plans. Since changing the "read-only" attribute would have to be treated as a DDL operation which would invalidate any cached plans we can trust that it won't change as long as the plan lives so no new tuples can be inserted. The reason I wanted to take such a roundabout route instead of having an "abstract" or "empty" property is that a wanted to generalize this. Once we know a table is read-only then there are lots of properties we could find useful in planning aside from emptiness. We could have statistics like the minimum and maximum value for a column which the planner would be able to trust and exclude partitions without having to explicitly declare constraints on every column. This is all just my musings, not any kind of consensus. Does it make sense to others or is it too baroque when a simple "abstract" flag would do? -- greg http://mit.edu/~gsstark/resume.pdf
This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together. Short description: when the optimizer considers join paths between two tables with child tables, it only creates join pathsover two append paths. In this patch we extend the optimizer to consider joins between the child tables from the twoparent relations, based on the child table constraints. The idea is that only child tables with overlapping constraintsneed to be joined, and not the entire hierarchies with each other. This optimization can provide significant benefitsin terms of execution time. A short motivation example as well as the overall algorithm description could be foundin the attached presentation. Herodotos & Nedyalko ________________________________________ From: gsstark@gmail.com [gsstark@gmail.com] On Behalf Of Greg Stark [gsstark@mit.edu] Sent: Monday, July 06, 2009 3:14 PM To: Nedyalko Borisov Cc: Tom Lane; pgsql-hackers@postgresql.org; Herodotos Herodotou; Eric Friedman; John Cieslewicz; Dheeraj Pandey Subject: Re: Join optimization for inheritance tables On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko@asterdata.com> wrote: > For example, typical observed scenario is: optimizer chooses Merge Join for > two partitioned tables like the plan below: > Merge Cond: (table1.id = table2.id) > -> Sort > Sort Key: id > -> Append > -> Seq Scan on table1_part1 > -> Seq Scan on table1_part2 > -> .... > -> Seq Scan on table1_partN > -> Sort > Sort Key: id > -> Append > -> Seq Scan on table2_part1 > -> Seq Scan on table2_part2 > -> .... > -> Seq Scan on table2_partM > > This plan ignores the fact there are indexes on the table2 partitions and > that the pairwise partitions joins (index nested loop or hash join) will be > faster than scanning all the partitions and sorting them. To some degree my merge-append patch would mitigate this case. It would allow the use of indexes on some or all the partitions to avoid the sorts. However it would still force all the partitions to be appended on each side and then merged. If we could match up all the partitions then I think this plan would be faster with the Append on top and separate merge joins for each pair of partitions. Aside from skipping the cpu cost of the merge-append I think it would win for a few other reasons as well. Each join would be able to come up with much better statistics which would enable it to pick a better join when one is available. Even if the planner still picks a merge join it would be much more likely to finish early and skip the remainder of a partition on one side or the other. > OK, implementing a special "abstract"/"parent" table would make more sense. I had in mind to tackle this in a bit of a roundabout way. If we mark the parent table "read-only" then notice that all tuples (all 0 of them) in the table are frozen then we can discard that table from the plans. Since changing the "read-only" attribute would have to be treated as a DDL operation which would invalidate any cached plans we can trust that it won't change as long as the plan lives so no new tuples can be inserted. The reason I wanted to take such a roundabout route instead of having an "abstract" or "empty" property is that a wanted to generalize this. Once we know a table is read-only then there are lots of properties we could find useful in planning aside from emptiness. We could have statistics like the minimum and maximum value for a column which the planner would be able to trust and exclude partitions without having to explicitly declare constraints on every column. This is all just my musings, not any kind of consensus. Does it make sense to others or is it too baroque when a simple "abstract" flag would do? -- greg http://mit.edu/~gsstark/resume.pdf
Attachment
Herodotos Herodotou escreveu: > This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together. > > Short description: when the optimizer considers join paths between two tables with child tables, it only creates join pathsover two append paths. In this patch we extend the optimizer to consider joins between the child tables from the twoparent relations, based on the child table constraints. The idea is that only child tables with overlapping constraintsneed to be joined, and not the entire hierarchies with each other. This optimization can provide significant benefitsin terms of execution time. A short motivation example as well as the overall algorithm description could be foundin the attached presentation. > I added it to the next commitfest [1]. [1] https://commitfest.postgresql.org/action/commitfest_view?id=3 -- Euler Taveira de Oliveira http://www.timbira.com/
Herodotos Herodotou <Herodotos.Herodotou@asterdata.com> writes: > This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together. I looked over this patch a bit. I am of the opinion that this is a lot of work towards a dead-end direction :-(. The patch contains a large amount of rather inefficient code that attempts to reverse-engineer knowledge about whether an inheritance tree forms a range-partitioned table. We already know that reasoning from first principles using a set of arbitrary per-table constraints doesn't scale very well, and this would make it a lot worse. pgsql-hackers have been talking for some time about implementing an explicit partitioning feature, which would give the planner direct knowledge about this type of table structure without expending nearly so much work (and then expending it all over again for the next query). I think the right way to go at it is to implement that first, and then see about teaching the planner to plan like this. We could possibly extract a subset of the patch that just deals with pushing joins down through appends, without the constraint-exclusion logic that tries to eliminate provably empty join pairs. But I'm dubious that that would be worthwhile by itself. I think you need the constraint exclusion to eliminate a good deal of work before this is actually better than doing a single join above the append. There are a number of other things I don't like, such as the restrictive and not-obviously-necessary requirement that all the join clauses be simple "Var op Var" clauses. But the main thing is that a large fraction of the patch is doing something I don't think we should try to do. regards, tom lane
Tom, If the partitioning implementation does not make progress (and does not make it for 8.5), don't you think that this could be an interesting contribution to consider for this release? I have put on the wiki (http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables) the results obtained so far and the use case where it is most used. As I understand it, partitioning will certainly lead to some significant changes/enhancements to the planner. Do you think it is realistic to get that for 8.5? manu > Herodotos Herodotou <Herodotos.Herodotou@asterdata.com> writes: > >> This patch extends the query optimizer to consider joins between child tables when hierarchies are joined together. >> > > I looked over this patch a bit. I am of the opinion that this is a lot > of work towards a dead-end direction :-(. The patch contains a large > amount of rather inefficient code that attempts to reverse-engineer > knowledge about whether an inheritance tree forms a range-partitioned > table. We already know that reasoning from first principles using a set > of arbitrary per-table constraints doesn't scale very well, and this > would make it a lot worse. pgsql-hackers have been talking for some > time about implementing an explicit partitioning feature, which would > give the planner direct knowledge about this type of table structure > without expending nearly so much work (and then expending it all over > again for the next query). I think the right way to go at it is to > implement that first, and then see about teaching the planner to plan > like this. > > We could possibly extract a subset of the patch that just deals with > pushing joins down through appends, without the constraint-exclusion > logic that tries to eliminate provably empty join pairs. But I'm > dubious that that would be worthwhile by itself. I think you need the > constraint exclusion to eliminate a good deal of work before this is > actually better than doing a single join above the append. > > There are a number of other things I don't like, such as the restrictive > and not-obviously-necessary requirement that all the join clauses be > simple "Var op Var" clauses. But the main thing is that a large > fraction of the patch is doing something I don't think we should try > to do. > > regards, tom lane > > -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote: > If the partitioning implementation does not make progress (and does not > make it for 8.5), don't you think that this could be an interesting > contribution to consider for this release? > I have put on the wiki > (http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables) > the results obtained so far and the use case where it is most used. I think you mean that the planning time is in milliseconds, not seconds. Also, you use "data" in a few places you meant "date". The results seem good, and trading planning time for execution time seems like a reasonable idea in the case of partitioned tables. We already work harder planning when constraint_exclusion='partition', so there is some precedent (I don't know if that's a good precedent to follow or not). How does it compare to using merge-append? I haven't looked at the actual patch yet. Regards,Jeff Davis
> As I understand it, partitioning will certainly lead to some significant > changes/enhancements to the planner. Do you think it is realistic to get > that for 8.5? I don't think that waiting for our plans for a more robust partitioning implementation is a reason to put off improvements to the implementation we have. If Simon or Pavel or someone was going all-out on getting the new partitioning ready, that would be one thing. But to date, nobody has volunteered to work on it; we just know we need it. Of course, that completely leaves aside Tom's critique of the implementation, which sounds like it needs some work. Trying to fit the target table into a range partitioning mold would break with a lot of real partitionings; for example I have several client DBs which are partitioned active/inactive-by-date. What about simply eliminating joins between partitioned tables by checking which columns' constraints match exactly or are subsets? A lot of partitioned DBs partition everything by month, and joining two tables which were partitioned by month would be useful by itself. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I don't think that waiting for our plans for a more robust partitioning > implementation is a reason to put off improvements to the implementation > we have. The complaint I had was that this patch consisted largely of code that we'd want to throw away once a better partitioning implementation is in place. I don't object to partial patches that move us forward on the agreed-on path, but this one seemed to be going sideways. > What about simply eliminating joins between partitioned tables by > checking which columns' constraints match exactly or are subsets? If it could be done with a <emphasis>small</> amount of throwaway code, maybe ... regards, tom lane
> If it could be done with a <emphasis>small</> amount of throwaway > code, maybe ... Well, that's a reasonable goal ... -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Hi Jeff, On Tue, Sep 22, 2009 at 8:06 PM, Jeff Davis <pgsql@j-davis.com> wrote: > I think you mean that the planning time is in milliseconds, not seconds. > The planning time is actually in seconds. Even without our feature, planning takes a few seconds since the optimizer deals with hundreds or even thousands of child tables. With our feature, planning time increases by 2-3X but then again, running time improves by 6-10X. I have added a paragraph under performance evaluation in the wiki page ( http://wiki.postgresql.org/wiki/Join_optimization_for_inheritance_tables ) in order to provide a better insight on what's happening under the covers. > The results seem good, and trading planning time for execution time > seems like a reasonable idea in the case of partitioned tables. We > already work harder planning when constraint_exclusion='partition', so > there is some precedent (I don't know if that's a good precedent to > follow or not). > With constraint_exclusion=partition, single child tables that cannot produce tuples (based on their check constraints and filter conditions) are filtered out. Our patch goes one step further by identifying which joins of child tables can/cannot produce tuples (based on their check constraints and join conditions). > How does it compare to using merge-append? > Merge-append is essentially a special case of our patch. Greg Stark compared it with our patch and made some good commends in a previous email thread ( message id = 407d949e0907061514i5f1a044r691d1d74eaefb067@mail.gmail.com ) Thank you, ~Hero -- Herodotos Herodotou Graduate Student Department of Computer Science, Duke University Email: hero@cs.duke.edu
On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote: > If the partitioning implementation does not make progress (and does not > make it for 8.5) Manu, not sure if you are referring to Kedar's patch I reviewed earlier in July, but that patch didn't implement anything like the right internals for this to work. As Tom says, what we need is a single internal data structure that describes all the partitions. That can then be used to automatically route inserts, perform the push down merge joins and speed-up constraint exclusion. We should build this once and store it in the relcache for the parent relation and so would need some changes in relcache invalidation as well to make sure child ddl changes invalidate that. -- Simon Riggs www.2ndQuadrant.com
Herodotos Herodotou <hero@cs.duke.edu> writes: > On Tue, Sep 22, 2009 at 8:06 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> I think you mean that the planning time is in milliseconds, not seconds. > The planning time is actually in seconds. This is exactly why I think this is a dead-end approach. Trying to do theorem proving from an unstructured collection of constraints simply cannot scale to hundreds of partitions, at least not if you want reasonable planning performance. There are other aspects of our current partitioning approach that don't scale either, eg the complexity of the insert redirection triggers. We could handle standard cases where there's a simple partitioning rule with far less overhead than this, if we had an explicit model of the partitioning rule inside the system. regards, tom lane
Hi Simon, Thanks for the insight. I might take that as a long term project. I have to discuss that with my colleagues at Aster. It would certainly help to put together a wiki page with the key insights on the design of such implementation to be able to better scope the project and agree on what is the right approach for this. manu > On Tue, 2009-09-22 at 18:16 -0400, Emmanuel Cecchet wrote: > > >> If the partitioning implementation does not make progress (and does not >> make it for 8.5) >> > > Manu, not sure if you are referring to Kedar's patch I reviewed earlier > in July, but that patch didn't implement anything like the right > internals for this to work. > > As Tom says, what we need is a single internal data structure that > describes all the partitions. That can then be used to automatically > route inserts, perform the push down merge joins and speed-up constraint > exclusion. We should build this once and store it in the relcache for > the parent relation and so would need some changes in relcache > invalidation as well to make sure child ddl changes invalidate that. > > -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
On 9/26/09 3:00 PM, Emmanuel Cecchet wrote: > Hi Simon, > > Thanks for the insight. I might take that as a long term project. I have > to discuss that with my colleagues at Aster. It would certainly help to > put together a wiki page with the key insights on the design of such > implementation to be able to better scope the project and agree on what > is the right approach for this. In the meantime, can you look at the possibility I suggested? If you could produce a patch which would implement the most common and least complex case of partition joining: where the constraints on two partitions match exactly (or are mathematically provable subsets) with a small patch, it would probably work for 8.5. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com