Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CA+Tgmob+cXfSJ_iUhQybRr3HjqF7YSWO4C5zVe621+enftjN0Q@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Thomas Munro <thomas.munro@enterprisedb.com>) |
List | pgsql-hackers |
On Thu, Aug 10, 2017 at 5:43 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: >> Do you think we solving this problem is a prerequisite for >> partition-wise join? Or should we propose that patch as a separate >> enhancement? > > No, I'm not proposing anything yet. For now I just wanted to share > this observation about where hot CPU time goes in simple tests, and > since it turned out to be a loop in a loop that I could see an easy to > way to fix for singleton sets and sets with a small range, I couldn't > help trying it... But I'm still trying to understand the bigger > picture. I'll be interested to compare profiles with the ordered > append_rel_list version you have mentioned, to see how that moves the > hot spots. Perhaps this is stating the obvious, but it's often better to optimize things like this at a higher level, rather than by tinkering with stuff like Bitmapset. On the other hand, sometimes micro-optimizations are the way to go, because optimizing find_ec_member_for_tle(), for example, might involve a much broader rethink of the planner code than we want to undertake right now. > I guess one very practical question to ask is: can we plan queries > with realistic numbers of partitioned tables and partitions in > reasonable times? Well, it certainly looks very good for hundreds of > partitions so far... My own experience of partitioning with other > RDBMSs has been on that order, 'monthly partitions covering the past > 10 years' and similar, but on the other hand it wouldn't be surprising > to learn that people want to go to many thousands, especially for > schemas which just keep adding partitions over time and don't want to > drop them. I've been thinking that it would be good if this feature - and other new partitioning features - could scale to about 1000 partitions without too much trouble. Eventually, it might be nice to scale higher, but there's not much point in making partition-wise join scale to 100,000 partitions if we've got some other part of the system that runs into trouble beyond 250. > Curious: would you consider joins between partitioned tables and > non-partitioned tables where the join is pushed down to be a kind of > "partition-wise join", or something else? If so, would that be a > special case, or just the logical extreme case for > 0014-WIP-Partition-wise-join-for-1-1-1-0-0-1-partition-ma.patch, where > one single "partition" on the non-partitioned side maps to all the > partitions on the partitioned size? I think this is actually a really important case which we've just excluded from the initial scope because the problem is hard enough already. But it's quite possible that if you are joining partitioned tables A and B with unpartitioned table X, the right join order could be A-X-B; the A-X join might knock out a lot of rows. It's not great to have to pick between doing the A-B join partitionwise and doing the A-X join first; you want to get both things. But we can't do everything at once. Further down the road, there's more than one way of doing the A-X join. You could join each partition of A to all of X, which is likely optimal if for example you are doing a nested loop with an inner index scan on X. But you could also partition X on the fly using A's partitioning scheme and then join partitions of A against the on-the-fly-partitioned version of X. That's likely to be a lot better for a merge join with an underlying sort on X. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: