Thread: a few crazy ideas about hash joins
While investigating some performance problems recently I've had cause to think about the way PostgreSQL uses hash joins. So here are a few thoughts. Some of these have been brought up before. 1. When the hash is not expected to spill to disk, it preserves the pathkeys of the outer side of the join. If the optimizer were allowed to assume that, it could produce significantly more efficient query plans in some cases. The problem is what to do if we start executing the query and find out that we have more stuff to hash than we expect, such that we need multiple batches? Now the results won't be sorted. I think we could handle this as follows: Don't count on the hash join to preserve pathkeys unless it helps, and only rely on it when it seems as if the hash table will still fit even if it turns out to be, say, three times as big as expected. But if you are counting on the hash join to preserve pathkeys, then pass that information to the executor. When the executor is asked to perform a hash join, it will first hash the inner side of the relation. At that point, we know whether we've succesfully gotten everything into a single batch, or not. If we have, perform the join normally. If the worst has happened and we've gone multi-batch, then perform the join and sort the output before returning it. The performance will suck, but at least you'll get the right answer. Previous in-passing reference to this idea here: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php 2. Consider building the hash table lazily. I often see query planner pick a hash join over a nested inner indexscan because it thinks that it'll save enough time making hash probes rather than index probes to justify the time spent building the hash table up front. But sometimes the relation that's being hashed has a couple thousand rows, only a tiny fraction of which will ever be retrieved from the hash table. We can predict when this is going to happen because n_distinct for the outer column will be much less than the size of the inner rel.In that case, we could consider starting with an emptyhash table that effectively acts as a cache. Each time a value is probed, we look it up in the hash table. If there's no entry, we use an index scan to find the matching rows and insert them into the hash table. Negative results must also be cached. 3. Avoid building the exact same hash table twice in the same query. This happens more often you'd think. For example, a table may have two columns creator_id and last_updater_id which both reference person (id). If you're considering a hash join between paths A and B, you could conceivably check whether what is essentially a duplicate of B has already been hashed somewhere within path A. If so, you can reuse that same hash table at zero startup-cost. 4. As previously discussed, avoid hashing for distinct and then hashing the results for a hash join on the same column with the same operators. http://archives.postgresql.org/message-id/4136ffa0902191346g62081081v8607f0b92c206f0a@mail.gmail.com Thoughts on the value and/or complexity of implementation of any of these? ...Robert
Robert Haas wrote: > While investigating some performance problems recently I've had cause > to think about the way PostgreSQL uses hash joins. So here are a few > thoughts. Some of these have been brought up before. > > 1. When the hash is not expected to spill to disk, it preserves the > pathkeys of the outer side of the join. If the optimizer were allowed > to assume that, it could produce significantly more efficient query > plans in some cases. The problem is what to do if we start executing > the query and find out that we have more stuff to hash than we expect, > such that we need multiple batches? Now the results won't be sorted. > I think we could handle this as follows: Don't count on the hash join > to preserve pathkeys unless it helps, and only rely on it when it > seems as if the hash table will still fit even if it turns out to be, > say, three times as big as expected. But if you are counting on the > hash join to preserve pathkeys, then pass that information to the > executor. When the executor is asked to perform a hash join, it will > first hash the inner side of the relation. At that point, we know > whether we've succesfully gotten everything into a single batch, or > not. If we have, perform the join normally. If the worst has > happened and we've gone multi-batch, then perform the join and sort > the output before returning it. The performance will suck, but at > least you'll get the right answer. > > Previous in-passing reference to this idea here: > http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php Hmm, instead of a sorting the output if the worst happens, a final merge step as in a merge sort would be enough. > 2. Consider building the hash table lazily. I often see query planner > pick a hash join over a nested inner indexscan because it thinks that > it'll save enough time making hash probes rather than index probes to > justify the time spent building the hash table up front. But > sometimes the relation that's being hashed has a couple thousand rows, > only a tiny fraction of which will ever be retrieved from the hash > table. We can predict when this is going to happen because n_distinct > for the outer column will be much less than the size of the inner rel. > In that case, we could consider starting with an empty hash table > that effectively acts as a cache. Each time a value is probed, we > look it up in the hash table. If there's no entry, we use an index > scan to find the matching rows and insert them into the hash table. > Negative results must also be cached. Yeah, that would be quite nice. One problem is that our ndistinct estimates are not very accurate. > 3. Avoid building the exact same hash table twice in the same query. > This happens more often you'd think. For example, a table may have > two columns creator_id and last_updater_id which both reference person > (id). If you're considering a hash join between paths A and B, you > could conceivably check whether what is essentially a duplicate of B > has already been hashed somewhere within path A. If so, you can reuse > that same hash table at zero startup-cost. That seems like a quite simple thing to do. But would it work for a multi-batch hash table? > 4. As previously discussed, avoid hashing for distinct and then > hashing the results for a hash join on the same column with the same > operators. This seems essentially an extension of idea 3. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Apr 3, 2009 at 1:48 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Robert Haas wrote: >> >> While investigating some performance problems recently I've had cause >> to think about the way PostgreSQL uses hash joins. So here are a few >> thoughts. Some of these have been brought up before. >> >> 1. When the hash is not expected to spill to disk, it preserves the >> pathkeys of the outer side of the join. If the optimizer were allowed >> to assume that, it could produce significantly more efficient query >> plans in some cases. The problem is what to do if we start executing >> the query and find out that we have more stuff to hash than we expect, >> such that we need multiple batches? Now the results won't be sorted. >> I think we could handle this as follows: Don't count on the hash join >> to preserve pathkeys unless it helps, and only rely on it when it >> seems as if the hash table will still fit even if it turns out to be, >> say, three times as big as expected. But if you are counting on the >> hash join to preserve pathkeys, then pass that information to the >> executor. When the executor is asked to perform a hash join, it will >> first hash the inner side of the relation. At that point, we know >> whether we've succesfully gotten everything into a single batch, or >> not. If we have, perform the join normally. If the worst has >> happened and we've gone multi-batch, then perform the join and sort >> the output before returning it. The performance will suck, but at >> least you'll get the right answer. >> >> Previous in-passing reference to this idea here: >> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php > > Hmm, instead of a sorting the output if the worst happens, a final merge > step as in a merge sort would be enough. Yeah - good point. >> 2. Consider building the hash table lazily. I often see query planner >> pick a hash join over a nested inner indexscan because it thinks that >> it'll save enough time making hash probes rather than index probes to >> justify the time spent building the hash table up front. But >> sometimes the relation that's being hashed has a couple thousand rows, >> only a tiny fraction of which will ever be retrieved from the hash >> table. We can predict when this is going to happen because n_distinct >> for the outer column will be much less than the size of the inner rel. >> In that case, we could consider starting with an empty hash table >> that effectively acts as a cache. Each time a value is probed, we >> look it up in the hash table. If there's no entry, we use an index >> scan to find the matching rows and insert them into the hash table. >> Negative results must also be cached. > > Yeah, that would be quite nice. One problem is that our ndistinct estimates > are not very accurate. Well, the right solution to that problem is to fix our ndistinct estimates. :-) >> 3. Avoid building the exact same hash table twice in the same query. >> This happens more often you'd think. For example, a table may have >> two columns creator_id and last_updater_id which both reference person >> (id). If you're considering a hash join between paths A and B, you >> could conceivably check whether what is essentially a duplicate of B >> has already been hashed somewhere within path A. If so, you can reuse >> that same hash table at zero startup-cost. > > That seems like a quite simple thing to do. But would it work for a > multi-batch hash table? I think not. >> 4. As previously discussed, avoid hashing for distinct and then >> hashing the results for a hash join on the same column with the same >> operators. > > This seems essentially an extension of idea 3. In theory, yes, but in practice, this case is easy to detect for an arbitrary inner rel, and the other case is not. Off to JDCon... ...Robert
On Fri, Apr 03, 2009 at 08:03:33AM -0400, Robert Haas wrote: > On Fri, Apr 3, 2009 at 1:48 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > Robert Haas wrote: > >> > >> While investigating some performance problems recently I've had cause > >> to think about the way PostgreSQL uses hash joins. ?So here are a few > >> thoughts. ?Some of these have been brought up before. > >> > >> 1. When the hash is not expected to spill to disk, it preserves the > >> pathkeys of the outer side of the join. ?If the optimizer were allowed > >> to assume that, it could produce significantly more efficient query > >> plans in some cases. ?The problem is what to do if we start executing > >> the query and find out that we have more stuff to hash than we expect, > >> such that we need multiple batches? ?Now the results won't be sorted. > >> I think we could handle this as follows: Don't count on the hash join > >> to preserve pathkeys unless it helps, and only rely on it when it > >> seems as if the hash table will still fit even if it turns out to be, > >> say, three times as big as expected. ?But if you are counting on the > >> hash join to preserve pathkeys, then pass that information to the > >> executor. ?When the executor is asked to perform a hash join, it will > >> first hash the inner side of the relation. ?At that point, we know > >> whether we've succesfully gotten everything into a single batch, or > >> not. ?If we have, perform the join normally. ?If the worst has > >> happened and we've gone multi-batch, then perform the join and sort > >> the output before returning it. ?The performance will suck, but at > >> least you'll get the right answer. > >> > >> Previous in-passing reference to this idea here: > >> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php > > > > Hmm, instead of a sorting the output if the worst happens, a final merge > > step as in a merge sort would be enough. > > Yeah - good point. > > >> 2. Consider building the hash table lazily. ?I often see query planner > >> pick a hash join over a nested inner indexscan because it thinks that > >> it'll save enough time making hash probes rather than index probes to > >> justify the time spent building the hash table up front. ?But > >> sometimes the relation that's being hashed has a couple thousand rows, > >> only a tiny fraction of which will ever be retrieved from the hash > >> table. ?We can predict when this is going to happen because n_distinct > >> for the outer column will be much less than the size of the inner rel. > >> ?In that case, we could consider starting with an empty hash table > >> that effectively acts as a cache. ?Each time a value is probed, we > >> look it up in the hash table. ?If there's no entry, we use an index > >> scan to find the matching rows and insert them into the hash table. > >> Negative results must also be cached. > > > > Yeah, that would be quite nice. One problem is that our ndistinct estimates > > are not very accurate. > > Well, the right solution to that problem is to fix our ndistinct estimates. :-) Also, as seen in the hash index build performance patch, it would be better to set the initial hash size bigger than needed to avoid the inter-page shuffle if the guess is wrong. Ken
> While investigating some performance problems recently I've had cause > to think about the way PostgreSQL uses hash joins. So here are a few > thoughts. Some of these have been brought up before. > > 1. When the hash is not expected to spill to disk, it preserves the > pathkeys of the outer side of the join. If the optimizer were allowed > to assume that, it could produce significantly more efficient query > plans in some cases. This is definitely possible, but you will have to dynamically modify the execution path if the hash join ends up to be more than one batch. > 3. Avoid building the exact same hash table twice in the same query. > This happens more often you'd think. For example, a table may have > two columns creator_id and last_updater_id which both reference person > (id). If you're considering a hash join between paths A and B, you > could conceivably check whether what is essentially a duplicate of B > has already been hashed somewhere within path A. If so, you can reuse > that same hash table at zero startup-cost. > 4. As previously discussed, avoid hashing for distinct and then > hashing the results for a hash join on the same column with the same > operators. > > Thoughts on the value and/or complexity of implementation of any of these? I would be interested in working with you on any of these changes to hash join if you decide to pursue them. I am especially interested in looking at the hash aggregation code and potentially improving its efficiency. We have implemented a multi-way hash join (can join more than 2 tables at a time) which may help with cases #3 and #4. Performance results look very good, and we are planning on building a patch for this over the summer. -- Ramon Lawrence
On Fri, Apr 3, 2009 at 11:24 AM, Lawrence, Ramon <ramon.lawrence@ubc.ca> wrote: > I would be interested in working with you on any of these changes to > hash join if you decide to pursue them. I am especially interested in > looking at the hash aggregation code and potentially improving its > efficiency. Wow, that would be great! > We have implemented a multi-way hash join (can join more than 2 tables > at a time) which may help with cases #3 and #4. Performance results > look very good, and we are planning on building a patch for this over > the summer. I'd be interested in hearing you cope with this on the planner end of things. ...Robert
On Thu, 2009-04-02 at 22:08 -0400, Robert Haas wrote: > 3. Avoid building the exact same hash table twice in the same query. > This happens more often you'd think. For example, a table may have > two columns creator_id and last_updater_id which both reference person > (id). If you're considering a hash join between paths A and B, you > could conceivably check whether what is essentially a duplicate of B > has already been hashed somewhere within path A. If so, you can reuse > that same hash table at zero startup-cost. This is also interesting because there is potential to save memory through that approach, which allows us to allocate work_mem higher and avoid multi-batch altogether. I would be especially interested in using a shared memory hash table that *all* backends can use - if the table is mostly read-only, as dimension tables often are in data warehouse applications. That would give zero startup cost and significantly reduced memory. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
>> 1. When the hash is not expected to spill to disk, it preserves the >> pathkeys of the outer side of the join. If the optimizer were allowed >> to assume that, it could produce significantly more efficient query >> plans in some cases. > > This is definitely possible, but you will have to dynamically modify the > execution path if the hash join ends up to be more than one batch. Yeah, this item seems to be another case where having a "conditional" branch in the plan would be valuable. What's interesting is that it's branching based on whether the hash has spilled into batches rather than whether the number of rows is greater or less than some breakeven point. It looks like these branche nodes are going to need to know more than just the generic plan parameters. They're going to need to know about specifics like "sort has spilled to disk" or "hash has spilled into batches" etc. I like the idea of coalescing hash tables. I'm not sure the order in which the planner decides on things is conducive to being able to make good decisions about it though. Even if we did it afterwards without adjusting the planner it might still be worthwhile though. Incidentally a similar optimization is possible for materialize or even sorts. They may not come up nearly so often since you would normally want to go around adding indexes if you're repeatedly sorting on the same columns. Biut it might not be any harder to get them all in one swoop. -- greg
On Fri, Apr 3, 2009 at 5:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > I would be especially interested in using a shared memory hash table > that *all* backends can use - if the table is mostly read-only, as > dimension tables often are in data warehouse applications. That would > give zero startup cost and significantly reduced memory. I think that's a non-starter due to visibility issues and handling inserts and updates. Even just reusing a hash from one execution in a later execution of the same plan would be tricky since we would have to expire it if the snapshot changes. Alternately, you could say that what you describe is addressed by hash indexes. The fact that they're not great performers compared to in-memory hashes comes down to dealing with updates and vaccum which is pretty much the same issue. Hm. I wonder if we need a whole class of index algorithms to deal specifically with read-only tables. A hash table on a read-only table could spend a lot of time to generate a perfect or near-perfect hash function and then pack the hash table very densely without any bucket chains. That might make it a big winner over a dynamic structure which has to deal with handling inserts and so on. I'm assuming that if you mark the table read-write it just marks the index invalid and you have to rebuild it later once you've marked the table read-only again. -- greg
On Fri, 2009-04-03 at 18:03 +0100, Greg Stark wrote: > I wonder if we need a whole class of index algorithms to deal > specifically with read-only tables I think we can drop the word "index" from the sentence as well. "Read-only" isn't an isolated case. Often you find many read-only tables alongside rapidly changing tables. So even the busiest of databases can benefit from read-only optimisations. So I want MVCC *and* read only, not MVCC everywhere (or MVCC nowhere if customer changes horses to get read-only benefits elsewhere). Having changes to those tables cause much heavier additional work is OK, if judged on a cost/benefit basis. So the case I care about ought to be called "read-mostly" but we're talking write:read ratios of millions:1. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> wrote: > "Read-only" isn't an isolated case. Often you find many read-only tables > alongside rapidly changing tables. So even the busiest of databases can > benefit from read-only optimisations. > Having changes to those tables cause much heavier additional work is OK, > if judged on a cost/benefit basis. So the case I care about ought to be > called "read-mostly" but we're talking write:read ratios of millions:1. We have tables which are frequently JOINed to other tables in complex SELECT statements, but which are only modified as part of a software release. It would be OK with us if switching between modifiable or not actually took as much time as, for example, a CLUSTER command if it gave us a performance benefit when used in these complex queries when in read-only mode. -Kevin
> > I would be especially interested in using a shared memory hash table > > that *all* backends can use - if the table is mostly read-only, as > > dimension tables often are in data warehouse applications. That would > > give zero startup cost and significantly reduced memory. > > I think that's a non-starter due to visibility issues and handling > inserts and updates. Even just reusing a hash from one execution in a > later execution of the same plan would be tricky since we would have > to expire it if the snapshot changes. If your data set is nearly read-only, materialized views would be a better way to go and would require no hash join changes. The idea of perfect hash functions for dimension tables is very interesting. If the data set is near static, it is possible to compute them once in a few minutes time for a million tuple table and then re-use them until they change. The research has shown it is possible, but I do not know if anyone has actually implemented it in a real DBMS. An implementation could be something to try if there is interest. -- Ramon Lawrence
On Fri, Apr 3, 2009 at 12:55 PM, Greg Stark <stark@enterprisedb.com> wrote: >>> 1. When the hash is not expected to spill to disk, it preserves the >>> pathkeys of the outer side of the join. If the optimizer were allowed >>> to assume that, it could produce significantly more efficient query >>> plans in some cases. >> >> This is definitely possible, but you will have to dynamically modify the >> execution path if the hash join ends up to be more than one batch. > > Yeah, this item seems to be another case where having a "conditional" > branch in the plan would be valuable. What's interesting is that it's > branching based on whether the hash has spilled into batches rather > than whether the number of rows is greater or less than some breakeven > point. It looks like these branche nodes are going to need to know > more than just the generic plan parameters. They're going to need to > know about specifics like "sort has spilled to disk" or "hash has > spilled into batches" etc. In this particular case, the operation that has to be performed is more specific than "sort", it's "merge this set of sorted tapes". So it's more subtle than just inserting a sort node. > I like the idea of coalescing hash tables. I'm not sure the order in > which the planner decides on things is conducive to being able to make > good decisions about it though. Even if we did it afterwards without > adjusting the planner it might still be worthwhile though. I don't see why hash_inner_and_outer can't walk the outer path looking for suitable hashes to reuse. I think the question is how aggressive we want to be in performing that search. If we limit the search to hashes on base tables without restriction conditions, we'll probably catch most of the interesting cases, but obviously not all of them. If we look for hashes on baserels with restriction conditions or hashes on joinrels, etc., we might pick up a few more cases, but at the expense of increased planning time. The advantage of searching in the executor, I suppose, is that the checks don't have to be as cheap, since you're only checking the plan that has already won, rather than lots and lots of potential plans. On the other hand, your costing will be less accurate, which could lead to bad decisions elsewhere. > Incidentally a similar optimization is possible for materialize or > even sorts. They may not come up nearly so often since you would > normally want to go around adding indexes if you're repeatedly sorting > on the same columns. Biut it might not be any harder to get them all > in one swoop. At least in my experience, sort and materialize nodes are pretty rare, so I'm not sure it would be worth the time it would take to search for them. In most cases it seems to be cheaper to hash the inner and outer paths than to sort even one of them and then merge-join the result. When I do get these sorts of paths, they tend to be in larger, more complex queries where there's less chance of useful reuse. But my experience might not be representative... ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I don't see why hash_inner_and_outer can't walk the outer path looking > for suitable hashes to reuse. I think the question is how aggressive > we want to be in performing that search. Correct, but you've got the details all wrong. The real problem is that the planner might discard a join path hash(A,B) at level 2 because it loses compared to, say, merge(A,B). But when we get to level three, perhaps hash(hash(A,B),C) would've been the best plan due to synergy of the two hashes. We'll never find that out unless we keep the "inferior" hash path around. We can certainly do that; the question is what's it going to cost us to allow more paths to survive to the next join level. (And I'm afraid the answer may be "plenty"; it's a combinatorial explosion we're looking at here.) >> Incidentally a similar optimization is possible for materialize or >> even sorts. The planner already goes to great lengths to avoid extra sorts for multiple levels of mergejoin ... that's what all the "pathkey" thrashing is about. It's pretty expensive. regards, tom lane
On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I don't see why hash_inner_and_outer can't walk the outer path looking >> for suitable hashes to reuse. I think the question is how aggressive >> we want to be in performing that search. > > Correct, but you've got the details all wrong. The real problem is that > the planner might discard a join path hash(A,B) at level 2 because it > loses compared to, say, merge(A,B). But when we get to level three, > perhaps hash(hash(A,B),C) would've been the best plan due to synergy > of the two hashes. We'll never find that out unless we keep the > "inferior" hash path around. We can certainly do that; the question > is what's it going to cost us to allow more paths to survive to the > next join level. (And I'm afraid the answer may be "plenty"; it's a > combinatorial explosion we're looking at here.) That would be crazy. I think doing it the way I suggested is correct, just not guaranteed to catch every case. The reality is that even if we took Greg Stark's suggestion of detecting this situation only in the executor, we'd still get some benefit out of this. If we take my intermediate approach, we'll catch more cases where this is a win. What you're suggesting here would catch every conceivable case, but at the expense of what I'm sure would be an unacceptable increase in planning time for very limit benefit. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Correct, but you've got the details all wrong. �The real problem is that >> the planner might discard a join path hash(A,B) at level 2 because it >> loses compared to, say, merge(A,B). �But when we get to level three, >> perhaps hash(hash(A,B),C) would've been the best plan due to synergy >> of the two hashes. �We'll never find that out unless we keep the >> "inferior" hash path around. �We can certainly do that; the question >> is what's it going to cost us to allow more paths to survive to the >> next join level. �(And I'm afraid the answer may be "plenty"; it's a >> combinatorial explosion we're looking at here.) > That would be crazy. I think doing it the way I suggested is correct, > just not guaranteed to catch every case. The reality is that even if > we took Greg Stark's suggestion of detecting this situation only in > the executor, we'd still get some benefit out of this. If we take my > intermediate approach, we'll catch more cases where this is a win. > What you're suggesting here would catch every conceivable case, but at > the expense of what I'm sure would be an unacceptable increase in > planning time for very limit benefit. Maybe, maybe not. I've seen plenty of plans that have several mergejoins stacked up on top of each other with no intervening sorts. There is 0 chance that the planner would have produced that if it thought that it had to re-sort at each level; something else would have looked cheaper. I think that your proposals will end up getting very little of the possible benefit, because the planner will fail to choose plan trees in which the optimization can be exploited. regards, tom lane
On 3 Apr 2009, at 19:44, Lawrence, Ramon wrote: >>> I would be especially interested in using a shared memory hash table >>> that *all* backends can use - if the table is mostly read-only, as >>> dimension tables often are in data warehouse applications. That > would >>> give zero startup cost and significantly reduced memory. >> >> I think that's a non-starter due to visibility issues and handling >> inserts and updates. Even just reusing a hash from one execution in a >> later execution of the same plan would be tricky since we would have >> to expire it if the snapshot changes. > > If your data set is nearly read-only, materialized views would be a > better way to go and would require no hash join changes. I think what he means is that some of the tables in join are effectively read-only. So materialized views are nono here. Unless you mean just a partial ones. I have to say, that frankly I got same problem, and plausibly my schemas could benefit from changes discussed here.
Hi, Le 3 avr. 09 à 22:29, Tom Lane a écrit : > Correct, but you've got the details all wrong. The real problem is > that > the planner might discard a join path hash(A,B) at level 2 because it > loses compared to, say, merge(A,B). But when we get to level three, > perhaps hash(hash(A,B),C) would've been the best plan due to synergy > of the two hashes. We'll never find that out unless we keep the > "inferior" hash path around. We can certainly do that; the question > is what's it going to cost us to allow more paths to survive to the > next join level. (And I'm afraid the answer may be "plenty"; it's a > combinatorial explosion we're looking at here.) I remember having done some board game simulation project at school, using alpha-beta algorithms with cuts, and as an optimisation a minimax too. Those are heuristics, but that you can decide to run on the full set of possible trees when you want a global optimum rather than a local one. Now, I don't know the specifics of the planner code, but would it be possible to use a minimax kind of heuristic? Then a planner effort GUC would allow users to choose whether they want to risk the "plenty" combinatorial explosion in some requests. It could be that the planner already is smarter than this of course, and I can't even say I'd be surprised about it, but still trying... -- dim http://en.wikipedia.org/wiki/Minimax
On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Correct, but you've got the details all wrong. The real problem is that >>> the planner might discard a join path hash(A,B) at level 2 because it >>> loses compared to, say, merge(A,B). But when we get to level three, >>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy >>> of the two hashes. We'll never find that out unless we keep the >>> "inferior" hash path around. We can certainly do that; the question >>> is what's it going to cost us to allow more paths to survive to the >>> next join level. (And I'm afraid the answer may be "plenty"; it's a >>> combinatorial explosion we're looking at here.) > >> That would be crazy. I think doing it the way I suggested is correct, >> just not guaranteed to catch every case. The reality is that even if >> we took Greg Stark's suggestion of detecting this situation only in >> the executor, we'd still get some benefit out of this. If we take my >> intermediate approach, we'll catch more cases where this is a win. >> What you're suggesting here would catch every conceivable case, but at >> the expense of what I'm sure would be an unacceptable increase in >> planning time for very limit benefit. > > Maybe, maybe not. I've seen plenty of plans that have several > mergejoins stacked up on top of each other with no intervening sorts. > There is 0 chance that the planner would have produced that if it > thought that it had to re-sort at each level; something else would have > looked cheaper. I think that your proposals will end up getting very > little of the possible benefit, because the planner will fail to choose > plan trees in which the optimization can be exploited. Well, I'm all ears if you have suggestions for improvement. For sorts, we use PathKeys to represent the ordering of each path and keep the paths for each set of pathkeys. By analogy, we could maintain a list of PathHash structures for each path representing the tables that had already been hashed. add_path() would then have to consider both the PathHash structures and the PathKey structures before concluding that a path was definitely worse than some path previously found. At each level of the join tree, we'd need to truncate PathHash structures that provably have no further use (e.g. on a base table that does not appear again above the level of the join already planned) to avoid keeping around paths that appeared to be better only because we didn't know that the paths they have hashed are worthless in practice. Maybe that wouldn't even be that expensive, actually, because there will be lots of cases where you know the relevant table doesn't appear elsewhere in the query and not save any extra paths. But I think we'd have to write the code and benchmark it to really know. I guess the reason I'm not too worked up about this is because my experience is that the planner nearly always prefers hash joins on small tables, even when an index is present - the queries I'm worried about optimizing don't need any additional encouragement to use hash joins; they're doing it already. But certainly it doesn't hurt to see how many cases we can pick up. ...Robert
Are there any TODOs here? --------------------------------------------------------------------------- Robert Haas wrote: > On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Correct, but you've got the details all wrong. ?The real problem is that > >>> the planner might discard a join path hash(A,B) at level 2 because it > >>> loses compared to, say, merge(A,B). ?But when we get to level three, > >>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy > >>> of the two hashes. ?We'll never find that out unless we keep the > >>> "inferior" hash path around. ?We can certainly do that; the question > >>> is what's it going to cost us to allow more paths to survive to the > >>> next join level. ?(And I'm afraid the answer may be "plenty"; it's a > >>> combinatorial explosion we're looking at here.) > > > >> That would be crazy. ?I think doing it the way I suggested is correct, > >> just not guaranteed to catch every case. ?The reality is that even if > >> we took Greg Stark's suggestion of detecting this situation only in > >> the executor, we'd still get some benefit out of this. ?If we take my > >> intermediate approach, we'll catch more cases where this is a win. > >> What you're suggesting here would catch every conceivable case, but at > >> the expense of what I'm sure would be an unacceptable increase in > >> planning time for very limit benefit. > > > > Maybe, maybe not. ?I've seen plenty of plans that have several > > mergejoins stacked up on top of each other with no intervening sorts. > > There is 0 chance that the planner would have produced that if it > > thought that it had to re-sort at each level; something else would have > > looked cheaper. ?I think that your proposals will end up getting very > > little of the possible benefit, because the planner will fail to choose > > plan trees in which the optimization can be exploited. > > Well, I'm all ears if you have suggestions for improvement. For > sorts, we use PathKeys to represent the ordering of each path and keep > the paths for each set of pathkeys. By analogy, we could maintain a > list of PathHash structures for each path representing the tables that > had already been hashed. add_path() would then have to consider both > the PathHash structures and the PathKey structures before concluding > that a path was definitely worse than some path previously found. At > each level of the join tree, we'd need to truncate PathHash structures > that provably have no further use (e.g. on a base table that does not > appear again above the level of the join already planned) to avoid > keeping around paths that appeared to be better only because we didn't > know that the paths they have hashed are worthless in practice. Maybe > that wouldn't even be that expensive, actually, because there will be > lots of cases where you know the relevant table doesn't appear > elsewhere in the query and not save any extra paths. But I think we'd > have to write the code and benchmark it to really know. > > I guess the reason I'm not too worked up about this is because my > experience is that the planner nearly always prefers hash joins on > small tables, even when an index is present - the queries I'm worried > about optimizing don't need any additional encouragement to use hash > joins; they're doing it already. But certainly it doesn't hurt to see > how many cases we can pick up. > > ...Robert > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote: > Are there any TODOs here? I'd say that all of the items listed in my original email could be TODOs. I'm planning to work on as many of them as I have time for. Ramon Lawrence is also working on some related ideas, as discussed upthread. AFAICS no one has expressed the idea that anything that's been talked about is a bad idea, so it's just a question of finding enough round tuits. ...Robert
Robert Haas wrote: > On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Are there any TODOs here? > > I'd say that all of the items listed in my original email could be > TODOs. I'm planning to work on as many of them as I have time for. > Ramon Lawrence is also working on some related ideas, as discussed > upthread. AFAICS no one has expressed the idea that anything that's > been talked about is a bad idea, so it's just a question of finding > enough round tuits. OK, would you please add them to the Index/Hash section of the TODO list; I am afraid I will not do the justice. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, Apr 7, 2009 at 5:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote: >> > Are there any TODOs here? >> >> I'd say that all of the items listed in my original email could be >> TODOs. I'm planning to work on as many of them as I have time for. >> Ramon Lawrence is also working on some related ideas, as discussed >> upthread. AFAICS no one has expressed the idea that anything that's >> been talked about is a bad idea, so it's just a question of finding >> enough round tuits. > > OK, would you please add them to the Index/Hash section of the TODO > list; I am afraid I will not do the justice. I think perhaps Optimizer / Executor would be more appropriate, since these are not about hash indices but rather about hash joins. I will look at doing that. Also I think the last item under Index / Hash is actually NOT done, and belongs in the main index section rather than Index / Hash. The first item in the Index / Hash section doesn't really look like a TODO, or at the very least it's unclear what the action item is supposed to be. ...Robert
Robert Haas wrote: > On Tue, Apr 7, 2009 at 5:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote: > >> > Are there any TODOs here? > >> > >> I'd say that all of the items listed in my original email could be > >> TODOs. ?I'm planning to work on as many of them as I have time for. > >> Ramon Lawrence is also working on some related ideas, as discussed > >> upthread. ?AFAICS no one has expressed the idea that anything that's > >> been talked about is a bad idea, so it's just a question of finding > >> enough round tuits. > > > > OK, would you please add them to the Index/Hash section of the TODO > > list; ?I am afraid I will not do the justice. > > I think perhaps Optimizer / Executor would be more appropriate, since > these are not about hash indices but rather about hash joins. I will > look at doing that. Yes, please. > Also I think the last item under Index / Hash is actually NOT done, > and belongs in the main index section rather than Index / Hash. Yep, I didn't realize that editing "Index" also does the subsections, while editing the subsections doesn't edit the upper level. > The first item in the Index / Hash section doesn't really look like a > TODO, or at the very least it's unclear what the action item is > supposed to be. Yep, remove, thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 2009-04-03, Simon Riggs <simon@2ndQuadrant.com> wrote: > > On Fri, 2009-04-03 at 18:03 +0100, Greg Stark wrote: > >> I wonder if we need a whole class of index algorithms to deal >> specifically with read-only tables > > I think we can drop the word "index" from the sentence as well. > > "Read-only" isn't an isolated case. Often you find many read-only tables > alongside rapidly changing tables. So even the busiest of databases can > benefit from read-only optimisations. So I want MVCC *and* read only, > not MVCC everywhere (or MVCC nowhere if customer changes horses to get > read-only benefits elsewhere). > > Having changes to those tables cause much heavier additional work is OK, > if judged on a cost/benefit basis. So the case I care about ought to be > called "read-mostly" but we're talking write:read ratios of millions:1. For the record and in case anyone gets interested in following up the idea of "read only" tables, see also: http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/76366
On Tue, Apr 7, 2009 at 5:57 PM, Bruce Momjian <bruce@momjian.us> wrote: >> I think perhaps Optimizer / Executor would be more appropriate, since >> these are not about hash indices but rather about hash joins. I will >> look at doing that. > > Yes, please. Done. See what you think... >> Also I think the last item under Index / Hash is actually NOT done, >> and belongs in the main index section rather than Index / Hash. > > Yep, I didn't realize that editing "Index" also does the subsections, > while editing the subsections doesn't edit the upper level. Heh. I'd write some webapps to do some of these things, but I haven't been able to interest anyone in providing me with a postgresql.org-based hosting arrangement. >> The first item in the Index / Hash section doesn't really look like a >> TODO, or at the very least it's unclear what the action item is >> supposed to be. > > Yep, remove, thanks. ...Robert