Thread: Optimizer speed and GEQO (was: nested loops in joins)
I have been looking at optimizer runtime using Charles Hornberger's example case, and what I find is that the number of tables involved in the query is a very inadequate predictor of the optimizer's runtime. Thus, it's not surprising that we are getting poor results from using number of tables as the GEQO threshold measure. I started with Charles' test case as presented, then simplified it by removing indexes from the database. This didn't change the final plan, which was a straight nested loop over sequential scans anyway (because the tables were so small). But it drastically reduces the number of cases that the optimizer looks at. Original test case: query involves seven tables with a total of twelve indexes (six on the primary table and one on each other table). 7-index test case: I removed all but one of the indexes on the primary table, leaving one index per table. No-index test case: I removed all indexes. It took a certain amount of patience to run these cases with profiling turned on :-(, but here are the results: 7T+12I 7T+7I 7T Runtime, sec 1800 457 59 equal() calls 585 mil 157 mil 21.7 mil better_path() calls 72546 37418 14025 bp->path_is_cheaper 668 668 668 create_hashjoin_path 198 198 198 create_mergejoin_path 2358 723 198 create_nestloop_path 38227 20297 7765 Next, I removed the last table from Charles' query, producing these cases: 6T+11I 6T+6I 6T Runtime, sec 34 12 2.3 equal() calls 14.3 mil 4.7 mil 0.65 mil better_path() calls 10721 6172 2443 bp->path_is_cheaper 225 225 225 create_hashjoin_path 85 85 85 create_mergejoin_path 500 236 85 create_nestloop_path 5684 3344 1354 The 5T+10I case is down to a couple of seconds of runtime, so I didn't bother to do profiles for 5 tables. A fairly decent approximation is that the runtime varies as the square of the number of create_nestloop_path calls. That number in turn seems to vary as the factorial of the number of tables, with a weaker but still strong term involving the number of indexes. I understand the square and the factorial terms, but the form of the dependency on the index count isn't real clear. It might be worthwhile to try a GEQO threshold based on the number of tables plus half the number of indexes on those tables. I have no idea where to find the number of indexes, so I'll just offer the idea for someone else to try. The main thing that jumps out from the profiles is that on these complex searches, the optimizer spends practically *all* of its time down inside better_path, scanning the list of already known access paths to see if a proposed new path is a duplicate of one already known. (Most of the time it is not, as shown by the small number of times path_is_cheaper gets called from better_path.) This scanning is O(N^2) in the number of paths considered, whereas it seems that all the other work is no worse than O(N) in the number of paths. The bottom of the duplicate check is equal(), which as you can see is getting called a horrendous number of times. It would be worth our while to try to eliminate this mostly-unsuccessful comparison operation. I wonder whether it would work to simply always add the new path to the list, and rely on the later "prune" step to get rid of duplicates? The prune code is evidently far more efficient at getting rid of useless entries than better_path is, because it's nowhere near the top of the profile even though it's processing nearly as many paths. Anybody here know much about how the optimizer works? I'm hesitant to hack on it myself. regards, tom lane
> It took a certain amount of patience to run these cases with profiling > turned on :-(, but here are the results: Interesting numbers. > > 7T+12I 7T+7I 7T > > Runtime, sec 1800 457 59 > equal() calls 585 mil 157 mil 21.7 mil > A fairly decent approximation is that the runtime varies as the > square of the number of create_nestloop_path calls. That number > in turn seems to vary as the factorial of the number of tables, > with a weaker but still strong term involving the number of indexes. > I understand the square and the factorial terms, but the form of > the dependency on the index count isn't real clear. This is exactly what I would expect. See optimizer/README. The factorial makes a lot of sense. It evaluates all joins join, removes the cheapest from the list, and goes to find the next best one. 8 compares7 compares6 compares... Here is a good illustration of what you are seeing: > 6! 720> 8! 40320 That's why I reduced GEQO from 8 to 6. > > It might be worthwhile to try a GEQO threshold based on the number of > tables plus half the number of indexes on those tables. I have no idea > where to find the number of indexes, so I'll just offer the idea for > someone else to try. Sounds like a good idea. I can easily get that information. The optimizer does that lower in the code. Perhaps we can just move the GEQO test to after the index stuff is created. I will be able to look at it after the TEMP stuff. > The main thing that jumps out from the profiles is that on these complex > searches, the optimizer spends practically *all* of its time down inside > better_path, scanning the list of already known access paths to see if a > proposed new path is a duplicate of one already known. (Most of the time > it is not, as shown by the small number of times path_is_cheaper gets > called from better_path.) This scanning is O(N^2) in the number of paths > considered, whereas it seems that all the other work is no worse than O(N) > in the number of paths. The bottom of the duplicate check is equal(), > which as you can see is getting called a horrendous number of times. > > It would be worth our while to try to eliminate this mostly-unsuccessful > comparison operation. > > I wonder whether it would work to simply always add the new path to the > list, and rely on the later "prune" step to get rid of duplicates? > The prune code is evidently far more efficient at getting rid of useless > entries than better_path is, because it's nowhere near the top of the > profile even though it's processing nearly as many paths. I think I optimized prune.c white a bit around 6.1. I broke it during the optimization, and Vadim fixed it for me. One of my TODO items has been to 100% understand the optimizer. Haven't had time to do that yet. Been on my list for a while. > > Anybody here know much about how the optimizer works? I'm hesitant to > hack on it myself. Let me help you. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
OK, I have modified the optimizer to count not only the table, but also the indexes. Patch is applied. The code is: { List *temp; int paths_to_consider = 0; foreach(temp, outer_rels) { RelOptInfo *rel = (RelOptInfo *) lfirst(temp); paths_to_consider+= length(rel->pathlist); } if ((_use_geqo_) && paths_to_consider >= _use_geqo_rels_) /* returns _one_ RelOptInfo, so lcons it */ return lcons(geqo(root), NIL); } It is my understanding that it is the size of the pathlist that determines the complexity of the optimization. (Wow, I actually understood that sentence.) Tom, can you tell me if this looks good, and also give a suggestion for a possible default value for GEQO optimization. My guess is that 6 now is too low. Sounds like you have a good testbed to do this. Old posting attached. I will look at the functions you mentioned for possible improvement. > I have been looking at optimizer runtime using Charles Hornberger's > example case, and what I find is that the number of tables involved in > the query is a very inadequate predictor of the optimizer's runtime. > Thus, it's not surprising that we are getting poor results from using > number of tables as the GEQO threshold measure. > > I started with Charles' test case as presented, then simplified it by > removing indexes from the database. This didn't change the final > plan, which was a straight nested loop over sequential scans anyway > (because the tables were so small). But it drastically reduces the > number of cases that the optimizer looks at. > > Original test case: query involves seven tables with a total of twelve > indexes (six on the primary table and one on each other table). > > 7-index test case: I removed all but one of the indexes on the primary > table, leaving one index per table. > > No-index test case: I removed all indexes. > > It took a certain amount of patience to run these cases with profiling > turned on :-(, but here are the results: > > 7T+12I 7T+7I 7T > > Runtime, sec 1800 457 59 > equal() calls 585 mil 157 mil 21.7 mil > better_path() calls 72546 37418 14025 > bp->path_is_cheaper 668 668 668 > create_hashjoin_path 198 198 198 > create_mergejoin_path 2358 723 198 > create_nestloop_path 38227 20297 7765 > > Next, I removed the last table from Charles' query, producing these > cases: > > 6T+11I 6T+6I 6T > > Runtime, sec 34 12 2.3 > equal() calls 14.3 mil 4.7 mil 0.65 mil > better_path() calls 10721 6172 2443 > bp->path_is_cheaper 225 225 225 > create_hashjoin_path 85 85 85 > create_mergejoin_path 500 236 85 > create_nestloop_path 5684 3344 1354 > > The 5T+10I case is down to a couple of seconds of runtime, so I > didn't bother to do profiles for 5 tables. > > A fairly decent approximation is that the runtime varies as the > square of the number of create_nestloop_path calls. That number > in turn seems to vary as the factorial of the number of tables, > with a weaker but still strong term involving the number of indexes. > I understand the square and the factorial terms, but the form of > the dependency on the index count isn't real clear. > > It might be worthwhile to try a GEQO threshold based on the number of > tables plus half the number of indexes on those tables. I have no idea > where to find the number of indexes, so I'll just offer the idea for > someone else to try. > > > The main thing that jumps out from the profiles is that on these complex > searches, the optimizer spends practically *all* of its time down inside > better_path, scanning the list of already known access paths to see if a > proposed new path is a duplicate of one already known. (Most of the time > it is not, as shown by the small number of times path_is_cheaper gets > called from better_path.) This scanning is O(N^2) in the number of paths > considered, whereas it seems that all the other work is no worse than O(N) > in the number of paths. The bottom of the duplicate check is equal(), > which as you can see is getting called a horrendous number of times. > > It would be worth our while to try to eliminate this mostly-unsuccessful > comparison operation. > > I wonder whether it would work to simply always add the new path to the > list, and rely on the later "prune" step to get rid of duplicates? > The prune code is evidently far more efficient at getting rid of useless > entries than better_path is, because it's nowhere near the top of the > profile even though it's processing nearly as many paths. > > Anybody here know much about how the optimizer works? I'm hesitant to > hack on it myself. > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> The main thing that jumps out from the profiles is that on these complex > searches, the optimizer spends practically *all* of its time down inside > better_path, scanning the list of already known access paths to see if a > proposed new path is a duplicate of one already known. (Most of the time > it is not, as shown by the small number of times path_is_cheaper gets > called from better_path.) This scanning is O(N^2) in the number of paths > considered, whereas it seems that all the other work is no worse than O(N) > in the number of paths. The bottom of the duplicate check is equal(), > which as you can see is getting called a horrendous number of times. > > It would be worth our while to try to eliminate this mostly-unsuccessful > comparison operation. > > I wonder whether it would work to simply always add the new path to the > list, and rely on the later "prune" step to get rid of duplicates? > The prune code is evidently far more efficient at getting rid of useless > entries than better_path is, because it's nowhere near the top of the > profile even though it's processing nearly as many paths. You are definately on to something here. I have added comments and more to the optimizer README as I continue exploring options. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> The main thing that jumps out from the profiles is that on these complex > searches, the optimizer spends practically *all* of its time down inside > better_path, scanning the list of already known access paths to see if a > proposed new path is a duplicate of one already known. (Most of the time > it is not, as shown by the small number of times path_is_cheaper gets > called from better_path.) This scanning is O(N^2) in the number of paths > considered, whereas it seems that all the other work is no worse than O(N) > in the number of paths. The bottom of the duplicate check is equal(), > which as you can see is getting called a horrendous number of times. The ironic thing about add_pathlist() is that the slowness is due to the code using a nested loop to merge duplicate RelOptInfo paths, rather than some sort of mergejoin. I will keep studying it, but from your comments, I can see you understood the code much better than I had. I am just understanding your conclusions. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I have been looking at optimizer runtime using Charles Hornberger's > example case, and what I find is that the number of tables involved in > the query is a very inadequate predictor of the optimizer's runtime. > Thus, it's not surprising that we are getting poor results from using > number of tables as the GEQO threshold measure. Still digging into the optimizer, but if you want some real eye-opening stuff, set OPTIMIZER_DEBUG and look in the postmaster log. A six-table join generates 55k lines of debug info, very nicely formatted. It shows what we are up against. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I have been looking at optimizer runtime using Charles Hornberger's > example case, and what I find is that the number of tables involved in > the query is a very inadequate predictor of the optimizer's runtime. > Thus, it's not surprising that we are getting poor results from using > number of tables as the GEQO threshold measure. OK, now I have a specific optimizer question. I looked at all references to RelOptInfo.pathlist, and though it gets very long and hard to check for uniqueness, the only thing I see it is used for it to find the cheapest path. Why are we maintaining this huge Path List for every RelOptInfo structure if we only need the cheapest? Why not store only the cheapest plan, instead of generating all unique plans, then using only the cheapest? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Why are we maintaining this huge Path List for every RelOptInfo > structure if we only need the cheapest? Why not store only the > cheapest plan, instead of generating all unique plans, then using only > the cheapest? Just guessing here: does it use this same list to determine if a new plan is a duplicate of a previously generated plan? Of course, maybe that is not important, since any *cheaper* plan should be different from any existing plan, and any plan of the same cost or higher could be rejected. Perhaps having the entire list of plans available makes it easier to debug, especially when the stuff was in lisp (since in that environment it is easy to traverse and manipulate these lists interactively)... - Tom
>> Why are we maintaining this huge Path List for every RelOptInfo >> structure if we only need the cheapest? I think Bruce is onto something here... keep only the best-so-far not everything ever generated. That'd get rid of the O(N^2) comparison behavior. The only thing I can think of that we might possibly *need* the whole list for is if it is used as a recursion stopper. ("Oh, I already investigated that alternative, no need to go down that path again.") It did not look to me like the optimizer does such a thing, but I don't claim to understand the code. It seems to me that the search space of possible paths is well-structured and can be enumerated without generation of duplicates. You've got a known set of tables involved in a query, a fixed set of possible access methods for each table, and only so many ways to combine them. So the real question here is why does the optimizer even need to check for duplicates --- should it not never generate any to begin with? The profiles I ran a few days ago show that indeed most of the generated paths are not duplicates, but a small fraction are duplicates. Why is that? I'd feel a lot closer to understanding what's going on if we could explain where the duplicates come from. "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: > Perhaps having the entire list of plans available makes it easier to > debug, especially when the stuff was in lisp (since in that environment > it is easy to traverse and manipulate these lists interactively)... The optimizer's Lisp heritage is pretty obvious, and in that culture building a list of everything you're interested in is just The Way To Do Things. I doubt anyone realized that keeping the whole list would turn out to be a performance bottleneck. regards, tom lane
> >> Why are we maintaining this huge Path List for every RelOptInfo > >> structure if we only need the cheapest? > > I think Bruce is onto something here... keep only the best-so-far > not everything ever generated. That'd get rid of the O(N^2) > comparison behavior. > > The only thing I can think of that we might possibly *need* the > whole list for is if it is used as a recursion stopper. > ("Oh, I already investigated that alternative, no need to go down > that path again.") It did not look to me like the optimizer does > such a thing, but I don't claim to understand the code. > > It seems to me that the search space of possible paths is > well-structured and can be enumerated without generation of duplicates. > You've got a known set of tables involved in a query, a fixed set of > possible access methods for each table, and only so many ways to > combine them. So the real question here is why does the optimizer > even need to check for duplicates --- should it not never generate > any to begin with? The profiles I ran a few days ago show that indeed > most of the generated paths are not duplicates, but a small fraction > are duplicates. Why is that? I'd feel a lot closer to understanding > what's going on if we could explain where the duplicates come from. Here is my guess, and I think it is valid. There are cases where a join of two tables may be more expensive than another plan, but the order of the result may be cheaper to use for a later join than the cheaper plan. That is why I suspect they are doing in better_path(): if (samekeys(path->keys, new_path->keys) && equal_path_ordering(&path->p_ordering, &new_path->p_ordering)) { old_path = path; break; } So I don't think we can grab the cheapest path right away, but I think the system is keeping around non-cheapest paths with the same result ordering. I also would like to throw away plans that are so expensive that sorting another plan to the desired ordering would be cheaper than using the plan. I am only guessing on this. How often does this 'keep non-cheapest path around' really help? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026