Optimizer cleanup to avoid redundant work on joins - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Optimizer cleanup to avoid redundant work on joins |
Date | |
Msg-id | 18496.949803725@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: [HACKERS] Optimizer cleanup to avoid redundant work on joins
|
List | pgsql-hackers |
(This is mostly directed at Bruce, but anyone else who's looked at the planner/optimizer is welcome to chime in.) On the way to implementing estimates of WHERE-clause costs, I was forced to notice that the estimated sizes of join relations are set *after* all the planning work is done for a rel, instead of before, which makes it hard to use the info for cost estimation :-(. In looking at whether the sizes couldn't be set earlier, I saw that trying to set them early enough to be used in planning would result in duplicate effort; in fact there is a lot of duplicated effort already. This is a proposal to rearrange the optimizer to clean that up. Right now, the sequence of events in constructing a join tree is that at each join level, make_one_rel_by_joins invokes make_rels_by_joins to prepare a list of the joins to be considered at this level (where a "join" means a particular outer rel and inner rel, and each "rel" might consist of several already-joined base relations). Each join is represented by a RelOptInfo node constructed by make_join_rel (see joinrels.c for these routines). Then update_rels_pathlist_for_joins is called to determine, for each of these joins, the best implementation or "path". Finally, since different "joins" in this sense may represent the same set of joined base relations, merge_rels_with_same_relids is called to match equivalent joinrels together and keep just the best path for each equivalence class of joinrels. My beef with this is that we should never be generating distinct RelOptInfos in the first place for different ways of producing the same join relation. make_join_rel spends a fair amount of time and memory space to produce the RelOptInfo and its substructure, and for an N-component joinrel this price will be paid (at least) N times over, after which we throw away all but one of the copies. Even more critically, once one joinrel has been completed for a particular set of base rels, the implementation paths for other equivalent joinrels will be considered in a vacuum --- we may have already discovered a path that will dominate many of the paths for other ways of building the same join relation, but because that path isn't available to add_pathlist when we are looking at another joinrel, we will have to keep paths that could have been discarded instantly. It seems to me that join RelOptInfos should be managed in the same way as base-relation RelOptInfos: there ought never be more than one of them for a given set of Relids. When we are considering a new pair of outer and inner rels that can produce an already-considered join relation, we should find the existing RelOptInfo for that join relation. Then add_pathlist will keep proposed paths only if they survive comparison against paths already found from the earlier ways of generating the same join relation. This looks like it should be a fairly straightforward change: we should meld make_join_rel and get_join_rel so that a requested join rel is first searched for in root->join_rel_list, and only built if not present (like get_base_rel). The join rel would have a flat relids list from the beginning, since it would be agnostic about which inner and outer subset rels should be used to produce it. Then update_rels_pathlist_for_joins would be called *immediately* to process just that one join rel, passing the outer and inner subset relid lists as separate parameters. It would generate paths using this pair of outer and inner rels, and would add them to the join rel's path list only if they survive comparison against all the already-found paths for that join rel. On return from make_rels_by_joins, all the work is done, so make_one_rel_by_joins doesn't need to invoke either update_rels_pathlist_for_joins or merge_rels_with_same_relids (the latter routine disappears entirely). We have but to invoke rels_set_cheapest and then advance to the next level of joining. With this change, I could add more processing to make_join_rel to set the estimated output rel size, without fear that it would be repeated uselessly. Anyone see a problem with this, or have a better idea about how to do it? regards, tom lane
pgsql-hackers by date: