Optimizer cleanup to avoid redundant work on joins - Mailing list pgsql-hackers

From Tom Lane
Subject Optimizer cleanup to avoid redundant work on joins
Date
Msg-id 18496.949803725@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Optimizer cleanup to avoid redundant work on joins
List pgsql-hackers
(This is mostly directed at Bruce, but anyone else who's looked at the
planner/optimizer is welcome to chime in.)

On the way to implementing estimates of WHERE-clause costs, I was
forced to notice that the estimated sizes of join relations are set
*after* all the planning work is done for a rel, instead of before,
which makes it hard to use the info for cost estimation :-(.
In looking at whether the sizes couldn't be set earlier, I saw that
trying to set them early enough to be used in planning would result in
duplicate effort; in fact there is a lot of duplicated effort already.
This is a proposal to rearrange the optimizer to clean that up.

Right now, the sequence of events in constructing a join tree is that
at each join level, make_one_rel_by_joins invokes make_rels_by_joins
to prepare a list of the joins to be considered at this level (where
a "join" means a particular outer rel and inner rel, and each "rel"
might consist of several already-joined base relations).  Each join
is represented by a RelOptInfo node constructed by make_join_rel (see
joinrels.c for these routines).  Then update_rels_pathlist_for_joins is
called to determine, for each of these joins, the best implementation or
"path".  Finally, since different "joins" in this sense may represent
the same set of joined base relations, merge_rels_with_same_relids is
called to match equivalent joinrels together and keep just the best path
for each equivalence class of joinrels.

My beef with this is that we should never be generating distinct
RelOptInfos in the first place for different ways of producing the
same join relation.  make_join_rel spends a fair amount of time and
memory space to produce the RelOptInfo and its substructure, and
for an N-component joinrel this price will be paid (at least) N times
over, after which we throw away all but one of the copies.   Even more
critically, once one joinrel has been completed for a particular set
of base rels, the implementation paths for other equivalent joinrels
will be considered in a vacuum --- we may have already discovered a
path that will dominate many of the paths for other ways of building
the same join relation, but because that path isn't available to
add_pathlist when we are looking at another joinrel, we will have to
keep paths that could have been discarded instantly.

It seems to me that join RelOptInfos should be managed in the same way
as base-relation RelOptInfos: there ought never be more than one of them
for a given set of Relids.  When we are considering a new pair of outer
and inner rels that can produce an already-considered join relation,
we should find the existing RelOptInfo for that join relation.  Then
add_pathlist will keep proposed paths only if they survive comparison
against paths already found from the earlier ways of generating the same
join relation.

This looks like it should be a fairly straightforward change: we should
meld make_join_rel and get_join_rel so that a requested join rel is
first searched for in root->join_rel_list, and only built if not present
(like get_base_rel).  The join rel would have a flat relids list from the
beginning, since it would be agnostic about which inner and outer subset
rels should be used to produce it.  Then update_rels_pathlist_for_joins
would be called *immediately* to process just that one join rel, passing
the outer and inner subset relid lists as separate parameters.  It would
generate paths using this pair of outer and inner rels, and would add
them to the join rel's path list only if they survive comparison against
all the already-found paths for that join rel.  On return from
make_rels_by_joins, all the work is done, so make_one_rel_by_joins
doesn't need to invoke either update_rels_pathlist_for_joins or
merge_rels_with_same_relids (the latter routine disappears entirely).
We have but to invoke rels_set_cheapest and then advance to the next
level of joining.

With this change, I could add more processing to make_join_rel to set
the estimated output rel size, without fear that it would be repeated
uselessly.

Anyone see a problem with this, or have a better idea about how to do
it?
        regards, tom lane


pgsql-hackers by date:

Previous
From: clyde jones
Date:
Subject: Re: Test
Next
From: Tatsuo Ishii
Date:
Subject: pg_ctl man page