Re: AW: Missing constant propagation in planner on hash quals causes join slowdown - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: AW: Missing constant propagation in planner on hash quals causes join slowdown |
Date | |
Msg-id | 7771.1576452845@sss.pgh.pa.us Whole thread Raw |
In response to | AW: Missing constant propagation in planner on hash quals causes joinslowdown (Hans Buschmann <buschmann@nidsa.net>) |
List | pgsql-hackers |
Hans Buschmann <buschmann@nidsa.net> writes: > The planner correctly sets the index conditions (knows that the xx_season columns are constant), but fails to apply thisconstantness to the hash conditions by discarding a constant column in a hash table. Yeah. The reason for this behavior is that, after reconsider_outer_join_clauses has decided that it can derive tfact.t2_season = 3 from the given conditions, it still "throws back" the original join condition tmaster.t1_season = tfact.t2_season into the set of ordinary join clauses, so that that condition will still get applied at the time the join is computed. The comment about it claims * If we don't find any match for a set-aside outer join clause, we must * throw it back into the regular joinclause processing by passing it to * distribute_restrictinfo_to_rels(). If we do generate a derived clause, * however, the outer-join clause is redundant. We still throw it back, * because otherwise the join will be seen as a clauseless join and avoided * during join order searching; but we mark it as redundant to keep from * messing up the joinrel's size estimate. However, this seems to be a lie, or at least not the whole truth. If you try diking out the throw-back logic, which is simple enough to do, you'll immediately find that some test cases in join.sql give the wrong answers. The reason is that once we've derived tfact.t2_season = 3 and asserted that that's an equivalence condition, the eclass logic believes that tmaster.t1_season = tfact.t2_season must hold everywhere (that's more or less the definition of an eclass). But *it doesn't hold* above the left join, because tfact.t2_season could be null instead. In particular this can break any higher joins involving tfact.t2_season. By treating tmaster.t1_season = tfact.t2_season as an ordinary join clause we force the necessary tests to be made anyway, independently of the eclass logic. (There's no bug in Hans' example because there's only one join; the problem is not really with this particular clause, but with other columns that might also be thought equal to tfact.t2_season. It's those upper join clauses that can't safely be thrown away.) I've had a bee in my bonnet for a long time about redesigning all this to be less klugy. Fundamentally what we lack is an honest representation that a given value might be NULL instead of the original value from its table; this results in assorted compromises both here and elsewhere in the planner. The rough sketch that's been lurking in my hindbrain is 1) Early in the planner where we flatten join alias variables, do so only when they actually are formally equivalent to the input variables, ie only for the non-nullable side of any outer join. This gives us the desired representation distinction between original and possibly-nulled values: the former are base-table Vars, the latter are join Vars. 2) tmaster.t1_season = tfact.t2_season could be treated as an honest equivalence condition *between those variables*, but it would not imply anything about the join output variable "j.t2_season". I think reconsider_outer_join_clauses goes away entirely. 3) Places where we're trying to estimate variable values would have to be taught to look through unflattened join alias variables to see what they're based on. For extra credit they could assign some higher-than-normal probability that the value is null (though that could be done later, since there's certainly nothing accounting for that today). 4) The final flattening of these alias variables would probably not happen till setrefs.c. 5) I have not thought through what this implies for full joins. The weird COALESCE business might go away entirely, or it might be something that setrefs.c still has to insert. 6) There are lots of other, related kluges that could stand to be revisited at the same time. The business with "outer-join delayed" clauses is a big example. Maybe that all just magically goes away once we have a unique understanding of what value a Var represents, but I've not thought hard about it. We'd certainly need some new understanding of how to schedule outer-join clauses, since their Var membership would no longer correspond directly to sets of baserels. There might be connections to the "PlaceHolderVar" mess as well. This is a pretty major change and will doubtless break stuff throughout the planner. I'm also not clear on the planner performance implications; both point (3) and the need for two rounds of alias flattening seem like they'd slow things down. But maybe we could buy some speed back by eliminating kluges, and there is a hope that we'd end up with better plans in a useful number of cases. Anyway, the large amount of work involved and the rather small benefits we'd probably get have discouraged me from working on this. But maybe someday it'll get to the top of the queue. Basically this is all technical debt left over from the way we bolted outer joins onto the original planner design, so we really oughta fix it sometime. regards, tom lane
pgsql-hackers by date: