Re: AW: Missing constant propagation in planner on hash quals causes join slowdown - Mailing list pgsql-hackers

From Tom Lane
Subject Re: AW: Missing constant propagation in planner on hash quals causes join slowdown
Date
Msg-id 7771.1576452845@sss.pgh.pa.us
Whole thread Raw
In response to AW: Missing constant propagation in planner on hash quals causes joinslowdown  (Hans Buschmann <buschmann@nidsa.net>)
List pgsql-hackers
Hans Buschmann <buschmann@nidsa.net> writes:
> The planner correctly sets the index conditions (knows that the xx_season columns are constant), but fails to apply
thisconstantness to the hash conditions by discarding a constant column in a hash table. 

Yeah.  The reason for this behavior is that, after
reconsider_outer_join_clauses has decided that it can derive
tfact.t2_season = 3 from the given conditions, it still "throws back"
the original join condition tmaster.t1_season = tfact.t2_season into
the set of ordinary join clauses, so that that condition will still get
applied at the time the join is computed.  The comment about it claims

 * If we don't find any match for a set-aside outer join clause, we must
 * throw it back into the regular joinclause processing by passing it to
 * distribute_restrictinfo_to_rels().  If we do generate a derived clause,
 * however, the outer-join clause is redundant.  We still throw it back,
 * because otherwise the join will be seen as a clauseless join and avoided
 * during join order searching; but we mark it as redundant to keep from
 * messing up the joinrel's size estimate.

However, this seems to be a lie, or at least not the whole truth.  If you
try diking out the throw-back logic, which is simple enough to do, you'll
immediately find that some test cases in join.sql give the wrong answers.
The reason is that once we've derived tfact.t2_season = 3 and asserted
that that's an equivalence condition, the eclass logic believes that
tmaster.t1_season = tfact.t2_season must hold everywhere (that's more or
less the definition of an eclass).  But *it doesn't hold* above the left
join, because tfact.t2_season could be null instead.  In particular this
can break any higher joins involving tfact.t2_season.  By treating
tmaster.t1_season = tfact.t2_season as an ordinary join clause we force
the necessary tests to be made anyway, independently of the eclass logic.
(There's no bug in Hans' example because there's only one join; the
problem is not really with this particular clause, but with other columns
that might also be thought equal to tfact.t2_season.  It's those upper
join clauses that can't safely be thrown away.)

I've had a bee in my bonnet for a long time about redesigning all this
to be less klugy.  Fundamentally what we lack is an honest representation
that a given value might be NULL instead of the original value from its
table; this results in assorted compromises both here and elsewhere in
the planner.  The rough sketch that's been lurking in my hindbrain is

1) Early in the planner where we flatten join alias variables, do so
only when they actually are formally equivalent to the input variables,
ie only for the non-nullable side of any outer join.  This gives us
the desired representation distinction between original and
possibly-nulled values: the former are base-table Vars, the latter
are join Vars.

2) tmaster.t1_season = tfact.t2_season could be treated as an honest
equivalence condition *between those variables*, but it would not
imply anything about the join output variable "j.t2_season".
I think reconsider_outer_join_clauses goes away entirely.

3) Places where we're trying to estimate variable values would have
to be taught to look through unflattened join alias variables to see
what they're based on.  For extra credit they could assign some
higher-than-normal probability that the value is null (though that
could be done later, since there's certainly nothing accounting
for that today).

4) The final flattening of these alias variables would probably not
happen till setrefs.c.

5) I have not thought through what this implies for full joins.
The weird COALESCE business might go away entirely, or it might
be something that setrefs.c still has to insert.

6) There are lots of other, related kluges that could stand to be revisited
at the same time.  The business with "outer-join delayed" clauses is
a big example.  Maybe that all just magically goes away once we have
a unique understanding of what value a Var represents, but I've not
thought hard about it.  We'd certainly need some new understanding of
how to schedule outer-join clauses, since their Var membership would
no longer correspond directly to sets of baserels.  There might be
connections to the "PlaceHolderVar" mess as well.

This is a pretty major change and will doubtless break stuff throughout
the planner.  I'm also not clear on the planner performance implications;
both point (3) and the need for two rounds of alias flattening seem like
they'd slow things down.  But maybe we could buy some speed back by
eliminating kluges, and there is a hope that we'd end up with better
plans in a useful number of cases.

Anyway, the large amount of work involved and the rather small benefits
we'd probably get have discouraged me from working on this.  But maybe
someday it'll get to the top of the queue.  Basically this is all
technical debt left over from the way we bolted outer joins onto the
original planner design, so we really oughta fix it sometime.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: What's the best way to get flex and bison on Windows?
Next
From: Thomas Munro
Date:
Subject: Re: Collation versions on Windows (help wanted, apply within)