Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date
Msg-id 1099349.1620834846@sss.pgh.pa.us
Whole thread Raw
In response to Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Dmitry Astapov <dastapov@gmail.com>)
Responses Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
List pgsql-hackers
Dmitry Astapov <dastapov@gmail.com> writes:
> I am trying to understand the behaviour of the query planner regarding the
> push-down of the conditions "through" the join.

I think your mental model is wrong.  What's actually happening here is
that the planner uses equivalence classes to deduce implied conditions.
That is, we have the join condition a.adate = b.bdate and then you've
added the where condition a.adate = '2021-05-12'.  Transitivity implies
that b.bdate = '2021-05-12', so we deduce that condition and are able
to apply it at the relation scan of b.  Furthermore, having restricted
both a.adate and b.bdate to the same constant value at the scan level,
we no longer need to apply the join condition a.adate = b.bdate at all.
This is important not only to avoid the (probably minor) inefficiency
of rechecking the join condition, but because if we believed that all
three conditions were independently applicable, we'd come out with a
serious underestimate of the size of the join result.

> In my experiments, I was never able to get an execution plan that "pushes
> down" any condition apart from (=) through to the right side of the join,

None of the argument sketched above works for non-equality conditions.
There are some situations where you could probably figure out how to
use transitivity to deduce some implied condition, but cleaning things
up so that you don't have redundant conditions fouling up the join
size estimates seems like a hard problem.

Another issue is that we could easily expend a lot of cycles on deductions
that lead nowhere, because once you try to open up the mechanism to
consider operators other than equality, there will be a lot of things that
it looks at and then fails to do anything with.  The equivalence class
mechanism is tied into the same logic that considers merge and hash joins,
so we are expending lots of cycles anytime we see an equality operator,
and not so much for other operators.

> Equally surprising is that I was unable to find documentation or past
> mailing list discussions of this or similar topic, which leads me to
> believe that I am just not familiar with the proper terminology and can't
> come up with the right search terms.

src/backend/optimizer/README has a discussion of equivalence classes.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Extending amcheck to check toast size and compression
Next
From: Mark Dilger
Date:
Subject: Re: Granting control of SUSET gucs to non-superusers