Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date
Msg-id e9c62bf4-95cd-ad83-944d-b68ea5b717d8@enterprisedb.com
Whole thread Raw
In response to Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers

On 2/17/22 23:16, Robert Haas wrote:
> On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> IMHO the whole problem is we're unable to estimate the join clause as a
>> conditional probability, i.e.
>>
>>    P(A.x = B.x | (A.x < 42) & (B.x < 42))
>>
>> so maybe instead of trying to generate additional RelOptInfo items we
>> should think about improving that. The extra RelOptInfos don't really
>> solve this, because even if you decide to join A|x<42 to B|x<42 it does
>> nothing to improve the join clause estimate.
> 
> I guess I hadn't considered that angle. I think the extra RelOptInfos
> (or whatever) actually do solve a problem, because enforcing a
> high-selectivity join qual against both sides is potentially quite
> wasteful, and you need some way to decide whether to do it on one
> side, the other, or both. But it's also true that I was wrong to
> assume independence ... and if we could avoid assuming that, then the
> join selectivity would work itself out without any of the machinery
> that I just proposed.
> 

True. We kinda already have this issue for the equality clauses, and
having paths with the condition pushed down (or not) seems like a
natural approach.

>> It actually deals with a more general form of this case, because the
>> clauses don't need to reference the same attribute - so for example this
>> would work too, assuming there is extended stats object on the columns
>> on each side:
>>
>>   P(A.c = B.d | (A.e < 42) & (B.f < 42))
> 
> That'd be cool.
> 

Yeah, but the patch implementing this still needs more work.

>> Not sure. In my experience queries with both a join clause and other
>> clauses referencing the same attribute are pretty rare. But I agree if
>> we can do the expensive stuff only when actually needed, with no cost in
>> the 99.999% other cases, I don't see why not. Of course, code complexity
>> is a cost too.
> 
> Right. I mean, we could have a planner GUC to control whether the
> optimization is used even in cases where we see that it's possible.
> But Tom keeps arguing that it is possible in many queries and would
> benefit few queries, and I'm not seeing why that should be so. I think
> it's likely to benefit many of the queries to which it applies.
> 

Maybe. Although the example I linked some time ago shows a pretty
dramatic improvement, due to picking merge join + index scan, and not
realizing we'll have to skip a lot of data. But that's just one
anecdotal example.

Anyway, I think the best way to deal with these (perfectly legitimate)
concerns is to show how expensive it is for queries not not having such
join/restriction clauses, with the cost being close to 0. And then for
queries with such clauses but not benefiting from the change (a bit like
a worst case).


regards


[1]
https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Next
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations