Home > mailing lists

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date	February 11, 2022 13:37:17
Msg-id	CAKU4AWpOqbhOwnxZS80DZWrc7daqAmMV7aKr7EDg3jkZcCUPOw@mail.gmail.com Whole thread Raw
In response to	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? (Andy Fan <zhihui.fan1213@gmail.com>)
Responses	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? ("Finnerty, Jim" <jfinnert@amazon.com>)
List	pgsql-hackers

Tree view

So I think knowing what bad it is to have this feature is the key point to discussion now.

I re-read the discussion at 2015 [1] and the below topic is added for the above

question. Here is the summary for easy discussion.

====

From planner aspect:

> While I've only read your description of the patch not the patch itself,

> the search methodology you propose seems pretty brute-force and

> unlikely to solve that issue. It's particularly important to avoid O(N^2)

> behaviors when there are N expressions ...

The patch has 3 steps in general. 1). Gather the filter_qual_list during

the deconstruct_jointree. only unmergeable qual is gathered here.

2). After the root->eq_classes is built, scan each of the above quals to

find out if there is a EC match, if yes, add it to the EC. There are

some fast paths here. like ec->relids, em->em_relids. 3). compose

the qual in ec_filter and members in ec_members, then distribute it to

the relations. This step take the most cycles of this feature, and it is

the most important part for this feature as well.

Fortunately, thousands of partitions of a table would not make it worse

since they are not generated at that stage. So I'd believe the number of

ECs or EMs in an EC would be pretty small in common cases.

> time would be spent on searches for matching subexpressions whether

> or not anything was learned (and often nothing would be learned).

This is about some cases like "SELECT * FROM t1, t2 WHERE t1.a = t2.a

and t1.b > 3". In this case, we still need to go through steps 1 & 2, all the fast

paths don't work and the equal() is unavoidable. However step 3 can be ignored.

If we want to improve this, could we maintain an attr_eq_indexes in RelOptInfos

which indicates if the given attribute appears in any one of EC members?

=====

From executor aspects:

> The reason why the answer isn't automatically "all of them"
> is because, first of all, it's possible that enforcing the condition
> at a particular table costs more to filter out the rows that we save
> in execution time at higher levels of the plan tree. For example,
> consider A JOIN B ON A.X = B.X WHERE A.X > 1000000. It might be that
> the range of A.X is [0,1000001] but the range of B.X is
> [1000000,2000000]; so enforcing the inequality against A is very
> selective but enforcing it against B filters out basically nothing.

I think we can classify this as we push down / execute an qual, the

qual takes lots of cycles, but it doesn't filter many rows.

> A first cut might be to enforce the inequality against the relation
> where it's believed to be most selective, equivalence-class column

> mentioned in the inequality provided that the
> selectivity is thought to be above some threshold ... but I'm not sure
> this is very principled,

I can only input +1 after some deep thoughts.

>> Furthermore, there are some cases involving parameterized paths where
>> enforcing the inequality multiple times is definitely bad: for
>> example, if we've got a nested loop where the outer side is a seq scan
>> that enforces the condition and the inner side is an index probe, it
>> is just a waste to retest it on the inner side. We already know that
>> the outer row passes the inequality, so the inner row will necessarily
>> pass also. This doesn't apply to merge or hash joins, and it also
>> doesn't apply to all nested loops: scans that aren't paramaterized by
>> the equivalence-class column can still benefit from separate
>> enforcement of the inequality.
>>
> I guess that could be fixed by somehow marking these pushed quals as
> optional and having parameterised scans ignore optional quals.

This has been done by committing 4.

> Now, all that having been said, I think this is a pretty important
> optimization. Lots of people have asked for it, and I think it would
> be worth expending some brainpower to try to figure out a way to be
> smarter than we are now, which is, in a nutshell, as dumb as possible.

+1. I asked custom to add the derivable quals manually for 10+ of table

each query last year and gained great results.

Anyone still have interest in this? Or is a better solution really possible?

Or is the current method too bad to rescue?

Best Regards

Andy Fan

pgsql-hackers by date:

From: Dilip Kumar
Date: 11 February 2022, 13:02:45
Subject: Assertion failure in WaitForWALToBecomeAvailable state machine

From: Amit Kapila
Date: 11 February 2022, 13:57:31
Subject: Re: [BUG]Update Toast data failure in logical replication

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

Previous

Next