Home > mailing lists

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date	February 6, 2022 05:23:28
Msg-id	CAKU4AWrPSZHHEaAi9FBhMFLCX+tkD7JF=f8n75yANatK3vZiwg@mail.gmail.com Whole thread Raw
In response to	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? (Andy Fan <zhihui.fan1213@gmail.com>)
List	pgsql-hackers

Tree view

Hi,

On Sat, Feb 5, 2022 at 9:32 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

I'm also not claiming this is 100% worth it - queries with a suitable
combination of clauses (conditions on the join keys) seems rather
uncommon.

Thanks for showing interest in this. I want to add some other user cases

which seem not very uncommon. a). When we join the key on a foregin

table, in which case, push down a qual to foregin key would be pretty

good to reduce the data transformed from the network. b). If the people

join many partitioned table on partitioned key, but they want to query

more than 1 partitions (which means the qual on partition key is not a

simple "partitionKey = Const"), then we have to do a run-time partition

prune (lose the chance for initial partition prune). We have big difference

on the performance aspect as well.

I guess some of the people who think we may need this feature are not very

clear about what bad it would be if we add this feature (Of course Including

me). I summarized the discussion before and hacked the solution at [1], the

current state looks reasonable to me. I'm not sure if I missed any point.

> Of course, this breaks the estimates in the faster query, because we now
> apply the condition twice - once for the index scan, one as the join
> clause. So instead of ~100k rows the join is estimated as ~1000 rows.

I think my patch has addressed this. Here is the example:

postgres=# set geqo to off; -- disable this feature, we have an estimation error.

-- using geqo guc in patch is just for easy testing.
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) and t2.a > 99000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.73..2408.37 rows=990 width=8)
(actual time=0.032..21.350 rows=99900 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991 width=4)
(actual time=0.014..0.121 rows=1000 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20 rows=101301 width=4)
(actual time=0.013..9.854 rows=99900 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
Planning Time: 0.282 ms
Execution Time: 24.823 ms
(10 rows)

postgres=# set geqo to on; -- enable this feature and let planner derive the qual by itself, the estimation

-- is good.
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.73..2408.37 rows=97680 width=8)
(actual time=0.031..21.296 rows=99900 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991 width=4)
(actual time=0.014..0.116 rows=1000 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20 rows=101301 width=4)
(actual time=0.012..9.751 rows=99900 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
Planning Time: 0.269 ms
Execution Time: 24.749 ms
(10 rows)

So I think knowing what bad it is to have this feature is the key point to discussion now.

[1] https://www.postgresql.org/message-id/CAKU4AWpo9z0hMHDWUKuce4Z-NpcybV0J2UVu5%2BDVwyP-CrHCQg%40mail.gmail.com

Best Regards

Andy Fan

pgsql-hackers by date:

From: Noah Misch
Date: 06 February 2022, 02:58:06
Subject: Re: Unclear problem reports

From: Andres Freund
Date: 06 February 2022, 06:23:39
Subject: Re: Adding CI to our tree

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

Previous

Next