Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? |
Date | |
Msg-id | 1642f540-badc-ef36-5d24-465f3ccb3343@enterprisedb.com Whole thread Raw |
In response to | Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? (Andy Fan <zhihui.fan1213@gmail.com>) |
Responses |
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
|
List | pgsql-hackers |
Hi, there's been an interesting case [1] of a slow query on pgsql-general, related to the topic discussed in this thread. It causes an order the query to run slower by multiple orders of magnitude, and I think it's interesting, so let me present a simple example demonstrating it. ------------------------------------------------------------------------ create table t1 (a int); create table t2 (a int); insert into t1 select i from generate_series(1,100000) s(i); insert into t2 select mod(i,100000) from generate_series(1,10000000) s(i); create index on t1(a); create index on t2(a); vacuum analyze t1, t2; -- we need to force mergejoin set enable_nestloop = off; ------------------------------------------------------------------------ Now, let's run a simple query: SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a) WHERE (t1.a > 99000) ORDER BY t1.a LIMIT 100; QUERY PLAN ------------------------------------------------------------------------ Limit (cost=4.63..224.57 rows=100 width=8) (actual time=8999.487..8999.707 rows=100 loops=1) -> Merge Join (cost=4.63..209447.97 rows=95226 width=8) (actual time=8999.485..8999.620 rows=100 loops=1) Merge Cond: (t1.a = t2.a) -> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.25 rows=969 width=4) (actual time=0.010..0.011 rows=1 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 -> Index Only Scan using t2_a_idx on t2 (cost=0.43..183464.09 rows=9999977 width=4) (actual time=0.026..4594.757 rows=9900200 loops=1) Heap Fetches: 0 Planning Time: 0.338 ms Execution Time: 8999.768 ms (10 rows) Now, let's do a simple trick and add condition on t2.a, "implied" by the join condition (t1.a = t2.a) and inequality (t1.a > 99000). SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a) WHERE (t1.a > 99000) AND (t2.a > 99000) ORDER BY t1.a LIMIT 100; QUERY PLAN ------------------------------------------------------------------------ Limit (cost=0.77..250.39 rows=100 width=8) (actual time=0.040..0.294 rows=100 loops=1) -> Merge Join (cost=0.77..2297.23 rows=920 width=8) (actual time=0.039..0.172 rows=100 loops=1) Merge Cond: (t1.a = t2.a) -> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.25 rows=969 width=4) (actual time=0.031..0.031 rows=1 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 -> Index Only Scan using t2_a_idx on t2 (cost=0.43..2014.87 rows=96596 width=4) (actual time=0.005..0.052 rows=100 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 Planning Time: 0.222 ms Execution Time: 0.414 ms (11 rows) Well, that's quite a difference! From 9000ms to 1ms, pretty good. What is happening in the first plan is the merge join needs t2 sorted by t2.a, and the index-only-scan looks like a great way to do that, as it has low startup cost (because LIMIT likes that). But this completely misses that (t1.a > 9900) implies (t2.a > 9900) through the equality in join condition. So we start scanning t2_a_idx, only to throw the first 99% of tuples away. In the original report this is particularly egregious, because the index only scan looks like this: -> Index Only Scan using data_class_pkey on data_class ta (cost=0.57..4935483.78 rows=216964862 width=8) (actual time=0.018..35022.908 rows=151321889 loops=1) Heap Fetches: 151321889 So yeah, 151M heap fetches, that's bound to be expensive :-/ Adding the condition on t2.a allows just skipping the first chunk of the index, eliminating the expensive part. Of course, this breaks the estimates in the faster query, because we now apply the condition twice - once for the index scan, one as the join clause. So instead of ~100k rows the join is estimated as ~1000 rows. I'm also not claiming this is 100% worth it - queries with a suitable combination of clauses (conditions on the join keys) seems rather uncommon. But it seems like an interesting example, because it may be seen either as missed execution optimization (failing to skip the initial chunk of rows), or an costing issue due to not accounting for having to process the rows (which would likely result in picking a different plan). regards [1] https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: