Thread: Using b-tree index for >= condition when joining

Using b-tree index for >= condition when joining

From
Łukasz Dąbek
Date:
Hello All!

I am having a problem with nudging postgres to choose a good plan for
a query involving a left join and an inequality constraint on a column
with b-tree index.

Let's say both tbl1 and tbl2 tables have date column with an index on
it. Queries like "SELECT * FROM tbl1 WHERE date >= CONSTANT" are using
index scan, as expected. Now let's define a view:

=# CREATE VIEW vw1 AS SELECT t1.date as date, t1.x as x, t2.y as y
FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date);

Query of the form "SELECT * FROM vw1 WHERE date = '2020-04-21'" is
using index scan on both tables:

=# EXPLAIN SELECT * FROM vw1 WHERE date = '2020-04-21';
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=91208.02..112781024.50 rows=10000000000 width=12)
   Hash Cond: (t2.date = t1.date)
   ->  Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..188393.92 rows=100000 width=8)
         Index Cond: (date = '2019-04-21'::date)
   ->  Hash  (cost=89566.58..89566.58 rows=100000 width=8)
         ->  Bitmap Heap Scan on tbl1 t1  (cost=1875.43..89566.58
rows=100000 width=8)
               Recheck Cond: (date = '2019-04-21'::date)
               ->  Bitmap Index Scan on tbl1_date_idx
(cost=0.00..1850.43 rows=100000 width=0)
                     Index Cond: (date = '2019-04-21'::date)

(I know the total number of rows estimated for this and next queries
is enormous, in reality there are more conditions on the join but I
want to keep the example small)

However when an inequality is used the query plan seems inefficient:

=# EXPLAIN SELECT * FROM vw1 WHERE date >= '2020-04-21';
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Hash Left Join  (cost=483538.43..4617954384.38 rows=410400000000 width=12)
   Hash Cond: (t1.date = t2.date)
   ->  Index Scan using tbl1_date_idx on tbl1 t1
(cost=0.43..369147.38 rows=4104000 width=8)
         Index Cond: (date >= '2019-04-21'::date)
   ->  Hash  (cost=234163.00..234163.00 rows=15200000 width=8)
         ->  Seq Scan on tbl2 t2  (cost=0.00..234163.00 rows=15200000 width=8)

It looks like the inequality on date isn't pushed down below the left
join? I can get the plan I'd like to have by putting the same
constraint on the date column on the second table:

=# EXPLAIN SELECT * FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date) WHERE
t1.date >= '2019-04-21' AND t2.date >= '2019-04-21';
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Hash Join  (cost=281625.87..1651822721.88 rows=112860000000 width=26)
   Hash Cond: (t2.date = t1.date)
   ->  Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..369784.44 rows=4180000 width=15)
         Index Cond: (date >= '2019-04-21'::date)
   ->  Hash  (cost=210285.43..210285.43 rows=4104000 width=15)
         ->  Bitmap Heap Scan on tbl1 t1  (cost=76822.43..210285.43
rows=4104000 width=15)
               Recheck Cond: (date >= '2019-04-21'::date)
               ->  Bitmap Index Scan on tbl1_date_idx
(cost=0.00..75796.43 rows=4104000 width=0)
                     Index Cond: (date >= '2019-04-21'::date)

Is it possible to define a view vw2 such that queries of the form
"SELECT * FROM vw2 WHERE date >= CONSTANT" use the plan I pasted
above?

Thanks in advance for help,
Lukasz



Re: Using b-tree index for >= condition when joining

From
Tom Lane
Date:
=?UTF-8?B?xYF1a2FzeiBExIViZWs=?= <sznurek@gmail.com> writes:
> I am having a problem with nudging postgres to choose a good plan for
> a query involving a left join and an inequality constraint on a column
> with b-tree index.
> ...
> It looks like the inequality on date isn't pushed down below the left
> join?

Nope.  The planner only derives implied conditions from equality clauses.
There've been discussions about that in the past, but it was (and remains)
unclear that trying to account for other clause types would be a net win.
The planner-cycles-expended versus number-of-queries-improved tradeoff
doesn't look promising.

> I can get the plan I'd like to have by putting the same
> constraint on the date column on the second table:

Note that you're not really getting the same plan that way: it's not
a left join anymore, because you put a strict constraint on the join's
inner relation, so the planner realizes it doesn't have to produce any
null-extended rows.  You could make it work with the desired semantics
with something along the lines of

SELECT * FROM tbl1 t1
  LEFT JOIN (select * from tbl2 where tbl2.date >= '2019-04-21') t2
  USING (date)
  WHERE t1.date >= '2019-04-21';

but of course that's even less easy :-(

            regards, tom lane



Re: Using b-tree index for >= condition when joining

From
Łukasz Dąbek
Date:
On 17/05/2020 at 04:22, Tom Lane wrote:> Note that you're not really 
getting the same plan that way: it's not
> a left join anymore, because you put a strict constraint on the join's
> inner relation, so the planner realizes it doesn't have to produce any
> null-extended rows.  You could make it work with the desired semantics
> with something along the lines of
> 
> SELECT * FROM tbl1 t1
>    LEFT JOIN (select * from tbl2 where tbl2.date >= '2019-04-21') t2
>    USING (date)
>    WHERE t1.date >= '2019-04-21';
> 
> but of course that's even less easy :-(

Thanks, I didn't realise my version of the query was incorrect.

It seems like there isn't much hope of creating a view equivalent to the 
query which would behave reasonably with "date >= CONST" constraint, or 
am I missing something? I think I can create a workaround using 
functions but would love to know if there's something simpler I could do.

Thanks,
Lukasz