Using b-tree index for >= condition when joining - Mailing list pgsql-general

From Łukasz Dąbek
Subject Using b-tree index for >= condition when joining
Date
Msg-id CAE2Xe9tLfLxE_pthPLFerywyA4RxtGTwMCqKpGk6VqMYOj6Lfw@mail.gmail.com
Whole thread Raw
Responses Re: Using b-tree index for >= condition when joining
List pgsql-general
Hello All!

I am having a problem with nudging postgres to choose a good plan for
a query involving a left join and an inequality constraint on a column
with b-tree index.

Let's say both tbl1 and tbl2 tables have date column with an index on
it. Queries like "SELECT * FROM tbl1 WHERE date >= CONSTANT" are using
index scan, as expected. Now let's define a view:

=# CREATE VIEW vw1 AS SELECT t1.date as date, t1.x as x, t2.y as y
FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date);

Query of the form "SELECT * FROM vw1 WHERE date = '2020-04-21'" is
using index scan on both tables:

=# EXPLAIN SELECT * FROM vw1 WHERE date = '2020-04-21';
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=91208.02..112781024.50 rows=10000000000 width=12)
   Hash Cond: (t2.date = t1.date)
   ->  Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..188393.92 rows=100000 width=8)
         Index Cond: (date = '2019-04-21'::date)
   ->  Hash  (cost=89566.58..89566.58 rows=100000 width=8)
         ->  Bitmap Heap Scan on tbl1 t1  (cost=1875.43..89566.58
rows=100000 width=8)
               Recheck Cond: (date = '2019-04-21'::date)
               ->  Bitmap Index Scan on tbl1_date_idx
(cost=0.00..1850.43 rows=100000 width=0)
                     Index Cond: (date = '2019-04-21'::date)

(I know the total number of rows estimated for this and next queries
is enormous, in reality there are more conditions on the join but I
want to keep the example small)

However when an inequality is used the query plan seems inefficient:

=# EXPLAIN SELECT * FROM vw1 WHERE date >= '2020-04-21';
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Hash Left Join  (cost=483538.43..4617954384.38 rows=410400000000 width=12)
   Hash Cond: (t1.date = t2.date)
   ->  Index Scan using tbl1_date_idx on tbl1 t1
(cost=0.43..369147.38 rows=4104000 width=8)
         Index Cond: (date >= '2019-04-21'::date)
   ->  Hash  (cost=234163.00..234163.00 rows=15200000 width=8)
         ->  Seq Scan on tbl2 t2  (cost=0.00..234163.00 rows=15200000 width=8)

It looks like the inequality on date isn't pushed down below the left
join? I can get the plan I'd like to have by putting the same
constraint on the date column on the second table:

=# EXPLAIN SELECT * FROM tbl1 t1 LEFT JOIN tbl2 t2 USING (date) WHERE
t1.date >= '2019-04-21' AND t2.date >= '2019-04-21';
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Hash Join  (cost=281625.87..1651822721.88 rows=112860000000 width=26)
   Hash Cond: (t2.date = t1.date)
   ->  Index Scan using tbl2_date_idx on tbl2 t2
(cost=0.43..369784.44 rows=4180000 width=15)
         Index Cond: (date >= '2019-04-21'::date)
   ->  Hash  (cost=210285.43..210285.43 rows=4104000 width=15)
         ->  Bitmap Heap Scan on tbl1 t1  (cost=76822.43..210285.43
rows=4104000 width=15)
               Recheck Cond: (date >= '2019-04-21'::date)
               ->  Bitmap Index Scan on tbl1_date_idx
(cost=0.00..75796.43 rows=4104000 width=0)
                     Index Cond: (date >= '2019-04-21'::date)

Is it possible to define a view vw2 such that queries of the form
"SELECT * FROM vw2 WHERE date >= CONSTANT" use the plan I pasted
above?

Thanks in advance for help,
Lukasz



pgsql-general by date:

Previous
From: Don Seiler
Date:
Subject: Re: template0 needing vacuum freeze?
Next
From: Christian Ramseyer
Date:
Subject: Re: Removing Last field from CSV string