Re: Filtering before join with date_trunc() - Mailing list pgsql-general

From Phil Endecott
Subject Re: Filtering before join with date_trunc()
Date
Msg-id 1539658709144@dmwebmail.dmwebmail.chezphil.org
Whole thread Raw
In response to Re: Filtering before join with date_trunc()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Thanks all for the replies.

Tom Lane wrote:
> You're expecting too much.

That often seems to be the case.

> I think you're also expecting the system to deduce that it can apply an
> inequality on one join column to the other one.  It doesn't; only equality
> constraints have any sort of transitivity logic.
>
> So you'll need to write out the BETWEEN separately for each table,
> and put it below the full join, which means you won't be able to
> use those nice views :-(

Here's an example:

create table t1 ("time" timestamptz, value1 numeric);
create index t1_time on t1("time");
\copy t1 from ......

create table t2 ("time" timestamptz, value2 numeric);
create index t2_time on t2("time");
\copy t2 from ......

explain select * from t1 join t2 using("time") where "time" between '2018-10-01' and '2018-10-02';
+------------------------------------------------------------------------------------------------+
|                                           QUERY PLAN                                           |
+------------------------------------------------------------------------------------------------+
| Hash Join  (cost=12.99..101.03 rows=138 width=21)                                              |
|   Hash Cond: (t2."time" = t1."time")                                                           |
|   ->  Seq Scan on t2  (cost=0.00..70.11 rows=4411 width=15)                                    |
|   ->  Hash  (cost=11.18..11.18 rows=145 width=14)                                              |
|         ->  Index Scan using t1_time on t1  (cost=0.28..11.18 rows=145 width=14)               |
|               Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND .|
|.("time" <= '2018-10-02 00:00:00+00'::timestamp with time zone))                                |
+------------------------------------------------------------------------------------------------+

explain with q1 as (select * from t1 where "time" between '2018-10-01' and '2018-10-02'), q2 as (select * from t2 where
"time"between '2018-10-01' and '2018-10-02') select * from q1 join q2 using("time");
 
+------------------------------------------------------------------------------------------------+
|                                           QUERY PLAN                                           |
+------------------------------------------------------------------------------------------------+
| Hash Join  (cost=26.60..31.41 rows=136 width=72)                                               |
|   Hash Cond: (q1."time" = q2."time")                                                           |
|   CTE q1                                                                                       |
|     ->  Index Scan using t1_time on t1  (cost=0.28..11.18 rows=145 width=14)                   |
|           Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("ti.|
|.me" <= '2018-10-02 00:00:00+00'::timestamp with time zone))                                    |
|   CTE q2                                                                                       |
|     ->  Index Scan using t2_time on t2  (cost=0.28..11.00 rows=136 width=15)                   |
|           Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("ti.|
|.me" <= '2018-10-02 00:00:00+00'::timestamp with time zone))                                    |
|   ->  CTE Scan on q1  (cost=0.00..2.90 rows=145 width=40)                                      |
|   ->  Hash  (cost=2.72..2.72 rows=136 width=40)                                                |
|         ->  CTE Scan on q2  (cost=0.00..2.72 rows=136 width=40)                                |
+------------------------------------------------------------------------------------------------+


So.... as you say, even if I strip out all of the complexity of approximate 
timestamps and missing values, it's never going to push the BETWEEN filter 
down below the join.  Even with just a few thousand rows I see a 5X speedup 
with the second query with the explicit filtering below the join.

This is rather disappointing.  Am I the only person who's ever wanted to do 
this?


Regards, Phil.





pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: FATAL: terminating connection because protocol synchronizationwas lost
Next
From: Ben Madin
Date:
Subject: Re: Saving view turns SELECT * into field list