Home > mailing lists

Re: Filtering before join with date_trunc() - Mailing list pgsql-general

From	Adrian Klaver
Subject	Re: Filtering before join with date_trunc()
Date	October 15, 2018 19:30:09
Msg-id	fcab9054-e2b3-fc08-3f57-80c7e0bff297@aklaver.com Whole thread Raw
In response to	Filtering before join with date_trunc() ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>)
List	pgsql-general

Tree view

On 10/15/18 8:57 AM, Phil Endecott wrote:
> Dear Experts,
> 
> I have a few tables with "raw" timestamsps like this:
> 
> +-------------------------------+----------+
> |             time              | pressure |
> +-------------------------------+----------+
> | 2018-09-14 00:00:07.148378+00 |  1007.52 |
> | 2018-09-14 00:10:07.147506+00 |  1007.43 |
> | 2018-09-14 00:20:07.147533+00 |  1007.28 |
> +-------------------------------+----------+
> 
> For each of these tables I have a view which rounds the timestamp
> to the nearest minute, and ensures there is only one row per minute:
> 
>   SELECT date_trunc('minute'::text, tbl."time") AS "time",
>      max(tbl.pressure) AS pressure
>     FROM tbl
>    GROUP BY (date_trunc('minute'::text, tbl."time"))
>    ORDER BY (date_trunc('minute'::text, tbl."time"));
> 
> I then join these tables on the rounded time:
> 
>   SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
>      rain.rain,
>      pressures.pressure,
>      temperatures.temperature
>     FROM rain
>       FULL JOIN pressures USING ("time")
>       FULL JOIN temperatures USING ("time");
> 
> +------------------------+------+----------+-------------+
> |          time          | rain | pressure | temperature |
> +------------------------+------+----------+-------------+
> | 2018-09-14 00:00:00+00 |    0 |  1007.52 |      11.349 |
> | 2018-09-14 00:10:00+00 |    0 |  1007.43 |     11.2317 |
> | 2018-09-14 00:20:00+00 |    0 |  1007.28 |     11.2317 |
> +------------------------+------+----------+-------------+
> 
> The COALESCE for time and the full joins are needed because some
> columns may be missing for some minutes.
> 
> Now I'd like to find the values for a particular short time period:
> 
> SELECT * FROM readings
> WHERE "time" BETWEEN '2018-10-01T00:00:00' AND '2018-10-01T24:00:00'

Is readings a table or view?

If view is the SELECT COALESCE ... query the view query?

> 
> This works, but it is inefficient; it seems to create all the rounded
> data, do the join on all of it, and then filter on the time period.
> Ideally it would filter the raw data, and then need to round and join
> far fewer rows.
> 
> It would not be difficult for me to round the timestamps when inserting
> the data, and also ensure that there is only one row for each minute.
> But I've done some experiments and even if I remove all the rounding and
> replace the full joins with regular joins, it still does sequential
> scans on at least one of the tables:
> 
> Nested Loop  (cost=12.95..144.99 rows=135 width=20)
>     Join Filter: (x_rain."time" = x_pressures."time")
>     ->  Hash Join  (cost=12.67..97.83 rows=135 width=24)
>           Hash Cond: (x_temperatures."time" = x_rain."time")
>           ->  Seq Scan on x_temperatures  (cost=0.00..67.50 rows=4350 width=12)
>           ->  Hash  (cost=10.98..10.98 rows=135 width=12)
>                 ->  Index Scan using x_rain_by_time on x_rain  (cost=0.28..10.98 rows=135 width=12)
>                       Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("time" <=
'2018-10-0200:00:00+00'::timestamp with time zone))
 
>     ->  Index Scan using x_pressures_by_time on x_pressures  (cost=0.28..0.34 rows=1 width=12)
>           Index Cond: ("time" = x_temperatures."time")
> 
> Maybe that is because the tables are currently relatively small (a
> few thousands rows) and it believes that sequential scans are faster.
> (I have sometimes wished for an "explain" variant that tells me what
> query plan it would choose if all the tables were 100X larger.)
> 
> Is there anything I can do to make this more efficient when the tables
> are larger?
> 
> 
> Thanks for any suggestions.
> 
> 
> Regards, Phil.
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

From: Adrian Klaver
Date: 15 October 2018, 19:22:55
Subject: Re: Regarding varchar max length in postgres

From: Tom Lane
Date: 15 October 2018, 19:45:52
Subject: Re: Filtering before join with date_trunc()

Re: Filtering before join with date_trunc() - Mailing list pgsql-general

Previous

Next