Re: Filtering before join with date_trunc() - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Filtering before join with date_trunc() |
Date | |
Msg-id | fcab9054-e2b3-fc08-3f57-80c7e0bff297@aklaver.com Whole thread Raw |
In response to | Filtering before join with date_trunc() ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>) |
List | pgsql-general |
On 10/15/18 8:57 AM, Phil Endecott wrote: > Dear Experts, > > I have a few tables with "raw" timestamsps like this: > > +-------------------------------+----------+ > | time | pressure | > +-------------------------------+----------+ > | 2018-09-14 00:00:07.148378+00 | 1007.52 | > | 2018-09-14 00:10:07.147506+00 | 1007.43 | > | 2018-09-14 00:20:07.147533+00 | 1007.28 | > +-------------------------------+----------+ > > For each of these tables I have a view which rounds the timestamp > to the nearest minute, and ensures there is only one row per minute: > > SELECT date_trunc('minute'::text, tbl."time") AS "time", > max(tbl.pressure) AS pressure > FROM tbl > GROUP BY (date_trunc('minute'::text, tbl."time")) > ORDER BY (date_trunc('minute'::text, tbl."time")); > > I then join these tables on the rounded time: > > SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time", > rain.rain, > pressures.pressure, > temperatures.temperature > FROM rain > FULL JOIN pressures USING ("time") > FULL JOIN temperatures USING ("time"); > > +------------------------+------+----------+-------------+ > | time | rain | pressure | temperature | > +------------------------+------+----------+-------------+ > | 2018-09-14 00:00:00+00 | 0 | 1007.52 | 11.349 | > | 2018-09-14 00:10:00+00 | 0 | 1007.43 | 11.2317 | > | 2018-09-14 00:20:00+00 | 0 | 1007.28 | 11.2317 | > +------------------------+------+----------+-------------+ > > The COALESCE for time and the full joins are needed because some > columns may be missing for some minutes. > > Now I'd like to find the values for a particular short time period: > > SELECT * FROM readings > WHERE "time" BETWEEN '2018-10-01T00:00:00' AND '2018-10-01T24:00:00' Is readings a table or view? If view is the SELECT COALESCE ... query the view query? > > This works, but it is inefficient; it seems to create all the rounded > data, do the join on all of it, and then filter on the time period. > Ideally it would filter the raw data, and then need to round and join > far fewer rows. > > It would not be difficult for me to round the timestamps when inserting > the data, and also ensure that there is only one row for each minute. > But I've done some experiments and even if I remove all the rounding and > replace the full joins with regular joins, it still does sequential > scans on at least one of the tables: > > Nested Loop (cost=12.95..144.99 rows=135 width=20) > Join Filter: (x_rain."time" = x_pressures."time") > -> Hash Join (cost=12.67..97.83 rows=135 width=24) > Hash Cond: (x_temperatures."time" = x_rain."time") > -> Seq Scan on x_temperatures (cost=0.00..67.50 rows=4350 width=12) > -> Hash (cost=10.98..10.98 rows=135 width=12) > -> Index Scan using x_rain_by_time on x_rain (cost=0.28..10.98 rows=135 width=12) > Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("time" <= '2018-10-0200:00:00+00'::timestamp with time zone)) > -> Index Scan using x_pressures_by_time on x_pressures (cost=0.28..0.34 rows=1 width=12) > Index Cond: ("time" = x_temperatures."time") > > Maybe that is because the tables are currently relatively small (a > few thousands rows) and it believes that sequential scans are faster. > (I have sometimes wished for an "explain" variant that tells me what > query plan it would choose if all the tables were 100X larger.) > > Is there anything I can do to make this more efficient when the tables > are larger? > > > Thanks for any suggestions. > > > Regards, Phil. > > > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: