Re: Filtering before join with date_trunc() - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Filtering before join with date_trunc()
Date
Msg-id CA+bJJbyBWWM9zKoRrybMMRKUkAU=9svqs+CvJYg2-Bwmh+qdBw@mail.gmail.com
Whole thread Raw
In response to Filtering before join with date_trunc()  ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>)
List pgsql-general
Hi Phil:

On Mon, Oct 15, 2018 at 5:57 PM, Phil Endecott
<spam_from_pgsql_lists@chezphil.org> wrote:
...
> For each of these tables I have a view which rounds the timestamp
> to the nearest minute, and ensures there is only one row per minute:
...
> I then join these tables on the rounded time:
....
> Now I'd like to find the values for a particular short time period:

For what I propose I assume the SHORT time is really short....

...
> This works, but it is inefficient; it seems to create all the rounded
> data, do the join on all of it, and then filter on the time period.
> Ideally it would filter the raw data, and then need to round and join
> far fewer rows.

I think you are expecting too much, I mean, you are expecting the
server to know it can expand your time-period into a (maybe) bigger
one which covers the original data and push that condition down. In my
experience this is normally too much.

...

> Is there anything I can do to make this more efficient when the tables
> are larger?

If your periods are really short you could try to replace the time
condition on the views to a time condition in the tables and do the
rounding and grouping afterwards. I mean, use a half-open interval to
catch the relevant chunks of the tables and then join the short
results.

I think with a trio of CTE selecting with a time interval on the WHERE
and doing the date_trunc()/MAX() group by you should be able to do
three index scans producing short results which can then be
full-joined and coalesced. If you want the interval from $A to $B (
rounded to minutes ), do something like....

WITH pressures AS (
 SELECT date_trunc('minute'::text, tbl."time") AS "time",
    max(tbl.pressure) AS pressure
   FROM tbl
-- Chop the relevant time..
WHERE time >= $A and time < $B + '1 minute'::interval
-- There may be easier ways to make the above condition if you are
generating the text, but always use half-open
  GROUP BY 1 ),  -- Easier to type, and no order-by here ( and I
normally label order by in views as a code smell )....
,
yada, yada....-- repeat for temperatures, rain

SELECT
  SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
    rain.rain,
    pressures.pressure,
    temperatures.temperature
   FROM rain
     FULL JOIN pressures USING ("time")
     FULL JOIN temperatures USING ("time")
ORDER BY 1; -- ORDER BY GOES HERE, I suspect your example got ordered
by chance, not by dessign.

( use other names, I just used the view names for C&P, lazy me ).


Francisco Olarte.


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Regarding varchar max length in postgres
Next
From: Adrian Klaver
Date:
Subject: Re: Regarding varchar max length in postgres