Re: Filtering before join with date_trunc() - Mailing list pgsql-general

From Tom Lane
Subject Re: Filtering before join with date_trunc()
Date
Msg-id 2018.1539621952@sss.pgh.pa.us
Whole thread Raw
In response to Filtering before join with date_trunc()  ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>)
Responses Re: Filtering before join with date_trunc()  ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>)
List pgsql-general
"Phil Endecott" <spam_from_pgsql_lists@chezphil.org> writes:
> ...
> For each of these tables I have a view which rounds the timestamp
> to the nearest minute, and ensures there is only one row per minute:

>  SELECT date_trunc('minute'::text, tbl."time") AS "time",
>     max(tbl.pressure) AS pressure
>    FROM tbl
>   GROUP BY (date_trunc('minute'::text, tbl."time"))
>   ORDER BY (date_trunc('minute'::text, tbl."time"));

> I then join these tables on the rounded time:

>  SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
>     rain.rain,
>     pressures.pressure,
>     temperatures.temperature
>    FROM rain
>      FULL JOIN pressures USING ("time")
>      FULL JOIN temperatures USING ("time");

> The COALESCE for time and the full joins are needed because some
> columns may be missing for some minutes.

> Now I'd like to find the values for a particular short time period:

> SELECT * FROM readings
> WHERE "time" BETWEEN '2018-10-01T00:00:00' AND '2018-10-01T24:00:00'

> This works, but it is inefficient; it seems to create all the rounded
> data, do the join on all of it, and then filter on the time period.
> Ideally it would filter the raw data, and then need to round and join
> far fewer rows.

You're expecting too much.  In the first place, it's generally impossible
to push constraints down through full joins, because that could change the
results.  (For example, if we remove a row from one join input, that could
result in getting unwanted null-extended rows from the other join input.)
Maybe with a whole lot of analysis we could prove that applying the "same"
constraint to both join keys doesn't break anything, but there's no such
intelligence there now --- in general, we've expended little if any effort
on optimizing full joins.  It's also not very clear to me that we can do
anything at all with pushing down constraints that are expressed in terms
of a JOIN USING merged column; they don't really constrain either input
individually.

> But I've done some experiments and even if I remove all the rounding and
> replace the full joins with regular joins, it still does sequential
> scans on at least one of the tables:

I think you're also expecting the system to deduce that it can apply an
inequality on one join column to the other one.  It doesn't; only equality
constraints have any sort of transitivity logic.

So you'll need to write out the BETWEEN separately for each table,
and put it below the full join, which means you won't be able to
use those nice views :-(

            regards, tom lane


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Filtering before join with date_trunc()
Next
From: David Steele
Date:
Subject: Re: Setting up continuous archiving