Re: [HACKERS] [PROPOSAL] Temporal query processing with range types - Mailing list pgsql-hackers

From Peter Moser
Subject Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Date
Msg-id CAHO0eLYyvuqwF=2FsgDn1xOs_NOrFBu9Xh-Wq+aWfFy0y6=jWQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PROPOSAL] Temporal query processing with range types  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
2017-02-15 20:24 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
> except apparently there's no join type and the optimizer can never
> reorder these operations with each other or with other joins.  Is that
> right?  The optimizer changes in this patch seem fairly minimal, so
> I'm guessing it can't be doing anything very complex here.

ALIGN/NORMALIZE operators are aliased from-clause items, which get rewritten
into a subquery using LEFT OUTER JOIN. The main idea behind that
is to reuse as much as possible of the existing PostgreSQL code, and
just provide
one executor function to process the output of these rewritten
queries. The optimizer
code is minimal, because we do not use any new constructs (except the
temporal adjustment
node for align/normalize) due to these rewrites. That is, all needed
optimization
techniques are already present.

> What happens if you perform the ALIGN or NORMALIZE operation using
> something other than an equality operator, like, say, less-than?  Or
> an arbitrary user-defined operator.

It is possible to use ALIGN/NORMALIZE with user-defined functions,
and non-equality operators.

> There's no documentation in this patch.  I'm not sure you want to go
> to the trouble of writing SGML documentation until this has been
> reviewed enough that it has a real chance of getting committed, but on
> the other hand we're obviously all struggling to understand what it
> does, so I think if not SGML documentation it at least needs a real
> clear explanation of what the syntax is and does in a README or
> something, even just for initial review.

We are currently writing SGML documentation and extend in-code comments.
Both, will be send soon.

> It's hard to see what's going on here.  What's ts?  What's te?  If you
> used longer names for these things, it might be a bit more
> self-documenting.

ts, te describe an half-open interval in which the tuple is considered valid:

  [time point start, time point end).

We have added an extended version of comments for both parser functions, i.e.,
transformTemporalAligner and transformTemporalNormalizer. See attached patch
(src/backend/parser/parse_temporal.c).

> If we are going to transform an ALIGN operator in to a left outer
> join, why do we also have an executor node for it?
>
> +               fcLowerLarg = makeFuncCall(SystemFuncName("lower"),
> +
> list_make1(crLargTs),
> +
> UNKNOWN_LOCATION);
> +               fcLowerRarg = makeFuncCall(SystemFuncName("lower"),
> +
> list_make1(crRargTs),
> +
> UNKNOWN_LOCATION);
> +               fcUpperLarg = makeFuncCall(SystemFuncName("upper"),
> +
> list_make1(crLargTs),
> +
> UNKNOWN_LOCATION);
> +               fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
> +
> list_make1(crRargTs),
> +
> UNKNOWN_LOCATION);
>
> Why is a temporal operator calling functions that upper-case and
> lower-case strings?  In one sense this whole function (and much of the
> nearby code) is very straightforward code and you can see exactly why
> it's doing it.  In another sense it's totally inscrutable: WHY is it
> doing any of that stuff?

These functions extract the lower-bound, and upper-bound of range types.

>
> -       char       *strategy;           /* partitioning strategy
> ('list' or 'range') */
> -       List       *partParams;         /* List of PartitionElems */
> -       int                     location;               /* token
> location, or -1 if unknown */
> +       char       *strategy;   /* partitioning strategy ('list' or 'range') */
> +       List       *partParams; /* List of PartitionElems */
> +       int                     location;       /* token location, or
> -1 if unknown */
>
> I think this is some kind of mistake on your end while generating the
> patch.  It looks like you patched one version of the source code, and
> diffed against another.

Thank you for pointing this out, we fixed it in our new patch.



2017-02-16 13:41 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
> Also, it sounds like all of this is intended to work with ranges that
> are stored in different columns rather than with PostgreSQL's built-in
> range types.

Our syntax supports PostgreSQL's built-in range types and ranges that
are stored in different columns.

For instance, for range types an ALIGN query would look like this:
  SELECT * FROM (r ALIGN s ON q WITH (r.t, s.t)) c

... and for ranges in different columns like this:
  SELECT * FROM (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c

... where r and s are input relations, q can be any join qualifier, and
    r.t, s.t, r.ts, r.te, s.ts, and s.te can be any column name. The
    latter represent the valid time intervals, that is time point start,
    and time point end of each tuple for each input relation. These can
    be defined as four scalars, or two half-open, i.e., [), range typed
    values.



Best regards,
Anton, Johann, Michael, Peter



ps. The patch has been rebased on top of
commit 090f21bbad98001979da8589e9647a1d49bce4ee
from "Sun Feb 19 17:18:10 2017 -0500"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Peter Moser
Date:
Subject: Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] Checksums by default?