Thread: [PROPOSAL] Temporal query processing with range types

[PROPOSAL] Temporal query processing with range types

From

Anton Dignös

Date:

22 July 2016, 16:37:39

Hi hackers,

we are a group of researches that work on temporal databases. Our main focus is
the processing of data with time intervals, such as the range types in
PostgreSQL.

For this type of processing the range types that are stored with the tuples are
considered to represent the tuples' valid time, such as for instance the time
period an employee works for a project. Intuitively, the processing of temporal
operators corresponds to the use of "at each time point" in natural language.
For example, a temporal count on such a relation (temporal relation) shall
determine the count at each time point, while an anti-join shall compute an
anti-join at each time point.

We would like to contribute to PostgreSQL a solution that supports the query
processing of "at each time point". The basic idea is to offer two new
operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
ranges of tuples so that subsequent queries can use the usual grouping and
equality conditions to get the intended results.

Query processing works as follows:
<temporal operator> := <NORMALIZE | ALIGN> + <traditional operator>

Attached is a proposal patch file that implements this extension and on which
the example queries below can be run. The support includes distinct,
aggregation, set operations, Cartesian product, Join, Left Join, Right Join,
Full Join, and Anti-join.

In our prototype the syntax for the two new operators is:

... FROM (table_ref NORMALIZE table_ref
          USING(att_list)
          WITH (columnref, columnref)
         ) alias_clause ...

and

... FROM (table_ref ALIGN table_ref
          ON a_expr
          WITH (columnref, columnref)
         ) alias_clause ...

where NORMALIZE is used for distinct, aggregation and set operations and ALIGN
is used for Cartesian product, Join, Left Join, Right Join, Full Join, and
Anti-join.

-------------------------------------------------------------------------------
EXAMPLE FOR AGGREGATION
-------------------------------------------------------------------------------

Relation 'projects' records when employees are assigned to projects.

CREATE TABLE projects (empl VARCHAR, proj VARCHAR, pay FLOAT, t DATERANGE);
INSERT INTO projects VALUES
('Ann', 'P1', 60, DATERANGE('2016-01-01', '2016-09-01')),
('Sam', 'P1', 40, DATERANGE('2016-06-01', '2016-12-01')),
('Joe', 'P2', 80, DATERANGE('2016-03-01', '2016-06-01'));

TABLE projects;

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-09-01)
 Sam  | P1   |  40 | [2016-06-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(3 rows)

QUERY1: At each time point, what is the number of employees assigned
to projects?

First, we use NORMALIZE to adjust the ranges so that they can be
aggregated as usual by using grouping on the adjusted timestamp t:

SELECT *
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted;

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-03-01)
 Ann  | P1   |  60 | [2016-03-01,2016-06-01)
 Ann  | P1   |  60 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-09-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(6 rows)

Observe that the time ranges have been adjusted, and it is now trivial
to compute the count of employees at each time point by grouping on t:

SELECT COUNT(*), t
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted
GROUP BY t;

 count |            t
-------+-------------------------
     1 | [2016-01-01,2016-03-01)
     2 | [2016-03-01,2016-06-01)
     2 | [2016-06-01,2016-09-01)
     1 | [2016-09-01,2016-12-01)
(4 rows)

QUERY2: At each time point, what is the number of employees assigned
to EACH project?

SELECT *
FROM (projects p1 NORMALIZE projects p2 USING(proj) WITH(t,t)) p_adjusted;

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-06-01)
 Ann  | P1   |  60 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-09-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(5 rows)

and

SELECT COUNT(*), proj, t
FROM (projects p1 NORMALIZE projects p2 USING(proj) WITH(t,t)) p_adjusted
GROUP BY proj, t;

 count | proj |            t
-------+------+-------------------------
     1 | P2   | [2016-03-01,2016-06-01)
     1 | P1   | [2016-01-01,2016-06-01)
     2 | P1   | [2016-06-01,2016-09-01)
     1 | P1   | [2016-09-01,2016-12-01)
(4 rows)

-------------------------------------------------------------------------------
EXAMPLE FOR ANTI-JOIN
-------------------------------------------------------------------------------

QUERY3: At each time point, deterime the employees with the highest
salary.

Without time this query is answered with a NOT EXISTS subquery. With
time periods everything remains, but the time ranges must be adjusted
first:

SELECT *
FROM (projects p1 ALIGN projects p2 ON p1.pay < p2.pay WITH (t,t)) p1_adjusted
WHERE NOT EXISTS (
 SELECT *
 FROM (projects p2 ALIGN projects p1 ON p1.pay < p2.pay WITH (t,t)) p2_adjusted
 WHERE p1_adjusted.pay < p2_adjusted.pay
       AND p1_adjusted.t = p2_adjusted.t );

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-03-01)
 Ann  | P1   |  60 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-09-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(4 rows)


-------------------------------------------------------------------------------
EXAMPLE FOR JOINS
-------------------------------------------------------------------------------

CREATE TABLE managers (mgr VARCHAR, proj VARCHAR, t DATERANGE);
INSERT INTO managers VALUES
('Tom', 'P1', DATERANGE('2016-09-01', '2016-12-01')),
('Frank', 'P2', DATERANGE('2016-01-01', '2016-12-01'));

TABLE managers;

  mgr  | proj |            t
-------+------+-------------------------
 Tom   | P1   | [2016-09-01,2016-12-01)
 Frank | P2   | [2016-01-01,2016-12-01)
(2 rows)

QUERY4: At each time point, determine employees and their managers
(including the periods with no employee or no manager).

WITH p_ext AS (SELECT *, t u FROM projects),
     m_ext AS (SELECT *, t v FROM managers)
SELECT proj, empl, mgr, t
FROM (p_ext ALIGN m_ext ON m_ext.proj = p_ext.proj WITH (t,t)) p_adjusted
       FULL OUTER JOIN
     (m_ext ALIGN p_ext ON m_ext.proj = p_ext.proj WITH (t,t)) m_adjusted
       USING (proj, t)
WHERE t = u * v OR v IS NULL OR u IS NULL;

 proj | empl |  mgr  |            t
------+------+-------+-------------------------
 P1   | Ann  |       | [2016-01-01,2016-09-01)
 P1   | Sam  |       | [2016-06-01,2016-09-01)
 P1   | Sam  | Tom   | [2016-09-01,2016-12-01)
 P2   |      | Frank | [2016-01-01,2016-03-01)
 P2   | Joe  | Frank | [2016-03-01,2016-06-01)
 P2   |      | Frank | [2016-06-01,2016-12-01)
(6 rows)

-------------------------------------------------------------------------------
IMPLEMENTATION
-------------------------------------------------------------------------------

Our implementation approach is to use two operators that take care of
adjusting the ranges in such a way that afterwards the corresponding
nontemporal operator can be used to compute the result:
input --> {NORMALIZE/ALIGN} --> nontemporal operator --> result.

The two operators (NORMALIZE and ALIGN) in this prototype are rewritten into
subqueries, which then serve as input for a new executor function
ExecTemporalAdjustment. The executor function takes care of the splitting of
the range types.

For NORMALIZE the tuples' ranges need to be split into all sub-ranges
according to all matching ranges of the second relation. For this we
create a subquery that first joins one relation with the range
boundaries of the other and then sorts the result. The executor
function splits the ranges in a sweep-line based manner.

For ALIGN the tuples' ranges must be split into all intersections and
differences with the other relation according to the join condition.
For this we create a subquery that first joins the two relations and
then sorts the result. The executor function splits the ranges
accordingly in a sweep-line based manner.

After the splitting it is possible to only use equality (GROUP BY,
DISTINCT, JOIN CONDITION, ...) on the range types to get the intended
result.

The subquery is 'visible' with the EXPLAIN command.

Range types need to be of half-open, i.e., [lower, upper).

Unbound ranges (Infinity), NaN, and NULL values (of ranges and range
bounds) are not yet supported.

-------------------------------------------------------------------------------

We are grateful for any feedback.

Best regards,

Anton, Johann, Michael, Peter

Attachment

tpg_primitives_out_v1.patch

Re: [PROPOSAL] Temporal query processing with range types

From

David Fetter

Date:

22 July 2016, 22:01:48

On Fri, Jul 22, 2016 at 01:15:17PM +0200, Anton Dignös wrote:
> Hi hackers,
> 
> we are a group of researches that work on temporal databases.  Our
> main focus is the processing of data with time intervals, such as
> the range types in PostgreSQL.

Thanks for your hard work so far!

[Explanation and examples elided]

To what extent, if any, are you attempting to follow the SQL:2011
standard?

http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [PROPOSAL] Temporal query processing with range types

From

Anton Dignös

Date:

23 July 2016, 13:37:54

On Sat, Jul 23, 2016 at 12:01 AM, David Fetter <david@fetter.org> wrote:
> On Fri, Jul 22, 2016 at 01:15:17PM +0200, Anton Dignös wrote:
>> Hi hackers,
>>
>> we are a group of researches that work on temporal databases.  Our
>> main focus is the processing of data with time intervals, such as
>> the range types in PostgreSQL.
>
> Thanks for your hard work so far!
>
> [Explanation and examples elided]
>
> To what extent, if any, are you attempting to follow the SQL:2011
> standard?
>
> http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

The querying in the SQL:2011 standard is based on simple SQL range restrictions
and period predicates (OVERLAP, PRECEDES, FOR SYSTEM_TIME AS OF, etc) that
functionality-wise in PostgreSQL are already covered by the operators and
functions on range types.

Operations such as aggregation, outer joins, set-operations on ranges
(mentioned in
Section 2.5 "Future directions" in the above paper) are not yet part of the
standard. These are the operations that require the adjustment (or splitting) of
ranges.

Best,

Anton

Re: [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

27 July 2016, 14:09:59

On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös <dignoes@inf.unibz.it> wrote:
> We would like to contribute to PostgreSQL a solution that supports the query
> processing of "at each time point". The basic idea is to offer two new
> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
> ranges of tuples so that subsequent queries can use the usual grouping and
> equality conditions to get the intended results.

I think that it is great that you want to contribute your work to
PostgreSQL.  I don't know whether there will be a consensus that this
is generally useful functionality that we should accept, but I commend
the effort anyhow.  Assuming there is, getting this into a state that
we consider committable will probably take quite a bit of additional
work on your part; no one will do it for you.  If you're still
interested in proceeding given those caveats, please add your patch
here so that it gets reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

25 October 2016, 09:45:57

On 27.07.2016 at 16:09 Robert Haas wrote:
> On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös <dignoes@inf.unibz.it> wrote:
>> We would like to contribute to PostgreSQL a solution that supports the query
>> processing of "at each time point". The basic idea is to offer two new
>> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
>> ranges of tuples so that subsequent queries can use the usual grouping and
>> equality conditions to get the intended results.
>
> I think that it is great that you want to contribute your work to
> PostgreSQL.  I don't know whether there will be a consensus that this
> is generally useful functionality that we should accept, but I commend
> the effort anyhow.  Assuming there is, getting this into a state that
> we consider committable will probably take quite a bit of additional
> work on your part; no one will do it for you.


Hi hackers,

thank you for your feedback.

We are aware that contributing to PostgreSQL is a long way with a lot
of work.  We are committed to go all the way and do the work as
discussed in the community.

We had some internal discussions about the project, looking also at
some other patches to better understand whether the patch is 
work-in-progress or ready for commitfest.


> If you're still
> interested in proceeding given those caveats, please add your patch
> here so that it gets reviewed:
>
> https://commitfest.postgresql.org/action/commitfest_view/open


We decided to follow your recommendation and add the patch to the 
commitfest.


Looking forward for your feedback,
Anton, Johann, Michael, Peter

Re: [PROPOSAL] Temporal query processing with range types

From

Haribabu Kommi

Date:

05 December 2016, 05:12:03

On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com> wrote:

We decided to follow your recommendation and add the patch to the commitfest.

Path is not applying properly to HEAD.

Moved to next CF with "waiting on author" status.

Regards,

Hari Babu

Fujitsu Australia

Re: [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

07 December 2016, 14:59:55

Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:
>
>
> On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com
> <mailto:pitiz29a@gmail.com>> wrote:
>
>
>     We decided to follow your recommendation and add the patch to the
>     commitfest.
>
>
> Path is not applying properly to HEAD.
> Moved to next CF with "waiting on author" status.
>

We updated our patch. We tested it with the latest
commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

>
> Regards,
> Hari Babu
> Fujitsu Australia

Best regards,
Anton, Johann, Michael, Peter

Attachment

tpg_primitives_out_v2.patch

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

David Fetter

Date:

16 December 2016, 09:17:28

On Wed, Dec 07, 2016 at 03:57:33PM +0100, Peter Moser wrote:
> Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:
> > 
> > 
> > On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com
> > <mailto:pitiz29a@gmail.com>> wrote:
> > 
> > 
> >     We decided to follow your recommendation and add the patch to the
> >     commitfest.
> > 
> > 
> > Path is not applying properly to HEAD.
> > Moved to next CF with "waiting on author" status.
> > 
> 
> We updated our patch. We tested it with the latest
> commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

This looks neat, but it no longer applies to master.  Is a rebase in
the offing?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

16 December 2016, 16:55:42

Am 16.12.2016 um 07:17 schrieb David Fetter:
> On Wed, Dec 07, 2016 at 03:57:33PM +0100, Peter Moser wrote:
>> Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:
>>>
>>>
>>> On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com
>>> <mailto:pitiz29a@gmail.com>> wrote:
>>>
>>>
>>>     We decided to follow your recommendation and add the patch to the
>>>     commitfest.
>>>
>>>
>>> Path is not applying properly to HEAD.
>>> Moved to next CF with "waiting on author" status.
>>>
>>
>> We updated our patch. We tested it with the latest
>> commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.
>
> This looks neat, but it no longer applies to master.  Is a rebase in
> the offing?

We rebased our patch on top of HEAD, that is, commit 
93513d1b6559b2d0805f0b02d312ee550e3d010b.


Best regards,
Anton, Johann, Michael, Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

tpg_primitives_out_v3.patch

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Eisentraut

Date:

04 January 2017, 18:21:21

So I'm looking at this patch in the commit fest.  I have only a general
understanding of temporal query processing.

What this patch does is to add two new clauses for FROM-list items,
NORMALIZE and ALIGN, which reshuffle a set of ranges into a new list
that can then be aggregated more easily.  From the original message:

> For NORMALIZE the tuples' ranges need to be split into all sub-ranges
> according to all matching ranges of the second relation. For this we
> create a subquery that first joins one relation with the range
> boundaries of the other and then sorts the result. The executor
> function splits the ranges in a sweep-line based manner.
>
> For ALIGN the tuples' ranges must be split into all intersections and
> differences with the other relation according to the join condition.
> For this we create a subquery that first joins the two relations and
> then sorts the result. The executor function splits the ranges
> accordingly in a sweep-line based manner.

So there isn't really temporal query processing as such here, only some
helpers that can make it easier.

I can see how those operations can be useful, but it would help if there
were a more formal definition to be able to check that further.

What I'm missing here is some references: existing implementations,
standards, documentation, research papers, alternative ideas, rejected
alternatives, etc.

Also, the submission is missing documentation and test cases.  There are
technical terms used in the code that I don't understand.

I think there are probably many interesting applications for normalizing
or otherwise adjusting ranges.  I'd like to see an overview and
consideration of other applications.

Ideally, I'd like to see these things implemented as some kind of
user-space construct, like an operator or function.  I think we'd need a
clearer definition of what it is they do before we can evaluate that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

13 January 2017, 17:22:14

> What this patch does is to add two new clauses for FROM-list items,
> NORMALIZE and ALIGN, which reshuffle a set of ranges into a new list
> that can then be aggregated more easily. From the original message:
>
> > For NORMALIZE the tuples' ranges need to be split into all sub-ranges
> > according to all matching ranges of the second relation. For this we
> > create a subquery that first joins one relation with the range
> > boundaries of the other and then sorts the result. The executor
> > function splits the ranges in a sweep-line based manner.
> >
> > For ALIGN the tuples' ranges must be split into all intersections and
> > differences with the other relation according to the join condition.
> > For this we create a subquery that first joins the two relations and
> > then sorts the result. The executor function splits the ranges
> > accordingly in a sweep-line based manner.
>
> So there isn't really temporal query processing as such here, only some
> helpers that can make it easier.

The goal of temporal aligners and normalizers is to split ranges to allow a
reduction from temporal queries to their non-temporal counterparts. Splitting
ranges is necessary for temporal query processing. Temporal aligners and
normalizer may then be used as building-blocks for any temporal query construct.

> I can see how those operations can be useful, but it would help if there
> were a more formal definition to be able to check that further.

We have published two papers, that contain formal definitions and related work
for the temporal aligner and normalizer. Please see [1] and [2].

> What I'm missing here is some references: existing implementations,
> standards, documentation, research papers, alternative ideas, rejected
> alternatives, etc.

A good overview of existing implementations in DBMSs, SQL standard, and history
is given in [3].

> Also, the submission is missing documentation and test cases. There are
> technical terms used in the code that I don't understand.

We added a second patch with test cases and expected results. We are now
writing the documentation in sgml-format.

> I think there are probably many interesting applications for normalizing
> or otherwise adjusting ranges. I'd like to see an overview and
> consideration of other applications.

Please see the attached file adjustment.sql for some interesting applications.

> Ideally, I'd like to see these things implemented as some kind of
> user-space construct, like an operator or function. I think we'd need a
> clearer definition of what it is they do before we can evaluate that.

Can you please explain what you mean by "user-space construct" in this case.

Best regards,
Anton, Johann, Michael, Peter

----
[1] Anton Dignös, Michael H. Böhlen, Johann Gamper:
Temporal alignment. SIGMOD Conference 2012: 433-444
http://doi.acm.org/10.1145/2213836.2213886
[2] Anton Dignös, Michael H. Böhlen, Johann Gamper, Christian S. Jensen:
Extending the Kernel of a Relational DBMS with Comprehensive Support for
Sequenced Temporal Queries. ACM Trans. Database Syst. 41(4): 26:1-26:46 (2016)
http://doi.acm.org/10.1145/2967608
[3] https://www2.cs.arizona.edu/people/rts/sql3.html and
https://www2.cs.arizona.edu/people/rts/tsql2.html

Attachment

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Eisentraut

Date:

18 January 2017, 05:57:57

On 1/13/17 9:22 AM, Peter Moser wrote:
> The goal of temporal aligners and normalizers is to split ranges to allow a
> reduction from temporal queries to their non-temporal counterparts.
> Splitting
> ranges is necessary for temporal query processing. Temporal aligners and
> normalizer may then be used as building-blocks for any temporal query
> construct.

I would need to see the exact definitions of these constructs.  Please
send some documentation.

> We have published two papers, that contain formal definitions and
> related work
> for the temporal aligner and normalizer. Please see [1] and [2].

I don't have access to those.

>> I think there are probably many interesting applications for normalizing
>> or otherwise adjusting ranges.  I'd like to see an overview and
>> consideration of other applications.
> 
> Please see the attached file adjustment.sql for some interesting
> applications.

That's surely interesting, but without knowing what these operations are
supposed to do, I can only reverse engineer and guess.

>> Ideally, I'd like to see these things implemented as some kind of
>> user-space construct, like an operator or function.  I think we'd need a
>> clearer definition of what it is they do before we can evaluate that.
> 
> Can you please explain what you mean by "user-space construct" in this case.

Implement them using the extensibility features, such as a user-defined
operator.  I don't know if it's possible, but it's something to consider.

Using common terms such as ALIGN and NORMALIZE for such a specific
functionality seems a bit wrong.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

24 January 2017, 12:32:51

2017-01-18 3:57 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
>
> On 1/13/17 9:22 AM, Peter Moser wrote:
> > The goal of temporal aligners and normalizers is to split ranges to allow a
> > reduction from temporal queries to their non-temporal counterparts.
> > Splitting
> > ranges is necessary for temporal query processing. Temporal aligners and
> > normalizer may then be used as building-blocks for any temporal query
> > construct.
>
> I would need to see the exact definitions of these constructs.  Please
> send some documentation.
>
> > We have published two papers, that contain formal definitions and
> > related work
> > for the temporal aligner and normalizer. Please see [1] and [2].
>
> I don't have access to those.

The papers can be freely downloaded from
http://www.inf.unibz.it/~dignoes/publications.html using the "Author-ize link".

> >> I think there are probably many interesting applications for normalizing
> >> or otherwise adjusting ranges.  I'd like to see an overview and
> >> consideration of other applications.
> >
> > Please see the attached file adjustment.sql for some interesting
> > applications.
>
> That's surely interesting, but without knowing what these operations are
> supposed to do, I can only reverse engineer and guess.

Intuitively what they do is as follows:

NORMALIZE: splits all the ranges of one relation according to all the range
boundaries of another (but possibly the same) relation whenever some equality
condition over some given attributes is satisfied.

When the two relations are the same, all ranges with the given equal attributes
are either equal or disjoint. After this, the traditional GROUP BY or DISTINCT
can be applied. The attributes given to NORMALIZE for a (temporal) GROUP BY are
the grouping attributes and for a (temporal) DISTINCT the target list
attributes.

When the two relations are different, but they each contain disjoint ranges
for the same attributes (as the current limitation for the set operations is)
we perform a symmetric NORMALIZE on each of them. Then we have a similar
situation as before, i.e., in both relations ranges with the same attributes
are either equal or disjoint and a traditional set operation
(EXCEPT/INTERSECT/UNION) can be applied. The attributes given to NORMALIZE for
a (temporal) EXCEPT/INTERSECT/UNION are the target list attributes.

ALIGN: splits all the ranges of one relation according to all the range
intersections of another relation, i.e., it produces all intersections and
non-overlapping parts, whenever some condition is satisfied.

We perform a symmetric ALIGN on each relation, after which a traditional inner
or outer join can be applied using equality on the ranges to calculate the
overlap. The condition given to a (temporal) inner or outer join is the
join condition without overlap.

> >> Ideally, I'd like to see these things implemented as some kind of
> >> user-space construct, like an operator or function.  I think we'd need a
> >> clearer definition of what it is they do before we can evaluate that.
> >
> > Can you please explain what you mean by "user-space construct" in this case.
>
> Implement them using the extensibility features, such as a user-defined
> operator.  I don't know if it's possible, but it's something to consider.

We experimented with user-defined operators and set-returning functions. The
main issue with these is that ALIGN and NORMALIZE rely on comparison and
processing of one tuple with many tuples at a time that is not possible with
operators, and for set-returning functions there is no clean way of passing
tables and/or conditions.

> Using common terms such as ALIGN and NORMALIZE for such a specific
> functionality seems a bit wrong.

Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
options? We are also thankful for any suggestion or comments about the syntax.

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Michael Paquier

Date:

01 February 2017, 08:19:59

On Tue, Jan 24, 2017 at 6:32 PM, Peter Moser <pitiz29a@gmail.com> wrote:
> [reviews and discussions]

The patch proposed has rotten. Please provide a rebase. By the way, I
am having a hard time applying your patches with patch or any other
methods... I am moving it to CF 2017-03 because of the lack of
reviews.
-- 
Michael

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

02 February 2017, 20:43:46

2017-02-01 6:19 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:
> The patch proposed has rotten. Please provide a rebase. By the way, I
> am having a hard time applying your patches with patch or any other
> methods... I am moving it to CF 2017-03 because of the lack of
> reviews.

We have rebased our patches on top of commit
53dd2da257fb5904b087b97dd9c2867390d309c1
from "Thu Feb 2 14:12:35 2017 +0200".

Hereby, we used the following commands to create both patches:
git diff --no-prefix origin/master -- src/ ':!src/test/*' >
tpg_primitives_out_v4.patch

git diff --no-prefix origin/master -- src/test/ >
tpg_primitives_out_tests_v2.patch

We have also tested our patches on the current HEAD with the command:
patch -p0 < patch-file

Both worked without problems or warnings on our Linux machine.
Could you please explain, which problems occurred while you tried to
apply our patches?


Best regards,
Anton, Johann, Michael, Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Eisentraut

Date:

06 February 2017, 22:29:18

On 2/2/17 12:43 PM, Peter Moser wrote:
> Hereby, we used the following commands to create both patches:
> git diff --no-prefix origin/master -- src/ ':!src/test/*' >
> tpg_primitives_out_v4.patch
> 
> git diff --no-prefix origin/master -- src/test/ >
> tpg_primitives_out_tests_v2.patch
> 
> We have also tested our patches on the current HEAD with the command:
> patch -p0 < patch-file
> 
> Both worked without problems or warnings on our Linux machine.
> Could you please explain, which problems occurred while you tried to
> apply our patches?

Your patches apply OK for me.

In the future, just use git diff without any options or git
format-patch, and put both the code and the tests in one patch.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

15 February 2017, 22:03:46

On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz29a@gmail.com> wrote:
> NORMALIZE: splits all the ranges of one relation according to all the range
> boundaries of another (but possibly the same) relation whenever some equality
> condition over some given attributes is satisfied.
>
> When the two relations are the same, all ranges with the given equal attributes
> are either equal or disjoint. After this, the traditional GROUP BY or DISTINCT
> can be applied. The attributes given to NORMALIZE for a (temporal) GROUP BY are
> the grouping attributes and for a (temporal) DISTINCT the target list
> attributes.
>
> When the two relations are different, but they each contain disjoint ranges
> for the same attributes (as the current limitation for the set operations is)
> we perform a symmetric NORMALIZE on each of them. Then we have a similar
> situation as before, i.e., in both relations ranges with the same attributes
> are either equal or disjoint and a traditional set operation
> (EXCEPT/INTERSECT/UNION) can be applied. The attributes given to NORMALIZE for
> a (temporal) EXCEPT/INTERSECT/UNION are the target list attributes.
>
>
> ALIGN: splits all the ranges of one relation according to all the range
> intersections of another relation, i.e., it produces all intersections and
> non-overlapping parts, whenever some condition is satisfied.
>
> We perform a symmetric ALIGN on each relation, after which a traditional inner
> or outer join can be applied using equality on the ranges to calculate the
> overlap. The condition given to a (temporal) inner or outer join is the
> join condition without overlap.

I don't quite understand the difference between NORMALIZE and ALIGN.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

15 February 2017, 22:24:24

On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz29a@gmail.com> wrote:
>> Using common terms such as ALIGN and NORMALIZE for such a specific
>> functionality seems a bit wrong.
>
> Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
> options? We are also thankful for any suggestion or comments about the syntax.

So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
except apparently there's no join type and the optimizer can never
reorder these operations with each other or with other joins.  Is that
right?  The optimizer changes in this patch seem fairly minimal, so
I'm guessing it can't be doing anything very complex here.

What happens if you perform the ALIGN or NORMALIZE operation using
something other than an equality operator, like, say, less-than?  Or
an arbitrary user-defined operator.

There's no documentation in this patch.  I'm not sure you want to go
to the trouble of writing SGML documentation until this has been
reviewed enough that it has a real chance of getting committed, but on
the other hand we're obviously all struggling to understand what it
does, so I think if not SGML documentation it at least needs a real
clear explanation of what the syntax is and does in a README or
something, even just for initial review.

We don't have anything quite like this in PostgreSQL today.  An
ordinary join just matches up things in relation A and relation B and
outputs the matching rows, and something like a SRF takes a set of
input rows and returns a set of output rows.  This is a hybrid - it
takes in two sets of rows, one from each relation being joined, and
produces a derived set of output rows that takes into account both
inputs.

+ * INPUT:
+ *             (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
+ *             where q can be any join qualifier, and r.ts, r.te, s.ts, and s.t
e
+ *             can be any column name.
+ *
+ * OUTPUT:
+ *             (
+ *             SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
+ *      FROM
+ *      (
+ *             SELECT *, row_id() OVER () rn FROM r
+ *      ) r
+ *      LEFT OUTER JOIN
+ *      s
+ *      ON q AND r.ts < s.te AND r.te > s.ts
+ *      ORDER BY rn, P1, P2
+ *      ) c

It's hard to see what's going on here.  What's ts?  What's te?  If you
used longer names for these things, it might be a bit more
self-documenting.

If we are going to transform an ALIGN operator in to a left outer
join, why do we also have an executor node for it?

+               fcLowerLarg = makeFuncCall(SystemFuncName("lower"),
+
list_make1(crLargTs),
+
UNKNOWN_LOCATION);
+               fcLowerRarg = makeFuncCall(SystemFuncName("lower"),
+
list_make1(crRargTs),
+
UNKNOWN_LOCATION);
+               fcUpperLarg = makeFuncCall(SystemFuncName("upper"),
+
list_make1(crLargTs),
+
UNKNOWN_LOCATION);
+               fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
+
list_make1(crRargTs),
+
UNKNOWN_LOCATION);

Why is a temporal operator calling functions that upper-case and
lower-case strings?  In one sense this whole function (and much of the
nearby code) is very straightforward code and you can see exactly why
it's doing it.  In another sense it's totally inscrutable: WHY is it
doing any of that stuff?

-       char       *strategy;           /* partitioning strategy
('list' or 'range') */
-       List       *partParams;         /* List of PartitionElems */
-       int                     location;               /* token
location, or -1 if unknown */
+       char       *strategy;   /* partitioning strategy ('list' or 'range') */
+       List       *partParams; /* List of PartitionElems */
+       int                     location;       /* token location, or
-1 if unknown */

I think this is some kind of mistake on your end while generating the
patch.  It looks like you patched one version of the source code, and
diffed against another.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

"David G. Johnston"

Date:

15 February 2017, 23:33:52

On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz29a@gmail.com> wrote:
>> Using common terms such as ALIGN and NORMALIZE for such a specific
>> functionality seems a bit wrong.
>
> Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
> options? We are also thankful for any suggestion or comments about the syntax.

So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
except apparently there's no join type and the optimizer can never
reorder these operations with each other or with other joins. Is that
right? The optimizer changes in this patch seem fairly minimal, so
I'm guessing it can't be doing anything very complex here.

+ * INPUT:
+ * (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
+ * where q can be any join qualifier, and r.ts, r.te, s.ts, and s.t
e
+ * can be any column name.
+ *
+ * OUTPUT:
+ * (
+ * SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
+ * FROM
+ * (
+ * SELECT *, row_id() OVER () rn FROM r
+ * ) r
+ * LEFT OUTER JOIN
+ * s
+ * ON q AND r.ts < s.te AND r.te > s.ts
+ * ORDER BY rn, P1, P2
+ * ) c

It's hard to see what's going on here. What's ts? What's te? If you
used longer names for these things, it might be a bit more
self-documenting.

Just reasoning out loud here...

ISTM ts and te are "temporal [range] start" and "temporal [range] end" (or probably just the common "timestamp start/end")

From what I can see it is affecting an intersection of the two ranges and, furthermore, splitting the LEFT range into sub-ranges that match up with the sub-ranges found on the right side. From the example above this seems like it should be acting on self-normalized ranges - but I may be missing something by evaluating this out of context.

r1 [1, 6] [ts, te] [time period start, time period end]

s1 [2, 3]

s2 [3, 4]

s3 [5, 7]

r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts)

r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3]

r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4]

r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6]

Thus the intersection is [2,6] but since s1 has three ranges that begin between 2 and 6 (i.e., 2, 3, and 5) there are three output records that correspond to those sub-ranges.

The description in the OP basically distinguishes between NORMALIZE and ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two ranges - discarding the non-overlapping data - while NORMALIZE performs the alignment while also retaining the non-overlapping data.

The rest of the syntax seems to deal with selecting subsets of range records based upon attribute data.

David J.

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

16 February 2017, 15:41:37

On Wed, Feb 15, 2017 at 3:33 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> The description in the OP basically distinguishes between NORMALIZE and
> ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
> ranges - discarding the non-overlapping data - while NORMALIZE performs the
> alignment while also retaining the non-overlapping data.

Hmm.  So is ALIGN like an inner join and NORMALIZE like a full outer
join?  Couldn't you have left and right outer joins, too, like you
null-extend the parts of the lefthand range that don't match but throw
away parts of the righthand range that don't match?

Also, it sounds like all of this is intended to work with ranges that
are stored in different columns rather than with PostgreSQL's built-in
range types.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

20 February 2017, 19:38:10

On Wed, Feb 15, 2017 at 9:33 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
>> except apparently there's no join type and the optimizer can never
>> reorder these operations with each other or with other joins.  Is that
>> right?  The optimizer changes in this patch seem fairly minimal, so
>> I'm guessing it can't be doing anything very complex here.
>>
>> + * INPUT:
>> + *             (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
>> + *             where q can be any join qualifier, and r.ts, r.te, s.ts,
>> and s.t
>> e
>> + *             can be any column name.
>> + *
>> + * OUTPUT:
>> + *             (
>> + *             SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
>> + *      FROM
>> + *      (
>> + *             SELECT *, row_id() OVER () rn FROM r
>> + *      ) r
>> + *      LEFT OUTER JOIN
>> + *      s
>> + *      ON q AND r.ts < s.te AND r.te > s.ts
>> + *      ORDER BY rn, P1, P2
>> + *      ) c
>>
>> It's hard to see what's going on here.  What's ts?  What's te?  If you
>> used longer names for these things, it might be a bit more
>> self-documenting.
>
>
> Just reasoning out loud here...
>
> ISTM ts and te are "temporal [range] start" and "temporal [range] end" (or
> probably just the common "timestamp start/end")
>
> From what I can see it is affecting an intersection of the two ranges and,
> furthermore, splitting the LEFT range into sub-ranges that match up with the
> sub-ranges found on the right side.  From the example above this seems like
> it should be acting on self-normalized ranges - but I may be missing
> something by evaluating this out of context.
>
> r1 [1, 6] [ts, te] [time period start, time period end]
> s1 [2, 3]
> s2 [3, 4]
> s3 [5, 7]
>
> r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts)
>
> r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3]
> r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4]
> r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6]
>
> Thus the intersection is [2,6] but since s1 has three ranges that begin
> between 2 and 6 (i.e., 2, 3, and 5) there are three output records that
> correspond to those sub-ranges.

Yes, this is what the internal rewriting produces for r1.
Note that till now we only support half-open ranges, i.e., [), but for
visibility I will continue this example using closed ranges [].
The executor function (ExecTemporalAdjustment) gets this (the output above) as
the input and will then produce:

r1[1, 1]
r1[2, 3]
r1[3, 4]
r1[5, 6]

Which means also for the ALIGN the non-overlapping parts are retained.

>
> The description in the OP basically distinguishes between NORMALIZE and
> ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
> ranges - discarding the non-overlapping data - while NORMALIZE performs the
> alignment while also retaining the non-overlapping data.

Also for ALIGN we retain the non-overlapping part.
Intersections are symmetric/commutative, so a subsequent outer join can then use
equality on the ranges
to produce join matches (for overlapping) as well as null-extend the
produced non-overlapping parts.

The difference between ALIGN and NORMALIZE is how they split, while ALIGN
produces intersections between pairs of tuples (used for joins) and the
non-overlapping parts, NORMALIZE produces intersections between groups of tuples
(used for aggregation, so that all tuples with the same group have equal
ranges) and the non-overlapping parts.

For instance, the NORMALIZE between r1, s1, s2, and s3 in your example above
would give the following:

r1[1, 1]
r1[2, 2]
r1[3, 3]
r1[4, 4]
r1[5, 6]

>
> The rest of the syntax seems to deal with selecting subsets of range records
> based upon attribute data.

Yes, exactly!


Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

20 February 2017, 19:42:49

2017-02-15 20:24 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
> except apparently there's no join type and the optimizer can never
> reorder these operations with each other or with other joins.  Is that
> right?  The optimizer changes in this patch seem fairly minimal, so
> I'm guessing it can't be doing anything very complex here.

ALIGN/NORMALIZE operators are aliased from-clause items, which get rewritten
into a subquery using LEFT OUTER JOIN. The main idea behind that
is to reuse as much as possible of the existing PostgreSQL code, and
just provide
one executor function to process the output of these rewritten
queries. The optimizer
code is minimal, because we do not use any new constructs (except the
temporal adjustment
node for align/normalize) due to these rewrites. That is, all needed
optimization
techniques are already present.

> What happens if you perform the ALIGN or NORMALIZE operation using
> something other than an equality operator, like, say, less-than?  Or
> an arbitrary user-defined operator.

It is possible to use ALIGN/NORMALIZE with user-defined functions,
and non-equality operators.

> There's no documentation in this patch.  I'm not sure you want to go
> to the trouble of writing SGML documentation until this has been
> reviewed enough that it has a real chance of getting committed, but on
> the other hand we're obviously all struggling to understand what it
> does, so I think if not SGML documentation it at least needs a real
> clear explanation of what the syntax is and does in a README or
> something, even just for initial review.

We are currently writing SGML documentation and extend in-code comments.
Both, will be send soon.

> It's hard to see what's going on here.  What's ts?  What's te?  If you
> used longer names for these things, it might be a bit more
> self-documenting.

ts, te describe an half-open interval in which the tuple is considered valid:

  [time point start, time point end).

We have added an extended version of comments for both parser functions, i.e.,
transformTemporalAligner and transformTemporalNormalizer. See attached patch
(src/backend/parser/parse_temporal.c).

> If we are going to transform an ALIGN operator in to a left outer
> join, why do we also have an executor node for it?
>
> +               fcLowerLarg = makeFuncCall(SystemFuncName("lower"),
> +
> list_make1(crLargTs),
> +
> UNKNOWN_LOCATION);
> +               fcLowerRarg = makeFuncCall(SystemFuncName("lower"),
> +
> list_make1(crRargTs),
> +
> UNKNOWN_LOCATION);
> +               fcUpperLarg = makeFuncCall(SystemFuncName("upper"),
> +
> list_make1(crLargTs),
> +
> UNKNOWN_LOCATION);
> +               fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
> +
> list_make1(crRargTs),
> +
> UNKNOWN_LOCATION);
>
> Why is a temporal operator calling functions that upper-case and
> lower-case strings?  In one sense this whole function (and much of the
> nearby code) is very straightforward code and you can see exactly why
> it's doing it.  In another sense it's totally inscrutable: WHY is it
> doing any of that stuff?

These functions extract the lower-bound, and upper-bound of range types.

>
> -       char       *strategy;           /* partitioning strategy
> ('list' or 'range') */
> -       List       *partParams;         /* List of PartitionElems */
> -       int                     location;               /* token
> location, or -1 if unknown */
> +       char       *strategy;   /* partitioning strategy ('list' or 'range') */
> +       List       *partParams; /* List of PartitionElems */
> +       int                     location;       /* token location, or
> -1 if unknown */
>
> I think this is some kind of mistake on your end while generating the
> patch.  It looks like you patched one version of the source code, and
> diffed against another.

Thank you for pointing this out, we fixed it in our new patch.



2017-02-16 13:41 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
> Also, it sounds like all of this is intended to work with ranges that
> are stored in different columns rather than with PostgreSQL's built-in
> range types.

Our syntax supports PostgreSQL's built-in range types and ranges that
are stored in different columns.

For instance, for range types an ALIGN query would look like this:
  SELECT * FROM (r ALIGN s ON q WITH (r.t, s.t)) c

... and for ranges in different columns like this:
  SELECT * FROM (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c

... where r and s are input relations, q can be any join qualifier, and
    r.t, s.t, r.ts, r.te, s.ts, and s.te can be any column name. The
    latter represent the valid time intervals, that is time point start,
    and time point end of each tuple for each input relation. These can
    be defined as four scalars, or two half-open, i.e., [), range typed
    values.



Best regards,
Anton, Johann, Michael, Peter



ps. The patch has been rebased on top of
commit 090f21bbad98001979da8589e9647a1d49bce4ee
from "Sun Feb 19 17:18:10 2017 -0500"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

tpg_primitives_out_v5.patch

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Eisentraut

Date:

22 February 2017, 21:43:38

On 2/16/17 07:41, Robert Haas wrote:
> Also, it sounds like all of this is intended to work with ranges that
> are stored in different columns rather than with PostgreSQL's built-in
> range types.

Yeah, that should certainly be changed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

24 February 2017, 15:40:20

2017-02-22 19:43 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
> On 2/16/17 07:41, Robert Haas wrote:
>> Also, it sounds like all of this is intended to work with ranges that
>> are stored in different columns rather than with PostgreSQL's built-in
>> range types.
>
> Yeah, that should certainly be changed.

Our syntax supports PostgreSQL's built-in range types and ranges that
are stored in different columns.

For instance, for range types an ALIGN query would look like this: SELECT * FROM (r ALIGN s ON q WITH (r.t, s.t)) c

... and for ranges in different columns like this: SELECT * FROM (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c

... where r and s are input relations, q can be any join qualifier, and   r.t, s.t, r.ts, r.te, s.ts, and s.te can be
anycolumn name. The   latter represent the valid time intervals, that is time point start,   and time point end of each
tuplefor each input relation. These can   be defined as four scalars, or two half-open, i.e., [), range typed
values.

It would reduce the size of our patch and simplify the overall structure,
if we would remove the possibility to express valid time start points and end
points in different columns.

Do you think it is better to remove the syntax for ranges expressed in
different columns?

However, internally we still need to split the
range types into two separate points, because NORMALIZE does not make a
distinction between start and end timepoints while grouping, therefore we
have only one timepoint attribute there (i.e., P1), which is the union of
start and end timepoints (see executor/nodeTemporalAdjustment.c). A second
constraint is, that we support currently only half-open intervals, that is,
any interval definition (open/closed/half-open) different from the PostgreSQL's
default, i.e., [), leads to undefined results.

Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Jim Nasby

Date:

24 February 2017, 23:25:34

On 2/24/17 6:40 AM, Peter Moser wrote:
> Do you think it is better to remove the syntax for ranges expressed in
> different columns?

It's not that hard to construct a range type on-the-fly from 2 columns, 
so (without having looked at the patch or really followed the thread) I 
would think the answer is yes.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

27 February 2017, 13:21:52

2017-02-24 21:25 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
> On 2/24/17 6:40 AM, Peter Moser wrote:
>>
>> Do you think it is better to remove the syntax for ranges expressed in
>> different columns?
>
>
> It's not that hard to construct a range type on-the-fly from 2 columns, so
> (without having looked at the patch or really followed the thread) I would
> think the answer is yes.

We discussed and decided to remove the syntax for separate columns.
The patch with "range-types-only" syntax will be send soon.

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

28 February 2017, 17:09:05

2017-02-27 11:21 GMT+01:00 Peter Moser <pitiz29a@gmail.com>:
> 2017-02-24 21:25 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
>> It's not that hard to construct a range type on-the-fly from 2 columns, so
>> (without having looked at the patch or really followed the thread) I would
>> think the answer is yes.

Thank you for your suggestion.

> We discussed and decided to remove the syntax for separate columns.

Please find attached the new patch with "range-type-only" syntax. It
is around 400 lines of
code shorter than its predecessor.

Now the syntax is as follows:

SELECT * FROM ( r ALIGN s ON q WITH (r.time, s.time) ) c;
SELECT * FROM ( r NORMALIZE s ON q WITH (r.time, s.time) ) c;
SELECT * FROM ( r NORMALIZE s USING(atts) WITH (r.time, s.time) ) c;

...where r and s are relations, c an alias, q any boolean expression,
atts a column name list, and r.time/s.time range typed columns.

This means that the syntax with four columns (i.e., scalar time point
start/end for relations r and s)
inside the WITH-clause is no longer supported.

Best regards,
Anton, Michael, Johann, Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

tpg_primitives_out_v6.patch

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

01 March 2017, 12:56:50

2017-02-15 20:24 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
> There's no documentation in this patch.  I'm not sure you want to go
> to the trouble of writing SGML documentation until this has been
> reviewed enough that it has a real chance of getting committed, but on
> the other hand we're obviously all struggling to understand what it
> does, so I think if not SGML documentation it at least needs a real
> clear explanation of what the syntax is and does in a README or
> something, even just for initial review.

The attached README explains the NORMALIZE operation step-by-step with
an example. That is, we start from a query input, show how we rewrite
it during parser stage, and show how the final execution generates
result tuples. A similar walkthrough for ALIGN will follow soon.

We are thankful for any suggestion or ideas, to be used to write a
good SGML documentation.

Best regards,
Anton, Michael, Johann, Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

README-NORMALIZE.txt

Re: [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

30 March 2017, 15:11:28

2017-03-01 10:56 GMT+01:00 Peter Moser <pitiz29a@gmail.com>:
> A similar walkthrough for ALIGN will follow soon.
>
> We are thankful for any suggestion or ideas, to be used to write a
> good SGML documentation.

The attached README explains the ALIGN operation step-by-step with a
TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
input, show how we rewrite it during parser stage, and show how the
final execution generates result tuples.


Best regards,
Anton, Michael, Johann, Peter

Attachment

README-ALIGN.txt

Re: [PROPOSAL] Temporal query processing with range types

From

Andres Freund

Date:

06 April 2017, 02:24:26

Hi,

On 2017-03-30 14:11:28 +0200, Peter Moser wrote:
> 2017-03-01 10:56 GMT+01:00 Peter Moser <pitiz29a@gmail.com>:
> > A similar walkthrough for ALIGN will follow soon.
> >
> > We are thankful for any suggestion or ideas, to be used to write a
> > good SGML documentation.
> 
> The attached README explains the ALIGN operation step-by-step with a
> TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
> input, show how we rewrite it during parser stage, and show how the
> final execution generates result tuples.

Unfortunately I don't think this patch has received sufficient design
and implementation to consider merging it into v10.  As code freeze is
in two days, I think we'll have to move this to the next commitfest.

- Andres

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

31 July 2017, 15:53:50

On 06.04.2017 01:24, Andres Freund wrote:
> Unfortunately I don't think this patch has received sufficient design
> and implementation to consider merging it into v10.  As code freeze is
> in two days, I think we'll have to move this to the next commitfest.

We rebased our patch on top of commit 
393d47ed0f5b764341c7733ef60e8442d3e9bdc2
from "Mon Jul 31 11:24:51 2017 +0900".

Best regards,
Anton, Johann, Michael, Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

tpg_primitives_out_v7.patch

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Thomas Munro

Date:

01 September 2017, 07:49:41

On Tue, Aug 1, 2017 at 12:53 AM, Peter Moser <pitiz29a@gmail.com> wrote:
> On 06.04.2017 01:24, Andres Freund wrote:
>>
>> Unfortunately I don't think this patch has received sufficient design
>> and implementation to consider merging it into v10.  As code freeze is
>> in two days, I think we'll have to move this to the next commitfest.
>
>
> We rebased our patch on top of commit
> 393d47ed0f5b764341c7733ef60e8442d3e9bdc2
> from "Mon Jul 31 11:24:51 2017 +0900".

Hi Peter,

This patch still applies, but no longer compiles:

nodeTemporalAdjustment.c: In function ‘ExecTemporalAdjustment’:
nodeTemporalAdjustment.c:286:21: error: incompatible types when
assigning to type ‘Form_pg_attribute’ from type
‘FormData_pg_attribute’  node->datumFormat = curr->tts_tupleDescriptor->attrs[tc->attNumP1 - 1];                    ^

After commits 2cd70845 and c6293249 you need to change expressions of
that format to, for example:
  node->datumFormat = TupleDescAttr(curr->tts_tupleDescriptor,
tc->attNumP1 - 1);

Thanks!

--
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Simon Riggs

Date:

12 September 2017, 17:33:34

On 30 March 2017 at 13:11, Peter Moser <pitiz29a@gmail.com> wrote:
> 2017-03-01 10:56 GMT+01:00 Peter Moser <pitiz29a@gmail.com>:
>> A similar walkthrough for ALIGN will follow soon.
>>
>> We are thankful for any suggestion or ideas, to be used to write a
>> good SGML documentation.
>
> The attached README explains the ALIGN operation step-by-step with a
> TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
> input, show how we rewrite it during parser stage, and show how the
> final execution generates result tuples.

Thanks for this contribution. I know what it takes to do significant
contributions and I know it can be frustrating when things languish
for months.

I am starting to look at temporal queries myself so I will begin an interest.

PostgreSQL tries really very hard to implement the SQL Standard and
just the standard. ISTM that the feedback you should have been given
is that this is very interesting but will not be committed in its
current form; I am surprised to see nobody has said that, though you
can see the truth of that since nobody is actively looking to review
or commit this. Obviously if the standard were changed to support
these things we'd suddenly be interested...

What I think I'm lacking is a clear statement of why we need to have
new syntax to solve the problem and why the problem itself is
important.

PostgreSQL supports the ability to produce Set Returning Functions and
various other extensions. Would it be possible to change this so that
we don't add new syntax to the parser but rather we do this as a set
of functions?

An alternative might be for us to implement a pluggable parser, so
that you can have an "alternate syntax" plugin. If we did that, you
can then maintain the plugin outside of core until the time when SQL
Standard is updated and we can implement directly. We already support
the ability to invent new plan nodes, so this could be considered as
part of the plugin.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

13 September 2017, 15:57:47

2017-09-01 6:49 GMT+02:00 Thomas Munro <thomas.munro@enterprisedb.com>:
> On Tue, Aug 1, 2017 at 12:53 AM, Peter Moser <pitiz29a@gmail.com> wrote:
> This patch still applies, but no longer compiles:
>
> nodeTemporalAdjustment.c: In function ‘ExecTemporalAdjustment’:
> nodeTemporalAdjustment.c:286:21: error: incompatible types when
> assigning to type ‘Form_pg_attribute’ from type
> ‘FormData_pg_attribute’
>    node->datumFormat = curr->tts_tupleDescriptor->attrs[tc->attNumP1 - 1];
>                      ^
>
> After commits 2cd70845 and c6293249 you need to change expressions of
> that format to, for example:
>
>    node->datumFormat = TupleDescAttr(curr->tts_tupleDescriptor,
> tc->attNumP1 - 1);

Hi Thomas,
thank you for your feedback.

We fixed the mentioned issues and rebased our patch on top of commit
69835bc8988812c960f4ed5aeee86b62ac73602a from "Tue Sep 12 19:27:48
2017 -0400".

Best regards,
Anton, Johann, Michael, Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

tpg_primitives_out_v8.patch

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

22 September 2017, 10:59:58

2017-09-12 16:33 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:
> PostgreSQL tries really very hard to implement the SQL Standard and
> just the standard. ISTM that the feedback you should have been given
> is that this is very interesting but will not be committed in its
> current form; I am surprised to see nobody has said that, though you
> can see the truth of that since nobody is actively looking to review
> or commit this. Obviously if the standard were changed to support
> these things we'd suddenly be interested...

Ok, we understand that PostgreSQL wants to strictly follow the SQL
standard, which is not yet defined for temporal databases. In this
context we understand your comment and agree on your position.

Our approach with the two temporal primitives is more far-reaching and
comprehensive: it supports all operators of a temporal relational
algebra by systematically transforming the temporal operators to the
nontemporal counterparts, thereby taking advantage of all features of
the underlying DBMS. This requires of course also new syntax.

> What I think I'm lacking is a clear statement of why we need to have
> new syntax to solve the problem ...

The newly introduced syntax of the two primitives comes from our
decision to divide a bigger patch into two parts: an primitives-only
patch, and a temporal query rewrite patch. We thought of discussing
and providing a second patch in the future, which would then
automatically rewrite temporal queries into their non-temporal
counterparts using these two primitives.

> ... and why the problem itself is
> important.

The main idea about these temporal primitives is to have only two new
operators to provide the whole range of temporal queries, that is,
temporal joins, temporal set operations, temporal duplicate
elimination, and temporal aggregation. The benefit of the primitives
is that it is minimal invasive to the postgres kernel due to the reuse of all
standard operators after the temporal split (or normalization). It can
therefore also use existing optimizations already implemented.

An alternative approach would be to implement for each operator a
separate algorithm.  For instance, Jeff Davis is implementing a temporal
join into the existing Merge Join Executor (see [1]). Note that a
temporal join is the only operator that can be implemented without
introducing new syntax due to the overlap predicate.  For all other
temporal operators a discussion about new syntax is necessary anyway,
independent of the implementation approach.

> PostgreSQL supports the ability to produce Set Returning Functions and
> various other extensions. Would it be possible to change this so that
> we don't add new syntax to the parser but rather we do this as a set
> of functions?

Set Returning Functions would indeed be a possibility to implement
temporal query processing without new syntax, though it has some serious
drawbacks: the user has to specify the schema of the query results; the
performance might be a problem, since functions are treated as
black-boxes for the optimizer, loosing selection-pushdown and similar
optimizations.

> An alternative might be for us to implement a pluggable parser, so
> that you can have an "alternate syntax" plugin. If we did that, you
> can then maintain the plugin outside of core until the time when SQL
> Standard is updated and we can implement directly. We already support
> the ability to invent new plan nodes, so this could be considered as
> part of the plugin.

This is an interesting idea to look into when it is ready some day.



With all these clarifications in mind, we thought to focus meanwhile
on improving the performance of temporal query processing in special
cases (eg., joins), similar to the pending Range Merge Join patch by
Jeff Davis in [1]. Hereby, we like to contribute to it as reviewers
and hopefully add some improvements or valuable ideas from our
research area.


Best regards,
Anton, Johann, Michael, Peter


[1]
https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com#CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Pavel Stehule

Date:

22 September 2017, 11:06:49

2017-09-22 9:59 GMT+02:00 Peter Moser <pitiz29a@gmail.com>:

2017-09-12 16:33 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:
> PostgreSQL tries really very hard to implement the SQL Standard and
> just the standard. ISTM that the feedback you should have been given
> is that this is very interesting but will not be committed in its
> current form; I am surprised to see nobody has said that, though you
> can see the truth of that since nobody is actively looking to review
> or commit this. Obviously if the standard were changed to support
> these things we'd suddenly be interested...

Ok, we understand that PostgreSQL wants to strictly follow the SQL
standard, which is not yet defined for temporal databases. In this
context we understand your comment and agree on your position.

ANSI SQL 2011 has temporal data support

https://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

Regards

Pavel Stehule

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

22 September 2017, 11:15:17

2017-09-22 10:06 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
> ANSI SQL 2011 has temporal data support
>
> https://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

As operations it only supports temporal inner joins using the overlap predicate.
Temporal aggregation, temporal outer joins, temporal duplicate
elimination, and temporal set operations are not supported in
SQL:2011.
Please see [1] Section 2.5 Future directions.

Best regards,
Anton, Johann, Michael, Peter


[1] https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Pavel Stehule

Date:

22 September 2017, 11:21:01

2017-09-22 10:15 GMT+02:00 Peter Moser <pitiz29a@gmail.com>:

2017-09-22 10:06 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
> ANSI SQL 2011 has temporal data support
>
> https://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

As operations it only supports temporal inner joins using the overlap predicate.
Temporal aggregation, temporal outer joins, temporal duplicate
elimination, and temporal set operations are not supported in
SQL:2011.
Please see [1] Section 2.5 Future directions.

Best regards,
Anton, Johann, Michael, Peter

[1] https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

Thank you for info.

Currently Postgres has zero support for SQL:2011 temporal tables. Isn't better start with already standard features than appends some without standard? The standard has some concept and if we start out of this concept, then the result will be far to standard probably.

Regards

Pavel

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

22 September 2017, 11:51:12

2017-09-22 10:21 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
> Currently Postgres has zero support for SQL:2011 temporal tables. Isn't
> better start with already standard features than appends some without
> standard? The standard has some concept and if we start out of this concept,
> then the result will be far to standard probably.

We will focus for now on the Range Merge Join algorithm by Jeff Davis,
which implements a temporal join with overlap predicates.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Paul A Jungwirth

Date:

06 October 2017, 20:22:44

On Fri, Jul 22, 2016 at 4:15 AM, Anton Dignös <dignoes@inf.unibz.it> wrote:
> We would like to contribute to PostgreSQL a solution that supports the query
> processing of "at each time point". The basic idea is to offer two new
> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
> ranges of tuples so that subsequent queries can use the usual grouping and
> equality conditions to get the intended results.

I just wanted to chime in and say that the work these people have done
is *amazing*. I read two of their papers yesterday [1, 2], and if you
are interested in temporal data, I encourage you to read them too. The
first one is only 12 pages and quite readable. After that the second
is easy because it covers a lot of the same ground but adds "scaling"
of values when a tuple is split, and some other interesting points.
Their contributions could be used to implement SQL:2011 syntax but go
way beyond that.

Almost every project I work on could use temporal database support,
but there is nothing available in the Open Source world. The
temporal_tables extension [3] offers transaction-time support, which
is great for auditing, but it has no valid-time support (aka
application-time or state-time). Same with Magnus Hagander's TARDIS
approach [4], Chronomodel [5] (an extension to the Rails ORM), or any
other project I've seen. But valid-time is the more valuable
dimension, because it tells you the history of the thing itself (not
just when the database was changed). Also nothing is even attempting
full bitemporal support.

The ideas behind temporal data are covered extensively in Snodgrass's
1999 book [6], which shows how valuable it is to handle temporal data
in a principled way, rather than ad hoc. But that book also
demonstrates how complex the queries become to do things like temporal
foreign key constraints and temporal joins. I was sad to learn that
his proposed TSQL2 was rejected as a standard back in the 90s,
although the critiques by C. J. Date [7] have some merit. In
particular, since TSQL2 used *statement* modifiers, some of the
behavior was unclear or bad when using subqueries, views, and
set-returning functions. It makes more sense to have temporal
*operators*, so alongside inner join you have temporal inner join, and
likewise with temporal left outer join, temporal
union/intersection/difference, temporal aggregation, etc. (I think the
drawbacks of TSQL2 came from pursuing an unachievable goal, which was
to enable seamlessly converting existing non-temporal tables to
temporal without breaking any queries.)

Another unsatisfactory approach at historical data, from the industry
rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's
*Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions)
is to ignore the problem and overwrite old data with new. His Type 2
approach (make a new row) is better but loses the continuity between
the old row and the new. Type 3 fixes that but supports only one
change, not several. And anyway his ideas are tailored to star-schema
designs so are not as broadly useful. Workarounds like bridge tables
and "put the data in the fact table" are even more wedded to a
star-schema approach. But I think his efforts do show how valuable
historical data is, and how hard it is to handle without built-in
support.

As far as I can tell SQL:2011 avoids the statement modifier problem
(I'm not 100% sure), but it is quite limited, mostly covering
transaction-time semantics and not giving any way to do valid-time
outer joins or aggregations. It is clearly an early first step.
Unfortunately the syntax feels (to me) crippled by over-specificity,
like it will have a hard time growing to support all the things you'd
want to do.

The research by Dignös et al shows how you can define temporal
variants for every operator in the relational algebra, and then
implement them by using just two transformations (ALIGN and NORMALIZE)
combined with the existing non-temporal operators. It has a strong
theoretical basis and avoids the TSQL2 problems with composability.
And unlike SQL:2011 it has a great elegance and completeness I haven't
seen anywhere else.

I believe with range types the approach was to build up useful
primitives rather than jumping straight to a less-factored full
implementation of temporal features. (This in spite of SQL:2011
choosing to model begin/end times as separate columns, not as ranges.
:-) It seems to me the Dignös work follows the same philosophy. Their
ALIGN and NORMALIZE could be used to implement SQL:2011 features, but
they are also useful for much more. In their papers they actually
suggest that these transformations need not be exposed to end-users,
although it was convenient to have access to them for their own
research. I think it'd be great if Postgres's SQL dialect supported
them though, since SQL:2011 leaves out so much.

Anyway, I wanted to thank them for their excellent work, their
generosity, and also their perseverance. ([1] is from 2012 and was
built against Postgres 9.0!) I hope we take their contribution
seriously, because it would truly move Postgres's temporal support
beyond any database on the market.

Yours,
Paul

[1] https://files.ifi.uzh.ch/boehlen/Papers/modf174-dignoes.pdf

[2] http://www.zora.uzh.ch/id/eprint/130374/1/Extending_the_kernel.pdf

[3] https://pgxn.org/dist/temporal_tables/

[4] https://www.youtube.com/watch?v=TRgni5q0YM8

[5] https://github.com/ifad/chronomodel

[6] *Developing Time-Oriented Database Applications in SQL*,
downloadable for free from
https://www2.cs.arizona.edu/~rts/publications.html

[7]
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8CA34414C364C1D859CD0EAE7A714DFF?doi=10.1.1.116.7598&rep=rep1&type=pdf

[8]
http://www.dsc.ufcg.edu.br/~sampaio/Livros/alph%20Kimball.%20The%20Data%20Warehouse%20Toolkit..%20The%20Complete%20Guide%20to%20Dimensional%20Modelling%20(Wiley,2002)(ISBN%200471200247)(449s).pdf

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Mike Rylander

Date:

06 October 2017, 22:22:52

On Fri, Oct 6, 2017 at 1:22 PM, Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Fri, Jul 22, 2016 at 4:15 AM, Anton Dignös <dignoes@inf.unibz.it> wrote:
>> We would like to contribute to PostgreSQL a solution that supports the query
>> processing of "at each time point". The basic idea is to offer two new
>> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
>> ranges of tuples so that subsequent queries can use the usual grouping and
>> equality conditions to get the intended results.
>
> I just wanted to chime in and say that the work these people have done
> is *amazing*. I read two of their papers yesterday [1, 2], and if you
> are interested in temporal data, I encourage you to read them too. The
> first one is only 12 pages and quite readable. After that the second
> is easy because it covers a lot of the same ground but adds "scaling"
> of values when a tuple is split, and some other interesting points.
> Their contributions could be used to implement SQL:2011 syntax but go
> way beyond that.
>

I've also been following this feature with great interest, and would
definitely throw whatever tiny weight I have, sitting out here in the
the peanut gallery, behind accepting the ALIGN and NORMALIZE syntax.
I estimate that about a third of the non-trivial queries in the
primary project I work on (and have, on Postgres, for the last 13+
years) would be simpler with support of the proposed syntax, and some
of the most complex business logic would be simplified nearly to the
point of triviality.

Anyway, that's my $0.02.

Thank you, Anton and Peter!

-- Mike

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

11 November 2017, 15:19:26

On Fri, Oct 6, 2017 at 3:22 PM, Mike Rylander <mrylander@gmail.com> wrote:
> I've also been following this feature with great interest, and would
> definitely throw whatever tiny weight I have, sitting out here in the
> the peanut gallery, behind accepting the ALIGN and NORMALIZE syntax.
> I estimate that about a third of the non-trivial queries in the
> primary project I work on (and have, on Postgres, for the last 13+
> years) would be simpler with support of the proposed syntax, and some
> of the most complex business logic would be simplified nearly to the
> point of triviality.

This is really good input.  If the feature weren't useful, then it
wouldn't make sense to try to figure out how to integrate it, but if
it is, then we should try.

I don't think that implementing a feature like this by SQL
transformation can work.  It's certainly got the advantage of
simplicity of implemention, but there are quite a few things that seem
like they won't always work correctly.  For instance:

+ * INPUT:
+ *         (r ALIGN s ON q WITH (r.t, s.t)) c
+ *
+ *      where r and s are input relations, q can be any
+ *      join qualifier, and r.t, s.t can be any column name. The latter
+ *      represent the valid time intervals, that is time point start,
+ *      and time point end of each tuple for each input relation. These
+ *      are two half-open, i.e., [), range typed values.
+ *
+ * OUTPUT:
+ *      (
+ *         SELECT r.*, GREATEST(LOWER(r.t), LOWER(s.t)) P1,
+ *                     LEAST(UPPER(r.t), UPPER(s.t)) P2
+ *      FROM
+ *      (
+ *          SELECT *, row_id() OVER () rn FROM r
+ *      ) r
+ *      LEFT OUTER JOIN
+ *      s
+ *      ON q AND r.t && s.t
+ *      ORDER BY rn, P1, P2
+ *      ) c

One problem with this is that we end up looking up functions in
pg_catalog by name: LOWER, UPPER, LEAST, GREATEST.  In particular,
when we do this...

+    fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
+                               list_make1(crRargTs),
+                               UNKNOWN_LOCATION);

...we're hoping and praying that we're going to latch onto the first of these:

rhaas=# \df upper                         List of functions  Schema   | Name  | Result data type | Argument data types
| Type

------------+-------+------------------+---------------------+--------pg_catalog | upper | anyelement       | anyrange
         | normalpg_catalog | upper | text             | text                | normal

(2 rows)

But that's only true as long as there isn't another function in
pg_catalog with a match to the specific range type that is being used
here, and there's nothing to stop a user from creating one, and then
their query, which does not anywhere in its query text mention the
name of that function, will start failing.  We're not going to accept
that limitation.  Looking up functions by name rather than by OID or
using an opclass or something is pretty much a death sentence for a
core feature, and this patch does a lot of it.

A related problem is that, because all of this transformation is being
done in the parser, when you use this temporal syntax to create a
view, and then run pg_dump to dump that view, you are going to (I
think) get the transformed version, not the original.  Things like
transformTemporalClause are going to have user-visible effects: the
renaming you do there will (I think) be reflected in the deparsed
output of views.  That's not good.  Users have a right to expect that
what comes out of deparsing will at least resemble what they put in.

Error reporting might be a problem too: makeTemporalQuerySkeleton is
creating parse nodes that have no location, so if an error develops at
that point, how will the user correlate that with what they typed in?

I suspect there are also problems with plan invalidation.  Any
decisions we make at parse time are fixed forever.  DDL changes can
force a re-plan, but not a re-parse.

Overall, I think that the whole approach here probably needs to be
scrapped and rethought.  The stuff this patch is doing really belongs
in the optimizer, not the parser, I think.  It could possibly happen
at a relatively early stage in the optimizer so that the rest of the
optimizer can see the results of the transformation and, well,
optimize.  But parse time is way too early.

Unrelated to the above, this patch introduces various kinds of helper
functions which are general-purpose in function but dumped in with the
temporal support because it happens to use them.  For instance:

+static Form_pg_type
+typeGet(Oid id)
+{
+    HeapTuple    tp;
+    Form_pg_type typtup;
+
+    tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(id));
+    if (!HeapTupleIsValid(tp))
+        ereport(ERROR,
+                (errcode(ERROR),
+                 errmsg("cache lookup failed for type %u", id)));
+
+    typtup = (Form_pg_type) GETSTRUCT(tp);
+    ReleaseSysCache(tp);
+    return typtup;
+}

A function as general as typeGet() certainly does not belong in
parse_clause.c in the middle of a long list of temporal functions.
This particular function is also a bad idea in general, because typtup
is only a valid pointer until ReleaseSysCache() is called.  This will
appear to work normally because, normally, no relevant cache
invalidation messages will show up at the wrong time, but if one does
then typtup will be pointing off into outer space.  You can find bugs
like this by testing with CLOBBER_CACHE_ALWAYS defined (warning: this
makes things very slow).  But the general point I want to make here is
actually about inventing your own idioms vs. using the ones we've got
-- if you're going to invent new general-purpose primitives, they need
to go next to the existing functions that do similar things, not in
whatever part of the code you first decided you needed them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

14 November 2017, 19:19:35

2017-10-06 19:22 GMT+02:00 Paul A Jungwirth <pj@illuminatedcomputing.com>:
> I just wanted to chime in and say that the work these people have done
> is *amazing*. I read two of their papers yesterday [1, 2], and if you
> are interested in temporal data, I encourage you to read them too. The
> first one is only 12 pages and quite readable. After that the second
> is easy because it covers a lot of the same ground but adds "scaling"
> of values when a tuple is split, and some other interesting points.
> Their contributions could be used to implement SQL:2011 syntax but go
> way beyond that.
>
> Almost every project I work on could use temporal database support,
> but there is nothing available in the Open Source world. The
> temporal_tables extension [3] offers transaction-time support, which
> is great for auditing, but it has no valid-time support (aka
> application-time or state-time). Same with Magnus Hagander's TARDIS
> approach [4], Chronomodel [5] (an extension to the Rails ORM), or any
> other project I've seen. But valid-time is the more valuable
> dimension, because it tells you the history of the thing itself (not
> just when the database was changed). Also nothing is even attempting
> full bitemporal support.
>
> The ideas behind temporal data are covered extensively in Snodgrass's
> 1999 book [6], which shows how valuable it is to handle temporal data
> in a principled way, rather than ad hoc. But that book also
> demonstrates how complex the queries become to do things like temporal
> foreign key constraints and temporal joins. I was sad to learn that
> his proposed TSQL2 was rejected as a standard back in the 90s,
> although the critiques by C. J. Date [7] have some merit. In
> particular, since TSQL2 used *statement* modifiers, some of the
> behavior was unclear or bad when using subqueries, views, and
> set-returning functions. It makes more sense to have temporal
> *operators*, so alongside inner join you have temporal inner join, and
> likewise with temporal left outer join, temporal
> union/intersection/difference, temporal aggregation, etc. (I think the
> drawbacks of TSQL2 came from pursuing an unachievable goal, which was
> to enable seamlessly converting existing non-temporal tables to
> temporal without breaking any queries.)
>
> Another unsatisfactory approach at historical data, from the industry
> rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's
> *Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions)
> is to ignore the problem and overwrite old data with new. His Type 2
> approach (make a new row) is better but loses the continuity between
> the old row and the new. Type 3 fixes that but supports only one
> change, not several. And anyway his ideas are tailored to star-schema
> designs so are not as broadly useful. Workarounds like bridge tables
> and "put the data in the fact table" are even more wedded to a
> star-schema approach. But I think his efforts do show how valuable
> historical data is, and how hard it is to handle without built-in
> support.
>
> As far as I can tell SQL:2011 avoids the statement modifier problem
> (I'm not 100% sure), but it is quite limited, mostly covering
> transaction-time semantics and not giving any way to do valid-time
> outer joins or aggregations. It is clearly an early first step.
> Unfortunately the syntax feels (to me) crippled by over-specificity,
> like it will have a hard time growing to support all the things you'd
> want to do.
>
> The research by Dignös et al shows how you can define temporal
> variants for every operator in the relational algebra, and then
> implement them by using just two transformations (ALIGN and NORMALIZE)
> combined with the existing non-temporal operators. It has a strong
> theoretical basis and avoids the TSQL2 problems with composability.
> And unlike SQL:2011 it has a great elegance and completeness I haven't
> seen anywhere else.
>
> I believe with range types the approach was to build up useful
> primitives rather than jumping straight to a less-factored full
> implementation of temporal features. (This in spite of SQL:2011
> choosing to model begin/end times as separate columns, not as ranges.
> :-) It seems to me the Dignös work follows the same philosophy. Their
> ALIGN and NORMALIZE could be used to implement SQL:2011 features, but
> they are also useful for much more. In their papers they actually
> suggest that these transformations need not be exposed to end-users,
> although it was convenient to have access to them for their own
> research. I think it'd be great if Postgres's SQL dialect supported
> them though, since SQL:2011 leaves out so much.
>
> Anyway, I wanted to thank them for their excellent work, their
> generosity, and also their perseverance. ([1] is from 2012 and was
> built against Postgres 9.0!) I hope we take their contribution
> seriously, because it would truly move Postgres's temporal support
> beyond any database on the market.

Paul, you are spot on.  Your comments are really insightful and the
understanding of the pros and cons of the different solutions is
impressive.

Some additional background about our approach:

During more than 20 years of research in the temporal database area we
have seen many ideas that at the end did not make a real difference for
temporal query processing.  Our goal was to identify and implement the
basic functionality that database systems must offer to support the
processing of temporal data.  Our answer to this is the adjustment of
time ranges with ALIGN and NORMALIZE.  We believe these are general and
useful primitives that will greatly benefit any possible temporal
extension of SQL.

We have been using ALIGN and NORMALIZE since several years in our
teaching at different universities.  Students have been using the
patched PostgreSQL kernel and everything (understanding the concepts;
using the primitives in numerous examples) has worked out really well.
As you have noticed the primitives were also well-received in the
flagship outlets of the database research community.

Best regards,
Anton, Johann, Michael, Peter







> [1] https://files.ifi.uzh.ch/boehlen/Papers/modf174-dignoes.pdf
>
> [2] http://www.zora.uzh.ch/id/eprint/130374/1/Extending_the_kernel.pdf
>
> [3] https://pgxn.org/dist/temporal_tables/
>
> [4] https://www.youtube.com/watch?v=TRgni5q0YM8
>
> [5] https://github.com/ifad/chronomodel
>
> [6] *Developing Time-Oriented Database Applications in SQL*,
> downloadable for free from
> https://www2.cs.arizona.edu/~rts/publications.html
>
> [7]
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8CA34414C364C1D859CD0EAE7A714DFF?doi=10.1.1.116.7598&rep=rep1&type=pdf
>
> [8]
http://www.dsc.ufcg.edu.br/~sampaio/Livros/alph%20Kimball.%20The%20Data%20Warehouse%20Toolkit..%20The%20Complete%20Guide%20to%20Dimensional%20Modelling%20(Wiley,2002)(ISBN%200471200247)(449s).pdf
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

14 November 2017, 19:33:17

2017-11-11 13:19 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
> This is really good input.  If the feature weren't useful, then it
> wouldn't make sense to try to figure out how to integrate it, but if
> it is, then we should try.

We are happy to hear this and will do the implementation.  Any input
regarding the implementation is much appreciated.

> I don't think that implementing a feature like this by SQL
> transformation can work.  It's certainly got the advantage of
> simplicity of implemention, but there are quite a few things that seem
> like they won't always work correctly.
> [...]
> Overall, I think that the whole approach here probably needs to be
> scrapped and rethought.  The stuff this patch is doing really belongs
> in the optimizer, not the parser, I think.  It could possibly happen
> at a relatively early stage in the optimizer so that the rest of the
> optimizer can see the results of the transformation and, well,
> optimize.  But parse time is way too early.

We create this query rewrites during parser stage, because we want
that the optimizer chooses the best strategies for each rewritten
subplan and that our executor nodes get the desired input format in
the most optimal way. Our goal was an integration that re-uses the
existing PostgreSQL rewrites and optimizations fully.

Another approach is to optimize the temporal primitives manually.
This does not reuse existing PostgreSQL optimizations automatically.

Is there a general guideline or policy as to which approach is
preferable?

Regarding all the other issues, we will look into them in detail and
report back soon.


Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Tom Lane

Date:

14 November 2017, 20:42:58

Peter Moser <peter.moser@unibz.it> writes:
> 2017-11-11 13:19 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
>> Overall, I think that the whole approach here probably needs to be
>> scrapped and rethought.  The stuff this patch is doing really belongs
>> in the optimizer, not the parser, I think.  It could possibly happen
>> at a relatively early stage in the optimizer so that the rest of the
>> optimizer can see the results of the transformation and, well,
>> optimize.  But parse time is way too early.

> We create this query rewrites during parser stage, because we want
> that the optimizer chooses the best strategies for each rewritten
> subplan and that our executor nodes get the desired input format in
> the most optimal way. Our goal was an integration that re-uses the
> existing PostgreSQL rewrites and optimizations fully.

Robert is correct that putting this into the parser is completely the
wrong thing.  If you do that, then for example views using the features
will reverse-list in the rewritten form, which we Do Not Want, even
if the rewritten form is completely valid SQL (is it?).

You might consider putting the rewriting into, um, the rewriter.
It could be a separate pass after view expansion, if direct integration
with the existing behavior seems unduly spaghetti-ish.  Or do it in
an early phase of planning as he suggested.  There's not really that
much difference between the rewriter and the planner for this purpose.
Although one way to draw the distinction is that the output of the
rewriter is (currently) still fully expressible as plain SQL, whereas
once the planner goes into action the intermediate states of the tree
might not really be SQL anymore (eg, it might contain join types that
don't correspond to any SQL syntax).  So depending on what your rewrite
emits, there would be a weak preference for calling it part of the
rewriter or planner respectively.
        regards, tom lane

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

16 November 2017, 11:32:09

2017-11-14 18:42 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert is correct that putting this into the parser is completely the
> wrong thing.  If you do that, then for example views using the features
> will reverse-list in the rewritten form, which we Do Not Want, even
> if the rewritten form is completely valid SQL (is it?).

Yes, the subnode to our executor is rewritten in valid SQL.

> You might consider putting the rewriting into, um, the rewriter.
> It could be a separate pass after view expansion, if direct integration
> with the existing behavior seems unduly spaghetti-ish.  Or do it in
> an early phase of planning as he suggested.  There's not really that
> much difference between the rewriter and the planner for this purpose.
> Although one way to draw the distinction is that the output of the
> rewriter is (currently) still fully expressible as plain SQL, whereas
> once the planner goes into action the intermediate states of the tree
> might not really be SQL anymore (eg, it might contain join types that
> don't correspond to any SQL syntax).  So depending on what your rewrite
> emits, there would be a weak preference for calling it part of the
> rewriter or planner respectively.

Thank you for your feedback. We'll have a look at this and come back to you.

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

16 November 2017, 18:42:35

On Thu, Nov 16, 2017 at 3:32 AM, Peter Moser <peter.moser@unibz.it> wrote:
> Thank you for your feedback. We'll have a look at this and come back to you.

Another thing to think about is that even though the CURRENT
implementation just rewrites the relevant constructs as SQL, in the
future somebody might want to do something else.  I feel like it's not
hard to imagine a purpose-build ALIGN or NORMALIZE join type being a
lot faster than the version that's just done by rewriting the SQL.
That would be more work, potentially, but it would be nice if the
initial implementation leant itself to be extended that way in the
future, which an all-rewriter implementation would not.  On the other
hand, maybe an early-in-the-optimizer implementation wouldn't either,
and maybe it's not worth worrying about it anyway.  But it would be
cool if this worked out in a way that meant it could be further
improved without having to change it completely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

21 November 2017, 12:36:30

2017-11-14 18:42 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
> You might consider putting the rewriting into, um, the rewriter.
> It could be a separate pass after view expansion, if direct integration
> with the existing behavior seems unduly spaghetti-ish.  Or do it in
> an early phase of planning as he suggested.  There's not really that
> much difference between the rewriter and the planner for this purpose.
> Although one way to draw the distinction is that the output of the
> rewriter is (currently) still fully expressible as plain SQL, whereas
> once the planner goes into action the intermediate states of the tree
> might not really be SQL anymore (eg, it might contain join types that
> don't correspond to any SQL syntax).  So depending on what your rewrite
> emits, there would be a weak preference for calling it part of the
> rewriter or planner respectively.

2017-11-16 16:42 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
> Another thing to think about is that even though the CURRENT
> implementation just rewrites the relevant constructs as SQL, in the
> future somebody might want to do something else.  I feel like it's not
> hard to imagine a purpose-build ALIGN or NORMALIZE join type being a
> lot faster than the version that's just done by rewriting the SQL.
> That would be more work, potentially, but it would be nice if the
> initial implementation leant itself to be extended that way in the
> future, which an all-rewriter implementation would not.  On the other
> hand, maybe an early-in-the-optimizer implementation wouldn't either,
> and maybe it's not worth worrying about it anyway.  But it would be
> cool if this worked out in a way that meant it could be further
> improved without having to change it completely.

Hi hackers,
we like to rethink our approach...

For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE:

    SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c;

Our normalization executor node needs the following input (for now
expressed in plain SQL):

    SELECT R.*, p1
    FROM (SELECT *, row_id() OVER () rn FROM R) R LEFT OUTER JOIN (    SELECT y, LOWER(time) p1 FROM
S    UNION    SELECTy, UPPER(time) p1 FROM S ) S ON R.x = S.y AND p1 <@ R.time
 
    ORDER BY rn, p1;

In other words:
1) The left subquery adds an unique ID to each tuple (i.e., rn).
2) The right subquery creates two results for each input tuple: one for
   the upper and one for the lower bound of each input tuple's valid time
   column. The boundaries get put into a single (scalar) column, namely p1.
3) We join both subqueries if the normalization predicates hold (R.x = S.y)
   and p1 is inside the time of the current outer tuple.
4) Finally, we sort the result by the unique ID (rn) and p1, and give all
   columns of the outer relation, rn and p1 back.

Our first attempt to understand the new approach would be as follows: The
left base rel of the inner left-outer-join can be expressed as a WindowAgg
node. However, the right query of the join is much more difficult to build
(maybe through hash aggregates). Both queries could be put together with a
MergeJoin for instance. However, if we create the plan tree by hand and
choose algorithms for it manually, how is it possible to have it optimized
later? Or, if that is not possible, how do we choose the best algorithms
for it?

Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Michael Paquier

Date:

30 November 2017, 07:28:28

On Tue, Nov 21, 2017 at 6:36 PM, Peter Moser <pitiz29a@gmail.com> wrote:
> 2017-11-14 18:42 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>> You might consider putting the rewriting into, um, the rewriter.
>> It could be a separate pass after view expansion, if direct integration
>> with the existing behavior seems unduly spaghetti-ish.  Or do it in
>> an early phase of planning as he suggested.  There's not really that
>> much difference between the rewriter and the planner for this purpose.
>> Although one way to draw the distinction is that the output of the
>> rewriter is (currently) still fully expressible as plain SQL, whereas
>> once the planner goes into action the intermediate states of the tree
>> might not really be SQL anymore (eg, it might contain join types that
>> don't correspond to any SQL syntax).  So depending on what your rewrite
>> emits, there would be a weak preference for calling it part of the
>> rewriter or planner respectively.
>
> 2017-11-16 16:42 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
>> Another thing to think about is that even though the CURRENT
>> implementation just rewrites the relevant constructs as SQL, in the
>> future somebody might want to do something else.  I feel like it's not
>> hard to imagine a purpose-build ALIGN or NORMALIZE join type being a
>> lot faster than the version that's just done by rewriting the SQL.
>> That would be more work, potentially, but it would be nice if the
>> initial implementation leant itself to be extended that way in the
>> future, which an all-rewriter implementation would not.  On the other
>> hand, maybe an early-in-the-optimizer implementation wouldn't either,
>> and maybe it's not worth worrying about it anyway.  But it would be
>> cool if this worked out in a way that meant it could be further
>> improved without having to change it completely.
>
> Hi hackers,
> we like to rethink our approach...
>
> For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE:
>
>     SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c;
>
> Our normalization executor node needs the following input (for now
> expressed in plain SQL):
>
>     SELECT R.*, p1
>     FROM (SELECT *, row_id() OVER () rn FROM R) R
>          LEFT OUTER JOIN (
>             SELECT y, LOWER(time) p1 FROM S
>             UNION
>             SELECT y, UPPER(time) p1 FROM S
>          ) S
>          ON R.x = S.y AND p1 <@ R.time
>     ORDER BY rn, p1;
>
> In other words:
> 1) The left subquery adds an unique ID to each tuple (i.e., rn).
> 2) The right subquery creates two results for each input tuple: one for
>    the upper and one for the lower bound of each input tuple's valid time
>    column. The boundaries get put into a single (scalar) column, namely p1.
> 3) We join both subqueries if the normalization predicates hold (R.x = S.y)
>    and p1 is inside the time of the current outer tuple.
> 4) Finally, we sort the result by the unique ID (rn) and p1, and give all
>    columns of the outer relation, rn and p1 back.
>
> Our first attempt to understand the new approach would be as follows: The
> left base rel of the inner left-outer-join can be expressed as a WindowAgg
> node. However, the right query of the join is much more difficult to build
> (maybe through hash aggregates). Both queries could be put together with a
> MergeJoin for instance. However, if we create the plan tree by hand and
> choose algorithms for it manually, how is it possible to have it optimized
> later? Or, if that is not possible, how do we choose the best algorithms
> for it?

As far as I can see, this patch has received some feedback. In order
to digest them properly, I am marking the patch as returned with
feedback.
-- 
Michael

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

30 November 2017, 20:26:50

On Tue, Nov 21, 2017 at 4:36 AM, Peter Moser <pitiz29a@gmail.com> wrote:
> Hi hackers,
> we like to rethink our approach...
>
> For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE:
>
>     SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c;
>
> Our normalization executor node needs the following input (for now
> expressed in plain SQL):
>
>     SELECT R.*, p1
>     FROM (SELECT *, row_id() OVER () rn FROM R) R
>          LEFT OUTER JOIN (
>             SELECT y, LOWER(time) p1 FROM S
>             UNION
>             SELECT y, UPPER(time) p1 FROM S
>          ) S
>          ON R.x = S.y AND p1 <@ R.time
>     ORDER BY rn, p1;
>
> In other words:
> 1) The left subquery adds an unique ID to each tuple (i.e., rn).
> 2) The right subquery creates two results for each input tuple: one for
>    the upper and one for the lower bound of each input tuple's valid time
>    column. The boundaries get put into a single (scalar) column, namely p1.
> 3) We join both subqueries if the normalization predicates hold (R.x = S.y)
>    and p1 is inside the time of the current outer tuple.
> 4) Finally, we sort the result by the unique ID (rn) and p1, and give all
>    columns of the outer relation, rn and p1 back.

So, if I understand correctly, there are three possible outcomes.  If
S.time and R.time are non-overlapping, then a null-extended row is
produced with the original data from R.  If S.time overlaps R.time but
is not contained within it, then this produces one result row, not
null-extended -- either the upper or lower bound will found to be
contained within R.time, but the other will not.  If S.time overlaps
R.time but is not contained within it, then this produces 2 result
rows, one with p1 containing S.time's lower bound and one with p1
containing S.time's upper bound.  Is that right?  Then, after all this
happens, the temporal normalizer code runs over the results and uses
the p1 values as split points for the ranges from R.

I wonder if we could instead think about R NORMALIZE S ON R.x = S.y
WITH (R.time, S.time) as an ordinary join on R.x = S.y with some extra
processing afterwards.  After finding all the join partners for a row
in R, extract all the lower and upper bounds from the S.time fields of
the join partners and use that to emit as many rows from R as may be
necessary.  The main problem that I see with that approach is that it
seems desirable to separate the extra-processing-afterwards step into
a separate executor node, and there's no way for a separate type of
plan node to know when the join advances the outer side of the join.
That's the purpose that row_id() is serving as you have it; we'd have
to invent some other kind of signalling mechanism, which might not be
entirely trivial.  :-(

If we could do it, though, it might be quite a bit more efficient,
because it would avoid scanning S twice and performing a UNION on the
results of those scans.  Also, we wouldn't necessarily need to sort
the whole set of rows from R, which I suspect is unavoidable in your
implementation.  We'd just need to sort the individual groups of rows
from S, and my guess is that in many practical cases those groups are
fairly small.

I wonder what identities hold for NORMALIZE.  It does not seem to be
commutative, but it looks like it might be associative... i.e. (R
NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S
NORMALIZE T), perhaps?

> Our first attempt to understand the new approach would be as follows: The
> left base rel of the inner left-outer-join can be expressed as a WindowAgg
> node. However, the right query of the join is much more difficult to build
> (maybe through hash aggregates). Both queries could be put together with a
> MergeJoin for instance. However, if we create the plan tree by hand and
> choose algorithms for it manually, how is it possible to have it optimized
> later? Or, if that is not possible, how do we choose the best algorithms
> for it?

You don't want to generate plan nodes directly like this, I think.
Generally, you create paths, and the cheapest path wins at the end of
planning.  The thing that makes this tricky is that the temporal
normalization phase can be separated from the inner left-outer-join by
other joins, and the planner doesn't really have a good way of
representing that concept.

More broadly, the very idea of creating plan nodes suggests that you
are approaching this from the point of view of sticking the logic in
near the end of the planning process, which I think is not going to
work.  Whatever is going to happen here needs to happen at path
generation time at the latest, or maybe even earlier, before the
optimizer even starts processing the join tree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Simon Riggs

Date:

06 January 2018, 23:29:07

On 30 November 2017 at 17:26, Robert Haas <robertmhaas@gmail.com> wrote:

> I wonder if we could instead think about R NORMALIZE S ON R.x = S.y
> WITH (R.time, S.time) as an ordinary join on R.x = S.y with some extra
> processing afterwards.

That would work nicely, kindof like a Projection, but one that can
vary the number of rows emitted.

For normal joins, we simply emit one row. For new style joins we call
a special PostJoinSetProjection function: one tuple in, potentially
many tuples out.

Peter, does Robert's proposed treatment give you what you need?

Overall, I like the goals expressed on this thread. I think if we
should focus on introducing useful new functionality, rather than
focusing on syntax.

I'm not very keen on adopting new syntax that isn't in the
SQLStandard. They have a bad habit of doing something completely
different. So a flexible approach will allow us to have functionality
now and we can adopt any new syntax later.

For any new syntax, I think the right approach would be to create a
new parser plugin. That way we could make all of this part of an
extension.
* a parser plugin for any new syntax
* various PostJoinSetProjection() functions to be called as needed
So the idea is we enable Postgres to allow major new functionality, as
was done for PostGIS so successfully.

We can adopt syntax into the main parser later once SQLStandard
accepts this, or some munged version of it.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Simon Riggs

Date:

07 January 2018, 12:58:46

On 30 March 2017 at 13:11, Peter Moser <pitiz29a@gmail.com> wrote:
> 2017-03-01 10:56 GMT+01:00 Peter Moser <pitiz29a@gmail.com>:
>> A similar walkthrough for ALIGN will follow soon.
>>
>> We are thankful for any suggestion or ideas, to be used to write a
>> good SGML documentation.
>
> The attached README explains the ALIGN operation step-by-step with a
> TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
> input, show how we rewrite it during parser stage, and show how the
> final execution generates result tuples.

Sorry, this was too complex for me.

Can we get a much simpler example please?

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

08 January 2018, 19:18:47

On Sat, Jan 6, 2018 at 3:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I'm not very keen on adopting new syntax that isn't in the
> SQLStandard. They have a bad habit of doing something completely
> different. So a flexible approach will allow us to have functionality
> now and we can adopt any new syntax later.
>
> For any new syntax, I think the right approach would be to create a
> new parser plugin. That way we could make all of this part of an
> extension.
> * a parser plugin for any new syntax
> * various PostJoinSetProjection() functions to be called as needed
> So the idea is we enable Postgres to allow major new functionality, as
> was done for PostGIS so successfully.
>
> We can adopt syntax into the main parser later once SQLStandard
> accepts this, or some munged version of it.

We don't currently have any concept of a parser plugin, and I don't
think it's reasonable for this patch to try to invent one.  I can't
see where you could possibly put a hook that would handle something
like this.  I've thought about suggesting a hook that gets called if
parsing fails, before we give up and throw a syntax error.  That
would, perhaps, allow extension to implement additional DDL commands,
which would be pretty cool.  However, it wouldn't be practical to use
something like that for this because this is syntax that appears
buried deep down inside FROM clauses, and you'd basically have to
reimplement the core parser's entire handling of SELECT statements,
which would be immensely complicated to develop and a real pain to
maintain.

Furthermore, the changes here aren't only parsing changes.  There is a
proposal to add a new executor node which has to be supported by new
planner code.  A great deal more than parser pluggability would be
needed.  And if we did all that, it doesn't really solve anything
anyway: the potential problem with committing to this syntax is that
if we change it later, there could be upgrade problems.
Extension-izing this wouldn't make those problems go away.  People are
going to want to be able to run pg_dump on their old database and
restore the result on their new database, full stop.  If we add this
syntax, in core or in a hypothetical extension system, we're stuck
with it and must maintain it going forward.

I also don't agree with the idea that we should reject syntax that
doesn't appear in the SQL standard.  We have a great deal of such
syntax already, and we add more of it in every release, and a good
deal of it is contributed by you and your colleagues.  I don't see why
this patch should be held to a stricter standard than we do in
general.  I agree that there is some possibility for pain if the SQL
standards committee adopts syntax that is similar to whatever we pick
but different in detail, but I don't think we should be too worried
about that unless other database systems, such as Oracle, have syntax
that is similar to what is proposed here but different in detail.  The
SQL standards committee seems to like standardizing on whatever
companies with a lot of money have already implemented; it's unlikely
that they are going to adopt something totally different from any
existing system but inconveniently similar to ours.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

26 January 2018, 09:55:11

On Thu, 2017-11-30 at 12:26 -0500, Robert Haas wrote:
> I wonder if we could instead think about R NORMALIZE S ON R.x = S.y
> WITH (R.time, S.time) as an ordinary join on R.x = S.y with some
> extra processing afterwards.  After finding all the join partners for
> a row in R, extract all the lower and upper bounds from the S.time
> fields of the join partners and use that to emit as many rows from R
> as may be necessary.

We have now a new approach to plan and execute NORMALIZE as a special
join node of type NORMALIZE, an append-plan on the inner join path,
and a merge-join executor. For the latter, we would need to
extend nodeMergejoin.c with an point-in-range-containment join.

That is, we create a new join path within sort_inner_and_outer
(joinpath.c). First two projection nodes to extract the start- and
end-timepoints of the range type used as interval, and above an
append-plan to merge both subplans. In detail, outer_path is just sort,
whereas inner_path is append of (B, ts) projection with (B, te)
projection.
Hereby, B is a set of non-temporal attributes used in join equality
predicates, and [ts,te) forms the valid-time interval. Non-equality
predicates must be handled separately as a filter step.

Do you think, it is a good idea to extend the sort_inner_and_outer()
with a new branch, where jointype == NORMALIZE and add the projection
and append sub-paths there?

> The main problem that I see with that approach is that it
> seems desirable to separate the extra-processing-afterwards step into
> a separate executor node, and there's no way for a separate type of
> plan node to know when the join advances the outer side of the join.
> That's the purpose that row_id() is serving as you have it; we'd have
> to invent some other kind of signalling mechanism, which might not be
> entirely trivial.  :-(

One advantage of the new approach is that row_id() is no longer needed.
The executor knows that it is inside a new "group" after every NEXTOUTER,
then the whole loop of the inner relation has the same row_id. The whole
mechanism could be handled through the MergeJoinState struct.

> If we could do it, though, it might be quite a bit more efficient,
> because it would avoid scanning S twice and performing a UNION on the
> results of those scans.  Also, we wouldn't necessarily need to sort
> the whole set of rows from R, which I suspect is unavoidable in your
> implementation.  We'd just need to sort the individual groups of rows
> from S, and my guess is that in many practical cases those groups are
> fairly small.

We would need a special SEQSCAN node/executor, that does the projection
and append steps in a single read. Do you have any suggestions how to
implement this?

> I wonder what identities hold for NORMALIZE.  It does not seem to be
> commutative, but it looks like it might be associative... i.e. (R
> NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S
> NORMALIZE T), perhaps?

It is not commutative and also not associative.
Assume relations that contain one tuple each as follows:

R={[1, 100)}, S={[50, 100)}, and T={[10, 20)}

(R NORMALIZE S) NORMALIZE T gives {[1, 10), [10, 20), [20,50), [50, 100)}

while

R NORMALIZE (S NORMALIZE T) gives {[1, 50), [50, 100)}

Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

26 January 2018, 09:56:23

On Sat, 2018-01-06 at 20:29 +0000, Simon Riggs wrote:
> * various PostJoinSetProjection() functions to be called as needed
> So the idea is we enable Postgres to allow major new functionality,
> as was done for PostGIS so successfully.

If we use a post-join approach many intermediate result tuples, that do
not contribute to the final result would be emmitted, and thus the
performance would suffer.

Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

26 January 2018, 09:57:25

On Sun, 2018-01-07 at 09:58 +0000, Simon Riggs wrote:
> > The attached README explains the ALIGN operation step-by-step with
> > a TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
> > input, show how we rewrite it during parser stage, and show how the
> > final execution generates result tuples.
> Sorry, this was too complex for me.
>
> Can we get a much simpler example please?

Please see the following simpler example:

DROP TABLE budg;
CREATE TABLE budg(name VARCHAR(5), amnt INTEGER, t DATERANGE);
INSERT INTO budg VALUES ('Joe', 5, '[2012/2/1,2012/9/1)');
INSERT INTO budg VALUES ('Ann', 7, '[2012/5/1,2012/9/1)');
INSERT INTO budg VALUES ('Per', 3, '[2012/4/1,2012/10/1)');
SELECT * FROM budg AS r;

SELECT *
FROM ( budg r ALIGN budg s ON s.amnt > r.amnt WITH (t,t)) r
WHERE NOT EXISTS (
  SELECT *
  FROM (budg s ALIGN budg r ON s.amnt > r.amnt WITH (t,t)) s
  WHERE s.amnt > r.amnt
  AND r.t = s.t  );

--  name | amnt |            t
-- ------+------+-------------------------
--  Joe  |    5 | [2012-02-01,2012-05-01)
--  Ann  |    7 | [2012-05-01,2012-09-01)
--  Per  |    3 | [2012-09-01,2012-10-01)



Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

26 January 2018, 09:58:04

On Mon, 2018-01-08 at 11:18 -0500, Robert Haas wrote:
> I also don't agree with the idea that we should reject syntax that
> doesn't appear in the SQL standard.  We have a great deal of such
> syntax already, and we add more of it in every release, and a good
> deal of it is contributed by you and your colleagues.  I don't see
> why this patch should be held to a stricter standard than we do in
> general.  I agree that there is some possibility for pain if the SQL
> standards committee adopts syntax that is similar to whatever we pick
> but different in detail, but I don't think we should be too worried
> about that unless other database systems, such as Oracle, have syntax
> that is similar to what is proposed here but different in
> detail.  The
> SQL standards committee seems to like standardizing on whatever
> companies with a lot of money have already implemented; it's unlikely
> that they are going to adopt something totally different from any
> existing system but inconveniently similar to ours.

We agree with you.

Best regards,
Anton, Johann, Michael, Peter

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

07 September 2018, 14:02:26

On 01/26/2018 07:55 AM, Peter Moser wrote:
> We have now a new approach to plan and execute NORMALIZE as a special
> join node of type NORMALIZE, an append-plan on the inner join path,
> and a merge-join executor. For the latter, we would need to
> extend nodeMergejoin.c with an point-in-range-containment join.

We are ready with a new prototype for the temporal NORMALIZE operation. 
In this prototype we do not rewrite queries as in the previous patch, 
but have one executor node, that solves the normalize operation. This 
executor is based on a merge-join.

Our new patch is based on top of 
75f7855369ec56d4a8e7d6eae98aff1182b85cac from September 6, 2018.

The syntax is
SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

It currently is only implemented for empty USING clauses, and solely 
int4range as range attributes.

Example:

A=# table r; 

  a | b | period_r 

---+---+---------- 

  a | B | [1,7) 

  b | B | [3,9) 

  c | G | [8,10) 

(3 rows) 

A=# table s; 

  c | d | period_s 

---+---+---------- 

  1 | B | [2,5) 

  2 | B | [3,4) 

  3 | B | [7,9) 

(3 rows) 

A=# SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c; 

  period_r | a | b 

----------+---+--- 

  [1,2)    | a | B 

  [2,3)    | a | B 

  [3,4)    | a | B 

  [4,5)    | a | B 

  [5,7)    | a | B 

  [3,4)    | b | B 

  [4,5)    | b | B 

  [5,7)    | b | B 

  [7,9)    | b | B 

(9 rows) 

A=# EXPLAIN SELECT * FROM (r NORMALIZE s USING() WITH(period_r, 
period_s)) c;
                                 QUERY PLAN 

-------------------------------------------------------------------------- 

  Result  (cost=2.15..2.22 rows=3 width=18) 

    ->  Merge ??? Join  (cost=2.15..2.23 rows=3 width=22) 

          Merge Cond: (r.period_r @> (range_split(s.period_s))) 

          ->  Sort  (cost=1.05..1.06 rows=3 width=18) 

                Sort Key: r.period_r 

                ->  Seq Scan on r  (cost=0.00..1.03 rows=3 width=18) 

          ->  Sort  (cost=1.09..1.10 rows=3 width=4) 

                Sort Key: (range_split(s.period_s)) USING < 

                ->  ProjectSet  (cost=0.00..1.07 rows=3 width=4) 

                      ->  Seq Scan on s  (cost=0.00..1.03 rows=3 
width=14)
(10 rows) 

> That is, we create a new join path within sort_inner_and_outer
> (joinpath.c). First two projection nodes to extract the start- and
> end-timepoints of the range type used as interval, and above an
> append-plan to merge both subplans. In detail, outer_path is just sort,
> whereas inner_path is append of (B, ts) projection with (B, te)
> projection.

We changed this implementation and use a set-returning function called 
"range_split", that extracts the upper and lower bound of a range and 
returns two tuples. For instance, a tuple '[4,10),a' becomes two tuples 
of the form '4,a' and '10,a'.

> Hereby, B is a set of non-temporal attributes used in join equality
> predicates, and [ts,te) forms the valid-time interval. Non-equality
> predicates must be handled separately as a filter step.

The current prototype supports only an integer range-type without any 
additional non-temporal attributes (empty USING clause).

> Do you think, it is a good idea to extend the sort_inner_and_outer()
> with a new branch, where jointype == NORMALIZE and add the projection
> and append sub-paths there?

We actually extended sort_inner_and_outer now. It is an early solution, 
to support discussions. Please see the two sections starting with "if 
(jointype == JOIN_TEMPORAL_NORMALIZE)" inside sort_inner_and_outer:

The purpose of these sections is to change the inner path's range type 
into its single bounds.

We accomplish this with a new function called range_split. We take the 
inner clause and extract the second operator of an RANGE_EQ expression 
out of it. We assume *for this prototype*, that their is only one such 
operator and that it is solely used for NORMALIZE. Then, we replace it 
with range_split. A range split returns a set of tuples, hence we add a 
new "set projection path" above the inner path, and another sort path 
above that.

What we like to discuss now is:
- Is sort_inner_and_outer the correct place to perform this split?
- How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
   mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
   not an equality operator, but if we use RANGE_EQ it assumes that the
   right-hand-side of the operator must be a range as well.
- Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
   comparison, or keep RANGE_EQ and fix pathkeys later?
- How do we update equivalence classes, pathkeys, and any other struct,
   when changing the inner relation's data type from "int4range" to "int"
   in the query tree inside "sort_inner_and_outer" to get the correct
   ordering and data types

Best regards,
Anton, Johann, Michael, Peter

Attachment

tpg_normalize_v2.patch

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

28 February 2019, 21:43:41

Dear all,
we rebased our temporal normalization patch on top of 554ebf687852d045f0418d3242b978b49f160f44 from 2019-02-28.

On 9/7/18 1:02 PM, Peter Moser wrote:

The syntax is
SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

Please find all information about our decisions and current state within the previous email.

What we like to discuss now is:
- Is sort_inner_and_outer the correct place to perform this split?
- How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
not an equality operator, but if we use RANGE_EQ it assumes that the
right-hand-side of the operator must be a range as well.
- Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
comparison, or keep RANGE_EQ and fix pathkeys later?
- How do we update equivalence classes, and pathkeys
when changing the inner relation's data type from "int4range" to "int"
in the query tree inside "sort_inner_and_outer" to get the correct
ordering and data types

I will also add this prototype (WIP) patch to the commitfest of March, as suggested by two developers met at the
FOSDEM some weeks ago.

Best regards,
Anton, Johann, Michael, Peter

Attachment

tpg_normalize_v3.patch

Re: Re: [HACKERS] [PROPOSAL] Temporal query processing with rangetypes

From

David Steele

Date:

05 March 2019, 08:46:01

Hi Peter,

On 2/28/19 11:43 PM, Peter Moser wrote:
> Dear all,
> we rebased our temporal normalization patch on top of 
> 554ebf687852d045f0418d3242b978b49f160f44 from 2019-02-28.
> 
> I will also add this prototype (WIP) patch to the commitfest of March, 
> as suggested by two developers met at the
> FOSDEM some weeks ago.

I have marked this entry as targeting PG13 since it is too late to 
consider for PG12.

Regards,
-- 
-David
david@pgmasters.net

Re: Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From

Robert Haas

Date:

05 March 2019, 23:58:07

On Tue, Mar 5, 2019 at 3:46 AM David Steele <david@pgmasters.net> wrote:
> I have marked this entry as targeting PG13 since it is too late to
> consider for PG12.

Sounds right.  As Peter said himself, this patch is WIP, so it's too
soon to consider integrating it.  This is also fairly evident from the
content of the patch, which is full of comments marked XXX and PEMOSER
that obviously need to be addressed somehow.  For all of that, I'd say
this patch is much closer to being on the right track than the old
design, even though it's clear that a lot of work remains.

Some desultory review comments:

+#define setSweepline(datum) \
+ node->sweepline = datumCopy(datum, node->datumFormat->attbyval,
node->datumFormat->attlen)
+
+#define freeSweepline() \
+ if (! node->datumFormat->attbyval) pfree(DatumGetPointer(node->sweepline))

I am quite dubious about this. Almost everywhere in the executor we
rely on slots to keep track of tuples and free memory for us. It seems
unlikely that this should be the one place where we have code that
looks completely different.  Aside from that, this seems to propose
there is only one relevant column, which seems like an assumption that
we probably don't want to bake too deeply into the code.

+ ereport(ERROR,
+ (errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("Attribute \"%s\" at position %d is null. Temporal " \
+ "adjustment not possible.",
+ NameStr(TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->attname),
+ attnum)));

This error message is marked as translatable (ereport used rather than
elog) but the error-message text is unsuitable for a user-facing
error.  If this is a user-facing error, we need a better error, or
maybe we need to rethink the semantics altogether (e.g. just skip such
rows instead of erroring out, or something).  If it's an internal
error that should not be possible no matter what query the user
enters, and is only here as a sanity test, just simplify and use elog
(and maybe add some comments explaining why that's so).

+ * heapGetAttrNotNull

I may be a bit behind the times here, but it seems to me that this is
functionally equivalent to slotGetAttrNotNull and thus we shouldn't
need both.

+ bool        empty = false;

Not much point in declaring a variable whose value is never changed
and whose value takes up exactly the same number of characters as the
variable name.

+ * temporalAdjustmentStoreTuple
+ *      While we store result tuples, we must add the newly calculated temporal
+ *      boundaries as two scalar fields or create a single range-typed field
+ *      with the two given boundaries.

This doesn't seem to describe what the function is actually doing.

+ * This should ideally be done with RangeBound types on the right-hand-side
+ * created during range_split execution. Otherwise, we loose information about
+ * inclusive/exclusive bounds and infinity. We would need to implement btree
+ * operators for RangeBounds.

This seems like an idea for future improvement, but it's unclear to me
how the proposed idea is different from the state created by the
patch.

Also, materializing the slot to a heap tuple so that we can modify it
seems inefficient.  I wonder if we can't use a virtual slot here.

+ if (qual->opno == OID_RANGE_EQ_OP) {
+ Oid rngtypid;
+
+ // XXX PEMOSER Change opfamily and opfunc
+ qual->opfuncid = F_RANGE_CONTAINS; //<<--- opfuncid can be 0 during planning
+ qual->opno = OID_RANGE_CONTAINS_ELEM_OP; //OID_RANGE_CONTAINS_OP;
+ clause->isnormalize = true;
+
+ // Attention: cannot merge using non-equality operator 3890 <---
OID_RANGE_CONTAINS_OP
+ opfamily = 4103; //range_inclusion_ops from pg_opfamily.h
+
+ rngtypid = exprType((Node*)clause->lexpr->expr);
+ clause->range_typcache = lookup_type_cache(rngtypid, TYPECACHE_RANGE_INFO);
+ testmytypcache = clause->range_typcache;
+ } else {
+ clause->isnormalize = false;
+ }

This is pretty obviously a mess of hardcoded constants which are,
furthermore, not explained.  I can't tell whether this is intended as
a dirty hack to make some cases work while other things remain broken,
or whether you believe that OID_RANGE_EQ_OP.  If it's the latter, this
needs a much better explanation.  You probably realize that
"Attention: cannot merge using non-equality operator 3890" is not a
compelling justification, and that hard-coding all of these things
here doesn't look good.

In general, this patch needs both user-facing documentation and more
and better code comments.  I would suggest writing the user-facing
documentation soon.  It is pretty clear that you've got the basics of
this working, but it's impossible to understand what the semantics are
supposed to be except by staring at the code until you figure it out,
or running experiments.  People who are interested in this
functionality are more likely to provide useful feedback if there's
something they can read and go "yeah, that looks right" or "wait, that
sounds wrong."  Also, there may be places where, in the process of
documenting, you realize that things should be changed to make them
more consistent or easier to understand.  Adding a developer README to
the patch might be good, too.

With respect to this specific point, I'm wondering if the patch is
trying to handle two cases in somewhat different ways -- one where
there are actual range types involved and the other where there are
two different columns that can be view as constituting a range.  That
might explain the special-casing here, but if so there is probably a
need to clearly distinguish those cases much earlier, probably
someplace in the planner, not just at execution time.

+static Datum
+getLower(Datum range, TypeCacheEntry *typcache)
...
+static Datum
+getUpper(Datum range, TypeCacheEntry *typcache)

Anything that calls both of these functions is going to call
range_deserialize() twice, which seems inefficient.

+ if (node->sweepline < currP1)

This looks very strange.  Those are just Datums.  Typically what you'd
do is determine based on the datatype which opclass's comparator to
use and then call that.  I'm not sure what purpose could be served by
comparing Datums directly.  Maybe it would give you sensible answers
if these are integers, but not if they're pointers and probably not
even if they are native floats.

+ // XXX PEMOSER Manually fix sort operation of second attribute
(former time, now upper(time))
+ // We must fix that in general... this is just a proof of concept
brute-force solution!
+ if (plannode->plan.lefttree->type == T_ProjectSet) {
+ plannode->sortOperators[0] = 97; // 97 means "<" for int4, it was
"<" for int4range
+ }

Aside from the fact that this shouldn't be hard-coded, this shouldn't
be in the executor at all.  The planner should be figuring this out
for us.  If the planner is making a bad decision, then it needs to be
fixed so that it makes a good decision.  Just trying to work around
the wrong decision elsewhere in the code won't lead to anything good.

+ /*
+ * TEMPORAL NORMALIZE: To improve this, we would need to remove only
+ * temporal range types from the path key list, not all
+ */

If you can do a temporal join that has leading columns being joined in
a standard way, then it might be possible to truncate the PathKey list
to include just the leading columns, provided the join preserves
ordering on those columns -- but once you get to the first column in
the PathKey where ordering is not preserved, any columns after that
point are certainly no longer part of the PathKey.

+ /*
+ * XXX PEMOSER NORMALIZE needs a result node above to properly
+ * handle target lists, functions and constants
+ * Maybe we need to refine this like in create_unique_plan:
+ * "If the top plan node can't do projections..."
+ */
+ if (best_path->jpath.jointype == JOIN_TEMPORAL_NORMALIZE)
+ join_plan = make_result(tlist, NULL, join_plan);

Actually, I think what you need to do is make sure that MergeJoin can
still project in all cases after your changes.  Breaking that only for
the temporal normalize case isn't going to be acceptable, and it
shouldn't be hard to avoid having such a restriction.  It's "just" a
matter of figuring out how a bunch of fiddly executor bits work...

+ // XXX PEMOSER Hardcoded NORMALIZE detection... change this. Read
the note below...

What note below?

+ /* Temporal NORMALIZE appends an expression to compare temporal bounds */
+ if (normalizeVars)
+ {
+ A_Expr *e;
+ e = makeSimpleA_Expr(AEXPR_OP, "=",
+ (Node *) copyObject(linitial(normalizeVars)),
+ (Node *) copyObject(lsecond(normalizeVars)),
+ -1);
+ andargs = lappend(andargs, e);
+ }

You absolutely cannot do stuff like this in parse analysis.  It's
doubtful whether it's acceptable in general to include hard-coded
knowledge of = as a general point, but it's certainly not OK to do it
in parse analysis.  Parse analysis's job is to determine the objects
to which the query refers, NOT to massage it as a preparation for
further optimization.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PROPOSAL] Temporal query processing with range types

From

Ibrar Ahmed

Date:

02 April 2019, 13:11:51

I start looking at the patch, there is a couple of problems with the patch. The first one is the OID conflict, which I
fixedon my machine. The second problem is assertion failure. I think you have not compiled the PostgreSQL code with the
assertion.
 

...
postgres=# SELECT *
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted;
TRAP: FailedAssertion("!(ptr == ((void *)0) || (((const Node*)(ptr))->type) == type)", File:
"../../../src/include/nodes/nodes.h",Line: 588)
 
psql: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-04-02 12:50:09.654 UTC [27550] LOG:  server process (PID
27559)was terminated by signal 6: Aborted
 
...

Although this patch is WIP, but please avoid mix declaration to avoid the compiler warning message.

...
joinpath.c:993:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   PathTarget *target_split = makeNode(PathTarget);
...

I am still looking at the patch.

Re: [PROPOSAL] Temporal query processing with range types

From

Thomas Munro

Date:

01 July 2019, 11:44:31

On Wed, Apr 3, 2019 at 2:12 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> I start looking at the patch, there is a couple of problems with the patch. The first one is the OID conflict, which
Ifixed on my machine. The second problem is assertion failure. I think you have not compiled the PostgreSQL code with
theassertion.
 

Hi Peter,

Looks like there was some good feedback for this WIP project last time
around.  It's currently in "Needs Review" status in the July
Commitfest.  To encourage more review and see some automated compile
and test results, could we please have a fresh rebase?  The earlier
patches no longer apply.

Thanks,

-- 
Thomas Munro
https://enterprisedb.com

Re: [PROPOSAL] Temporal query processing with range types

From

Ibrar Ahmed

Date:

02 August 2019, 21:00:22

Hi,

I have rebased the patch and currently reviewing the patch

on master (1e2fddfa33d3c7cc93ca3ee0f32852699bd3e012).

On Mon, Jul 1, 2019 at 4:45 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Apr 3, 2019 at 2:12 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> I start looking at the patch, there is a couple of problems with the patch. The first one is the OID conflict, which I fixed on my machine. The second problem is assertion failure. I think you have not compiled the PostgreSQL code with the assertion.

Hi Peter,

Looks like there was some good feedback for this WIP project last time
around. It's currently in "Needs Review" status in the July
Commitfest. To encourage more review and see some automated compile
and test results, could we please have a fresh rebase? The earlier
patches no longer apply.

Thanks,

--
Thomas Munro
https://enterprisedb.com

Ibrar Ahmed

Attachment

001_temporal_query_processing_with_range_types_v4.patch

Re: [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

08 August 2019, 07:58:31

Hi Ibrar, Thomas and Robert,

On 8/2/19 11:00 PM, Ibrar Ahmed wrote:
> I have rebased the patch and currently reviewing the patch 
> on master (1e2fddfa33d3c7cc93ca3ee0f32852699bd3e012).

Thanks a lot for your effort. We are now trying to put again more work
and time in this patch.
We are grateful for any feedback.

Thanks,
Peter

Re: [PROPOSAL] Temporal query processing with range types

From

Michael Paquier

Date:

01 December 2019, 03:45:56

On Thu, Aug 08, 2019 at 09:58:31AM +0200, Peter Moser wrote:
> Thanks a lot for your effort. We are now trying to put again more work
> and time in this patch.
> We are grateful for any feedback.

The latest patch applies, but does not build because of an OID
conflict.  For development purposes, please make sure to use an OID in
the range 8000~9000 which are reserved for development per the
recently-added new project policy.  For now I have moved the patch to
next CF, waiting on author.
--
Michael

Attachment

signature.asc

Re: [PROPOSAL] Temporal query processing with range types

From

Peter Moser

Date:

30 January 2020, 15:00:27

Hi Hackers,

On 12/1/19 4:45 AM, Michael Paquier wrote:
> For now I have moved the patch to
> next CF, waiting on author.

We have withdrawn this patch for now. The reason for this is, that we
had ideas on how to split it into multiple simpler independent patches,
that can be reviewed and committed one by one.

Best regards,
Anton and Peter