Re: [HACKERS] [PROPOSAL] Temporal query processing with range types - Mailing list pgsql-hackers

From Peter Moser
Subject Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Date
Msg-id 87682356-8fdf-217c-bcfb-4d4d91aa6e8c@gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PROPOSAL] Temporal query processing with range types  (Peter Moser <pitiz29a@gmail.com>)
Responses Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
List pgsql-hackers
On 01/26/2018 07:55 AM, Peter Moser wrote:
> We have now a new approach to plan and execute NORMALIZE as a special
> join node of type NORMALIZE, an append-plan on the inner join path,
> and a merge-join executor. For the latter, we would need to
> extend nodeMergejoin.c with an point-in-range-containment join.

We are ready with a new prototype for the temporal NORMALIZE operation. 
In this prototype we do not rewrite queries as in the previous patch, 
but have one executor node, that solves the normalize operation. This 
executor is based on a merge-join.

Our new patch is based on top of 
75f7855369ec56d4a8e7d6eae98aff1182b85cac from September 6, 2018.

The syntax is
SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

It currently is only implemented for empty USING clauses, and solely 
int4range as range attributes.

Example:

A=# table r; 

  a | b | period_r 

---+---+---------- 

  a | B | [1,7) 

  b | B | [3,9) 

  c | G | [8,10) 

(3 rows) 

 

A=# table s; 

  c | d | period_s 

---+---+---------- 

  1 | B | [2,5) 

  2 | B | [3,4) 

  3 | B | [7,9) 

(3 rows) 

 

A=# SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c; 

  period_r | a | b 

----------+---+--- 

  [1,2)    | a | B 

  [2,3)    | a | B 

  [3,4)    | a | B 

  [4,5)    | a | B 

  [5,7)    | a | B 

  [3,4)    | b | B 

  [4,5)    | b | B 

  [5,7)    | b | B 

  [7,9)    | b | B 

(9 rows) 



A=# EXPLAIN SELECT * FROM (r NORMALIZE s USING() WITH(period_r, 
period_s)) c;
                                 QUERY PLAN 

-------------------------------------------------------------------------- 

  Result  (cost=2.15..2.22 rows=3 width=18) 

    ->  Merge ??? Join  (cost=2.15..2.23 rows=3 width=22) 

          Merge Cond: (r.period_r @> (range_split(s.period_s))) 

          ->  Sort  (cost=1.05..1.06 rows=3 width=18) 

                Sort Key: r.period_r 

                ->  Seq Scan on r  (cost=0.00..1.03 rows=3 width=18) 

          ->  Sort  (cost=1.09..1.10 rows=3 width=4) 

                Sort Key: (range_split(s.period_s)) USING < 

                ->  ProjectSet  (cost=0.00..1.07 rows=3 width=4) 

                      ->  Seq Scan on s  (cost=0.00..1.03 rows=3 
width=14)
(10 rows) 



> That is, we create a new join path within sort_inner_and_outer
> (joinpath.c). First two projection nodes to extract the start- and
> end-timepoints of the range type used as interval, and above an
> append-plan to merge both subplans. In detail, outer_path is just sort,
> whereas inner_path is append of (B, ts) projection with (B, te)
> projection.

We changed this implementation and use a set-returning function called 
"range_split", that extracts the upper and lower bound of a range and 
returns two tuples. For instance, a tuple '[4,10),a' becomes two tuples 
of the form '4,a' and '10,a'.

> Hereby, B is a set of non-temporal attributes used in join equality
> predicates, and [ts,te) forms the valid-time interval. Non-equality
> predicates must be handled separately as a filter step.

The current prototype supports only an integer range-type without any 
additional non-temporal attributes (empty USING clause).

> Do you think, it is a good idea to extend the sort_inner_and_outer()
> with a new branch, where jointype == NORMALIZE and add the projection
> and append sub-paths there?

We actually extended sort_inner_and_outer now. It is an early solution, 
to support discussions. Please see the two sections starting with "if 
(jointype == JOIN_TEMPORAL_NORMALIZE)" inside sort_inner_and_outer:

The purpose of these sections is to change the inner path's range type 
into its single bounds.

We accomplish this with a new function called range_split. We take the 
inner clause and extract the second operator of an RANGE_EQ expression 
out of it. We assume *for this prototype*, that their is only one such 
operator and that it is solely used for NORMALIZE. Then, we replace it 
with range_split. A range split returns a set of tuples, hence we add a 
new "set projection path" above the inner path, and another sort path 
above that.

What we like to discuss now is:
- Is sort_inner_and_outer the correct place to perform this split?
- How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
   mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
   not an equality operator, but if we use RANGE_EQ it assumes that the
   right-hand-side of the operator must be a range as well.
- Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
   comparison, or keep RANGE_EQ and fix pathkeys later?
- How do we update equivalence classes, pathkeys, and any other struct,
   when changing the inner relation's data type from "int4range" to "int"
   in the query tree inside "sort_inner_and_outer" to get the correct
   ordering and data types


Best regards,
Anton, Johann, Michael, Peter


Attachment

pgsql-hackers by date:

Previous
From: Rajkumar Raghuwanshi
Date:
Subject: cache lookup failed for constraint when alter table referred bypartition table
Next
From: Adrien NAYRAT
Date:
Subject: Standby reads fail when autovacuum take AEL during truncation