Re: [HACKERS] [PROPOSAL] Temporal query processing with range types - Mailing list pgsql-hackers
From | Peter Moser |
---|---|
Subject | Re: [HACKERS] [PROPOSAL] Temporal query processing with range types |
Date | |
Msg-id | 87682356-8fdf-217c-bcfb-4d4d91aa6e8c@gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [PROPOSAL] Temporal query processing with range types (Peter Moser <pitiz29a@gmail.com>) |
Responses |
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
|
List | pgsql-hackers |
On 01/26/2018 07:55 AM, Peter Moser wrote: > We have now a new approach to plan and execute NORMALIZE as a special > join node of type NORMALIZE, an append-plan on the inner join path, > and a merge-join executor. For the latter, we would need to > extend nodeMergejoin.c with an point-in-range-containment join. We are ready with a new prototype for the temporal NORMALIZE operation. In this prototype we do not rewrite queries as in the previous patch, but have one executor node, that solves the normalize operation. This executor is based on a merge-join. Our new patch is based on top of 75f7855369ec56d4a8e7d6eae98aff1182b85cac from September 6, 2018. The syntax is SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c; It currently is only implemented for empty USING clauses, and solely int4range as range attributes. Example: A=# table r; a | b | period_r ---+---+---------- a | B | [1,7) b | B | [3,9) c | G | [8,10) (3 rows) A=# table s; c | d | period_s ---+---+---------- 1 | B | [2,5) 2 | B | [3,4) 3 | B | [7,9) (3 rows) A=# SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c; period_r | a | b ----------+---+--- [1,2) | a | B [2,3) | a | B [3,4) | a | B [4,5) | a | B [5,7) | a | B [3,4) | b | B [4,5) | b | B [5,7) | b | B [7,9) | b | B (9 rows) A=# EXPLAIN SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c; QUERY PLAN -------------------------------------------------------------------------- Result (cost=2.15..2.22 rows=3 width=18) -> Merge ??? Join (cost=2.15..2.23 rows=3 width=22) Merge Cond: (r.period_r @> (range_split(s.period_s))) -> Sort (cost=1.05..1.06 rows=3 width=18) Sort Key: r.period_r -> Seq Scan on r (cost=0.00..1.03 rows=3 width=18) -> Sort (cost=1.09..1.10 rows=3 width=4) Sort Key: (range_split(s.period_s)) USING < -> ProjectSet (cost=0.00..1.07 rows=3 width=4) -> Seq Scan on s (cost=0.00..1.03 rows=3 width=14) (10 rows) > That is, we create a new join path within sort_inner_and_outer > (joinpath.c). First two projection nodes to extract the start- and > end-timepoints of the range type used as interval, and above an > append-plan to merge both subplans. In detail, outer_path is just sort, > whereas inner_path is append of (B, ts) projection with (B, te) > projection. We changed this implementation and use a set-returning function called "range_split", that extracts the upper and lower bound of a range and returns two tuples. For instance, a tuple '[4,10),a' becomes two tuples of the form '4,a' and '10,a'. > Hereby, B is a set of non-temporal attributes used in join equality > predicates, and [ts,te) forms the valid-time interval. Non-equality > predicates must be handled separately as a filter step. The current prototype supports only an integer range-type without any additional non-temporal attributes (empty USING clause). > Do you think, it is a good idea to extend the sort_inner_and_outer() > with a new branch, where jointype == NORMALIZE and add the projection > and append sub-paths there? We actually extended sort_inner_and_outer now. It is an early solution, to support discussions. Please see the two sections starting with "if (jointype == JOIN_TEMPORAL_NORMALIZE)" inside sort_inner_and_outer: The purpose of these sections is to change the inner path's range type into its single bounds. We accomplish this with a new function called range_split. We take the inner clause and extract the second operator of an RANGE_EQ expression out of it. We assume *for this prototype*, that their is only one such operator and that it is solely used for NORMALIZE. Then, we replace it with range_split. A range split returns a set of tuples, hence we add a new "set projection path" above the inner path, and another sort path above that. What we like to discuss now is: - Is sort_inner_and_outer the correct place to perform this split? - How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is not an equality operator, but if we use RANGE_EQ it assumes that the right-hand-side of the operator must be a range as well. - Should we better change our mergeclause to a RANGE_ELEM_CONTAINED comparison, or keep RANGE_EQ and fix pathkeys later? - How do we update equivalence classes, pathkeys, and any other struct, when changing the inner relation's data type from "int4range" to "int" in the query tree inside "sort_inner_and_outer" to get the correct ordering and data types Best regards, Anton, Johann, Michael, Peter
Attachment
pgsql-hackers by date: