Re: Specifying attribute slot for storing/reading statistics - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Specifying attribute slot for storing/reading statistics |
Date | |
Msg-id | 20190910133035.fecpdiynqqvcszpj@development Whole thread Raw |
In response to | Re: Specifying attribute slot for storing/reading statistics (Esteban Zimanyi <ezimanyi@ulb.ac.be>) |
Responses |
Re: Specifying attribute slot for storing/reading statistics
|
List | pgsql-hackers |
Hi, Please don't top-post. If you're not responding to parts of the e-mail, then don't quote it. On Fri, Sep 06, 2019 at 12:50:33PM +0200, Esteban Zimanyi wrote: >Dear Tom > >Many thanks for your quick reply. Indeed both solutions you proposed can be >combined together in order to solve all the problems. However changes in >the code are needed. Let me now elaborate on the solution concerning the >combination of stakind/staop first and I will elaborate on adding a new >kind identifier after. > >In order to understand the setting, let me explain a little more about the >different kinds of temporal types. As explained in my previous email these >are types whose values are composed of elements v@t where v is a >PostgreSQL/PostGIS type (float or geometry) and t is a TimestampTz. There >are four kinds of temporal types, depending on the their duration >* Instant: Values of the form v@t. These are used for example to represent >car accidents as in Point(0 0)@2000-01-01 08:30 >* InstantSet: A set of values {v1@t1, ...., vn@tn} where the values between >the points are unknown. These are used for example to represent checkins in >FourSquare or RFID readings >* Sequence: A sequence of values [v1@t1, ...., vn@tn] where the values >between two successive instants vi@ti vj@tj are (linearly) interpolated. >These are used to represent for example GPS tracks. >* SequenceSet: A set of sequences {s1, ... , sn} where there is a temporal >gap between them. These are used to represent for example GPS tracks where >the signal was lost during a time period. > So these are 4 different data types (or classes of data types) that you introduce in your extension? Or is that just a conceptual view and it's stored in some other way (e.g. normalized in some way)? >To compute the selectivity of temporal types we assume that time and space >dimensions are independent and thus we can reuse all existing analyze and >selectivity infrastructure in PostgreSQL/PostGIS. For the various durations >this amounts to >* Instant: Use the functions in analyze.c and selfuncs.c independently for >the value and time dimensions >* InstantSet: Use the functions in array_typanalyze.c, array_selfuncs.c >independently for the value and time dimensions >* Sequence and SequenceSet: To simplify, we do not take into account the >gaps, and thus use the functions in rangetypes_typanalyze.c, >rangetypes_selfuncs.c independently for the value and time dimensions > OK. >However, this requires that the analyze and selectivity functions in all >the above files satisfy the following >* Set the staop when computing statistics. For example in >rangetypes_typanalyze.c the staop is set for >STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM but not for >STATISTIC_KIND_BOUNDS_HISTOGRAM >* Always call get_attstatsslot with the operator Oid not with InvalidOid. >For example, from the 17 times this function is called in selfuncs.c only >two are passed with an operator. This also requires to pass the operator as >an additional parameter to several functions. For example, the operator >should be passed to the function ineq_histogram_selectivity in selfuncs.c >* Export several top-level functions which are currently static. For >example, var_eq_const, ineq_histogram_selectivity, eqjoinsel_inner and >several others in the file selfuncs.c should be exported. > >That would solve all the problems excepted for >STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM, since in this case the staop will >always be Float8LessOperator, independently of whether we are computing >lengths of value ranges or of tstzranges. This could be solved by using a >different stakind for the value and time dimensions. > I don't think we're strongly against changing the code to allow this, as long as it does not break existing extensions/code (unnecessarily). >If you want I can prepare a PR in order to understand the implications of >these changes. Please let me know. > I think having an actual patch to look at would be helpful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: