Re: Specifying attribute slot for storing/reading statistics - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Specifying attribute slot for storing/reading statistics
Date
Msg-id 20190910133035.fecpdiynqqvcszpj@development
Whole thread Raw
In response to Re: Specifying attribute slot for storing/reading statistics  (Esteban Zimanyi <ezimanyi@ulb.ac.be>)
Responses Re: Specifying attribute slot for storing/reading statistics
List pgsql-hackers
Hi,

Please don't top-post. If you're not responding to parts of the e-mail,
then don't quote it.

On Fri, Sep 06, 2019 at 12:50:33PM +0200, Esteban Zimanyi wrote:
>Dear Tom
>
>Many thanks for your quick reply. Indeed both solutions you proposed can be
>combined together in order to solve all the problems. However changes in
>the code are needed. Let me now elaborate on the solution concerning the
>combination of stakind/staop first and I will elaborate on adding a new
>kind identifier after.
>
>In order to understand the setting, let me explain a little more about the
>different kinds of temporal types. As explained in my previous email these
>are types whose values are composed of elements v@t where v is a
>PostgreSQL/PostGIS type (float or geometry) and t is a TimestampTz. There
>are four kinds of temporal types, depending on the their duration
>* Instant: Values of the form v@t. These are used for example to represent
>car accidents as in Point(0 0)@2000-01-01 08:30
>* InstantSet: A set of values {v1@t1, ...., vn@tn} where the values between
>the points are unknown. These are used for example to represent checkins in
>FourSquare or RFID readings
>* Sequence: A sequence of values [v1@t1, ...., vn@tn] where the values
>between two successive instants vi@ti vj@tj are (linearly) interpolated.
>These are used to represent for example GPS tracks.
>* SequenceSet: A set of sequences {s1, ... , sn} where there is a temporal
>gap between them. These are used to represent for example GPS tracks where
>the signal was lost during a time period.
>

So these are 4 different data types (or classes of data types) that you
introduce in your extension? Or is that just a conceptual view and it's
stored in some other way (e.g. normalized in some way)?

>To compute the selectivity of temporal types we assume that time and space
>dimensions are independent and thus we can reuse all existing analyze and
>selectivity infrastructure in PostgreSQL/PostGIS. For the various durations
>this amounts to
>* Instant: Use the functions in analyze.c and selfuncs.c independently for
>the value and time dimensions
>* InstantSet: Use the functions in array_typanalyze.c, array_selfuncs.c
>independently for the value and time dimensions
>* Sequence and SequenceSet: To simplify, we do not take into account the
>gaps, and thus use the functions in rangetypes_typanalyze.c,
>rangetypes_selfuncs.c independently for the value and time dimensions
>

OK.

>However, this requires that the analyze and selectivity functions in all
>the above files satisfy the following
>* Set the staop when computing statistics. For example in
>rangetypes_typanalyze.c the staop is set for
>STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM but not for
>STATISTIC_KIND_BOUNDS_HISTOGRAM
>* Always call get_attstatsslot with the operator Oid not with InvalidOid.
>For example, from the 17 times this function is called in selfuncs.c only
>two are passed with an operator. This also requires to pass the operator as
>an additional parameter to several functions. For example, the operator
>should be passed to the function ineq_histogram_selectivity in selfuncs.c
>* Export several top-level functions which are currently static. For
>example, var_eq_const, ineq_histogram_selectivity, eqjoinsel_inner and
>several others in the file selfuncs.c should be exported.
>
>That would solve all the problems excepted for
>STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM, since in this case the staop will
>always be Float8LessOperator, independently of whether we are computing
>lengths of value ranges or of tstzranges. This could be solved by using a
>different stakind for the value and time dimensions.
>

I don't think we're strongly against changing the code to allow this, as 
long as it does not break existing extensions/code (unnecessarily).

>If you want I can prepare a PR in order to understand the implications of
>these changes. Please let me know.
>

I think having an actual patch to look at would be helpful.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Avoiding hash join batch explosions with extreme skew and weirdstats
Next
From: Binguo Bao
Date:
Subject: Re: [proposal] de-TOAST'ing using a iterator