Re: Weighted Stats - Mailing list pgsql-hackers

From Haribabu Kommi
Subject Re: Weighted Stats
Date
Msg-id CAJrrPGf1KT-5BN+RPuNXMkCiGXNQs5_RatYqseun-xfF9H+8aQ@mail.gmail.com
Whole thread Raw
In response to Re: Weighted Stats  (David Fetter <david@fetter.org>)
Responses Re: Weighted Stats  (David Fetter <david@fetter.org>)
Re: Weighted Stats  (David Fetter <david@fetter.org>)
List pgsql-hackers
On Mon, Dec 21, 2015 at 1:50 PM, David Fetter <david@fetter.org> wrote:
> On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote:
>> On 11/2/15 5:46 PM, David Fetter wrote:
>> >I'd like to add weighted statistics to PostgreSQL
>>
>> Anything happen with this? If community isn't interested, ISTM it'd be good
>> to put this in PGXN.
>
> I think it's already in PGXN as an extension, and I'll get another
> version out this early this week, as it involves mostly adding some
> tests.
>
> I'll do the float8 ones for core this week, too, and unless there's a
> really great reason to do more data types on the first pass, it should
> be in committable shape.

I reviewed the patch, following are my observations.

1. +       precision</type>, <type>numeric</type>, or <type>interval</type>

with interval type it is giving problem. As interval data type is not supported,
so remove it in the list of supported inputs.

postgres=# select weighted_avg(f7,f1) from tbl;
ERROR:  function weighted_avg(interval, smallint) does not exist at character 8
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.


2. +float8_weighted_avg(PG_FUNCTION_ARGS)

It will be helpful, if you provide some information as a function header,
how the weighted average is calculated similar like other weighted functions.


3. + transvalues = check_float8_array(transarray,
"float8_weighted_stddev_accum", 4);

The second parameter to check_float8_array should be "float8_weighted_accum".


4. There is an OID conflict of 4066 with latest master code.


5.+ A += newvalW * ( newvalX - transvalues[2] ) / W;
+ CHECKFLOATVAL(A, isinf(newvalW) || isinf(newvalX - transvalues[2])
|| isinf(1.0/W), true);

+ Q += newvalW * (newvalX - transvalues[2]) * (newvalX - A);
+ CHECKFLOATVAL(A, isinf(newvalX -  transvalues[3]) || isinf(newvalX -
A) || isinf(1.0/W), true);


Is the need of calculation also needs to be passed to CHECKFLOATVAL?
Just passing
the variables involved in the calculation isn't enough? If expressions
are required then
it should be something as follows?

CHECKFLOATVAL(A, isinf(transvalues[2]) || isinf(newvalW) ||
isinf(newvalX - transvalues[2]) || isinf(1.0/W), true);

CHECKFLOATVAL(Q, isinf(transvalues[3]) || isinf(newvalX -
transvalues[2]) || isinf(newvalX - A) || isinf(1.0/W), true);


I verified the stddev transition and final function calculations
according to wikipedia
and they are fine.

Regards,
Hari Babu
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
Next
From: Catalin Iacob
Date:
Subject: Re: proposal: PL/Pythonu - function ereport