Thread: Do we need multiple forms of the SQL2003 statistics aggregates?

Do we need multiple forms of the SQL2003 statistics aggregates?

From
Tom Lane
Date:
Sergey's recent patch for the SQL2003 binary aggregates implements each
of them three times: in float4, float8, and numeric arithmetic.  This
seems like vast overkill.  The float4 versions certainly ought to go,
as they won't perform noticeably better than float8 and will likely be
subject to terrible roundoff-error problems.  I'm also dubious about the
numeric versions.  The spec says
b) Let DTDVE be the declared type of DVE and let DTIVE be thedeclared type of IVE.
c) Case: 
i) The declared type of REGR_COUNT is an implementation-definedexact numeric type with scale of 0 (zero).
ii) Otherwise, the declared type of the result is animplementation-defined approximate numeric type. If DTDVE is
anapproximatenumeric type, then the precision of the result isnot less than the precision of DTDVE. If DTIVE is an
approximatenumerictype, then the precision of the result is not less thanthe precision of DTIVE.
 

so as far as I can see we are *only* required to provide a float8
implementation.

There is room to argue that the numeric-arithmetic version would be
worth having on the grounds of greater precision or range, but it's a
big chunk of code and the public demand for the functionality has not
exactly been overwhelming.

Comments?
        regards, tom lane


Re: Do we need multiple forms of the SQL2003 statistics aggregates?

From
Tom Lane
Date:
I wrote:
> There is room to argue that the numeric-arithmetic version would be
> worth having on the grounds of greater precision or range, but it's a
> big chunk of code and the public demand for the functionality has not
> exactly been overwhelming.

> Comments?

Since no one's even bothered to respond, I take it there's insufficient
interest in the numeric versions of these aggregates.  I've committed
just the float8 versions.

I added some very trivial regression tests, which we'll have to keep an
eye on to see if they have any portability problems.  We may need to
back off the number of displayed fraction digits to get them to pass
everywhere.

If anyone wants to do better tests, feel free...
        regards, tom lane


Re: Do we need multiple forms of the SQL2003 statistics

From
"Sergey E. Koposov"
Date:
Tom, thank you for the reviewing/correcting/applying my patches...

On Fri, 28 Jul 2006, Tom Lane wrote:

> I wrote:
>> There is room to argue that the numeric-arithmetic version would be
>> worth having on the grounds of greater precision or range, but it's a
>> big chunk of code and the public demand for the functionality has not
>> exactly been overwhelming.
>
>> Comments?
>
> Since no one's even bothered to respond, I take it there's insufficient
> interest in the numeric versions of these aggregates.  I've committed
> just the float8 versions.


My opinion on that is 
1) I agree that really the float4 version are useless, because they don't
offer anything new... comparing to the float8 
2) But I think the numeric versions of these functions are not useless...
(if somebody have numeric columns... ) 
I don't think the 10-15 additional functions in pg_proc is too much. 
Concerning to the amount of code in numeric.c I think it is large, but not
complicated at all, so I don't think that it will be a problem to support
that code.

I think since we are supporting the numeric type as a special 
high-precision type, Postgres must have the high-precision 
versions of all computational functions. Just my opinion.

>
> I added some very trivial regression tests, which we'll have to keep an
> eye on to see if they have any portability problems.  We may need to
> back off the number of displayed fraction digits to get them to pass
> everywhere.
>
> If anyone wants to do better tests, feel free...

I will try to write some better tests and send a patch.

Regards,    Sergey

*******************************************************************
Sergey E. Koposov
Max Planck Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru


Re: Do we need multiple forms of the SQL2003 statistics

From
Peter Eisentraut
Date:
Sergey E. Koposov wrote:
> I think since we are supporting the numeric type as a special
> high-precision type, Postgres must have the high-precision
> versions of all computational functions. Just my opinion.

Another way to look at it is whether you want to have accurate 
computations (numeric) or approximate computations (float).  I'm not a 
statistician, so I don't know what most of these functions are used 
for.  From a mathematician's point of view, however, some of these 
functions normally produce irrational numbers anyway, so it seems 
unlikely that numeric will be useful.  But looking at the definition 
of, say, regr_avgx(Y, X), if all the input values are integers, it 
might be useful if I could get an exact integer or rational number as 
output, instead of a float, that is.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Do we need multiple forms of the SQL2003 statistics

From
"Sergey E. Koposov"
Date:
On Sat, 29 Jul 2006, Peter Eisentraut wrote:

> Sergey E. Koposov wrote:
>> I think since we are supporting the numeric type as a special
>> high-precision type, Postgres must have the high-precision
>> versions of all computational functions. Just my opinion.
>
> Another way to look at it is whether you want to have accurate
> computations (numeric) or approximate computations (float).  I'm not a
> statistician, so I don't know what most of these functions are used
> for.  From a mathematician's point of view, however, some of these
> functions normally produce irrational numbers anyway, so it seems
> unlikely that numeric will be useful.  But looking at the definition
> of, say, regr_avgx(Y, X), if all the input values are integers, it
> might be useful if I could get an exact integer or rational number as
> output, instead of a float, that is.

Exactly from the statistical point of view, there is no need to have the 
integer output of those 2-arg. aggregates. For example corr(), regr_*() 
are by definition not integer (they just don't have any sense as 
integers...)( -1<= corr(Y,X)<=1 ) (for example the stddev(int) do 
not produce int also, because it does not have any sense)

So it's perfectly fine that they are producing only floating numbers...

Regards,    Sergey

*******************************************************************
Sergey E. Koposov
Max Planck Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru


Re: Do we need multiple forms of the SQL2003 statistics

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> ... From a mathematician's point of view, however, some of these 
> functions normally produce irrational numbers anyway, so it seems 
> unlikely that numeric will be useful.  But looking at the definition 
> of, say, regr_avgx(Y, X), if all the input values are integers, it 
> might be useful if I could get an exact integer or rational number as 
> output, instead of a float, that is.

The question is whether this is useful enough to justify adding a
thousand lines to numeric.c.

I believe also that the numeric forms of the aggregates will be
enormously slower than the float forms, at least on most modern
machines with decent floating-point performance.  I don't have time
to do any performance testing though.
        regards, tom lane


Re: Do we need multiple forms of the SQL2003 statistics

From
"Sergey E. Koposov"
Date:
On Sat, 29 Jul 2006, Tom Lane wrote:

> Peter Eisentraut <peter_e@gmx.net> writes:
>> ... From a mathematician's point of view, however, some of these
>> functions normally produce irrational numbers anyway, so it seems
>> unlikely that numeric will be useful.  But looking at the definition
>> of, say, regr_avgx(Y, X), if all the input values are integers, it
>> might be useful if I could get an exact integer or rational number as
>> output, instead of a float, that is.
>
> The question is whether this is useful enough to justify adding a
> thousand lines to numeric.c.
>
> I believe also that the numeric forms of the aggregates will be
> enormously slower than the float forms, at least on most modern
> machines with decent floating-point performance.  I don't have time
> to do any performance testing though.
>

Based on just simple tests On my laptop (1.8Ghz Centrino) the numeric 
versions are 5-10 times slower (depending on what 2arg function is used)

The example:

template1=# \d  test          Table "public.test" Column |       Type       | Modifiers
--------+------------------+----------- na     | numeric          | nb     | numeric          | da     | double
precision| db     | double precision |
 
template1=# select count(*) from test;  count
--------- 1000000
(1 row)

template1=# select regr_avgx(da,db) from test;    regr_avgx
----------------- 0.5002412120227
(1 row)

Time: 1052,893 ms
template1=# select regr_avgx(na,nb) from test;        regr_avgx
------------------------- 0.500040167263887822939
(1 row)

Time: 4459,739 ms

template1=# select regr_sxx(da,db) from test;     regr_sxx
------------------ 83303.6317359119
(1 row)

Time: 1043,891 ms
template1=# select regr_sxx(na,nb) from test;                     regr_sxx
-------------------------------------------------- 83342.044294954140912267902323775495680113567986
(1 row)

Time: 8514,843 ms
template1=# select corr(da,db) from test;         corr
---------------------- 0.000527588261283456
(1 row)

Time: 1074,948 ms
template1=# select corr(na,nb) from test;                                          corr
----------------------------------------------------------------------------------------

0.000759857150984988517883855238363403977440313567465424735082001422354119457463407737
(1 row)

Time: 18327,376 ms


Yes, that's significantly slower, but I don't think that it make the 
functions unusable...

Regards,    Sergey

*******************************************************************
Sergey E. Koposov
Max Planck Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru