Re: BUG #19340: Wrong result from CORR() function - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #19340: Wrong result from CORR() function
Date
Msg-id 545890.1764725276@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #19340: Wrong result from CORR() function  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I wrote:
> I'm coming around to the conclusion that your way is better,
> though.  It seems good that "any NaN in the input results in
> NaN output", which your way does and mine doesn't.

Poking further at this, I found that my v2 patch fails that principle
in one case:

regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,1000) g;
 corr 
------
     
(1 row)

We see that Y is constant and therefore return NULL, despite the
other NaN input.

I think we can fix that along these lines:

@@ -3776,8 +3776,12 @@ float8_corr(PG_FUNCTION_ARGS)
     if (N < 1.0)
         PG_RETURN_NULL();
 
-    /* per spec, return NULL for horizontal and vertical lines */
-    if (!isnan(commonX) || !isnan(commonY))
+    /*
+     * per spec, return NULL for horizontal and vertical lines; but not if the
+     * result would otherwise be NaN
+     */
+    if ((!isnan(commonX) || !isnan(commonY)) &&
+        (!isnan(Sxx) && !isnan(Syy)))
         PG_RETURN_NULL();
 
     /* at this point, Sxx and Syy cannot be zero or negative */

(don't think it should be necessary to also check Sxy)


BTW, HEAD is inconsistent: it will return NaN for this example, but
only because it's confused by roundoff error into thinking that Y
isn't constant.  With few enough inputs, it produces NULL too:

regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,3) g;
 corr 
------
     
(1 row)

            regards, tom lane



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #19340: Wrong result from CORR() function