I wrote:
> I'm coming around to the conclusion that your way is better,
> though. It seems good that "any NaN in the input results in
> NaN output", which your way does and mine doesn't.
Poking further at this, I found that my v2 patch fails that principle
in one case:
regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,1000) g;
corr
------
(1 row)
We see that Y is constant and therefore return NULL, despite the
other NaN input.
I think we can fix that along these lines:
@@ -3776,8 +3776,12 @@ float8_corr(PG_FUNCTION_ARGS)
if (N < 1.0)
PG_RETURN_NULL();
- /* per spec, return NULL for horizontal and vertical lines */
- if (!isnan(commonX) || !isnan(commonY))
+ /*
+ * per spec, return NULL for horizontal and vertical lines; but not if the
+ * result would otherwise be NaN
+ */
+ if ((!isnan(commonX) || !isnan(commonY)) &&
+ (!isnan(Sxx) && !isnan(Syy)))
PG_RETURN_NULL();
/* at this point, Sxx and Syy cannot be zero or negative */
(don't think it should be necessary to also check Sxy)
BTW, HEAD is inconsistent: it will return NaN for this example, but
only because it's confused by roundoff error into thinking that Y
isn't constant. With few enough inputs, it produces NULL too:
regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,3) g;
corr
------
(1 row)
regards, tom lane