Re: BUG #19340: Wrong result from CORR() function - Mailing list pgsql-bugs

From Dean Rasheed
Subject Re: BUG #19340: Wrong result from CORR() function
Date
Msg-id CAEZATCUyPKDb5z7yasp5v5HN9dTzPKqqa4u20goJK2fC15LpcQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #19340: Wrong result from CORR() function  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Poking at this, I soon found a test case where even with the separate
> sqrt() calls we'd produce a result slightly outside [-1, 1] (running
> this test over more values of x is sufficient).  So now I think we
> should do both the separate sqrt and the clamp.
>

I'm starting to have doubts about having 2 sqrt() calls. The problem
is that it seems to produce a noticeable reduction in accuracy in
quite a few cases. This is especially noticeable with fully-correlated
data. For example:

SELECT n, (SELECT corr(x, x) FROM generate_series(1, n) x)
  FROM generate_series(1, 10) g(n);

 n  |        corr
----+--------------------
  1 |
  2 | 0.9999999999999998
  3 | 0.9999999999999998
  4 | 0.9999999999999998
  5 | 0.9999999999999998
  6 |                  1
  7 | 0.9999999999999999
  8 |                  1
  9 | 0.9999999999999999
 10 |                  1
(10 rows)

Now I'm not sure that the current code can be expected to get cases
like this exactly right 100% of the time, but it's pretty close. For
example, if I do this:

WITH t1 AS (
  SELECT n, random() * 1000 AS r FROM generate_series(1, 1000000) n
), t2 AS (
  SELECT corr(r, r) FROM t1 GROUP BY n % 10000
)
SELECT count(*), count(*) FILTER (WHERE corr != 1) FROM t2;

on HEAD it produced corr = 1 every time I ran it, whereas the patch
gives rounding errors roughly 25% of the time, which seems likely to
be noticed.

Perhaps we should only use 2 sqrt()'s if the product Sxx * Syy overflows.

Regards,
Dean



pgsql-bugs by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: BUG #19340: Wrong result from CORR() function
Next
From: Anthonin Bonnefoy
Date:
Subject: Re: Segfault due to NULL ParamExecData value