Re: BUG #19340: Wrong result from CORR() function - Mailing list pgsql-bugs

From Dean Rasheed
Subject Re: BUG #19340: Wrong result from CORR() function
Date
Msg-id CAEZATCU8rnyKN3z_5-osk3Bn8dtzWf9nKjTr2E16-ExXiESNrQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #19340: Wrong result from CORR() function  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #19340: Wrong result from CORR() function
List pgsql-bugs
On Tue, 2 Dec 2025 at 17:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> It's fundamentally impossible to guarantee exact results with
> floating-point arithmetic, so if you are expecting that you need
> to readjust your expectations.  But having said that, it does
> seem a bit sad that we can't detect constant-input cases exactly.

Yes, indeed. I tested the following query:

SELECT n,
       (SELECT variance(1.3::float8) FROM generate_series(1, n)),
       (SELECT corr(1.3, 1.3) FROM generate_series(1, n))
  FROM generate_series(1, 10) g(n);

In v11 (with the old algorithm) this produces

 n  |       variance       | corr
----+----------------------+------
  1 |                      |
  2 |                    0 |
  3 |                    0 |
  4 |                    0 |
  5 |  3.5527136788005e-16 |    1
  6 |   2.368475785867e-16 |    1
  7 | 3.38353683695286e-16 |    1
  8 |                    0 |
  9 |                    0 |
 10 |                    0 |
(10 rows)

whereas in HEAD (with the Youngs-Cramer algorithm) it produces

 n  |        variance        | corr
----+------------------------+------
  1 |                        |
  2 |                      0 |
  3 |                      0 |
  4 |                      0 |
  5 |                      0 |
  6 |  5.259072701473412e-33 |    1
  7 |  4.382560584561177e-33 |    1
  8 |  3.756480501052437e-33 |    1
  9 | 3.2869204384208825e-33 |    1
 10 |  6.817316464872942e-33 |    1
(10 rows)

so the errors in the variance are smaller, but any non-zero error
makes the correlation completely wrong.

> I wonder whether it'd be worth carrying additional state to
> check that explicitly (instead of assuming that "if (Sxx == 0 ||
> Syy == 0)" will catch it).

I wondered the same thing. It's not nice to have to do that, but
clearly the existing test for constant inputs is no good. The question
is, do we really want to spend extra cycles on every query just to
catch this odd corner case?

Regards,
Dean



pgsql-bugs by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
Next
From: Tom Lane
Date:
Subject: Re: BUG #19340: Wrong result from CORR() function