Re: regr_slope returning NULL - Mailing list pgsql-general

From Dean Rasheed
Subject Re: regr_slope returning NULL
Date
Msg-id CAEZATCUMo3vwEgk6oa_4SqDXketLbFqhvh7_XD1p5+=0JeCFFw@mail.gmail.com
Whole thread Raw
In response to Re: regr_slope returning NULL  (Steve Baldwin <steve.baldwin@gmail.com>)
Responses Re: regr_slope returning NULL
List pgsql-general
On Sun, 24 Mar 2019 at 08:01, Steve Baldwin <steve.baldwin@gmail.com> wrote:
>
> Thanks Tom,
>
> I've tried this on 11.2 (OS X 10.14.3, installed locally) and 10.6 (AWS RDS) instances with identical results.  The
valuesyou show are identical to those returned by Oracle so that's great but why am I seeing different results?
 
>

This is caused by the large magnitude of the ts values, which causes a
cancellation error in the Sxx calculation, which is what commit
e954a727f0 fixed in HEAD, and will be available in PG12 [1].

You can see that by including regr_sxx in the results. With PG11, this
gives the following:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |        trend         |     sxx
------+----------------------+-------------
 c742 |                      |           0
 317e |                      |           0
 5fe6 | 5.78750952760444e-06 | 19905896448
 3441 |                      |           0
(4 rows)

Those zeros for Sxx are the result of calculating the sum of the
squares of ts values and then subtracting off the square of the mean,
which results in a complete loss of accuracy because the intermediate
values are so large they don't differ according to double precision
arithmetic.

A workaround in PG11 is to just offset the ts values by something
close to their mean (offsetting the ts values by a constant amount
shouldn't affect the mathematical result, but does eliminate the
cancellation errors):

select id, regr_slope(elapsed, ts-1552892914) as trend,
       regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6077357654714 | 0.0468182563781738
 317e |    -1.08385104429772 |   59.2381523980035
 5fe6 | 5.78750948360697e-06 |   19905896596.7403
 3441 |    -3.82839508895523 |   20.1098628044128
(4 rows)


For PG12 the algorithm for calculating these quantities has been
changed by e954a727f0, so the result should be more accurate
regardless of the offset:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6078587812905 | 0.0468179252929986
 317e |     -1.0838511987809 |   59.2381423694815
 5fe6 | 5.78750948358674e-06 |   19905896596.7605
 3441 |    -3.82839546309736 |   20.1098619909822
(4 rows)

select id, regr_slope(elapsed, ts-1552892914) as trend,
       regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6078431374563 | 0.0468179999990382
 317e |    -1.08385109620679 |   59.2381495556381
 5fe6 | 5.78750948360693e-06 |   19905896596.7403
 3441 |    -3.82839509931361 |    20.109862749992
(4 rows)

Regards,
Dean

[1] https://github.com/postgres/postgres/commit/e954a727f0


pgsql-general by date:

Previous
From: Ron
Date:
Subject: Re: When to store data that could be derived
Next
From: Chris Travers
Date:
Subject: Re: When to store data that could be derived