Thread: regr_slope returning NULL

regr_slope returning NULL

From
Steve Baldwin
Date:
Hi,

I'm not sure whether or not this is a bug, so I've posted here first (after having posted on Stack Overflow).

Consider the following:

log=# create table sb1(id text, elapsed int, ts numeric);
CREATE TABLE
log=# insert into sb1 values ('317e',86,1552861322.627),('317e',58,1552861324.747),('317e',52,1552861325.722),('317e',58,1552861326.647),('317e',82,1552861327.609),('317e',118,1552861328.514),('317e',58,1552861329.336),('317e',58,1552861330.317),('317e',54,1552861330.935),('3441',68,1552861324.765),('3441',84,1552861326.665),('3441',56,1552861327.627),('3441',50,1552861330.952),('5fe6',42,1552993248.398),('5fe6',44,1552993255.883),('5fe6',44,1553166049.261),('c742',62,1552861322.149),('c742',68,1552861322.455);
INSERT 0 18
log=# select * from sb1 order by id, ts;
  id  | elapsed |       ts       
------+---------+----------------
 317e |      86 | 1552861322.627
 317e |      58 | 1552861324.747
 317e |      52 | 1552861325.722
 317e |      58 | 1552861326.647
 317e |      82 | 1552861327.609
 317e |     118 | 1552861328.514
 317e |      58 | 1552861329.336
 317e |      58 | 1552861330.317
 317e |      54 | 1552861330.935
 3441 |      68 | 1552861324.765
 3441 |      84 | 1552861326.665
 3441 |      56 | 1552861327.627
 3441 |      50 | 1552861330.952
 5fe6 |      42 | 1552993248.398
 5fe6 |      44 | 1552993255.883
 5fe6 |      44 | 1553166049.261
 c742 |      62 | 1552861322.149
 c742 |      68 | 1552861322.455
(18 rows)

log=# select id, regr_slope(elapsed, ts) as trend from sb1 group by id;
  id  |        trend         
------+----------------------
 c742 |                     
 317e |                     
 5fe6 | 5.78750952760444e-06
 3441 |                     
(4 rows)

Interestingly, the same dataset and function in Oracle 11.2 returns the same value for one of the ids and non-null values for the other ids :

SQL> select * from sb1 order by id, ts;

ID            ELAPSED               TS
---------- ---------- ----------------
317e               86   1552861322.627
317e               58   1552861324.747
317e               52   1552861325.722
317e               58   1552861326.647
317e               82   1552861327.609
317e              118   1552861328.514
317e               58   1552861329.336
317e               58   1552861330.317
317e               54   1552861330.935
3441               68   1552861324.765
3441               84   1552861326.665
3441               56   1552861327.627
3441               50   1552861330.952
5fe6               42   1552993248.398
5fe6               44   1552993255.883
5fe6               44   1553166049.261
c742               62   1552861322.149
c742               68   1552861322.455

18 rows selected.

SQL> select id, regr_slope(elapsed, ts) from sb1 group by id;

ID         REGR_SLOPE(ELAPSED,TS)
---------- ----------------------
c742                   19.6078431
5fe6                   5.7875E-06
317e                   -1.0838511
3441                   -3.8283951

If pg is correctly returning NULL, I'd be interested to understand the circumstances under which this can occur.

Thanks,

Steve

Re: regr_slope returning NULL

From
Tom Lane
Date:
Steve Baldwin <steve.baldwin@gmail.com> writes:
> Consider the following:
> ...
> log=# select id, regr_slope(elapsed, ts) as trend from sb1 group by id;
>   id  |        trend
> ------+----------------------
>  c742 |
>  317e |
>  5fe6 | 5.78750952760444e-06
>  3441 |
> (4 rows)

Hm, I get

regression=# select id, regr_slope(elapsed, ts) as trend from sb1 group by id;
  id  |         trend         
------+-----------------------
 c742 |    19.607858781290517
 317e |   -1.0838511987808963
 5fe6 | 5.787509483586743e-06
 3441 |    -3.828395463097356
(4 rows)

What platform are you doing this on, and what exactly is the PG version?

> If pg is correctly returning NULL, I'd be interested to understand the
> circumstances under which this can occur.

The source code shows two cases in which NULL would be returned:

    /* if N is 0 we should return NULL */
    if (N < 1.0)
        PG_RETURN_NULL();

    /* per spec, return NULL for a vertical line */
    if (Sxx == 0)
        PG_RETURN_NULL();

Maybe the cases you're looking at are sufficiently numerically
ill-conditioned that you could get Sxx == 0 depending on platform-
specific roundoff error, but it seems fishy.

            regards, tom lane


Re: regr_slope returning NULL

From
Steve Baldwin
Date:
Thanks Tom,

I've tried this on 11.2 (OS X 10.14.3, installed locally) and 10.6 (AWS RDS) instances with identical results.  The values you show are identical to those returned by Oracle so that's great but why am I seeing different results?

This is from my local install:

log=# select version();
                                                     version                                                      
------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2 on x86_64-apple-darwin18.2.0, compiled by Apple LLVM version 10.0.0 (clang-1000.11.45.5), 64-bit

Cheers,

Steve

On Sun, Mar 24, 2019 at 4:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve Baldwin <steve.baldwin@gmail.com> writes:
> Consider the following:
> ...
> log=# select id, regr_slope(elapsed, ts) as trend from sb1 group by id;
>   id  |        trend
> ------+----------------------
>  c742 |
>  317e |
>  5fe6 | 5.78750952760444e-06
>  3441 |
> (4 rows)

Hm, I get

regression=# select id, regr_slope(elapsed, ts) as trend from sb1 group by id;
  id  |         trend         
------+-----------------------
 c742 |    19.607858781290517
 317e |   -1.0838511987808963
 5fe6 | 5.787509483586743e-06
 3441 |    -3.828395463097356
(4 rows)

What platform are you doing this on, and what exactly is the PG version?

> If pg is correctly returning NULL, I'd be interested to understand the
> circumstances under which this can occur.

The source code shows two cases in which NULL would be returned:

        /* if N is 0 we should return NULL */
        if (N < 1.0)
                PG_RETURN_NULL();

        /* per spec, return NULL for a vertical line */
        if (Sxx == 0)
                PG_RETURN_NULL();

Maybe the cases you're looking at are sufficiently numerically
ill-conditioned that you could get Sxx == 0 depending on platform-
specific roundoff error, but it seems fishy.

                        regards, tom lane

Re: regr_slope returning NULL

From
Dean Rasheed
Date:
On Sun, 24 Mar 2019 at 08:01, Steve Baldwin <steve.baldwin@gmail.com> wrote:
>
> Thanks Tom,
>
> I've tried this on 11.2 (OS X 10.14.3, installed locally) and 10.6 (AWS RDS) instances with identical results.  The
valuesyou show are identical to those returned by Oracle so that's great but why am I seeing different results?
 
>

This is caused by the large magnitude of the ts values, which causes a
cancellation error in the Sxx calculation, which is what commit
e954a727f0 fixed in HEAD, and will be available in PG12 [1].

You can see that by including regr_sxx in the results. With PG11, this
gives the following:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |        trend         |     sxx
------+----------------------+-------------
 c742 |                      |           0
 317e |                      |           0
 5fe6 | 5.78750952760444e-06 | 19905896448
 3441 |                      |           0
(4 rows)

Those zeros for Sxx are the result of calculating the sum of the
squares of ts values and then subtracting off the square of the mean,
which results in a complete loss of accuracy because the intermediate
values are so large they don't differ according to double precision
arithmetic.

A workaround in PG11 is to just offset the ts values by something
close to their mean (offsetting the ts values by a constant amount
shouldn't affect the mathematical result, but does eliminate the
cancellation errors):

select id, regr_slope(elapsed, ts-1552892914) as trend,
       regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6077357654714 | 0.0468182563781738
 317e |    -1.08385104429772 |   59.2381523980035
 5fe6 | 5.78750948360697e-06 |   19905896596.7403
 3441 |    -3.82839508895523 |   20.1098628044128
(4 rows)


For PG12 the algorithm for calculating these quantities has been
changed by e954a727f0, so the result should be more accurate
regardless of the offset:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6078587812905 | 0.0468179252929986
 317e |     -1.0838511987809 |   59.2381423694815
 5fe6 | 5.78750948358674e-06 |   19905896596.7605
 3441 |    -3.82839546309736 |   20.1098619909822
(4 rows)

select id, regr_slope(elapsed, ts-1552892914) as trend,
       regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6078431374563 | 0.0468179999990382
 317e |    -1.08385109620679 |   59.2381495556381
 5fe6 | 5.78750948360693e-06 |   19905896596.7403
 3441 |    -3.82839509931361 |    20.109862749992
(4 rows)

Regards,
Dean

[1] https://github.com/postgres/postgres/commit/e954a727f0


Re: regr_slope returning NULL

From
Steve Baldwin
Date:
Thanks Dean, that's really helpful. Because my x axis values are actually derived from 'extract(epoch from tstz_col)', it is simple for me to subtract an offset.

Cheers,

Steve

On Sun, Mar 24, 2019 at 7:55 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Sun, 24 Mar 2019 at 08:01, Steve Baldwin <steve.baldwin@gmail.com> wrote:
>
> Thanks Tom,
>
> I've tried this on 11.2 (OS X 10.14.3, installed locally) and 10.6 (AWS RDS) instances with identical results.  The values you show are identical to those returned by Oracle so that's great but why am I seeing different results?
>

This is caused by the large magnitude of the ts values, which causes a
cancellation error in the Sxx calculation, which is what commit
e954a727f0 fixed in HEAD, and will be available in PG12 [1].

You can see that by including regr_sxx in the results. With PG11, this
gives the following:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |        trend         |     sxx
------+----------------------+-------------
 c742 |                      |           0
 317e |                      |           0
 5fe6 | 5.78750952760444e-06 | 19905896448
 3441 |                      |           0
(4 rows)

Those zeros for Sxx are the result of calculating the sum of the
squares of ts values and then subtracting off the square of the mean,
which results in a complete loss of accuracy because the intermediate
values are so large they don't differ according to double precision
arithmetic.

A workaround in PG11 is to just offset the ts values by something
close to their mean (offsetting the ts values by a constant amount
shouldn't affect the mathematical result, but does eliminate the
cancellation errors):

select id, regr_slope(elapsed, ts-1552892914) as trend,
       regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6077357654714 | 0.0468182563781738
 317e |    -1.08385104429772 |   59.2381523980035
 5fe6 | 5.78750948360697e-06 |   19905896596.7403
 3441 |    -3.82839508895523 |   20.1098628044128
(4 rows)


For PG12 the algorithm for calculating these quantities has been
changed by e954a727f0, so the result should be more accurate
regardless of the offset:

select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6078587812905 | 0.0468179252929986
 317e |     -1.0838511987809 |   59.2381423694815
 5fe6 | 5.78750948358674e-06 |   19905896596.7605
 3441 |    -3.82839546309736 |   20.1098619909822
(4 rows)

select id, regr_slope(elapsed, ts-1552892914) as trend,
       regr_sxx(elapsed, ts-1552892914) as sxx
  from sb1 group by id;

  id  |        trend         |        sxx
------+----------------------+--------------------
 c742 |     19.6078431374563 | 0.0468179999990382
 317e |    -1.08385109620679 |   59.2381495556381
 5fe6 | 5.78750948360693e-06 |   19905896596.7403
 3441 |    -3.82839509931361 |    20.109862749992
(4 rows)

Regards,
Dean

[1] https://github.com/postgres/postgres/commit/e954a727f0