Thread: Interpolation problem - pg 12.4 - full correct version!

Interpolation problem - pg 12.4 - full correct version!

From
Pól Ua Laoínecháin
Date:
Hi all - here is the entire correct problem - apologies again!

Hi all,

I have an interpolation problem as follows - fiddle available here:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=428aa76d49b37961088d3dfef190757f


A table:

CREATE TABLE data
(
  s SERIAL PRIMARY KEY,
  t TIMESTAMP,
  lat NUMERIC
);

and data:


INSERT INTO data (t, lat)
VALUES
('2019-01-01 00:00:00', 5.07),
('2019-01-01 01:00:00', 4.60),
('2019-01-01 02:00:00', NULL),
('2019-01-01 03:00:00', NULL),
('2019-01-01 04:00:00', 4.7),
('2019-01-01 05:00:00', 4.20),
('2019-01-01 06:00:00', NULL),
('2019-01-01 07:00:00', 4.98),
('2019-01-01 08:00:00', 4.50),
('2019-01-01 09:00:00', 4.7),
('2019-01-01 10:00:00', NULL),
('2019-01-01 11:00:00', NULL),
('2019-01-01 12:00:00', NULL),
('2019-01-01 13:00:00', 6.45),
('2019-01-01 14:00:00', 3.50);


There are gaps in the data as you can see - I'm trying to fill them
using the algorithm:

 - a sequence of 1 NULL - take the average of the reading above and
the reading below

- a sequence of 2 NULLs - the top assigned value is the average of the
two records above it and the bottom assigned one is the average of the
two records below.

So far, so good - I'm able to do this (but see discussion below)

 - a sequence of 3 NULLs - the middle one is assigned a value equal to
average of the non-NULL record above and the non-null record below,
and then the remaining NULLs above and below the average of the middle
one and the non-NULL ones above and below.

This is where it gets tricky - I'm getting answers, but I don't think
they're correct. The result of the massive  SQL shown below are here
(also on fiddle):

s    lat     final_val
1    5.07         5.07
2    4.60         4.60
3    NULL         4.84
4    NULL         4.45
5     4.7          4.7
6    4.20         4.20
7    NULL         4.59
8    4.98         4.98
9    4.50         4.50
10    4.7          4.7
11   NULL         4.60
12   NULL         5.58
13   NULL         4.98
14   6.45         6.45
15   3.50         3.50

The value for record 12 is correct, ,but not those above and below it.

I think my *MAJOR* problem is that I've developed what is,
essentially, a totally brute force approach - and this simply won't
work at the scenario becomes more complex - take a look at the CASE
statement - it's horrible and would only become exponentially worse as
the number NULLs rises.

So, my question is: Is there a recognised technique (using SQL only,
not PL/pgSQL - soutions based on the latter are easy to find) whereby
I can do a basic Linear Interpolation?

Should you require any further information, please don't hesitate to contact me.

TIA and rgs,


Pól...


=========================================================

My mega SQL:

WITH cte1 AS
(
  SELECT d1.s,
    d1.t AS t1, d1.lat AS l1,
    LAG(d1.lat, 2)  OVER (ORDER BY t ASC) AS lag_t1_2,
    LAG(d1.lat, 1)  OVER (ORDER BY t ASC) AS lag_t1,
    LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1,
    LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2
  FROM data d1
),
cte2 AS
(
  SELECT
    d2.t AS t2, d2.lat AS l2,
    LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2,
    LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2
  FROM data d2
),
cte3 AS
(
  SELECT t1.s,
    t1.t1,  t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2,
    t1.lead_t1, t2.lead_t2, t1.lead_t1_2
  FROM cte1 t1
  JOIN cte2 t2
  ON t1.t1 = t2.t2
),
cte4 AS
(
  SELECT t1.s,
  t1.l1 AS lat,
    CASE

      -- The WHEN for the middle of 3 NULLs has to be at the beginning
      -- of the CASE - if at the end, it remains NULL - why?

      WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL)
        AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL)
      THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2)

      WHEN (t1.l1 IS NOT NULL) THEN t1.l1
      WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL)
        AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2)
      WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL)
        AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2)
      WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL)  AND (t1.lag_t1 IS NULL)
        AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2)
      ELSE 0
    END AS final_val
  FROM cte3 t1
)
SELECT s, lat, final_val FROM cte4;



Re: Interpolation problem - pg 12.4 - full correct version!

From
"David G. Johnston"
Date:
On Wed, Aug 19, 2020 at 11:51 AM Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:

I think my *MAJOR* problem is that I've developed what is,
essentially, a totally brute force approach - and this simply won't
work at the scenario becomes more complex - take a look at the CASE
statement - it's horrible and would only become exponentially worse as
the number NULLs rises.

So, my question is: Is there a recognised technique (using SQL only,
not PL/pgSQL - soutions based on the latter are easy to find) whereby
I can do a basic Linear Interpolation?


I don't have a recognized technique, nor care to ponder one right now, but what you've described would best be done in pure SQL using WITH RECURSIVE, which provides an iterative approach to SQL result building.  Which is more commonly done in a procedural language.  The algorithm you describe is an iterative algorithm and so I'm wondering why place the arbitrary restriction on using pure SQL when it likely wouldn't provide a very readable nor performant solution relative to a procedural (pl/pgsql or otherwise) one?

David J.

Re: Interpolation problem - pg 12.4 - full correct version!

From
Joe Conway
Date:
On 8/19/20 3:08 PM, David G. Johnston wrote:
> On Wed, Aug 19, 2020 at 11:51 AM Pól Ua Laoínecháin <linehanp@tcd.ie
> <mailto:linehanp@tcd.ie>> wrote:
>
>
>     I think my *MAJOR* problem is that I've developed what is,
>     essentially, a totally brute force approach - and this simply won't
>     work at the scenario becomes more complex - take a look at the CASE
>     statement - it's horrible and would only become exponentially worse as
>     the number NULLs rises.
>
>     So, my question is: Is there a recognised technique (using SQL only,
>     not PL/pgSQL - soutions based on the latter are easy to find) whereby
>     I can do a basic Linear Interpolation?
>
> I don't have a recognized technique, nor care to ponder one right now, but what
> you've described would best be done in pure SQL using WITH RECURSIVE, which
> provides an iterative approach to SQL result building.  Which is more commonly
> done in a procedural language.  The algorithm you describe is an iterative
> algorithm and so I'm wondering why place the arbitrary restriction on using pure
> SQL when it likely wouldn't provide a very readable nor performant solution
> relative to a procedural (pl/pgsql or otherwise) one?


Might want to consider using PL/R with something like the R imputeTS package:

https://cran.r-project.org/web/packages/imputeTS/readme/README.html

HTH,

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: Interpolation problem - pg 12.4 - full correct version!

From
Pól Ua Laoínecháin
Date:
Hi,

>> So, my question is: Is there a recognised technique (using SQL only,
>> not PL/pgSQL - soutions based on the latter are easy to find) whereby
>> I can do a basic Linear Interpolation?

> I don't have a recognized technique, nor care to ponder one right now, but what you've described would best be done
inpure SQL using WITH RECURSIVE, which provides an iterative approach to SQL result building.  Which is more commonly
donein a procedural language.  The algorithm you describe is an iterative algorithm and so I'm wondering why place the
arbitraryrestriction on using pure SQL when it likely wouldn't provide a very readable nor performant solution relative
toa procedural (pl/pgsql or otherwise) one? 

Well, it would be trivial to do something with PL/pgSQL (just Google
and download the code - there are a few examples around).

I''m a big fan of this site https://explainextended.com/ - this guy
can do anything with SQL - I'd like to try and learn how to do the
same.

You're idea of RECURSIVE is exactly along the lines I was thinking of
- I'll persevere and see what I can come up with using WITH RECURSIVE
- I've use it before.

Thanks for your input and rgs,


Pól...


> David J.



Re: Interpolation problem - pg 12.4 - full correct version!

From
Pól Ua Laoínecháin
Date:
Hi,

> Might want to consider using PL/R with something like the R imputeTS package:
> https://cran.r-project.org/web/packages/imputeTS/readme/README.html

Thanks for your input - looks interesting, but see my reply to David Johnston.

Rgs,

Pól

> Joe