Thread: Interpolation problem - pg 12.4 - full correct version!
Hi all - here is the entire correct problem - apologies again! Hi all, I have an interpolation problem as follows - fiddle available here: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=428aa76d49b37961088d3dfef190757f A table: CREATE TABLE data ( s SERIAL PRIMARY KEY, t TIMESTAMP, lat NUMERIC ); and data: INSERT INTO data (t, lat) VALUES ('2019-01-01 00:00:00', 5.07), ('2019-01-01 01:00:00', 4.60), ('2019-01-01 02:00:00', NULL), ('2019-01-01 03:00:00', NULL), ('2019-01-01 04:00:00', 4.7), ('2019-01-01 05:00:00', 4.20), ('2019-01-01 06:00:00', NULL), ('2019-01-01 07:00:00', 4.98), ('2019-01-01 08:00:00', 4.50), ('2019-01-01 09:00:00', 4.7), ('2019-01-01 10:00:00', NULL), ('2019-01-01 11:00:00', NULL), ('2019-01-01 12:00:00', NULL), ('2019-01-01 13:00:00', 6.45), ('2019-01-01 14:00:00', 3.50); There are gaps in the data as you can see - I'm trying to fill them using the algorithm: - a sequence of 1 NULL - take the average of the reading above and the reading below - a sequence of 2 NULLs - the top assigned value is the average of the two records above it and the bottom assigned one is the average of the two records below. So far, so good - I'm able to do this (but see discussion below) - a sequence of 3 NULLs - the middle one is assigned a value equal to average of the non-NULL record above and the non-null record below, and then the remaining NULLs above and below the average of the middle one and the non-NULL ones above and below. This is where it gets tricky - I'm getting answers, but I don't think they're correct. The result of the massive SQL shown below are here (also on fiddle): s lat final_val 1 5.07 5.07 2 4.60 4.60 3 NULL 4.84 4 NULL 4.45 5 4.7 4.7 6 4.20 4.20 7 NULL 4.59 8 4.98 4.98 9 4.50 4.50 10 4.7 4.7 11 NULL 4.60 12 NULL 5.58 13 NULL 4.98 14 6.45 6.45 15 3.50 3.50 The value for record 12 is correct, ,but not those above and below it. I think my *MAJOR* problem is that I've developed what is, essentially, a totally brute force approach - and this simply won't work at the scenario becomes more complex - take a look at the CASE statement - it's horrible and would only become exponentially worse as the number NULLs rises. So, my question is: Is there a recognised technique (using SQL only, not PL/pgSQL - soutions based on the latter are easy to find) whereby I can do a basic Linear Interpolation? Should you require any further information, please don't hesitate to contact me. TIA and rgs, Pól... ========================================================= My mega SQL: WITH cte1 AS ( SELECT d1.s, d1.t AS t1, d1.lat AS l1, LAG(d1.lat, 2) OVER (ORDER BY t ASC) AS lag_t1_2, LAG(d1.lat, 1) OVER (ORDER BY t ASC) AS lag_t1, LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1, LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2 FROM data d1 ), cte2 AS ( SELECT d2.t AS t2, d2.lat AS l2, LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2, LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2 FROM data d2 ), cte3 AS ( SELECT t1.s, t1.t1, t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2, t1.lead_t1, t2.lead_t2, t1.lead_t1_2 FROM cte1 t1 JOIN cte2 t2 ON t1.t1 = t2.t2 ), cte4 AS ( SELECT t1.s, t1.l1 AS lat, CASE -- The WHEN for the middle of 3 NULLs has to be at the beginning -- of the CASE - if at the end, it remains NULL - why? WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2) WHEN (t1.l1 IS NOT NULL) THEN t1.l1 WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL) AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2) WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL) AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2) WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL) AND (t1.lag_t1 IS NULL) AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2) ELSE 0 END AS final_val FROM cte3 t1 ) SELECT s, lat, final_val FROM cte4;
On Wed, Aug 19, 2020 at 11:51 AM Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
I think my *MAJOR* problem is that I've developed what is,
essentially, a totally brute force approach - and this simply won't
work at the scenario becomes more complex - take a look at the CASE
statement - it's horrible and would only become exponentially worse as
the number NULLs rises.
So, my question is: Is there a recognised technique (using SQL only,
not PL/pgSQL - soutions based on the latter are easy to find) whereby
I can do a basic Linear Interpolation?
I don't have a recognized technique, nor care to ponder one right now, but what you've described would best be done in pure SQL using WITH RECURSIVE, which provides an iterative approach to SQL result building. Which is more commonly done in a procedural language. The algorithm you describe is an iterative algorithm and so I'm wondering why place the arbitrary restriction on using pure SQL when it likely wouldn't provide a very readable nor performant solution relative to a procedural (pl/pgsql or otherwise) one?
David J.
On 8/19/20 3:08 PM, David G. Johnston wrote: > On Wed, Aug 19, 2020 at 11:51 AM Pól Ua Laoínecháin <linehanp@tcd.ie > <mailto:linehanp@tcd.ie>> wrote: > > > I think my *MAJOR* problem is that I've developed what is, > essentially, a totally brute force approach - and this simply won't > work at the scenario becomes more complex - take a look at the CASE > statement - it's horrible and would only become exponentially worse as > the number NULLs rises. > > So, my question is: Is there a recognised technique (using SQL only, > not PL/pgSQL - soutions based on the latter are easy to find) whereby > I can do a basic Linear Interpolation? > > I don't have a recognized technique, nor care to ponder one right now, but what > you've described would best be done in pure SQL using WITH RECURSIVE, which > provides an iterative approach to SQL result building. Which is more commonly > done in a procedural language. The algorithm you describe is an iterative > algorithm and so I'm wondering why place the arbitrary restriction on using pure > SQL when it likely wouldn't provide a very readable nor performant solution > relative to a procedural (pl/pgsql or otherwise) one? Might want to consider using PL/R with something like the R imputeTS package: https://cran.r-project.org/web/packages/imputeTS/readme/README.html HTH, Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
Hi, >> So, my question is: Is there a recognised technique (using SQL only, >> not PL/pgSQL - soutions based on the latter are easy to find) whereby >> I can do a basic Linear Interpolation? > I don't have a recognized technique, nor care to ponder one right now, but what you've described would best be done inpure SQL using WITH RECURSIVE, which provides an iterative approach to SQL result building. Which is more commonly donein a procedural language. The algorithm you describe is an iterative algorithm and so I'm wondering why place the arbitraryrestriction on using pure SQL when it likely wouldn't provide a very readable nor performant solution relative toa procedural (pl/pgsql or otherwise) one? Well, it would be trivial to do something with PL/pgSQL (just Google and download the code - there are a few examples around). I''m a big fan of this site https://explainextended.com/ - this guy can do anything with SQL - I'd like to try and learn how to do the same. You're idea of RECURSIVE is exactly along the lines I was thinking of - I'll persevere and see what I can come up with using WITH RECURSIVE - I've use it before. Thanks for your input and rgs, Pól... > David J.
Hi, > Might want to consider using PL/R with something like the R imputeTS package: > https://cran.r-project.org/web/packages/imputeTS/readme/README.html Thanks for your input - looks interesting, but see my reply to David Johnston. Rgs, Pól > Joe